ABSTRACT
Changes in climate might have a significant impact on rainfall characteristics, including extreme rainfall. This study aims to project the future daily rainfall, preserving most of the rainfall characteristics, including extreme rainfall incorporating climate changes. This paper presents two hybrid semi-parametric statistical downscaling models for future projection of IDF curves. The precipitation flux from seven scenarios of ten GCMs and observed daily rainfall data are considered as predictors and predictand variables, respectively. At site, daily rainfall occurrence is modeled using a two-state first-order Markov chain. Rainfall amounts on each wet day are modelled using a univariate nonparametric kernel density estimator. Two types of amount generation models are presented in this study. The bounded model (KDE-SP) is developed, considering the support for the kernel distribution as positive. In the unbounded model (KDE-Ext), the wet days are reclassified as extreme and non-extreme rainy days. A significant increasing trend can be observed in the future projected intensity–duration–frequency relationships. The maximum increment using empirical distribution is observed as 93.21 and 80.93% on a 5-year return period in the far future for the SSP5-8.5 scenario, using KDE-Ext and KDE-SP models, respectively. Although both methods show similar results, the KDE-Ext model performs better in simulating extreme rainfall.
HIGHLIGHTS
This study introduces two new statistical downscaling techniques to simulate future daily rainfall time series based on a two-state, first-order Markov chain and the kernel density estimation technique.
Both models can produce long-term synthetic rainfall series different from the historical data, better preserving most of the statistical characteristics and extreme rainfall values.
INTRODUCTION
Global warming and its resulting climate change are anticipated to have significant effects on ecosystems, agriculture, freshwater availability, and human civilization that are susceptible to changes in precipitation (Kannan & Ghosh 2013; Kumar et al. 2023; Sahu et al. 2023). Water is essential for both civilization and the environment, and it is also of enormous significance to understand how the changes in the global climate may influence regional water availability. General circulation models (GCMs) are the most credible mathematical models that simulate the global climate variables according to the shared socioeconomic pathways (SSPs) depending on the mitigation and socioeconomic challenges and changing greenhouse gas concentration levels and radiative forcing levels, which are generally used to assess the potential effects of climate change on hydroclimate variables (Dey et al. 2022) The distributions of the GCM outputs in space and time are often much coarser, which is insufficient to clearly assess local or point-scale climatic variables (Kannan & Ghosh 2013; Salvi et al. 2013; Raju & Kumar 2020; Pham et al. 2021). In order to evaluate the point or local scale hydroclimatic variables from GCM outputs, spatial or temporal downscaling techniques are widely used in literature to address these scaling mismatch issues (Giorgi & Mearns 1991; Ghosh & Mujumdar 2006; Fowler et al. 2007; Kannan & Ghosh 2013; Chandra et al. 2015; Tavakolifar et al. 2017; Halder & Saha 2021; Pham et al. 2021).
There are two primary ways of downscaling large-scale GCM output to a finer resolution: (a) dynamical approach and (b) statistical approach (Fowler et al. 2007; Kannan & Ghosh 2013; Salvi & Ghosh 2013; Chandra et al. 2015). Statistical downscaling approaches are computationally cost-effective and beneficial if there is enough historical data available for building the statistical or empirical relationships between the variables simulated by the large-scale GCMs, known as predictors, and station-scale climate variables, known as predictands (Mearns et al. 2003; Mujumdar & Kumar 2012).
In the literature, statistical downscaling methods are generally classified into three categories: weather generators or stochastic weather generators, weather typing, and regression models, or transfer functions (Giorgi & Mearns 1991; Wilby & Wigley 1997; Wilby et al. 2002; Fowler et al. 2007; Kannan & Ghosh 2011; Chandra et al. 2015; Pham et al. 2021). Some statistical downscaling methods used for climate variable projections include Markov chain models based on transitional probability (TP) (Haan et al. 1976; Bardossy & Plate 1991; Hughes et al. 1993; Wilks 1999a), spell length models based on TP (Lall & Sharma 1996; Wilks 1999b), nonhomogeneous Markov model (Rajagopalan et al. 1996), nonhomogeneous hidden Markov model (Hughes & Guttorp 1994), nonparametric nonhomogeneous hidden Markov models (Mehrotra & Sharma 2005, 2006), semi-parametric Markov model (Mehrotra & Sharma 2007), the fuzzy clustering technique (Ghosh & Mujumdar 2006), k-nearest neighbour (k-NN) resampling technique (Rajagopalan & Lall 1999), fuzzy clustering approach (Ghosh & Mujumdar 2006), artificial neural network (Olsson et al. 2004), support vector machine (Pham et al. 2019), and nonparametric kernel regression statistical downscaling model (Kannan & Ghosh 2011; Salvi & Ghosh 2013).
The typical basic Markov chain-based rainfall simulation model has been in the literature for the last few decades (Gabriel & Neumann 1962; Haan et al. 1976; Richardson 1981; Wilby 1994; Wilks 1992, 1989). The first statistical daily rainfall occurrence model using a first-order Markov chain was presented by Gabriel & Neumann (1962). Haan et al. (1976) simulate daily rainfall amounts using exponential distribution and uniform distribution for six nonzero classes, which are also identified by the first-order Markov model. Richardson (1981) and Wilby (1994) have also employed exponential distributions to model the daily rainfall amounts on each wet day. Wilks (1989) has modelled a two-state first-order Markov chain model to obtain daily rainfall occurrence and deployed a two-parameter gamma distribution to get the rainfall amount based on the probability density function to simulate the monthly rainfall. Wilks (1992) developed a statistical weather generator model using a two-state first-order Markov chain and two-parameter gamma distribution to generate daily rainfall amounts and also incorporate GCM outputs to assess the impacts of climate change. Chandra et al. (2015) proposed a statistical weather generator model to simulate extreme rainfall (ER) in three future time slices. They used a three-state first-order Markov chain model to determine the non-rainy days, rainy days of moderate intensity, and rainy days of high intensity. Chandra et al. (2015) fitted the three-state gamma distribution for both moderate and high-intensity rainfall to generate the daily rainfall amount for each wet day.
Rajagopalan et al. (1996) present a single-step, nonhomogeneous Markov model to generate daily rainfall at a single site. The one-step, 2 × 2 transitional probability matrices (TPMs) are estimated using a kernel density estimator through a weighted average of transition counts at the day of interest over the historical period. The rainfall amounts on each wet day were estimated using the kernel density estimation (KDE) technique centred on the day of interest over all the historical observed periods. Harrold et al. (2003a) present a nonparametric model based on the nearest neighbour approach to simulate daily rainfall occurrences for single sites. Harrold et al. (2003b) present a nonparametric stochastic model based on the KDE technique to generate daily rainfall amounts for single-site conditions on each wet day estimated using the method proposed by Harrold et al. (2003a). However, in this study, four distinct classes of previous-day rainfall amount were considered to be the predictor variable to predict the current-day rainfall amount. The seasonality of the daily rainfall series was achieved using a l-day moving window approach. Mehrotra & Sharma (2005) developed a nonparametric, nonhomogeneous hidden Markov model based on the k-NN technique to simulate daily rainfall occurrences for multiple sites using four atmospheric circulation variables. Mehrotra & Sharma (2007) proposed a semi-parametric stochastic modelling framework based on the KDE approach to generate multi-site daily rainfall amounts. The rainfall occurrence for each site was modelled using a two-state, first-order Markov chain model modified by the nearest neighbour approach with ‘aggregate’ predictor variables indicating how wet it has been over a particular period. The rainfall amounts were modelled using the nonparametric KDE technique with an l-day moving window.
Most rainfall simulation studies using the Markovian framework generally model the rainfall occurrence using the Markov chain technique, and rainfall amounts have been modelled using some parametric distribution (e.g., gamma, exponential, log-normal, generalized extreme value (GEV)). The combination of a modified Markov chain and some nonparametric distribution (e.g., K-NN and KDE) has also been used in literature, as discussed earlier. However, no study has been found by the authors that used both the Markov chain and nonparametric KDE technique to simulate the daily rainfall time series for future periods incorporating GCM precipitation outputs. This work tried to cover this gap using two downscaling methods, where both the Markov chain and nonparametric KDE technique are used to simulate the daily rainfall time series for three future time slices.
This study presents two stochastic downscaling frameworks to simulate daily rainfall occurrences and amounts for a single site, such that the model is able to represent the sequential future rainfall time series data for daily and longer timescales. The approach is structured to ensure that the model maintains persistent attributes such as ER, the number of wet and dry days, and other statistics discussed in Table 2, consistent with the observed historical rainfall record. Both downscaling frameworks operate in two parts. The first part involves downscaling of daily rainfall occurrence using a TP-based two-state first-order Markov chain (Richardson 1981). This part of the downscaling framework is named the rainfall occurrence downscaling model (RODM). The details about this method can be found in (Wilks & Wilby 1999). The second part of the downscaling framework, named the rainfall amounts downscaling model (RADM), simulates the daily rainfall amounts for each day classified as a wet day by the RODM. RADM is modelled based on univariate KDE function (Härdle et al. 2004; Scott 2015; Hollander et al. 2015; Silverman 2018). A major drawback of the nonparametric approach is that the model has limited extrapolation capacity to simulate daily precipitation values beyond the largest value recorded (Rajagopalan et al. 1996). Due to incorporating the perturbation factor (PF) from GCM data along with the KDE, this method can generate values different from the historical data and has become a novel approach.
Sl. No. . | Source ID . | Institution ID . | Grid (Lon-Lat) . | SSPs . |
---|---|---|---|---|
1. | CanESM5 | CCCma | 128 × 64 | SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP4-3.4, SSP4-6.0, SSP5-8.5 |
2. | CNRM-ESM2-1 | CNRM-CERFACS | 256 × 128 | SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP4-3.4, SSP4-6.0, SSP5-8.5 |
3. | IPSL-CM5A2-INCA | IPSL | 96 × 96 | SSP1-2.6, SSP3-7.0 |
4. | MRI-ESM2-0 | MRI | 320 × 160 | SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP4-3.4, SSP4-6.0, SSP5-8.5 |
5. | CMCC-CM2-SR5 | CMCC | 288 × 192 | SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5 |
6. | E3SM-1-0 | E3SM-Project | 360 × 180 | SSP5-8.5 |
7. | EC-Earth3-Veg | EC-Earth-Consortium | 512 × 256 | SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5 |
8. | MIROC-ES2L | MIROC | 128 × 64 | SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP4-3.4, SSP4-6.0, SSP5-8.5 |
9. | NESM3 | NUIST | 192 × 96 | SSP1-2.6, SSP2-4.5, SSP5-8.5 |
10. | TaiESM1 | AS-RCEC | 288 × 192 | SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5 |
Sl. No. . | Source ID . | Institution ID . | Grid (Lon-Lat) . | SSPs . |
---|---|---|---|---|
1. | CanESM5 | CCCma | 128 × 64 | SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP4-3.4, SSP4-6.0, SSP5-8.5 |
2. | CNRM-ESM2-1 | CNRM-CERFACS | 256 × 128 | SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP4-3.4, SSP4-6.0, SSP5-8.5 |
3. | IPSL-CM5A2-INCA | IPSL | 96 × 96 | SSP1-2.6, SSP3-7.0 |
4. | MRI-ESM2-0 | MRI | 320 × 160 | SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP4-3.4, SSP4-6.0, SSP5-8.5 |
5. | CMCC-CM2-SR5 | CMCC | 288 × 192 | SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5 |
6. | E3SM-1-0 | E3SM-Project | 360 × 180 | SSP5-8.5 |
7. | EC-Earth3-Veg | EC-Earth-Consortium | 512 × 256 | SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5 |
8. | MIROC-ES2L | MIROC | 128 × 64 | SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP4-3.4, SSP4-6.0, SSP5-8.5 |
9. | NESM3 | NUIST | 192 × 96 | SSP1-2.6, SSP2-4.5, SSP5-8.5 |
10. | TaiESM1 | AS-RCEC | 288 × 192 | SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5 |
Ensemble average statistical results from ten GCMs . | Model name . | Two-sample t-test for equal means . | Two-sample Kolmogorov–Smirnov test for same distribution . | Two-sample F-test for equal variances . | Wilcoxon rank sum test for equal median of two populations . | R2 . | RMSE . | MPE . | MAPE . |
---|---|---|---|---|---|---|---|---|---|
30 years AMP series | 1. KDE-Ext | 0.47 | 0.94 | 0.67 | 0.49 | 0.96 | 0.69 | –9.01 | 9.19 |
2. KDE-SP | 0.53 | 0.94 | 0.68 | 0.64 | 0.96 | 0.66 | −7.83 | 8.24 | |
GEV-distributed AMP series up to 100 years RP | 1. KDE-Ext | 0.16 | 0.89 | 0.10 | 0.22 | 0.98 | 0.66 | −8.23 | 8.23 |
2. KDE-SP | 0.23 | 0.96 | 0.11 | 0.33 | 0.98 | 0.62 | −7.04 | 7.04 | |
daily mean | 1. KDE-Ext | 0.86 | 0.79 | 0.99 | 0.71 | 0.99 | 0.57 | −55.19 | 58.09 |
2. KDE-SP | 0.96 | 0.99 | 0.81 | 0.93 | 0.99 | 0.56 | −41.80 | 50.68 | |
Monthly mean | 1. KDE-Ext | 0.86 | 0.79 | 0.98 | 0.71 | 0.99 | 17.46 | −55.16 | 58.08 |
2. KDE-SP | 0.96 | 0.99 | 0.82 | 0.93 | 0.99 | 17.06 | −41.80 | 50.70 | |
Daily median | 1. KDE-Ext | 0.04 | 0.07 | 0.91 | 0.04 | 0.89 | 2.21 | −67.07 | 67.07 |
2. KDE-SP | 0.56 | 0.07 | 0.46 | 0.37 | 0.93 | 0.88 | −26.00 | 27.96 | |
Monthly median | 1. KDE-Ext | 0.72 | 0.74 | 0.81 | 0.47 | 0.99 | 29.31 | −66.10 | 66.20 |
2. KDE-SP | 0.89 | 0.74 | 0.99 | 0.74 | 0.99 | 17.24 | −47.42 | 51.34 | |
Daily standard deviation | 1. KDE-Ext | 0.67 | 0.99 | 0.99 | 0.75 | 0.26 | 6.80 | −24.80 | 41.67 |
2. KDE-SP | 0.63 | 0.79 | 0.98 | 0.71 | 0.26 | 6.80 | −25.78 | 41.39 | |
Monthly standard deviation | 1. KDE-Ext | 0.55 | 0.19 | 0.13 | 0.62 | 0.89 | 30.35 | −26.07 | 59.46 |
2. KDE-SP | 0.51 | 0.19 | 0.13 | 0.58 | 0.90 | 30.75 | −21.85 | 56.63 | |
Daily skewness | 1. KDE-Ext | 0.92 | 0.99 | 0.52 | 0.67 | 0.13 | 0.94 | −2.23 | 25.25 |
2. KDE-SP | 0.79 | 0.79 | 0.53 | 0.62 | 0.11 | 0.94 | −0.39 | 25.32 | |
Monthly skewness | 1. KDE-Ext | 0.22 | 0.19 | 0.95 | 0.14 | 0.50 | 0.61 | 10.83 | 44.57 |
2. KDE-SP | 0.27 | 0.19 | 1.00 | 0.19 | 0.50 | 0.60 | 7.31 | 44.42 |
Ensemble average statistical results from ten GCMs . | Model name . | Two-sample t-test for equal means . | Two-sample Kolmogorov–Smirnov test for same distribution . | Two-sample F-test for equal variances . | Wilcoxon rank sum test for equal median of two populations . | R2 . | RMSE . | MPE . | MAPE . |
---|---|---|---|---|---|---|---|---|---|
30 years AMP series | 1. KDE-Ext | 0.47 | 0.94 | 0.67 | 0.49 | 0.96 | 0.69 | –9.01 | 9.19 |
2. KDE-SP | 0.53 | 0.94 | 0.68 | 0.64 | 0.96 | 0.66 | −7.83 | 8.24 | |
GEV-distributed AMP series up to 100 years RP | 1. KDE-Ext | 0.16 | 0.89 | 0.10 | 0.22 | 0.98 | 0.66 | −8.23 | 8.23 |
2. KDE-SP | 0.23 | 0.96 | 0.11 | 0.33 | 0.98 | 0.62 | −7.04 | 7.04 | |
daily mean | 1. KDE-Ext | 0.86 | 0.79 | 0.99 | 0.71 | 0.99 | 0.57 | −55.19 | 58.09 |
2. KDE-SP | 0.96 | 0.99 | 0.81 | 0.93 | 0.99 | 0.56 | −41.80 | 50.68 | |
Monthly mean | 1. KDE-Ext | 0.86 | 0.79 | 0.98 | 0.71 | 0.99 | 17.46 | −55.16 | 58.08 |
2. KDE-SP | 0.96 | 0.99 | 0.82 | 0.93 | 0.99 | 17.06 | −41.80 | 50.70 | |
Daily median | 1. KDE-Ext | 0.04 | 0.07 | 0.91 | 0.04 | 0.89 | 2.21 | −67.07 | 67.07 |
2. KDE-SP | 0.56 | 0.07 | 0.46 | 0.37 | 0.93 | 0.88 | −26.00 | 27.96 | |
Monthly median | 1. KDE-Ext | 0.72 | 0.74 | 0.81 | 0.47 | 0.99 | 29.31 | −66.10 | 66.20 |
2. KDE-SP | 0.89 | 0.74 | 0.99 | 0.74 | 0.99 | 17.24 | −47.42 | 51.34 | |
Daily standard deviation | 1. KDE-Ext | 0.67 | 0.99 | 0.99 | 0.75 | 0.26 | 6.80 | −24.80 | 41.67 |
2. KDE-SP | 0.63 | 0.79 | 0.98 | 0.71 | 0.26 | 6.80 | −25.78 | 41.39 | |
Monthly standard deviation | 1. KDE-Ext | 0.55 | 0.19 | 0.13 | 0.62 | 0.89 | 30.35 | −26.07 | 59.46 |
2. KDE-SP | 0.51 | 0.19 | 0.13 | 0.58 | 0.90 | 30.75 | −21.85 | 56.63 | |
Daily skewness | 1. KDE-Ext | 0.92 | 0.99 | 0.52 | 0.67 | 0.13 | 0.94 | −2.23 | 25.25 |
2. KDE-SP | 0.79 | 0.79 | 0.53 | 0.62 | 0.11 | 0.94 | −0.39 | 25.32 | |
Monthly skewness | 1. KDE-Ext | 0.22 | 0.19 | 0.95 | 0.14 | 0.50 | 0.61 | 10.83 | 44.57 |
2. KDE-SP | 0.27 | 0.19 | 1.00 | 0.19 | 0.50 | 0.60 | 7.31 | 44.42 |
Note: Daily and monthly rainfall statistics of daily rainfall time series are evaluated for all twelve months (January–December) for simulated and observed time series during VP, and the performance evaluation is carried out with respect to the observed time series statistics.
In this study, the predictor variable is considered as rainfall time series from large-scale GCM outputs to simulate the point-scale rainfall time series. In literature, downscaling methods proposed by several researchers using the KDE technique (Mehrotra & Sharma 2005, 2006, 2010; Kannan & Ghosh 2013; Shashikanth & Ghosh 2013; Shashikanth et al. 2018) are not considered rainfall time series from large-scale GCM outputs as predictor variable/variables. This study tried to invent a novel approach to simulate daily rainfall time series from large-scale GCMs precipitation series and station-scale observed rainfall series through RODM and RADM, using the combination of TP-based two-state first-order Markov chain and univariate KDE techniques.
STUDY LOCATION AND DATA USED
A total number of 35 GCM outputs containing daily precipitation data have been downloaded from the official website of Coupled Model Intercomparison Project Phase 6 (CMIP6), which has historical and at least four scenario outputs. The historical and scenario data used in this study have a temporal length from 1,850 to 2,014 and 2,015 to 2,100, respectively. Seven scenarios named SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP4-3.4, SSP4-6.0, and SSP5-8.5 have been used for downscaling the daily rainfall into three different future periods of 2021–2050 (near future [NF]), 2051–2080 (middle future [MF]), and 2071–2100 (far future [FF]) (Verma et al. 2024). GCM historical data have been used for the GCM historical baseline period (G-HBP) (1969–1998) and GCM historical validation period (G-HVP) (1984–2013).
Selection of GCMs
The number of GCMs is reduced to 10 GCMs to decrease the computational time of the models. However, the best 10 GCM models have been selected to maintain the efficiency of the downscaling models. Initially, 15 GCM models were discarded by comparing the GCM mean monthly rainfall (MMR) and the observed MMR from 1969 to 1988. After that, a threshold value is set as the 90th percentile of all rainy day rainfall values to get the peak over the threshold (POT) value for all the months, for all the GCMs, as well as for observed data during 1969–1988 and 1984–2013. Then, the PF is calculated for both observed and GCM data for all the months. These PF from observed and GCM data are used to calculate the root mean square error (RMSE) and mean absolute percentage error (MAPE). A weight based on the observed POT values is assigned to RMSE and MAPE corresponding to each GCM. Finally, two lists of GCMs corresponding to RMSE and MAPE (Verma et al. 2023) are prepared in ascending order, and the last 5 GCMs from both lists are discarded. RMSE values are given the first preference when selecting the ten best GCMs. The first 10 GCMs in the list of GCMs with the lowest RMSE value are shortlisted, which also belong to the list of GCMs with MAPE. The detailed list of GCM outputs, which are shortlisted and used in this study, is given in Table 1. The list of GCMs, which are discarded based on MMR, and the list of GCMs along with RMSE and MAPE are presented in the Supplementary Material.
METHODOLOGY
This study presents two statistical downscaling methods named KDE-Ext (KDE for extreme and non-extreme series) and KDE-SP (KDE for wet day rainfall series considering support positive) to simulate daily rainfall time series. Both downscaling models are based on a two-state first-order Markov chain and kernel density estimator. At first, TP of wet days and ER are obtained for every month from all rainfall time series. At the initial stage, both models first identified a day as a wet or dry day according to the two-state first-order Markov chain TP. After that, in the KDE-SP method, the rainfall amount on each wet day is simulated from the cumulative distribution function (CDF) based on the KDE technique. Support or boundary of the density function of the KDE-SP method is considered positive for each rainy day rainfall amount series (RRAS). In the case of the KDE-Ext method, a wet day is reclassified as extreme or non-extreme using the probabilities of extreme rainfall (PER). The ER of a series of a particular month is defined as the values that are equal to or greater than a specific threshold value of the series of that particular month. The threshold value (denoted as Th90), which is set by the trial and error method, is taken as the 90th percentile value of the series of that month. After classifying extreme and non-extreme wet days, the rainfall amounts of the classified days are obtained from the CDF of the KDE. In this KDE-Ext method, support of the CDF is considered unbounded (means − ∞ to +∞) for each extreme or non-ER series. The bandwidth (h) of the KDE technique is regarded as auto-selected bandwidth for the RRAS for both downscaling methods in order to simulate the intra-annual variations. The inverse distance weighting (Halder & Saha 2021) technique is used to determine the point-scale parameters or characteristics required for the downscaling method from the gridded GCM daily rainfall series. The GEV distribution (Halder & Saha 2021) and empirical distribution are used to construct the intensity–duration–frequency (IDF) curves from a 30-year annual maximum precipitation (AMP) series.
Rainfall occurrence model
The rainfall occurrence model for downscaling the at-site daily rainfall is done using a two-state, first-order Markov chain technique (Gabriel & Neumann 1962; Wilks & Wilby 1999), assuming the probability of rainfall occurrence on any given day depends only on whether the preceding day was dry or wet. The RODM parameters (Markov chain transition probabilities) are estimated for each rainfall series (OBP, OVP, G-HBP, G-HVP, and GCM scenario series from each SSP available in each GCM for near, middle, and FF) for each month (January–December). The PF of TPM is obtained from the future and historical GCM's daily rainfall series and multiplied with the TPM obtained from OBP data to get the required P-TPM for RODM.
Rainfall amount model
The univariate KDE method (Härdle et al. 2004; Scott 2015; Hollander et al. 2015; Gramacki 2018; Silverman 2018) is used to generate the rainfall amounts from kernel CDF conditioned on the rainfall state as wet day simulated by the RODM. A threshold of 0.1 mm/day rainfall intensity is considered in this study to define a wet day as guided by the IMD. The Gaussian kernel function (Sharma et al. 1997; Härdle et al. 2004; Gramacki 2018) is used in this study to estimate the probability density function for each month. This study used Silverman's rule of thumb for bandwidth estimation (Härdle et al. 2004; Scott 2015; Hollander et al. 2015; Gramacki 2018; Silverman 2018). The Gaussian kernel used an infinite support domain to estimate probability distribution and assign a small probability to some regions of the support domain where the outputs become negative, which is invalid for hydrological parameters such as rainfall.
This leakage of probability problem is addressed by determining whether the simulated values are negative or positive at each phase. A new sample is created from the same kernel slice each time a negative quantity is encountered by creating a new RN μk until a positive amount value is attained (Sharma et al. 1997; Kannan & Ghosh 2013). This technique is used in the KDE-Ext method.
RESULTS AND DISCUSSION
Model validation
In order to validate the downscaling procedure, the downscaling models are run following the procedure as outlined in Figure 2 to project the daily rainfall time series for the VP (1984–2013), taking the GCM data and observed data for the BP (1969–1998). The models have generated 1,000 independent realizations of daily rainfall of length equal to the baseline historical rainfall record. All the statistics shown in this article are the average of all 1,000 independent realizations of daily rainfall series and MMEA. The performance of both models is assessed based on daily and monthly statistics to determine their ability to replicate the observed rainfall characteristics.
The statistical measures (R2, RMSE, MPE, and MAPE) and the hypothetical tests carried out to evaluate the performance of the two downscaling models indicate a good correlation between the variables obtained from observed and predicted outputs during VP for both models, which are shown in Table 2. Four null hypothesis tests listed in Table 2 have been carried out at a significance level of 0.05 (i.e. 95% confidence interval (CI)). In Table 2, the p-value in columns 3–6 represents the probability of observing a test statistic similar to the observed value under the null hypothesis. The results of the null hypothesis tests for both models show that the null hypothesis can't be rejected at a 95% CI for all four tests, which means the values listed in the first column in Table 2 from both observed and simulated series during VP are from the same population. Our primary focus is to create a model capable of simulating ER very well. From Table 2, it can be clearly understood that both models can simulate ER very well, as statistics from extreme parameters show satisfactory outputs.
Future projection
. | . | Percentage change with respect to observed (1969–1998) rainfall . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
. | KDE-Ext . | KDE-SP . | |||||||||
RP . | 2 . | 5 . | 10 . | 15 . | 30 . | 2 . | 5 . | 10 . | 15 . | 30 . | |
Obsa | 4.38 | 5.99 | 11.06 | 13.16 | 14.32 | 4.38 | 5.99 | 11.06 | 13.16 | 14.32 | |
VP | 17.60 | 26.05 | −7.22 | −10.87 | −6.04 | 15.75 | 24.83 | −7.48 | −11.19 | −6.97 | |
SSP1-1.9 | 1. NF | 18.10 | 22.02 | −13.04 | −14.33 | −2.93 | 16.17 | 20.25 | −13.17 | −14.06 | −3.57 |
2. MF | 25.22 | 36.41 | 1.16 | −1.64 | 5.83 | 23.56 | 35.32 | 0.59 | −1.95 | 3.80 | |
3. FF | 22.48 | 34.24 | −3.13 | −7.28 | −0.93 | 20.23 | 32.91 | −3.92 | −7.65 | −2.72 | |
SSP1-2.6 | 1. NF | 27.65 | 34.16 | −0.53 | −3.43 | 3.07 | 26.43 | 33.27 | −0.25 | −2.97 | 2.41 |
2. MF | 39.42 | 54.50 | 19.11 | 17.15 | 26.11 | 37.57 | 52.82 | 18.69 | 16.69 | 25.29 | |
3. FF | 26.43 | 44.09 | 6.86 | 4.63 | 13.46 | 24.69 | 43.43 | 6.91 | 4.46 | 12.32 | |
SSP2-4.5 | 1. NF | 18.27 | 24.56 | −10.41 | −13.85 | −8.41 | 16.45 | 23.07 | −11.23 | −14.51 | −9.52 |
2. MF | 23.33 | 33.77 | −2.27 | −6.55 | −1.64 | 21.05 | 30.70 | −4.15 | −7.80 | −3.37 | |
3. FF | 27.37 | 32.09 | −3.62 | −4.57 | 8.50 | 25.61 | 31.94 | −3.62 | −4.18 | 8.46 | |
SSP3-7.0 | 1. NF | 52.76 | 71.31 | 36.25 | 33.28 | 40.29 | 36.75 | 53.23 | 15.74 | 16.91 | 30.37 |
2. MF | 74.24 | 83.43 | 51.86 | 42.48 | 41.94 | 55.02 | 78.63 | 32.73 | 30.46 | 40.53 | |
3. FF | 41.84 | 62.13 | 28.95 | 32.24 | 48.41 | 41.37 | 57.49 | 26.72 | 29.10 | 44.30 | |
SSP4-3.4 | 1. NF | 34.81 | 33.18 | −6.62 | −10.25 | −3.97 | 34.72 | 32.85 | −5.89 | −10.04 | −4.56 |
2. MF | 34.08 | 39.99 | −0.31 | −6.82 | −2.84 | 33.24 | 37.62 | −0.40 | −7.34 | −6.66 | |
3. FF | 39.29 | 46.53 | 2.93 | −3.05 | 2.03 | 36.60 | 44.61 | 1.98 | −4.84 | −1.45 | |
SSP4-6.0 | 1. NF | 23.24 | 21.00 | −15.71 | − 19.51 | −13.57 | 21.90 | 19.83 | −16.47 | − 20.70 | −14.99 |
2. MF | 36.53 | 41.21 | 3.23 | −0.56 | 4.69 | 35.35 | 39.86 | 4.21 | −0.26 | 3.53 | |
3. FF | 50.21 | 46.46 | −2.49 | −8.63 | −2.91 | 49.85 | 44.23 | −3.39 | −9.20 | −3.97 | |
SSP5-8.5 | 1. NF | 21.62 | 35.66 | 2.65 | −9.84 | 6.96 | 19.39 | 29.89 | −1.67 | −3.51 | 2.79 |
2. MF | 47.99 | 59.34 | 12.83 | 2.84 | 14.87 | 34.05 | 48.68 | 12.78 | 7.94 | 12.56 | |
3. FF | 75.27 | 93.21 | 52.83 | 38.66 | 62.50 | 61.72 | 80.93 | 39.84 | 38.98 | 51.24 |
. | . | Percentage change with respect to observed (1969–1998) rainfall . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
. | KDE-Ext . | KDE-SP . | |||||||||
RP . | 2 . | 5 . | 10 . | 15 . | 30 . | 2 . | 5 . | 10 . | 15 . | 30 . | |
Obsa | 4.38 | 5.99 | 11.06 | 13.16 | 14.32 | 4.38 | 5.99 | 11.06 | 13.16 | 14.32 | |
VP | 17.60 | 26.05 | −7.22 | −10.87 | −6.04 | 15.75 | 24.83 | −7.48 | −11.19 | −6.97 | |
SSP1-1.9 | 1. NF | 18.10 | 22.02 | −13.04 | −14.33 | −2.93 | 16.17 | 20.25 | −13.17 | −14.06 | −3.57 |
2. MF | 25.22 | 36.41 | 1.16 | −1.64 | 5.83 | 23.56 | 35.32 | 0.59 | −1.95 | 3.80 | |
3. FF | 22.48 | 34.24 | −3.13 | −7.28 | −0.93 | 20.23 | 32.91 | −3.92 | −7.65 | −2.72 | |
SSP1-2.6 | 1. NF | 27.65 | 34.16 | −0.53 | −3.43 | 3.07 | 26.43 | 33.27 | −0.25 | −2.97 | 2.41 |
2. MF | 39.42 | 54.50 | 19.11 | 17.15 | 26.11 | 37.57 | 52.82 | 18.69 | 16.69 | 25.29 | |
3. FF | 26.43 | 44.09 | 6.86 | 4.63 | 13.46 | 24.69 | 43.43 | 6.91 | 4.46 | 12.32 | |
SSP2-4.5 | 1. NF | 18.27 | 24.56 | −10.41 | −13.85 | −8.41 | 16.45 | 23.07 | −11.23 | −14.51 | −9.52 |
2. MF | 23.33 | 33.77 | −2.27 | −6.55 | −1.64 | 21.05 | 30.70 | −4.15 | −7.80 | −3.37 | |
3. FF | 27.37 | 32.09 | −3.62 | −4.57 | 8.50 | 25.61 | 31.94 | −3.62 | −4.18 | 8.46 | |
SSP3-7.0 | 1. NF | 52.76 | 71.31 | 36.25 | 33.28 | 40.29 | 36.75 | 53.23 | 15.74 | 16.91 | 30.37 |
2. MF | 74.24 | 83.43 | 51.86 | 42.48 | 41.94 | 55.02 | 78.63 | 32.73 | 30.46 | 40.53 | |
3. FF | 41.84 | 62.13 | 28.95 | 32.24 | 48.41 | 41.37 | 57.49 | 26.72 | 29.10 | 44.30 | |
SSP4-3.4 | 1. NF | 34.81 | 33.18 | −6.62 | −10.25 | −3.97 | 34.72 | 32.85 | −5.89 | −10.04 | −4.56 |
2. MF | 34.08 | 39.99 | −0.31 | −6.82 | −2.84 | 33.24 | 37.62 | −0.40 | −7.34 | −6.66 | |
3. FF | 39.29 | 46.53 | 2.93 | −3.05 | 2.03 | 36.60 | 44.61 | 1.98 | −4.84 | −1.45 | |
SSP4-6.0 | 1. NF | 23.24 | 21.00 | −15.71 | − 19.51 | −13.57 | 21.90 | 19.83 | −16.47 | − 20.70 | −14.99 |
2. MF | 36.53 | 41.21 | 3.23 | −0.56 | 4.69 | 35.35 | 39.86 | 4.21 | −0.26 | 3.53 | |
3. FF | 50.21 | 46.46 | −2.49 | −8.63 | −2.91 | 49.85 | 44.23 | −3.39 | −9.20 | −3.97 | |
SSP5-8.5 | 1. NF | 21.62 | 35.66 | 2.65 | −9.84 | 6.96 | 19.39 | 29.89 | −1.67 | −3.51 | 2.79 |
2. MF | 47.99 | 59.34 | 12.83 | 2.84 | 14.87 | 34.05 | 48.68 | 12.78 | 7.94 | 12.56 | |
3. FF | 75.27 | 93.21 | 52.83 | 38.66 | 62.50 | 61.72 | 80.93 | 39.84 | 38.98 | 51.24 |
aObs represents the observed BP outputs.
The bold value signify the maximum and minimum percentage changes.
The changes presented in Table 3 are calculated using empirically distributed AMP series from 30 years duration. The results from all SSPs show more increments in lower-order RP compared to higher-order RP for both downscaling methods. The most extreme situation may occur under SSP3-7.0, SSP1-2.6, and SSP5-8.5 scenarios in NF, MF, and FF, respectively. SSP1-2.6 shows a major increasing trend in the MF, which means ER may occur in the middle of the century under SSP1-2.6. Other SSPs showed an increasing trend in ER when the time period increased from NF to FF. The maximum increasing trend is observed under the SSP5-8.5 scenario. The highest increment can be observed in the FF at 5-year RP for both KDE-Ext and KDE-SP methods, and the corresponding values are 93.21 and 80.93%, respectively. Some decreasing trends of ER intensities are also found in some scenarios, especially when the RP becomes high for both downscaling methods. The maximum negative deviation in ER intensity was observed in the SSP4-6.0 scenario in the NF at 15-year RP, and values are near about 20% for both models. The daily IDF curves using GEV distribution have also been obtained up to 100 years RP and similar changes were found to those observed from empirical distribution.
DISCUSSION
It can be seen from Figure 6 that the simulated number of wet and dry days for every month is almost similar to the observed BP and VP. However, it can be observed that the simulated number of wet and dry days is more likely to be OBP compared to OVP. The simulated number of wet and dry days is obtained by the RODM, which is based on the two-state first-order Markov chain model. So, from the observation, it can be said that it is more skilled in replicating the BP data, as it is not a non-stationary model. This is a limitation of the two-state, first-order Markov chain model. However, this simulation depends on the difference between the number of wet and dry days between the observed BP and VP, and PF comes from the GCM outputs.
The results, shown in Table 2, suggest that both the models performed well by simulating similar outputs like the observed historical data. The R2 values of the ER series for both models lie between 0.96 and 0.98, which means that the RADM can generate ER values like the OVP rainfall series, which supports the model towards the study focus on the simulation of ER values. Figure 4 also proved that both models can simulate ER values like the OVP rainfall series. Though both models slightly overestimated the OVP extreme and slightly underestimated the OBP extreme, the ensemble average of extreme values balances both the OVP and OBP extreme values. The most acceptable GCM models with respect to OVP ER are the ‘TaiESM1’, followed by ‘CMCC-CM2-SR5’, followed by ‘E3SM-1-0, GCM outputs, according to Figure 3.
The maximum increasing trend is observed under the SSP5-8.5 scenario in both empirical and GEV distribution, which is under high mitigation challenges and low adaptation challenges (Riahi et al. 2017), but under the highest trajectories of radiative forcing (W/m2), global mean temperature, and global CO2 emission (Gidden et al. 2019). These findings also support the investigation of Maity & Maity (2022), who have exhibited a significant increment in the hourly rainfall intensity of about 41–44% under the SSP585 scenarios. Maximum percentage increment using empirical and GEV distribution is observed under the SSP5-8.5 scenario at 5-year and 2-year RP for both downscaling methods, respectively. These increments in ER may significantly impact the design of urban drainage networks in the study area as 2–5-year RP IDF curves are generally used. It is found that mainly SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios indicate significant climate change in the future period under ER conditions. Crévolin et al. (2023) simulate the IDF curves using the Quantile-Quantile Downscaling method for 30 major cities in Canada. They found that most cities may experience high-intensity storms with an average increment of around 30% under the SSP2-4.5 and 40% under the SSP5-8.5 between 2,071 and 2,100, whereas this study found a maximum of 32.09% under the SSP2-4.5 in the FF and a range of 38.66–93.21% increment in ER in the FF. Xu et al. (2024) observed a maximum increment of 40, 31, 27, and 22% in the AMR under the SSP1-2.6, SSP5-8.5, SSP2-4.5, and SSP3-7.0 in the study area of Barranquilla, Colombia, which is also a similar finding of this study. Halder & Saha (2021) simulated the IDF curves for the same study location (Alipore IMD) using the quantile perturbation downscaling method from CMIP6 data. They also found a significant increase in ER intensity for most of the GCMs and SSPs in future periods like this study.
If we compare both downscaling methods with respect to validation results, then it can be observed that both models show almost similar types of results. The KDE-SP model simulates the values well for lower-order RP, whereas the KDE-Ext model simulates the values well for higher-order RP. So, the KDE-Ext model is preferred when the expected design life of the infrastructure system is very high, but in general, the KDE-SP model can be used as it provides a better result in lower-order RP, and also, the results from higher-order RP are good. Another advantage of the KDE-SP model is that it is simpler than KDE-Ext.
CONCLUSION
This study has shown two reasonably simple semi-parametric single-site rainfall simulation models that can generate long synthetic daily rainfall sequences and reflect both the short- and long-term variability attributes present in the observed historical record. Using a first-order Markov model and transition probabilities, the method replicates rainfall events. The at-site daily rainfall amount specified as a rainy day or extreme rainy day identified by the classification model is simulated based on the univariate KDE model. The present downscaling approach uses rainfall data from GCMs outputs as predictor variables, thereby outlining potential applications of the method for simulating the rainfall field in a changing climate as predicted by GCMs. Therefore, the current downscaling method can simulate rainfall in a changing climate.
In literature, the weather generator process with the Markovian framework uses parametric distribution to estimate rainfall amounts. The main disadvantage of the method is that ER cannot be simulated properly using a single distribution. Also, the same distribution cannot be applied universally. Even using different types of distribution for extreme and non-ER, the method cannot replicate the observed rainfall properly. In comparison, the methods proposed in this study are also based on the Markovian framework but use nonparametric distribution, which is more robust as it can be applied to all types of rainfall data. Additionally, it is able to modify the distribution by changing the bandwidth according to data type. It is proved that the single kernel distribution (KDE-SP) model is enough to simulate the extreme and non-ER accurately for different types of rainfall data, whereas, in the case of parametric distribution, it cannot be possible with a single distribution.
In the validation stage, both downscaling methods showed very good performance of RODM, which was reflected through the spell length check and the number of wet/dry day checks for every month during the OVP. RADM also shows excellent performance, which was reflected through the 90% CI band (for the IDF curve) of GEV-distributed observed AMRs during OVP, and the model predicted the IDF curve for the same duration.
The daily rainfall intensity is projected over three time periods, each 30 years in duration, during the 21st century, which is presented as daily IDF curves showing considerable changes. The maximum changes were observed during the end of the century, i.e., in the FF. The rainfall is projected according to the different GCM scenarios available on CMIP6. Different scenarios show different results, but SSP5-8.5 shows a maximum increasing trend. These results can be used for the design of different hydraulic components. It is very difficult to suggest a particular scenario for design purposes. The selection of scenarios may be decided according to design requirements, risk factors taken, design life of the structure, climate condition of the location, and assessment of climate change.
The foremost important objective of the study was to simulate the ER perfectly, which is fulfilled most precisely by both models. The developed rainfall sequences for future emission scenarios will be an important source of data for research on how climate change will affect regional hydrology. In the present downscaling methods, observed and GCM rainfall data series are used to simulate both the rainfall state and amounts. However, there may be some scope for incorporating other climate variables along with rainfall data for the simulation of rainfall state and amounts, which may show greater variability and thus may improve the downscaling models to simulate rainfall state and amounts more robustly.
ACKNOWLEDGEMENTS
The authors acknowledge the Department of Civil Engineering, Indian Institute of Engineering Science and Technology (IIEST), Shibpur, for providing the infrastructure support. The authors also acknowledge the India Meteorological Department (IMD) for providing the required meteorological data for research purposes.
FUNDING
The authors declare that no funding was received for doing this research.
AUTHORS' CONTRIBUTIONS
S. H. conceptualized the whole article, developed the methodology and software, rendered support in formal analysis, and wrote the original draft preparation and edited the article. U. S. supervised the article, and wrote the review and edited the article.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.