Data scarcity and unavailability of observed rainfalls in the northeastern states of India limit prediction of extreme hydro-climatological changes. To fill this gap, a data assimilation approach has been applied to re-construct accurate high-resolution gridded (5 km2) daily rainfall data (2001–2020), which include seasonality assessment, statistical evaluation, and bias correction. Random forest (RF) and support vector regression were used to predict rainfall time series, and a comparison between machine learning and data assimilation-based gridded rainfall data was performed. Five gridded rainfall datasets, namely, Indian Monsoon Data Assimilation and Analysis (IMDAA) (12 km2), APHRODITE (25 km2), India Meteorological Department (25 km2), PRINCETON (25 km2), and CHIRPS (25 and 5 km2), have been utilized. For re-constructed rainfall datasets (5 km2), the comparative seasonality and change assessment have been performed with respect to other rainfall datasets. CHIRPS and APHRODITE datasets have shown better similarities with IMDAA. The RF and assimilated rainfall (AR) have superiority based on bias and extremity, and AR data were recognized as the best accurate data (>0.8). Precipitation change analysis (2021–2100) performed utilizing the bias-corrected and downscaled CMIP6 datasets showed that the dry spells will be enhanced. Considering the CMIP6 moderate emission scenario, i.e., SSP245, the wet spell will be enhanced in future; however, when considering SSP585 (representing the extreme worst case), the wet spells will be decreased.

  • A unique data assimilation approach is applied to construct an accurate high-resolution gridded (5 km2) daily rainfall time series.

  • Evaluation and bias correction of multisource gridded rainfall datasets were performed.

  • Random forest and support vector regression machine learning methods were applied for the prediction of rainfall.

  • Assessment of long-term rainfall changes was done in the wettest regions of the world.

Several regions of the world suffer from the daily or sub-daily basis availability of high-resolution rainfall datasets due to the limited presence of rainfall gauges (Gupta et al. 2020a). In India, regions such as the Himalayan river basins and northeastern India (which is also known as one of the wettest regions of the world) have limited availability of rainfall gauges, and therefore, these regions lack a standard and accurate rainfall data product that could be utilized for different watershed applications and also to analyse extreme hydro-climatic changes in the region (Bharti & Singh 2015; Gupta et al. 2020a). For the analysis of extreme event conditions such as floods and droughts, which may be linked to climate change (Mukherjee et al. 2018), the high resolution and at least daily rainfall availability are required (Alexander et al. 2019). Due to the poor availability of rainfall gauges or high-resolution accurate rainfall datasets (e.g. gridded rainfall datasets), the prediction and simulation of hydrological events can be highly uncertain (Singh & Xiaosheng 2019). Therefore, in a data-scarce region like the northeastern states of India, due to the low density of rainfall gauges and the lack of presence of long-term availability of high-resolution gridded rainfall time series (mostly available at >12.5 km2 scale) (Zahan et al. 2021), there is an urgent requirement to develop high-resolution (say up to 5 km2 scale or less) gridded rainfall datasets, so that the near-tern and long-term changes related to rainfall extremity in the northeastern states of India can be addressed accurately.

The availability of universally acknowledged high-resolution open-source gridded rainfall datasets such as Climate Hazards Group InfraRed Precipitation (CHIRPS) (available at 25 km2 and 5 km2 scale), Tropical Rainfall Measuring Mission (TRMM), APHRODITE (available at 25 km2 scale), Soil Moisture to Rain (SM2RAINASCAT), and PRINCETON rainfall data (available 25 km2 scale) provides a viable source to assess the rainfall variability and pattern in different parts of the world (Aggarwal et al. 2022; Bhattacharyya et al. 2022). The reliability of these gridded rainfall datasets has been explored around the world and in India, which provides valuable information for the long-term assessment of rainfall variability mostly at a larger scale (Singh & Xiaosheng 2019). In a study conducted by Gupta et al. (2020a), the applicability of various sources of gridded rainfall datasets across India was compared and tested, and it showed that the CHIRPS and TRMM performed better to capture the rainfall characteristics with reference to India Meteorological Department (IMD) data in most of the regions (Sulugodu & Deka 2019). However, TRMM has shown some predictions in the northeastern regions. As per the applicability of these different rainfall datasets in India, mostly performed at a larger scale, it is found that each dataset has its own advantages and limitations (Dubey et al. 2021). The temporal and spatial availability of these rainfall data sources restrict the assessment of the short-term and long-term impact of rainfall at a higher resolution spatial scale (Singh & Xiaosheng 2019).

As per the obtained feedback from the previous studies, the new hybrid and improved (in terms of resolution and accuracy) rainfall datasets can be generated to better analyse the long-term rainfall changes even at the basin scale or smaller scale (Pai et al. 2014). Many studies applied data assimilation (DA) techniques to adjust or generate new datasets for better numerical predictions (Lu et al. 2018; Singh & Xiaosheng 2019). Singh & Xiaosheng (2019) utilized the data assimilating approach for the construction of long-term daily gridded rainfall datasets over Southeast Asia, and by utilizing several statistical methods, they successfully removed the time series gaps in the rainfall data. Several studies demonstrated the utilities and consequences of machine learning methods such as decision forest regression, neural network regression, multilayer perceptron, random forest (RF), and support vector regression (SVR) methods for the prediction of short-term and long-term rainfall datasets (Ridwan et al. 2021; Barrera-Animas et al. 2022). While testing the capabilities of machine learning methods for the prediction of rainfall datasets, some methods are found reliable for short-term rainfall predictions like SVR (Kajewska-Szkudlarek 2020), and some methods are found efficient for long-term predictions of the rainfall such as RF (Pham et al. 2019). Several studies demonstrated the utility of different statistical functions and bias correction methods such as quantile mapping (QM) (Singh & Xiaosheng 2019; Kumar et al. 2021), quantile–quantile analysis (Gupta et al. 2020a), and probability methods (Fang et al. 2015; Shivam et al. 2019) for the correction of rainfall datasets.

A significant impact of climate change has been noticed in the last few years over the Indian Monsoon system, which may be caused due to climate change, and this has impacted the rainfall pattern and amounts in terms of both intensity and frequency across India (Gupta et al. 2020b; Kumar et al. 2021). Many regions in India, especially the hilly regions including the northeastern regions of India, have been threatened by severe extreme events such as flash droughts and extreme high floods (Yaduvanshi et al. 2019; Sharma & Goyal 2020). A study conducted by Mukherjee et al. (2018) elaborated that the annual maximum precipitation will be decreased in the northeastern regions of India. It was observed that the northern part of the northeastern states showed a decrease in rainfall, which varied from 3% in the northwestern part to ∼12% in the northeastern part (Ravindranath et al. 2011). The increase in temperature and rainfall variability causing due to climate change has exerted pressure on the overall water availability in Mizoram and other northeastern regions through increased rate of evapotranspiration and altering the overall water balance (Ravindranath et al. 2011; Monsang et al. 2021). However, the northeastern region's specific observations of extreme rainfall changes are less explored, which might be crucial for analysing the impact of climate change on the current and long-term water availability and water security in the region.

Considering the aforementioned points, this study mainly focuses on the construction of accurate and reliable high-resolution gridded rainfall datasets (i.e. 5 km2 grid scale) for the selected study area to enhance the scope of analysing the current and long-term rainfall changes. The second objective is to analyse the historical and long-term (1991–2100) rainfall changes in the selected study area using constructed rainfall data and climate model datasets by formulating various extreme rainfall climate indices (RCIs). For this purpose, first, the de-bias of the latest large-scale gridded (25 km2) coupled model inter-comparison project phase six (CMIP6) climate model datasets with shared socioeconomic pathways (SSPs) experiments (i.e. SSP245 and SSP585) with reference to the newly generated rainfall datasets has been done, and then while utilizing the CMIP6 climate model datasets (Gupta et al. 2020b), the near-term and long-term rainfall changes have been analysed. For the construction of accurate gridded high-resolution rainfall data, DA with machine learning methods has been performed. For the DA, various open-source gridded rainfall datasets such as Indian Monsoon Data Assimilation and Analysis (IMDAA) (12 km2), APHRODITE (25 km2), IMD (25 km2), PRINCETON (25 km2), and CHIRPS (25 and 5 km2) have been utilized (Gupta et al. 2020a). For assimilation, the least error datasets were found with reference to the observed gridded IMD rainfall dataset by statistical functions and quantile–quantile (Q–Q) plots, and then the bias correction was done to remove the uncertainty in the rainfall data. For the accuracy assessment of the newly constructed time series gridded rainfall dataset (5 km2), the comparative seasonality and change assessment have been performed with respect to other rainfall datasets. For the rainfall predictions and assessment of the newly constructed time series gridded rainfall dataset, machine learning methods such as RF and SVR have been employed. The SVR and RF have been successfully utilized for the prediction of time series rainfall datasets across the world (Pham et al. 2019). For the assessment of near-term and long-term rainfall changes, the standard and widely used RCIs such as annual mean, dry spell frequency, wet spell frequency, and maximum 1-day precipitation per year (Rx1D) have been formulated and analysed.

The present study area comprises the northeastern region which belongs to the latitudes 22.0°–26.0° and longitudes 92.0°–94.5°, which covers mainly the Mizoram state and several parts of Assam, Tripura, Manipur, Meghalaya, and Nagaland (Figure 1). The selected study area comprises the parts of three river basins such as Barak and minor rivers draining into Bangladesh (MRD-BAN) and minor rivers draining into Myanmar (MRD-MYA) sub-basins. The average rainfall of the region corresponded to ∼2,500 to ∼6,000 mm, and this region can be categorized as the wettest region in the world. The topographical elevation varies from 10 to 3,100 m, and the topography of the terrain is the most variegated topography among all hilly areas in this part of the country. The hills are extremely rugged and steep; the ranges are running in the north–south direction, leaving some plains scattered occasionally here and there. Furthermore many rivers and streamlets drain the hill ranges.

Data sources

In this study, six rainfall datasets such as IMD (25 km2), IMDAA re-analysis (12 km2), PRINCETON University Rainfall datasets (25 km2), CHIRPS (25 km2), CHIRPS (5 km2), and APHRODITE (25 km2) for the homogeneous time period, i.e. 1979–2020, have been utilized. Among the aforementioned grid-based rainfall datasets, the IMDAA (12 km2) and IMD (25 km2) datasets, specifically generated for the Indian region using gauged rainfall stations, have been considered the most accurate and reliable observed rainfall datasets (Ashrit et al. 2020). All these gridded rainfall datasets have been obtained in the NETCDF file format. For the data extraction, the Python scripts were written, and the rainfall values were extracted for the current study region.

IMDAA re-analysis rainfall datasets

The IMDAA re-analysis is a regional atmospheric re-analysis that encompasses the Indian subcontinent. The IMDAA re-analysis datasets have been generated by the National Centre for Medium-Range Weather Forecasting (NCMRWF) and the IMD (Ashrit et al. 2020) (https://rds.ncmrwf.gov.in/datasets). Previous studies have utilized the IMDAA datasets across India (Ashrit et al. 2020; Rani et al. 2021).

CHIRPS rainfall

Climate Hazards Group Infrared Precipitation with Station Data (CHIRPS) is a quasi-global rainfall dataset spanning 35 years (https://data.chc.ucsb.edu/products/CHIRPS-2.0/). The CHIRPS rainfall datasets are available in two resolutions i.e. 0.05° × 0.05° (i.e. 5 km2) and 0.25° × 0.25° (25 km2), and in this study, both resolution datasets (1981–2020) have been utilized. The CHIRPS rainfall product incorporates Climate Hazards Group Rainfall Climatology (CHP Clim), Tropical Rainfall Measuring Mission (TRMM) 3B42 rainfall product, Geostationary Thermal Infrared Satellite Observations, atmospheric model rainfall from NOAA Climate Forecast System, and gauge rainfall observations from national or regional meteorological sources (Sulugodu & Deka 2019; Gupta et al. 2020a).

APHRODITE rainfall

Asian Precipitation – Highly-Resolved Observational Data Integration Towards Evaluation (APHRODITE's) (Yatagai et al. 2012; Banerjee et al. 2020) gridded precipitation is a series of long-term (1951–2016) continental-scale daily products for Asia, including the Himalayas, South and Southeast Asia, and mountainous areas in the Middle East (https://climatedataguide.ucar.edu/climate-data/aphrodite-asian-precipitation-highly-resolved-observational-data-integration-towards). APHRODITE gridded data products are available for four subdomains (Monsoon Asia, Middle East, Russia, and Japan), as well as a unified domain. Except for Japan, which has a 0.05° × 0.05° horizontal resolution, the time-varying datasets have a 0.25° × 0.25° (25 km2) or 0.05° × 0.05° (5 km2) horizontal resolution. In this study, the Monsoon Asia-based climatological daily mean precipitation datasets with a resolution of 0.25° × 0.25° (25 km2) have been utilized (Yasutomi et al. 2011; Bhattacharyya et al. 2022). This dataset was prepared using gauged-based rainfalls (around 12,000 rain gauge stations over the entire Asian region), and the angular distance weighting interpolation method was used for the gridding of rainfall observations (Yasutomi et al. 2011; Singh & Xiaosheng 2019).

PRINCETON rainfall

The Terrestrial Hydrology Research Group Princeton University provides a gridded daily rainfall dataset (PRINCETON) from 1948 to 2008 at a grid resolution of 0.50° × 0.50° globally (http://hydrology.princeton.edu/data.pgf.php). This dataset has been utilized in various hydro-climatological studies around the world (Sheffield et al. 2006; El Kenawy & McCabe 2016). The PRINCETON dataset was prepared utilizing the NCEP re-analysis dataset with observational datasets such as TRMM, CRU (Sheffield et al. 2006; El Kenawy & McCabe 2016), GPCP (El Kenawy & McCabe 2016), and NASA SRB products (Sheffield et al. 2006). This dataset showed better accuracy as compared to other global datasets (Sheffield et al. 2006; El Kenawy & McCabe 2016) and does not contain gaps over SEA. In this study, the PRINCETON rainfall data are adopted for the generation of long-term rainfall time series over the SEA.

Princeton University's Terrestrial Hydrology Research Group provides a gridded daily rainfall dataset (PRINCETON) from 1948 to 2016 with a global grid resolution of 0.25° × 0.25° (http://hydrology.princeton.edu/data.pgf.php) and hence utilized in the study (Singh & Xiaosheng 2019). PRINCETON rainfall datasets have been used in a variety of hydro-climatological investigations all over the world (Sheffield et al. 2006; El Kenawy et al. 2016; Singh & Xiaosheng 2019). The PRINCETON dataset was created by combining the NCEP re-analysis dataset with observational data from TRMM, CRU (Sheffield et al. 2006; El Kenawy et al. 2016), GPCP (El Kenawy et al. 2016), and NASA SRB products (Sheffield et al. 2006). This dataset outperformed comparable worldwide datasets in terms of accuracy (Sheffield et al. 2006; El Kenawy et al. 2016).

CMIP6 climate models

In this study, the latest climate model datasets by CMIP under the World Climate Research Programme (WCRP) named CMIP6 developed after CMIP5 have been utilized. CMIP6 marks a significant increase over CMIP5, and a new set of emissions scenarios based on various socioeconomic assumptions known as ‘shared socioeconomic pathways’ (SSPs) have been developed (Gupta et al. 2020b; Samantaray et al. 2022). These scenarios are called SSP1-2.6, SSP2-4.5, SSP4-6.0, and SSP5-8.5, each of which results in similar 2100 radiative forcing levels as their predecessor in AR5 (Mishra et al. 2020).

Mishra et al. (2020) have developed a bias-corrected CMIP6 climate model data of precipitation, maximum temperature, and minimum temperature for six countries in South Asia. Each zipped country file contains 13 models, and each model includes five scenarios (historical, SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5). In this analysis, four climate models such as ACCESS-ESM 1-5, BCC-CSM2-MR, EC-Earth3, and MRI-ESM2-0 considering three scenarios each (e.g. historical, SSP2-4.5, SSP5-8.5) have been selected among 13 climate models (https://zenodo.org/record/3987736). To select the best applicable model (as selected above) for the current study region, the model's historical rainfall data are compared with the observed historical data by following the previous research works (Gupta et al. 2020b; Samantaray et al. 2022). The overall rainfall and climate data availability is shown in Table 1.

Table 1

Rainfall and climate model data availability

SI. No.Dataset nameResolutionTime series availability
CHIRPS 0.05° × 0.05° and 0.25° × 0.25° 1981–2021 
IMDAA re-analysis 0.12° × 0.12° 1979–2021 
IMD 0.25° × 0.25° 1901–2021 
PRINCETON 0.25° × 0.25° 1948–2016 
APHRODITE 0.25° × 0.25° 1951–2007, 2007–2015 
ACCESS-ESM 1-5 (SSP245 and SSP585) 0.25° × 0.25° 2014–2100 
BCC-CSM2-MR (SSP245 and SSP585) 0.25° × 0.25° 2014–2100 
EC-Earth3 (SSP245 and SSP585) 0.25° × 0.25° 2014–2100 
MRI-ESM2-0 (SSP245 and SSP585) 0.25° × 0.25° 2014–2100 
SI. No.Dataset nameResolutionTime series availability
CHIRPS 0.05° × 0.05° and 0.25° × 0.25° 1981–2021 
IMDAA re-analysis 0.12° × 0.12° 1979–2021 
IMD 0.25° × 0.25° 1901–2021 
PRINCETON 0.25° × 0.25° 1948–2016 
APHRODITE 0.25° × 0.25° 1951–2007, 2007–2015 
ACCESS-ESM 1-5 (SSP245 and SSP585) 0.25° × 0.25° 2014–2100 
BCC-CSM2-MR (SSP245 and SSP585) 0.25° × 0.25° 2014–2100 
EC-Earth3 (SSP245 and SSP585) 0.25° × 0.25° 2014–2100 
MRI-ESM2-0 (SSP245 and SSP585) 0.25° × 0.25° 2014–2100 

Data assimilation

In this study, DA has been performed to combine rough-scale rainfall datasets from various sources in a synergistic way to generate a new and accurate high-resolution gridded rainfall product, which is able to capture the seasonality, distribution, and extremity with respect to the observed data (i.e. IMDAA). DA comprises the four main steps: (i) evaluation of the various rainfall datasets and selection of the most applicable rainfall datasets, (ii) downscaling and bias correction of rainfall datasets, (iii) prediction of rainfall datasets using deep learning methods, and (iv) final comparison and evaluation of the best rainfall dataset.

Previously, various mathematical methods (e.g. distance power method, distance power with high correlation coefficient, linear regression, multiple linear regression, quantile regression) have been used to assimilate time series meteorological datasets (Singh & Xiaosheng 2019). In this study, quantile regression or quantile-based mapping technique has been used, because it is found useful for capturing extreme values and rainfall distribution patterns (Gupta et al. 2020a). Very few studies have used this method in the correction and adjustment of rainfall datasets (Singh & Xiaosheng 2019; Gupta et al. 2020a).

For the construction of a long-term hybrid and improved gridded rainfall dataset at high-resolution scale (0.05° × 0.05°), different rainfall datasets have been incorporated and evaluated. For this purpose, first, the best applicable datasets have been identified, and then the selected best datasets have been downscaled at 0.05° × 0.05° resolution. Then the bias corrections and gap filling of the rainfall datasets were performed to correct the rainfall time series (Figure 2). For the downscaling (or re-gridding) of the gridded rainfall datasets, the Python scripts were written by using the XARRAY and NETCDF libraries by incorporating linear scaling and nearest neighbour interpolation methods (Lang 2015; Westra 2015).
Figure 1

Study area map showing the northeastern regions in India.

Figure 1

Study area map showing the northeastern regions in India.

Close modal
Figure 2

Workflow of methodology adopted for the construction of new hybrid rainfall datasets using data assimilation.

Figure 2

Workflow of methodology adopted for the construction of new hybrid rainfall datasets using data assimilation.

Close modal

Seasonality and statistical evaluation

In this study, the selected rainfall datasets were available for different time spans and resolutions. Therefore, for comparison and evaluation of the datasets, homogeneous rainfall datasets have been prepared for a common time period (1989–2015) at the same grid resolution (i.e. 0.25° × 0.25°). All the datasets were available at this resolution except IMDAA, which was upscaled from 0.12° × 0.12° to 0.25° × 0.25° using the nearest neighbourhood interpolation method (Teegavarapu et al. 2018). The rainfall data gaps have also been corrected to improve the accuracy of the overall dataset. Rainfall gaps were mostly present in the IMD datasets, and the rainfall gaps have been filled using a data gap-filling approach as previously applied by Singh & Xiaosheng (2019). In this study, to identify the most suitable datasets for the selected study region, the seasonality and statistical evaluation have been performed at 0.25° × 0.25°. For this purpose, the IMDAA is considered as the observed dataset. Rainfall seasonality is described as the uneven distribution of rainfall over the course of a normal year (Roffe et al. 2019). In the northeastern regions of India, the rainfall distributions are highly varied, and these regions have been categorized as the wettest regions of India (Dikshit & Dikshit 2014). The northeastern states, especially Mizoram and its environment, are mostly influenced by southwest (SW) summer monsoon rainfall (June–September), and around 70% of the annual rainfall is received during the SW monsoon time period (Saha et al. 2015). Therefore, for the seasonality assessment, the whole year has been categorized into four seasons, namely, pre-monsoon (March–May), post-monsoon (October–November), monsoon (June–September), and winter (December–February) followed by previous studies (Gupta et al. 2020a). For the comparison of different datasets, the seasonal average was calculated for all the datasets.

For the statistical evaluation, the widely used statistical functions such as annual mean, standard deviation, root-mean-square error (RMSE), mean-square error (MSE), coefficient of determination (R2), quantile–quantile (Q–Q) plots, Hamman–Quinn information criteria (HQC), and Akaike-information criteria (AIC) have been used for the selection of best datasets (Singh & Xiaosheng 2019; Afuecheta & Omar 2021). In total, 13 grids have been randomly selected over the entire study region from each dataset, and all the statistical evaluation was performed on these grids. IMDAA is considered reference dataset because it is a regional dataset present at a fine resolution (0.12° × 0.12°) specifically generated for the Indian region using gauged rainfall stations data. IMDAA can be considered the most accurate and reliable observed rainfall dataset in the Indian context (Ashrit et al. 2020). For the assessment of long-term precipitation changes, first, the CMIP6 climate models are bias corrected with the assimilated rainfall (AR) datasets. For this, the QM bias correction method was performed by utilizing the historical experimental scenario of the CMIP6 model and AR, and then the future scenarios of rainfall for all four GCMs and 2 SSP scenarios were corrected.

Downscaling and bias corrections

Downscaling refers to the process of obtaining high-resolution information from low-resolution variables (Kumar & Singh 2021; Kumar et al. 2022). This method is based on dynamical or statistical methodologies that are extensively employed in a variety of disciplines, particularly meteorology, climatology, and remote sensing. Of the various downscaling techniques available, APHRODITE, being considered the best dataset, is downscaled to 0.05° × 0.05° scale. CHIRPS is already available at fine resolution 0.05° × 0.05° (Section 3.1.3). IMDAA, which was available at 0.12° × 0.12°, was also downscaled to 0.05° × 0.05° scale.

In this study, the cumulative distribution function (CDF)-based QM has been used to correct the bias in the rainfall datasets (Cannon et al. 2015; Singh & Xiaosheng 2019). The QM methods implement statistical transformations for the post-processing of climate modelling outputs (Cannon et al. 2015). The statistical transformations involve transforming the distribution functions of the modelled variables into the observed ones using a mathematical function, which can be mathematically expressed as follows (Enayati et al. 2021):
(1)
where x° is the observed variable, xm is the modelled variable, and f () is the transformation function.
Given that the QM methods use the quantile–quantile relation to converge the simulated variables' distribution function to the observed one, one should note that with the CDFs of both observed and simulated variables' time series, their quantile relation can also be determined, as shown below:
(2)
where Fm(xm) = CDF of xm⁠ and = inverse form of the CDF of x°⁠, which is technically referred to as the quantile function.

To execute quantile regression (QR) in this study, the closest grid (equal to the observed) has been determined for rainfall correction from the training period to correct test results. QR is applied two times to obtain the desired results. Initially, both APHRODITE and CHIRPS were considered for the year 2001–2016 as training datasets, and CHIRPS data for the year 1981–2000 were considered for testing and the CHIRPS data for 1981–2000 were predicted. Then, this predicted dataset was considered as a training dataset alongside IMDAA for the same year, and CHIRPS for the year 2001–2020 was considered as testing data and QR was applied to obtain the newly generated hybrid gridded rainfall dataset.

Machine learning methods for rainfall predictions

Machine learning methods such as RF regression (RFR) and SVR have been used to predict the time series rainfall at 0.05° × 0.05° grid scale. For the prediction of time series rainfall, the best-selected datasets have been utilized (Figure 2). Deep learning techniques contain certain parameters that need to be optimized for its progressive use (Ridwan et al. 2021). A few studies have demonstrated that the SVR and RFR performed well in the prediction of time series datasets (Pham et al. 2019; Kajewska-Szkudlarek 2020), and therefore, the most suitable two methods such as SVR and RFR have been adopted in this study.

Support vector regression

Few studies have demonstrated the ability of SVR in the prediction of rainfall, which can be referred to as a single and hybrid (combination) techniques (Pham et al. 2019; Kajewska-Szkudlarek 2020). In SVR, each data item can be represented as a point in n-dimensional space, in which the value of each feature is the value of a given position in the SVR algorithm (Pham et al. 2019). For SVR, the input variables (or predictors) are the downscaled best rainfall datasets, while the IMDAA is considered as the reference datasets (or dependent variable). The SVR method ensures that the model computed errors should fall into a specific range while searching for it. The regression process is then carried out by selecting the hyperplane that best distinguishes the classes. The SVR seeks to fit the best line within a threshold value, as opposed to other regression models that aim to minimize the error between the real and predicted value. SVR contains different functions (or kernels) such as linear, polynomial (poly), radial basis function (RBF), and sigmoid for the optimization of coefficients (Pham et al. 2019; Kajewska-Szkudlarek 2020). Linear kernel is mostly used when there are a large number of features in a particular dataset. The polynomial kernel measures the similarity of vectors (training samples) in a feature space over polynomials of the original variables and enables the learning of nonlinear models. RBF is a non-parametric model, so its complexity grows with the size of the training sets (Auzani et al. 2021). Finally, overfitting a complex model is considerably simpler. The sigmoid function is preferred for neural networks and is comparable to a two-layer perceptron model of the neural network (Xiang et al. 2020). In this study, various functions such as linear, poly, RBF, and sigmoid have been tested, and RBF was selected as the most fit and thus finally used to predict the rainfall datasets. For identifying the best-fit kernel, the cross-validation was done while splitting the datasets into training and validation sets, and this was performed for all the kernels while using a varying parameter. In a few studies, it has been shown that the RBF function is found more sensitive than other functions in case of noisy data (Pham et al. 2019). The RBF kernel on two samples x and y, represented as feature vectors in some input space, is defined as follows (Pham et al. 2019; Auzani et al. 2021):
(3)
where is defined as the squared Euclidean distance between the x and y vectors, and is defined as a free parameter. The RBF kernel can be trained and optimized by the gamma parameter γ (i.e. width of kernel function), epsilon (ε), and regularization parameter (C) (Xiang et al. 2020) (Table 2). To prevent the phenomenon of underestimation/overestimation of the dataset, the whole time series was divided into training (i.e. 1989–2007) and testing (2007–2015) periods i.e. ∼80% and ∼20%, respectively, while applying the fixed size window approach (Xiang et al. 2020). In this study, multiple time period sets (splitting datasets) have been performed, and after that, the data splitting criteria have been set (i.e. ∼80% and ∼20% for training and testing). The quality of the predicted rainfall has been assessed based on the correlation between observed and predicted values (i.e. coefficient of determination, R2) and RMSE.
Table 2

Hyperparameters tuned for SVR

SI. No.ParameterRange/typeOptimum value
Kernel Linear, poly, rbf, sigmoid rbf 
C regularisation parameter 1.0–100,000.0 85,000 
gamma 1.0–0.0001 0.001 
Epsilon (ℇ) 0.1–0.00001 0.0001 
SI. No.ParameterRange/typeOptimum value
Kernel Linear, poly, rbf, sigmoid rbf 
C regularisation parameter 1.0–100,000.0 85,000 
gamma 1.0–0.0001 0.001 
Epsilon (ℇ) 0.1–0.00001 0.0001 

Random forest regression

RF is an ensemble machine learning algorithm, which has been found suitable for the prediction of time series variables (Pham et al. 2019). An RF algorithm is a combination of a large number of trees. In RF, each tree is independently constructed with a bootstrap sample of the original dataset, and each node is split with the most suitable random selection of predictor variables at that node (Pham et al. 2019). For each training set, a new decision tree is grown, and every time, a new split has to be made at a given node of the tree. In RFR, the final prediction is simply the average of all outcomes of the individual trees of the forest.

In this study, the RFR parameters like n_estimators (i.e. the number of regression trees that have been created) and hyperparameters such as max depth (i.e. the greatest depth to which trees can grow) have been utilized. In the present study, the n_estimators (ranges between 10 and 100) and max depth (ranges between 2 and 8) have been tuned to get the best-fit parameters and results (Table 3). For the determination of the hyperparameters, the grid search method was applied to train the model for multiple combinations of parameters, and then the best combination was selected that gave the best performance (Table 3). In the present study, the downscaled best rainfall datasets have been utilized as an input (predictor) variable, and IMDAA is used as the dependent variable (or reference dataset). The regression model was trained (1989–2007) and validated (2007–2015) over the 0.05° scale, and the precipitation datasets have been predicted at the 0.05° scale during 2007–2015.

Table 3

Hyperparameters tuned for RF

SI. No.ParameterRangeOptimum value
n_estimators 10–100 100 
max_depth 2–8 
SI. No.ParameterRangeOptimum value
n_estimators 10–100 100 
max_depth 2–8 

Evaluation of assimilated and predicted rainfall time series

After the construction of the hybrid rainfall time series dataset, the statistical evaluation was done for analysing the performance of the dataset using changes in mean, coefficient of determination R2, and RMSE functions. The mediated parameters have been calculated with IMDAA and CHIRPS at 0.05° × 0.05° scale. For the selection of best rainfall datasets between AR (i.e. hybrid rainfall) and predicted rainfall (e.g. SVR and RF), the percentage (%) of change has been computed with respect to reference rainfall data (i.e. IMDAA). The selected best dataset, i.e. between hybrid and predicted dataset, is further used for the evaluation of RCIs.

Rainfall extreme indices and future changes

For the calculation of the RCIs, CMIP6-based four models and, for each model, two SSP scenarios (i.e. SSP245 and SSP585) were considered as mentioned in Section 3.1.6. Each scenario has been bias corrected with respect to the AR dataset. After bias corrections of the CMIP6 models with their SSP245 and SSP 585 scenarios, the total time series datasets were converted into two categories, namely, near-term (2020–2050) and far-term (2060–2090), and for each category, annual average and climate indices (CIs) were calculated with respect to CMIP6 historical scenarios of the four selected climate models. The RCIs considered are annual mean, dry spell frequency, wet spell frequency, and maximum 1-day precipitation per year (Rx1D) as per the guidelines of IPCC and also utilized in previous studies (Singh & Goyal 2016; Kumar et al. 2021). If rainfall is <2.5 mm/day for a continuous 5 days, then it is considered to be dry spell frequency, and if rainfall is >2.5 mm/day for a continuous 5 days, then it is considered as wet spell frequency. As the name suggests, Rx1D is the maximum precipitation in a day. Python modules are available for the calculation of CIs, and one of them is XCLIM, which can be accessed here (https://xclim.readthedocs.io/en/stable/).

Comparative assessment of rainfall datasets

This study utilizes the five multiscale rainfall datasets such as APHRODITE, CHIRPS, IMD, IMDAA, and PRINCETON. The assimilation of these rainfall datasets has been done, and a new hybrid and improved fine resolution gridded rainfall dataset has been generated for the selected study area. In this context, primarily the applicability of the different rainfall datasets (e.g. APHRODITE, CHIRPS, IMD, and PRINCETON) have been evaluated with respect to the IMDAA re-analysis rainfall data. To find out the best rainfall datasets, the rainfall seasonality, statistical evaluation, and quantile regression (e.g. Q–Q plots) analysis have been done.

Figure 3 shows the comparison in mean rainfall among the different datasets at 25 km2 scale. Across selected study regions, the mean rainfall varies from 1,200 to 6,500 mm. In Figure 3, the area corresponding to Meghalaya (Jantia hills) has shown the maximum value of rainfall. However, only IMDAA shows rainfall distribution >5,500 mm, while other datasets show an underestimation with respect to IMDAA. As compared to IMDAA with other rainfall datasets, only CHIRPS has shown close similarity in mean rainfall with respect to IMDAA, especially in Jantia hills areas. As per the comparison of mean rainfall over the Mizoram state, APHRODITE and CHIRPS have shown better similarity (ranges between 1,300 and 4,000 mm) than PRINCETON and IMD with respect to IMDAA. Overall, CHIRPS performed better with respect to IMDAA in terms of mean rainfall. Figure 4 shows the seasonal variability in mean rainfalls (1985–2015) at 25 km2 resolution scale among all the selected datasets. To evaluate the seasonality, the mean rainfall was averaged into four seasons, namely, June–July–August–September (J-J-A-S), October–November (O-N), December–January–February (D-J-F), and March–April–May (M-A-M). Around 75% contribution of rainfall is obtained during J-J-A-S, which ranges from ∼1,500 to ∼5,000 mm (Figure 4(a), 4(e), 4(i), 4(m), and 4(q)). After that, most of the rainfall is recorded during the M-A-M season (ranges ∼500–1,500 mm), while the least rainfall contribution is captured during O-N and D-J-F months (ranges from ∼0 to 1,000 mm). As per the comparative assessment, only the IMDAA shows a reliable rainfall pattern during the monsoon/wet season (i.e. J-J-A-S), and it is able to capture the extreme rainfall pattern over the Meghalaya region (e.g. Jantia hills). Meghalaya receives the largest amount of rainfall during monsoon season, and this region can be categorized as the highest rainfall area around the world (Marak et al. 2020). The other datasets show an underestimation of rainfall in the Jantia hills region with respect to IMDAA. However, during O-N, D-J-F, and M-A-M, all the datasets did not show much variation in the rainfall across the study area. After the seasonality assessment, a statistical evaluation has been performed. Figure 5 shows the spatial variations in mean bias (mm), RMSE, and coefficient of determination among APHRODITE, CHIRPS, IMD, and PRINCETON rainfall datasets that are computed with respect to IMDAA rainfall data. In Figure 5(a)–5(d), the least mean bias is computed for CHIRPS followed by APHRODITE, IMD, and PRICETOM. RMSE is recorded minimum for APHRODITE followed by CHRIPS, IMD, and PRINCETON. As per the R2 (Figure 5(i)–5(l)), the maximum correlation existed between APDRODITE and IMDAA across the study area (ranges ∼0.5 to 0.8) except few areas (e.g. Mamit), followed by CHIRPS, IMD, and PRICETON. In the case of R2, the PRINCETON has outperformed. As per the observations shown in Figure 5, the APHRODITE and CHIRPS performed superior to IMD and PRINCETON, especially over the MIZORAM state. The box plots (Figure 6) have been derived to evaluate the rainfall datasets (e.g. APHRO, CHIRPS, IMD, and PRINCETON) considering all grids across the selected study region the statistical evaluation functions such as AIC, HQC, MSE, mean absolute error (MAE), RMSE, and R2.
Figure 3

Average annual plots of rainfall (mm) (1989–2015) highlighting spatial variations among the selected different rainfall datasets at 25 km2 scale.

Figure 3

Average annual plots of rainfall (mm) (1989–2015) highlighting spatial variations among the selected different rainfall datasets at 25 km2 scale.

Close modal
Figure 4

Highlighting seasonal variations in average (1989–2015) rainfall in the selected rainfall datasets.

Figure 4

Highlighting seasonal variations in average (1989–2015) rainfall in the selected rainfall datasets.

Close modal
Figure 5

Statistical evaluation of different rainfall datasets using Mean bias, RMSE and Coefficient of Determination (R2) which is computed with respect to IMDAA reanalysis rainfall datasets (1989–2015).

Figure 5

Statistical evaluation of different rainfall datasets using Mean bias, RMSE and Coefficient of Determination (R2) which is computed with respect to IMDAA reanalysis rainfall datasets (1989–2015).

Close modal
Figure 6

Comparative assessment of different rainfall datasets using different evaluation criteria which is computed with respect to IMDAA reanalysis rainfall datasets (1989–2015).

Figure 6

Comparative assessment of different rainfall datasets using different evaluation criteria which is computed with respect to IMDAA reanalysis rainfall datasets (1989–2015).

Close modal
As per the AIC and HQC, the APHRODITE and CHIRPS recorded lower values, while the PRINCETON recorded higher values. As per the MSE, RMSE, and MAE criteria, the APHRODITE and CHIRPS recorded the minimum values; however, the variation in the values is recorded minimum in the case of IMD and CHIRPS. As per the R2 values, APHRODITE and CHIRPS performed superior than IMD and PRINCETON datasets. To analyse the extreme behaviour of different rainfall datasets with respect to IMDAA (1989–2015), the quantile–quantile plots (Q–Q) have been derived at randomly selected grids across the study area, and the responses of Q–Q plots at several grids (three grids) are shown in Figure 7. As per Figure 7, in the range of lower order quantiles, the distribution of rainfall datasets such as APHRODITE, CHIRPS, and IMD well follow the IMDAA, while PRINCETON has shown significant variability. For the higher and middle order quantiles, CHIRPS and APHRODITE follow the IMDAA, while IMD and PRINCETON do not follow the IMDAA distribution. Among all datasets, CHIRPS performed superiorly than other datasets. Based on the statistical and Q–Q plot-based evaluation, the CHIRPS and APHRODITE rainfall datasets performed better with respect to IMDAA than IMD and PROCETON rainfall datasets. Therefore, for further analysis, the CHIRPS and APHRODITE datasets have been recognized as the best two datasets.
Figure 7

Comparative assessment of different rainfall datasets using Q-Q plots which is computed with respect to IMDAA reanalysis rainfall datasets at the selected random grids (locations) (1989–2015).

Figure 7

Comparative assessment of different rainfall datasets using Q-Q plots which is computed with respect to IMDAA reanalysis rainfall datasets at the selected random grids (locations) (1989–2015).

Close modal

Construction of hybrid rainfall data

For the construction of hybrid rainfall datasets at 5 km2 scale, the three datasets have been selected, namely, APHRODITE, CHIRPS, and IMDAA. In these three datasets, the CHIRPS is available at 5 km2. To assimilate the new rainfall datasets, the bias correction of APHRODITE and CHIRPS rainfall datasets has been done with reference to IMDAA using the QM method. In the bias correction, first, the first level correction has been performed on the CHIRPS data with reference to APHRODITE. For this, the time period from 2001 to 2016 was taken as a training period, and then the CHIRPS data were adjusted for the time period 1981–2000. Figure 8 shows the comparative evaluation of bias-corrected CHIRPS (with respect to APHRODITE), bias-corrected CHIRPS (with respect to IMDAA), uncorrected CHIRPS, and IMDAA rainfall. As per the spatial plots of mean rainfall (1981–2000), a significant variation can be observed in BC CHIRPS (Figure 8(a)) and UC CHIRPS (Figure 8(b)). In Figure 8(c) and 8(d), a few areas (areas belonging to Assam state under Barak basin) show higher RMSE (>110) and poor R2 (<0.5), while in some areas, especially falling in Mizoram state, most of grids’ corresponded RMSE is below 100 and R2 is greater than 0.5 except few grids. This shows a significant variability between APHRODITE and CHIRPS. Therefore, the bias-corrected CHIRPS is then adjusted with IMDAA. After that, the adjusted CHIRPS (1981–2000) and IMDAA (1981–2000) have been used as training dataset to generate the newly corrected hybrid rainfall (AR) datasets for the time period 2001–2020.
Figure 8

Comparison of Bias corrected vs uncorrected Rainfall datasets viz. CHIRPS (w.r.t. APHRODITE), CHIRPS (w.r.t. IMDAA) and Assimilated Rainfall. Also, highlighting the correlation and evaluating the strength of different rainfall datasets using R2 and RMSE.

Figure 8

Comparison of Bias corrected vs uncorrected Rainfall datasets viz. CHIRPS (w.r.t. APHRODITE), CHIRPS (w.r.t. IMDAA) and Assimilated Rainfall. Also, highlighting the correlation and evaluating the strength of different rainfall datasets using R2 and RMSE.

Close modal

A significant variation can be seen between bias-corrected CHIRPS (with respect to IMDAA) and uncorrected CHIRPS as shown in Figures 8(e) and 8(f). As per the computed RMSE between bias-corrected CHIRPS (with respect to IMDAA) and uncorrected CHIRPS (Figure 8(g)), most of the grids secured RMSE around >100 except few grids (mostly in Jantia hills, Meghalaya region), which have shown slightly higher RMSE values (>125). As per the computed R2 between bias-corrected CHIRPS (with respect to IMDAA) and uncorrected CHIRPS (Figure 8(h)), the majority of the grids show a good match and the R2 is computed ∼> 0.5. Figures 8(i) and 8(j) show the distribution of the mean rainfall (2001–2020) for newly constructed AR (hybrid rainfall data) and IMDAA. In Figure 8(k), the majority of grids (area) show RMSE <200, and here, it can be seen that the AR dataset is able to capture the spatial pattern of IMDAA rainfall data, especially the extreme rainfall values, i.e. >5,200 mm (e.g. over the Jantial hills, Meghalaya region). In Figure 8(l), the AR data shows a good match with IMDAA rainfall data, and most of the grids have shown R2 values >0.6, except very few grids. Overall, the AR datasets, which have been generated for the time period 2001–2020, performed well across the study region, and also it is found comparable to the reference IMDAA datasets.

Prediction of rainfalls using SVR and RF

For the prediction of rainfall using RF and SVR methods, the IMDAA, CHIRPS, and APHRODITE rainfall datasets (1989–2015) were utilized. In RF and support vector machine (SVM), the time series datasets were divided into training (1989–2007) and testing (2007–2015) datasets. Figure 9 shows the results of predicted rainfall (2007–2015) by RF and SVR. For the assessment of predicted rainfalls, the average annual mean plots (2007–2015) (Figure 9(c) and 9(d)), the AR (Figure 9(b)) and IMDAA (Figure 9(a)) datasets have been compared. As per the comparative assessment, the RF-based rainfall distribution and pattern look closer to the IMDAA and assimilated RF. Based on the annual rainfall plots (Figure 9), the SVR did not perform well, and it was not able to capture the extremely high rainfall values (>4,500 mm). Here, RF is found able to capture extreme rainfall values. However, as per the comparative assessment of SVR vs IMDAA and assimilated RF vs IMDAA, the assimilated RF-based annual mean plot looks more prominent with respect to IMDAA than SVR.
Figure 9

Comparison of different rainfall datasets through averaged mean (2007–2015) viz. IMDAA, Assimilated Rainfall (RF), rainfall predicted by RF and rainfall predicted by SVR.

Figure 9

Comparison of different rainfall datasets through averaged mean (2007–2015) viz. IMDAA, Assimilated Rainfall (RF), rainfall predicted by RF and rainfall predicted by SVR.

Close modal

Evaluation of best-constructed rainfall times series

To find out the best rainfall datasets among the predicted rainfall datasets (e.g. RF and SVR) and construct a new hybrid rainfall dataset (i.e. Assimilated RF), the comparative assessment of these datasets has been done using R2, RMSE, and percentage (%) of change analysis. Figure 10 shows the evaluation of RF (Figure 10a–10d) and SVR (Figure 10e–10h) methods during training and testing times. It is clearly visible that the RF method kept better R2 and RMSE values than SVR during the testing phase with respect to IMDAA. In the case of RF (Figure 10(a) and 10(c)), a majority of grids kept R2 > 0.8 in both training and testing periods, while in the case of SVR (Figure 10(e) and 10(g)), only very few grids secured R2 > 0.8, and the majority of grids secured R2 < 0.5. As per the percentage of change analysis (Figure 10(i)–10(k)), the RF versus IMDAA plot highlights the change values within the range of +20% to −40%, while the SVR versus IMDAA plot highlights the change values within the range of 80% to −40%. The IMDAA versus assimilated RF plot displays the change values within the range of +10% to −30%. Overall, the assimilated RF dataset was found closer to IMDAA and also able to preserve the extremely high and low rainfall values. Therefore, for further analysis and bias corrections of climate model datasets (CMIP5 scenarios), the assimilated RF datasets have been considered.
Figure 10

Showing the evaluation results of predicted rainfall datasets by RF and SVR methods (a to d) and figures (a to g) showing the comparative assessment of predicted and assimilated rainfall datasets by the computation of percentage (%) of change.

Figure 10

Showing the evaluation results of predicted rainfall datasets by RF and SVR methods (a to d) and figures (a to g) showing the comparative assessment of predicted and assimilated rainfall datasets by the computation of percentage (%) of change.

Close modal

Long-term assessment of rainfall changes through rainfall indices

This study explores the long-term future changes in rainfall in the selected study region by categorizing the total time series length (2021–2100) into two terms: (i) near-term (NR-2021 to 2050) and far-term (FT-2061 to 2090) utilizing the statistically downscaled and bias-corrected CMIP6 GCM scenarios, namely, ACCESS-ESM1, BCC-CSM2-MR, EC-EARTH3, and MRI-ESM2-0 with SSP245 (i.e. moderate emission scenario) and SSP585 (extreme emission scenario). For analysing the rainfall changes, the percentage (%) of change analysis was performed between historical scenario (1991–2020) versus NT (2021–2050) and historical scenario (1991–2020) versus FT (2061–2090) while deriving four rainfall indices such as annual mean, Rx1D, dry spell frequency, and wet spell frequency.

Figure 11 shows the changes in average annual rainfall in NT and FT with respect to four CMIP6 models and 2 SSP scenarios. In the case of ACCESS-ESM1 SSP245 scenario, in NT, the rainfall is found to be decreased (0 to −60%) mostly in the Mizoram state. However, in some portions of the Nagaland and Manipur state, the rainfall is slightly increasing (0 to +15%). In the FT scenario (Figure 11(b)), in the majority of areas, the rainfall is slightly decreasing. In the case of SSP585 with NT and FT scenarios (Figure 11(c) and 11(d)), in most of the cases, the rainfall is slightly decreasing (0 to −20%). In the case of BCC-CSM2-MR SSP245 scenario, as per both NT and FT observations, the rainfall is found to be slightly decreased (0 to −20%) or no change, and these changes are mostly observed in the Mizoram state (Figure 11(e)–11(h)). In the case of SSP585 FT scenarios (Figure 11(h)), in most of the cases, the rainfall is slightly decreasing (0 to −20%).
Figure 11

Showing the variations in average annual rainfall in near term (NT) scenario (2021–2050) and far term (FT) scenario (2061–2090) as per four different CMIP6 climate models with two experimental scenarios (i.e. SSP245 and SSP585).

Figure 11

Showing the variations in average annual rainfall in near term (NT) scenario (2021–2050) and far term (FT) scenario (2061–2090) as per four different CMIP6 climate models with two experimental scenarios (i.e. SSP245 and SSP585).

Close modal

In the case of EX-EARTH3 SSP245, as per both NT and FT scenarios (Figure 11(i) and 11(j)), the rainfall is found to be decreased (0 to −60%), while in the case of SSP585, both NT and FT scenarios show a slight decrease or no change (0 to −20%) (Figure 11(k) and 11(l)). In the case of MRI-ESM2-0 SSP245 scenario, as per both NT and FT observations (Figure 11(m) and 11(n)), the rainfall is found to be slightly decreased (0 to −25%) or no change. In the case of SSP585 NT and FT scenarios (Figure 11(o) and 11(p)), in most of the cases, the rainfall is slightly decreasing or no change was observed (0 to −20%). Based on these observations, in most of the areas, the rainfall is found to be decreased, and a higher rate of change (mostly decreasing) is recorded in the Mizoram area.

To analyse the changes in extreme rainfall events, the Rx1D has been calculated. Figure 12 shows the changes in extreme rainfall events in NT and FT with respect to historical extreme events. In Figure 11(a)–11(d), in the case of ACCESS-ESM-1, all scenarios have shown a decrease (∼0 to −70%) in Rx1D events except grids (in some areas of Nagaland) in the case of SSP585 NT scenario which shows a slight increase (0 to +20%) in Rx1D values. In the case of BCC-CSM2-MR, almost the whole selected study area shows a decrease in Rx1D events except a very few grids (in the case of SSP245 NT scenario) which show some increase (0 to +20%). In the case of EC-EARTH3, all scenarios show a decrease in Rx1D events (0 to −80%). In the case of EC-EARTH3, the SSP585 FT scenario has shown a maximum change (-10% to −80%) in Rx1D events in most of the areas (Figure 12(l)). However, the MRI-ESM2-0-based observations show slightly different pictures than the other three models. Here, some areas show an increase (0 to +30%) in Rx1D events (areas of Assam, Meghalaya, Nagaland, and a very few grids in Mizoram), while some areas show a decrease in Rx1D events in all scenarios (Figure 12(m)–12(p)). Overall, it is concluded that in most of the areas, especially in Mizoram state, the Rx1D events will be decreased in both NT and FT time durations.
Figure 12

Showing the variations in average Rx1D (maximum 1 D rainfall) in near term (NT) scenario (2021–2050) and far term (FT) scenario (2061–2090) as per four different CMIP6 climate models with two experimental scenarios (i.e. SSP245 and SSP585).

Figure 12

Showing the variations in average Rx1D (maximum 1 D rainfall) in near term (NT) scenario (2021–2050) and far term (FT) scenario (2061–2090) as per four different CMIP6 climate models with two experimental scenarios (i.e. SSP245 and SSP585).

Close modal
The frequency of occurrences of rainfall events can be an important parameter to signify long-term rainfall changes in NT and FT time durations. Therefore, dry spell frequency (Figure 13) and wet spell frequency (Figure 14) have been computed and analysed in the selected study region. Figure 13 displays the dry spell observations performed in NT and FT under different GCMs and SSP scenarios. In the case of ACCESS-ESM-1 SSP245 (Figure 13(a) and 13(b)), a slight increase (0 to +20%) in dry spell frequency has been observed mostly in Mizoram and Assam states; however, over some areas (e.g. Nagaland), a slight decrease (0 to −10%) in dry spell frequency has also been seen. In Figure 13(b), the FT scenarios exhibit almost similar pattern of change. In the case of SSP585 (Figure 13(c) and 13(d)), in NT, the Mizoram state is showing an increase in dry spells (0 to +20%), while other parts like Assam, Meghalaya, and Nagaland show a slight decrease in dry spells (0 to −20%). In the case of BCC-CSM2-MR (Figure 13(e)–13(h)), in both SSPs, the NT scenarios show a decrease in dry spells (0 to −20%) in majority of areas (mostly in Mizoram state), while in case of FT scenarios, most of the areas show an increase in dry spells (0 to +20%), except few areas in Mizoram state which show a decrease in dry spells. In the case of EC-EARTH3 (with SSP245 and SSP585), most of the areas show a clear increase (0 to +20%) in dry spells in both NT and FT scenarios, except a very few areas (mostly in Nagaland) which show a decrease in dry spell (0–30%). As per MRI-ESM2-0 SSP245- and SSP585-based observations (Figure 13(m)–13(p)), the NT scenarios show a decrease in dry spells (0 to −20%) in the majority of areas, while in the case of FT scenarios, most of the areas show an increase in dry spells (0 to +20%), except few areas which show a decrease in dry spells. Overall, it has been observed that the dry spells will be clearly enhanced in the Mizoram state and other areas of Meghalaya and Assam, especially during the NT scenario, while in some areas (like Nagaland and a few areas of Assam) during the FT scenario, the dry spell can be decreased with extreme conditions (like following the SSP585 scenario).
Figure 13

Showing the variations in rainfall using Dry Spell Frequency in near term (NT) scenario (2021–2050) and far term (FT) scenario (2061–2090) as per four different CMIP6 climate models with two experimental scenarios (i.e. SSP245 and SSP585).

Figure 13

Showing the variations in rainfall using Dry Spell Frequency in near term (NT) scenario (2021–2050) and far term (FT) scenario (2061–2090) as per four different CMIP6 climate models with two experimental scenarios (i.e. SSP245 and SSP585).

Close modal
Figure 14

Showing the variations in rainfall using Wet Spell Frequency in near term (NT) scenario (2021–2050) and far term (FT) scenario (2061–2090) as per four different CMIP6 climate models with two experimental scenarios (i.e. SSP245 and SSP585).

Figure 14

Showing the variations in rainfall using Wet Spell Frequency in near term (NT) scenario (2021–2050) and far term (FT) scenario (2061–2090) as per four different CMIP6 climate models with two experimental scenarios (i.e. SSP245 and SSP585).

Close modal

Wet spell frequency-based rainfall changes have been analysed during NT and FT scenarios as shown in Figure 14. In Figure 14(a)–14(d), as per the ACCESS-ESM-1 SS9245, the NT and FT scenarios show mixed response, areas like Nagaland and Manipur show a slight decrease in wet spells (0 to −10%), and areas like Mizoram and Assam show a slight increase in wet spells (0 to +20%). As per Figures 14(c) and 14(d), the SSP585-based observations clearly show an increase in wet spell frequency in the majority of areas (0 to +40%). In the case of BCC-CSM2-MR, a mixed observation has been derived, and in the case of SSP245, the NT scenario shows a slight increase in wet spells (0–20%), except in some areas over Nagaland, while the FT scenario displays a decrease in wet spells (0 to −20%). As per Figures 14(g) and 14(h), as per SSP585, both NT and FT scenarios show a clear decrease in wet spells (0 to −40%). In the case of EC-EARTH3, again some mixed responses have been observed, and in the case of SSP245, in NT, a slight increase (0 to +10%) in wet spells has been observed, except a very few areas which show a slight decrease. In the case of the FT scenario (Figure 14(j)), the majority of areas show a decrease in wet spells (0 to −30%). In the case of SSP585 (Figure 14(k) and 14(l)), a mixed response has been observed in both NT and FT scenarios. As per MRI-ESM2-0, as per SSP245 (Figure 14(m)–14(p)), a mixed response has been observed in both NT and FT scenarios, where most of the areas over Mizoram mostly show an increase in wet spells, while a major area of Assam and Nagaland shows a decrease in wet spells. In the case of SSP585 (Figure 14(o)–14(p)), a major area of Mizoram shows an increase in wet spells (0 to +30%), while some areas of Assam, Meghalaya, and Nagaland show a decrease in the wet spell. Overall, all scenarios have shown a mixed response; however, in the case of SSP245-based observations, the wet spell will be enhanced, while in the case of SSP585, the wet spells will be decreased.

This study was basically performed to accomplish two important objectives. First, this study evaluated the applicability of various sources of gridded rainfall datasets (including satellite based and gauged based) in the wettest regions of India such as Mizoram state and some parts of Meghalaya, Assam, Nagaland, and Manipur. In this study, the DA has been done to generate a new hybrid and improved gridded rainfall datasets over the selected study region utilizing six gridded rainfall datasets, namely, IMDAA re-analysis, APHRODITE, IMD, PRINCETON, and CHIRPS (two different versions). After applying various statistical evaluation functions and bias corrections, a new AR product was generated, and its evaluation was done with the predicted rainfall datasets. In this study, it is concluded that the APHRODITE and CHIRPS rainfall datasets were found to be close to the IMDAA. Therefore, finally, these two datasets were utilized for the construction of assimilating rainfall. In this study, the RF and SVR machine learning methods have been utilized to predict the rainfall datasets, which was found very useful for the comparison and evaluation of AR products and other sources of rainfall datasets. Based on the inter-comparisons of predicted rainfall datasets (as per RF and SVR) and AR product, the RF algorithm almost equally performed well with the AR. The AR product is able to capture the seasonality and extremity as compared to the IMDAA rainfall data, and therefore, among all the predicted and constructed datasets, the AR product (at 5 km2 scale) was found to give the best data in the selected study area.

Second, different CMIP6-based climate model datasets have been bias corrected with reference to the AR product, and then the future rainfall changes were analysed. For the assessment of long-term rainfall changes, various standard climate change indices have been formulated and the percentage of change in rainfall was computed to highlight the rainfall variability and changes in the near future term (2021–2050) and far future term (2061–2090) with respect to the historical time (1991–2020). The rainfall indices displayed substantial variabilities in the near future term and far future term over the selected study region. The long-term percent of change analysis based on rainfall extreme indices revealed significant changes in rainfall extremes over the selected study region. As per the annual mean and Rx1D-based observations performed in NT and FT, the rainfall amount and extreme events are expected to decrease in the future. As per the dry spell frequency analysis, the dry spells will enhance, and the wet spells will increase/decrease in different areas of the selected study area. Considering the moderate emission scenario, i.e. SSP245, the wet spell will enhance in the future, while in the case of SSP585 (representing the extreme worst case), the wet spells will decrease. These observations are very crucial for the northeastern states of India, and mostly show that over the wettest regions of India, as per the expected climate change, the rainfall variability (in terms of frequency) will increase, while extreme high events and rainfall amount will decrease.

The authors thanks the National Hydrology Project (NIH SP-45) for funding the study and supporting this research work. The authors would like to thank the National Institute of Hydrology India for providing facilities to carry-out this research work. The authors are also thankful to the Indian Meteorological Department Pune for providing the gridded precipitation dataset. The authors are thankful to WRD Mizoram for providing the observed gauge datasets. The authors are obliged to IPCC CMIP6, CHIRPS, TRMM, APHRODITE, and PRINCETON data generation teams/organizations for providing the rainfall datasets at free of cost. The authors are thankful to the Python developers project team who made the software/scripts/libraries available free of cost.

All authors have equally contributed to the research work.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Afuecheta
E.
&
Omar
M. H.
2021
Characterization of variability and trends in daily precipitation and temperature extremes in the Horn of Africa
.
Climate Risk Management
32
,
100295
.
https://doi.org/10.1016/j.crm.2021.100295.
Aggarwal
D.
,
Attada
R.
,
Shukla
K. K.
,
Chakraborty
R.
&
Kunchala
R. K.
2022
Monsoon precipitation characteristics and extreme precipitation events over Northwest India using Indian high resolution regional reanalysis
.
Atmospheric Research
267
,
105993
.
https://doi.org/10.1016/j.atmosres.2021.105993.
Alexander
L. V.
,
Fowler
H. J.
,
Bador
M.
,
Behrangi
A.
,
Donat
M. G.
,
Dunn
R.
,
Funk
C.
,
Goldie
J.
,
Lewis
E.
,
Rogé
M.
&
Seneviratne
S. I.
2019
On the use of indices to study extreme precipitation on sub-daily and daily timescales
.
Environmental Research Letters
14
(
12
),
125008
.
doi:10.1088/1748-9326/ab51b6
.
Ashrit
R.
,
Indira Rani
S.
,
Kumar
S.
,
Karunasagar
S.
,
Arulalan
T.
,
Francis
T.
,
Routray
A.
,
Laskar
S. I.
,
Mahmood
S.
,
Jermey
P.
&
Maycock
A.
2020
IMDAA regional reanalysis: Performance evaluation during Indian summer monsoon season
.
Journal of Geophysical Research: Atmospheres
125
(
2
),
e2019JD030973
.
https://doi.org/10.1029/2019JD030973.
Auzani
H.
,
Has-Yun
K. S.
&
Nazri
F. A. M.
2021
Development of trees management system using radial basis function neural network for rain forecast
.
Computational Water, Energy, and Environmental Engineering
11
(
1
),
1
10
.
Banerjee
A.
,
Dimri
A. P.
&
Kumar
K.
2020
Rainfall over the Himalayan foot-hill region: Present and future
.
Journal of Earth System Science
129
,
1
16
.
Barrera-Animas
A. Y.
,
Oyedele
L. O.
,
Bilal
M.
,
Akinosho
T. D.
,
Delgado
J. M. D.
&
Akanbi
L. A.
2022
Rainfall prediction: A comparative analysis of modern machine learning algorithms for time-series forecasting
.
Machine Learning with Applications
7
,
100204
.
Bharti
V.
&
Singh
C.
2015
Evaluation of error in TRMM 3b42v7 precipitation estimates over the Himalayan region
.
Journal of Geophysical Research: Atmospheres
120
(
24
),
12458
12473
.
Bhattacharyya
S.
,
Sreekesh
S.
&
King
A.
2022
Characteristics of extreme rainfall in different gridded datasets over India during 1983–2015
.
Atmospheric Research
267
,
105930
.
https://doi.org/10.1016/j.atmosres.2021.105930.
Dikshit
K. R.
&
Dikshit
J. K.
2014
Weather and climate of north-east India
. In:
North-East India: Land, People and Economy
(Dikshit, K. R. & Dikshit, J. K., eds). Springer, pp.
149
173
.
https://doi.org/10.1007/978-94-007-7055-3
.
Dubey
S.
,
Gupta
H.
,
Goyal
M. K.
&
Joshi
N.
2021
Evaluation of precipitation datasets available on Google earth engine over India
.
International Journal of Climatology
41
(
10
),
4844
4863
.
El Kenawy
A. M.
,
McCabe
M. F.
,
Vicente-Serrano
S. M.
,
Robaa
S. M.
&
Lopez-Moreno
J. I.
2016
Recent changes in continentality and aridity conditions over the Middle East and North Africa region, and their association with circulation patterns
.
Climate Research
69
(
1
),
25
43
.
Enayati
M.
,
Bozorg-Haddad
O.
,
Bazrafshan
J.
,
Hejabi
S.
&
Chu
X.
2021
Bias correction capabilities of quantile mapping methods for rainfall and temperature variables
.
Journal of Water and Climate Change
12
(
2
),
401
419
.
Fang
G. H.
,
Yang
J.
,
Chen
Y. N.
&
Zammit
C.
2015
Comparing bias correction methods in downscaling meteorological variables for a hydrologic impact study in an arid area in China
.
Hydrology and Earth System Sciences
19
(
6
),
2547
2559
.
Gupta
V.
,
Jain
M. K.
,
Singh
P. K.
&
Singh
V.
2020a
An assessment of global satellite-based precipitation datasets in capturing precipitation extremes: A comparison with observed precipitation dataset in India
.
International Journal of Climatology
40
(
8
),
3667
3688
.
doi:10.1002/joc.6419
.
Gupta
V.
,
Singh
V.
&
Jain
M. K.
2020b
Assessment of precipitation extremes in India during the 21st century under SSP1-1.9 mitigation scenarios of CMIP6 GCMs
.
Journal of Hydrology
590
,
125422
.
https://doi.org/10.1016/j.jhydrol.2020.125422.
Kumar
N.
&
Singh
S. K.
2021
Soil erosion assessment using earth observation data in a trans-boundary river basin
.
Natural Hazards
107
(
1
),
1
34
.
https://doi.org/10.1007/s11069-021-04571-6.
Kumar
N.
,
Goyal
M. K.
,
Gupta
A. K.
,
Jha
S.
,
Das
J.
&
Madramootoo
C. A.
2021
Joint behaviour of climate extremes across India: Past and future
.
Journal of Hydrology
597
,
126185
.
Kumar
N.
,
Dubey
A. K.
,
Goswami
U. P.
&
Singh
S. K.
2022
Modelling of hydrological and environmental flow dynamics over a central Himalayan river basin through satellite altimetry and recent climate projections
.
International Journal of Climatology
42
(
16
),
8446
8471
.
https://doi.org/10.1002/joc.7734.
Lang
T. J.
2015
Python-based scientific analysis and visualization of precipitation systems at NASA Marshall Space Flight Center. Available from: www.ntrs.nasa.gov..
Lu
J.
,
Hu
W.
&
Zhang
X.
2018
Precipitation data assimilation system based on a neural network and case-based reasoning system
.
Information
9
(
5
),
106
.
https://doi.org/10.3390/info9050106.
Marak
J. D. K.
,
Sarma
A. K.
&
Bhattacharjya
R. K.
2020
Innovative trend analysis of spatial and temporal rainfall variations in Umiam and Umtru watersheds in Meghalaya, India
.
Theoretical and Applied Climatology
142
,
1397
1412
.
Mishra
V.
,
Bhatia
U.
&
Tiwari
A. D.
2020
Bias-corrected climate projections for South Asia from coupled model intercomparison project-6
.
Scientific Data
7
(
1
),
338
.
https://doi.org/10.6084/m9.figshare.12963008.
Monsang
N. P.
,
Tripathi
S. K.
,
Singh
N. S.
&
Upadhyay
K. K.
2021
Climate change and Mizoram: Vulnerability status and future projections
.
Mizoram: Environment, Development, and Climate Change
2021
,
33
40
.
Mukherjee
S.
,
Aadhar
S.
,
Stone
D.
&
Mishra
V.
2018
Increase in extreme precipitation events under anthropogenic warming in India
.
Weather and Climate Extremes
20
,
45
53
.
Pham
Q. B.
,
Yang
T. C.
,
Kuo
C. M.
,
Tseng
H. W.
&
Yu
P. S.
2019
Combing random forest and least square support vector regression for improving extreme rainfall downscaling
.
Water
11
(
3
),
451
.
https://doi.org/10.3390/w11030451.
Rani
S. I.
,
Arulalan
T.
,
George
J. P.
,
Rajagopal
E. N.
,
Renshaw
R.
,
Maycock
A.
,
Barker
D. M.
&
Rajeevan
M.
2021
IMDAA: High-resolution satellite-era reanalysis for the Indian monsoon region
.
Journal of Climate
34
(
12
),
5109
5133
.
Ravindranath
N. H.
,
Rao
S.
,
Sharma
N.
,
Nair
M.
,
Gopalakrishnan
R.
,
Rao
A. S.
,
Malaviya
S.
,
Tiwari
R.
,
Sagadevan
A.
,
Munsi
M.
&
Krishna
N.
2011
Climate change vulnerability profiles for North East India
.
Current Science
101
(
3
),
384
394
.
https://ssrn.com/abstract=2140671
.
Ridwan
W. M.
,
Sapitang
M.
,
Aziz
A.
,
Kushiar
K. F.
,
Ahmed
A. N.
&
El-Shafie
A.
2021
Rainfall forecasting model using machine learning methods: Case study Terengganu, Malaysia
.
Ain Shams Engineering Journal
12
(
2
),
1651
1663
.
Roffe
S. J.
,
Fitchett
J. M.
&
Curtis
C. J.
2019
Classifying and mapping rainfall seasonality in South Africa: A review
.
South African Geographical Journal
101
(
2
),
158
174
.
Saha
S.
,
Chakraborty
D.
,
Choudhury
B. U.
,
Singh
S. B.
,
Chinza
N.
,
Lalzarliana
C.
,
Dutta
S. K.
,
Chowdhury
S.
,
Boopathi
T.
,
Lungmuana
&
Singh
A. R.
2015
Spatial variability in temporal trends of precipitation and its impact on the agricultural scenario of Mizoram
.
Current Science
109
(
12
),
2278
2282
.
Samantaray
A. K.
,
Ramadas
M.
&
Panda
R. K.
2022
Changes in drought characteristics based on rainfall pattern drought index and the CMIP6 multi-model ensemble
.
Agricultural Water Management
266
,
107568
.
https://doi.org/10.1016/j.agwat.2022.107568.
Sharma
A.
&
Goyal
M. K.
2020
Assessment of the changes in precipitation and temperature in Teesta River basin in Indian Himalayan Region under climate change
.
Atmospheric Research
231
,
104670
.
https://doi.org/10.1016/j.atmosres.2019.104670.
Shivam
G.
,
Goyal
M. K.
&
Sarma
A. K.
2019
Index-based study of future precipitation changes over subansiri river catchment under changing climate
.
Journal of Environmental Informatics
34
(
1
),
1
14
.
Westra
E.
2015
Python Geospatial Analysis Essentials
.
Packt Publishing Ltd
. .
Xiang
Z.
,
Yan
J.
&
Demir
I.
2020
A rainfall-runoff model with LSTM-based sequence-to-sequence learning
.
Water Resources Research
56
(
1
),
e2019WR025326
.
https://doi.org/10.1029/2019WR025326.
Yaduvanshi
A.
,
Zaroug
M.
,
Bendapudi
R.
&
New
M.
2019
Impacts of 1.5 C and 2 C global warming on regional rainfall and temperature change across India
.
Environmental Research Communications
1
(
12
),
125002
.
doi:10.1088/2515-7620/ab4ee2
.
Yasutomi
N.
,
Hamada
A.
&
Yatagai
A.
2011
Development of a long-term daily gridded temperature dataset and its application to rain/snow discrimination of daily precipitation
.
Global Environmental Research
15
(
2
),
165
172
.
Yatagai
A.
,
Kamiguchi
K.
,
Arakawa
O.
,
Hamada
A.
,
Yasutomi
N.
&
Kitoh
A.
2012
APHRODITE: Constructing a long-term daily gridded precipitation dataset for Asia based on a dense network of rain gauges
.
Bulletin of the American Meteorological Society
93
(
9
),
1401
1415
.
Zahan
Y.
,
Mahanta
R.
,
Rajesh
P. V.
&
Goswami
B. N.
2021
Impact of climate change on North-East India (NEI) summer monsoon rainfall
.
Climatic Change
164
,
1
19
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).