Data scarcity and unavailability of observed rainfalls in the northeastern states of India limit prediction of extreme hydro-climatological changes. To fill this gap, a data assimilation approach has been applied to re-construct accurate high-resolution gridded (5 km2) daily rainfall data (2001–2020), which include seasonality assessment, statistical evaluation, and bias correction. Random forest (RF) and support vector regression were used to predict rainfall time series, and a comparison between machine learning and data assimilation-based gridded rainfall data was performed. Five gridded rainfall datasets, namely, Indian Monsoon Data Assimilation and Analysis (IMDAA) (12 km2), APHRODITE (25 km2), India Meteorological Department (25 km2), PRINCETON (25 km2), and CHIRPS (25 and 5 km2), have been utilized. For re-constructed rainfall datasets (5 km2), the comparative seasonality and change assessment have been performed with respect to other rainfall datasets. CHIRPS and APHRODITE datasets have shown better similarities with IMDAA. The RF and assimilated rainfall (AR) have superiority based on bias and extremity, and AR data were recognized as the best accurate data (>0.8). Precipitation change analysis (2021–2100) performed utilizing the bias corrected and downscaled CMIP6 datasets showed that the dry spells will be enhanced. Considering the CMIP6 moderate emission scenario, i.e., SSP245, the wet spell will be enhanced in future; however, when considering SSP585 (representing the extreme worst case), the wet spells will be decreased.

  • A unique data assimilation approach is applied to construct an accurate high-resolution gridded (5 km2) daily rainfall time series.

  • Evaluation and bias correction of multisource gridded rainfall datasets were performed.

  • Random forest and support vector regression machine learning methods were applied for the prediction of rainfall.

  • Assessment of long-term rainfall changes was done in the wettest regions of the world.

This content is only available as a PDF.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (