Cloud properties are pivotal in analyzing rainfall patterns in monsoon-dependent countries such as India. The impact of climate change becomes more important in regions susceptible to hydrometeorological events due to different monsoon regimes. To examine regional heterogeneity of cloud properties, this study investigates long-term trends and predictive capabilities for cloud properties in drought- and flood-prone regions of western India, utilizing satellite data and employing machine learning (ML) models to comprehend intricate data patterns and enhance predictive accuracy. The results show higher mean and variability in cloud parameters over the flood-prone area due to favorable rain conditions, reflecting higher cloud microphysical and optical properties. These parameters negatively correlate with some cloud macrophysical properties and aerosol property in the drought-prone area. A moderate correlation exists between certain cloud characteristics of one region and another. Employing ML for regression analysis and comparing them for cloud effective radius across regions shows promising results, with random forest demonstrating high coefficient of determination (0.86, 0.93) and low root mean squared error (0.76, 1.15) due to its robustness and high accuracy. This research enhances the understanding of regional heterogeneity in India and shows that ML can help in predicting future cloud dynamics and climate by suitable model.

  • The correlation among cloud properties of drought- and flood-prone regions showed significant mutual dependence.

  • Higher mean and variability were observed for most of the cloud variables in the flood-prone area than in the drought-prone region.

  • Among all the utilized ML models, RF performed the best for the flood-prone and the drought-prone regions.

  • Using ML techniques, it is observed that cloud top temperature is the most influential parameter for predicting cloud effective particle radius.

The projected changes in climate, mainly triggered by anthropogenic activities that enhance global warming, result in transitional precipitation patterns and distributions across various geographical regions. Human-induced change in global and regional climate and its effects are apparent in recent studies based on observations and simulations obtained by the climate models (Pörtner et al. 2022). The global surface air temperature has shown a rising trend in almost every region around the world in recent decades (Lee et al. 2015; Goyal et al. 2022; Panicker et al. 2023). As the hydrological cycle plays an important role in driving sustainable livelihood in the world, it was considered that ‘wet areas become wetter and dry areas become drier’, but precipitation shows a different trend in the research of Rajesh & Goswami (2023), which is based on a combination of observations and climate model simulations and indicates that the mean precipitation in the semi-arid northwest regions of India and Pakistan is anticipated to rise significantly higher, considering the human influences on climate.

The study of change in the hydrological cycle is important for a country such as India that is directly or indirectly driven by the rainfall. India has a tropical monsoon climate and receives an annual rainfall of over 1,150 mm (Gupta et al. 2023; Rakkasagi et al. 2023). Rainfall analysis along with prediction becomes even more important in cases of deficient or excess rain (Brunner 2023). During the summer monsoon season, extended periods of dry weather, known as prolonged rain breaks, can result in drought conditions that can have a significant impact on the Indian subcontinent (Karadan et al. 2021). These precipitation patterns such as active and/or break days and cloud properties at different locations get influenced by climatic conditions such as global warming and change in land usage. The deficient rainfall also has an adverse effect on water quantity especially in arid or semi-arid areas such as deserts (Thakur et al. 2021). To achieve precise seasonal and sub-seasonal forecasts of precipitation and atmospheric conditions, it is crucial to thoroughly understand the variations in monsoon features. This necessitates considering the complex interplay of orography and land–atmosphere–ocean interactions (Singh 2016).

Hydrometeorological extreme events, including droughts, floods, and transitional drought–floods occurring at varying time intervals and being localized extents, have become significant sources of concern, posing formidable challenges for water management and exerting significant impacts. There is an increased frequency and intensity of weather extremes such as flooding, droughts, cyclones, and so on and enhanced vulnerability due to changing climate (Singh et al. 2014; Rakkasagi et al. 2023). Hence, to prevent these flood/drought conditions, manage resources, predict rainfall with greater accuracy, and implement proper policies, it is important to analyze weather extremes and their characteristics. However, these complex aspects of floods and droughts are frequently overlooked when studied from a single-variable perspective, potentially resulting in the underestimation or overestimation of risks. The multivariate nature of hydrological extreme events can manifest in various forms, including regional extremes, consecutive extremes, extremes with multiple characteristics, and transitions between extremes (Brunner et al. 2021). Recent research focusing on regional aspects has given new insights into the spatiotemporal analyses of lakes for flood hazard (Gupta et al. 2023) and impact of climate change on precipitation extremes over cities.

Clouds envelop over two-thirds of the Earth's expanse, wielding significant influence over the Earth's energy distribution and water circulation (Zhou et al. 2018). They stand as pivotal components within the climate framework, intricately linked to precipitation mechanisms and dynamics through microphysical processes and aqueous-phase chemistry (Qian et al. 2012). Atmospheric aerosols, functioning as cloud condensation nuclei (CCN), play an indispensable role in cloud genesis, underscoring their significance in cloud formation (Shikwambana 2022). Indeed, clouds owe their existence to atmospheric aerosols and variations in aerosols directly affect cloud properties as well as precipitation. However, it has also been shown that rainfall intensity can alter aerosol concentrations over India, leading to changes in cloud properties (Karmakar et al. 2017). Cloud optical thickness (COT) and cloud effective particle radius (CER) are key parameters that determine the radiative properties of clouds, including their reflection, transmission, and absorption of solar radiation (Kikuchi et al. 2006). Other cloud properties such as cloud fraction (CF), cloud liquid water (CLW), cloud top temperature (CTT), and cloud top pressure (CTP) are also studied for a comprehensive analysis of cloud dynamics. Satellites offer a comprehensive vantage point for observing clouds globally or nearly so, facilitating the resolution of uncertainties in understanding cloud feedback mechanisms and their interplay with radiation, thereby shaping climate dynamics. However, the precise influence of these parameters on the climate in different geographical locations remains quite uncertain (Shah & Srivastava 2019). Studying the disparities and variabilities in cloud properties over the coastal as well as inland regions is important to understand uncertain precipitation scenarios.

Climate models facilitate climate forecasts spanning seasonal to decadal timescales, thus enabling estimations of future climate projections over centuries based on various parameters and scenarios (Eyring et al. 2019; Saponaro et al. 2020). While climate models excel at replicating many observed climate features, challenges remain in disentangling cloud retrievals from meteorological influences and addressing inaccuracies in retrieval algorithms and underlying assumptions (Koren et al. 2022). These drawbacks, along with limitations in uncertainty analysis, computational overhead, and the need for extensive datasets, have spurred the application of machine learning (ML) and deep learning techniques. Their computational efficiency and inherent intelligence offer promising solutions to these challenges. Heterogeneous hardware for executing weather and climate models can be optimized through the integration of ML methodologies. Current research endeavors also aim to enhance parameterization within weather and climate models or refine existing kernels utilizing ML algorithms. It is noteworthy that only in recent years have these methodologies gained widespread traction within research communities (Amato et al. 2020). A recent noticeable trend is the rapid advancement of ML and deep learning applications within hydrological processes, climate change studies, and broader investigations into Earth systems and their respective subfields. Although the increasing predominance of these intelligent methodologies is readily apparent, consequently, there exists a pressing need to systematically examine these innovative approaches and discern prevailing trends in their adoption and advancement within the scholarly domain. The vast datasets produced by climate systems, from satellite imagery to weather station readings and intricate climate models, pose a daunting challenge. But ML techniques unearth hidden patterns, trends, and intricate relationships within data and connect them with changes in climate.

Regression models with different techniques such as ridge, random forest (RF), support vector regression (SVR), principal component analysis (PCA), neural networks, and ensemble methods can now tackle crucial tasks. These tasks include predicting climate variables, assessing human impacts on climate change, and refining climate simulation (Li & Palazzolo 2022). By pinpointing extreme weather events, optimizing resource management, and enhancing climate risk assessment, ML becomes a potent force for better analysis and prediction (Khastagir et al. 2022). In one of the studies, the neural network approach has been used for cloud top height retrieval from the imager instrument the Moderate Resolution Imaging Spectroradiometer (MODIS) for better results (Håkansson et al. 2018). The ability of such algorithms to effortlessly navigate vast, multidimensional datasets allows scientists to unravel the delicate tapestry of climate systems. This deeper understanding paves the way for informed decision making, guiding us toward sustainable environmental practices and preparing us to face the challenges that accompany climate change. As climate studies continue to evolve, the integration of ML will undoubtedly play a crucial role. Its potential to unlock the secrets of our planet's climate and empower us to address the complex challenges is truly boundless.

Generally, for precipitation, factors such as temperature, humidity, and sea surface temperature (SST) are widely taken into consideration. However, the inclusion of cloud properties for getting more insights into changing precipitation patterns from the regional perspective has not been studied much. To predict cloud properties and precipitation, using ML techniques will be helpful in analyzing the decadal changes through the heterogeneity in cloud properties as well as other variables and in better forecasting, too. The aim of the present work is to understand the regional changes in these hydrometeorological extreme-prone areas through cloud properties as regional weather events have altered patterns. Here, a comprehensive analysis of cloud macrophysical, microphysical, and optical properties, meteorological parameters, and aerosol properties are taken into account. We need to identify these changes along with extremes being frequent, having transitions, or with changing characteristics using ML techniques. This study conducts a comparative analysis of two distinct geographical areas to gain insights into the monsoon's heterogeneity by examining cloud properties and precipitation patterns in the western region of India. Due to the recent advancements in ML, the accuracy of predicting various atmospheric conditions with the available data and the application of various algorithms has improved. To analyze this, the present paper tries to apply different ML techniques to understand the changing scenario of cloud properties and precipitation over both regions (drought prone and flood prone) and explore complex variations in predicting flood and drought events based on cloud properties.

The dependence of precipitation on clouds can be investigated through cloud properties such as CF, CER, and CLW. Expecting the trend reversal in precipitation, there has to be a continuous evaluation of regional climate and updating the future alteration considering flood- and drought-prone regions. Specifically, variations in rainfall offset water management plans and threaten water security in various ways as precipitation is a product of environmental and cloud microphysical conditions (Gao & Li 2010). In turn, this variation in precipitation pattern has a different impact on drought-prone and flood-prone regions. The present study gains insights into the monsoon's shifting patterns by examining clouds and precipitation in two western regions of India.

Western India, being a gateway for monsoon winds, is an important region to experience the onset of the monsoon. The desert region of western India may influence monsoon dynamics through the formation of heat lows, affecting rainfall distribution, and contributing to localized weather phenomena as well as rainfall affecting drought scenarios in these regions. Understanding the interaction between the climatic conditions in the desert and the monsoon is essential for mitigating the impacts of climate variability in this region. From a climate perspective, understanding the characteristics of the monsoon over flood-prone regions is essential for regulating local and global water cycles and changes in hydrological balance.

Study regions of western India are taken into account as depicted in Figure 1. Both regions are indicated with boxes, where (a) is the drought-prone inland, arid area in Rajasthan and (b) is the coastal area mainly of South Gujarat, which is flood prone, and a smaller region from Maharashtra. Variability of precipitation in India has been taken into account to visualize the drought-prone (lower precipitation) and flood-prone regions (higher precipitation). Thus, the result's relevance from the cloud parameters perspective is connected to precipitation variability. The Thar Desert, located in the western region of Rajasthan state, is particularly susceptible to heat waves and a rise in both mean and maximum temperatures indicating a potential decrease in precipitation. Consequently, the Thar Desert of Rajasthan (inland region) is more vulnerable to drought conditions. The flood risk in the South Gujarat plains of Gujarat state (coastal region) is comparatively higher than that in the Saurashtra region. This distinction arises from the relatively flat terrain in the lower basins, coupled with hilly catchment areas in the upper parts of South Gujarat, which magnify flood risks in the Surat–Bharuch regions. Hence, the present study considers two distinct regions, showing extreme variations in the precipitation characteristics even though they are in proximity in western India itself.
Figure 1

Map of mean annual precipitation of western India including the study regions. (a) Drought-prone region (25.47–27.69°N to 70.72–75.50°E) and (b) flood-prone region (19.49–21.60°N to 72.80–74.53°E).

Figure 1

Map of mean annual precipitation of western India including the study regions. (a) Drought-prone region (25.47–27.69°N to 70.72–75.50°E) and (b) flood-prone region (19.49–21.60°N to 72.80–74.53°E).

Close modal

Various parameters are used to examine monsoon heterogeneity, such as the cloud macrophysical property – CF, optical property – COT, and microphysical property – CLW and CER, among which CER is the most thoroughly researched. Observations from previous studies also indicate that CER had played a crucial role in the relationship between precipitation and anthropogenic impact on climate, particularly when utilizing data retrieved from satellites (Shah et al. 2021). The interdependence of these parameters and their correlations need to be studied from the perspective of regional variability taking into consideration the rainfall scenarios. Some of the most influential cloud properties and the meteorological factors correlated to precipitation are described briefly as follows.

CER represents the weighted mean values of the size distribution of cloud droplets in the atmosphere. CF represents the portion of the sky covered by clouds. CF is a crucial parameter in understanding the Earth's radiative balance and climate. Higher CF can lead to reduced incoming solar radiation reaching the surface (cooling effect), and increased trapping of outgoing terrestrial radiation (warming effect). However, the net effect depends on cloud type, altitude, and other factors. COT quantifies the attenuation of light passing through the atmosphere caused by the scattering and absorption of cloud droplets. CLW gives a mass of liquid water contained within a unit vertical column of atmosphere above a specific area and quantifies the amount of liquid water present in clouds, which plays a role in precipitation processes. CTT can be used to monitor cloud top changes during convection and represents the temperature at the uppermost level of a cloud. Atmospheric water vapor (AWV) and aerosol optical depth (AOD) are key players in regulating Earth's climate. AWV dominates the radiative balance, the exchange of energy between Earth and space, and the hydrological cycle, the movement of water through the atmosphere. AOD, conversely, represents the ability of aerosols to obstruct sunlight from reaching the Earth's surface. Interestingly, the amount of aerosols present significantly influences cloud microphysics, the formation and properties of clouds. These changes in cloud properties can significantly impact the regional and global radiation budget, potentially affecting climate patterns.

Data collection

The present study has utilized MODIS remote sensing data available from Aqua satellite for our analysis. The dataset covers a time span from 2002 to 2023, allowing for an extensive study of cloud properties during this period. The dataset includes daily observations that have been aggregated into monthly or annual intervals as needed. Comprehensive analysis of monsoon characteristics such as precipitation and supported by cloud properties is carried out using long-term (2002–2023) data through satellite observations with an emphasis on the monsoon. This research has mainly focused on key MODIS products for cloud and aerosol properties that align with our study objectives, as follows: CER, COT, CTT, CF, CTP, CLW, and AOD. The MODIS, equipped to gather data across 36 spectral bands, provides valuable information about the Earth's atmosphere, oceans, and terrestrial environments. The Level-3 MODIS Atmosphere Daily Global Product contains roughly 600 statistical datasets that are derived from approximately 80 scientific parameters from four Level-2 MODIS Atmosphere Products: Aerosol, Water Vapor, Cloud, and Atmosphere Profile (Platnick et al. 2015). Readily available MODIS Daily data product files for temporal profiles, containing data collected from the Aqua platform, are used in the present paper. Notably, the visible and infrared bands are instrumental in assessing cloud optical and microphysical properties. The MODIS CTP is converted to cloud height and cloud temperature through the National Centers for Environmental Prediction (NCEP) Global Forecast System (Menzel et al. 2008). Cloud properties such as CF, CER, and CLW have been considered for the analysis of multi-decadal changes (2002–2023). However, there is an uncertainty in the measurement of cloud properties observed in MODIS data. CER has an error of less than 0.1 μm, COT, less than 50, and CLW, less than 1 dB (King et al. 1997; Iguchi et al. 2015), while AOD has uncertainty for land ± (0.03 + 0.05 AOD) (Remer et al. 2005). The Integrated Multi-Satellite Retrievals for GPM (IMERG) data products are also used to study precipitation in this paper. The IMERG algorithm combines and calibrates various infrared (IR), microwave (MW), and gauge data to generate precipitation estimates with high spatial (0.1° × 0.1°) and temporal (30 min) resolution (Huffman et al. 2015). IMERG offers three datasets: IMERG-Early (IMERG-E), IMERG-Late (IMERG-L), and IMERG-Final (IMERG-F). IMERG-E and IMERG-L are near-real-time products, available with 4-h and 14-h latencies, respectively, making them suitable for flood forecasting and real-time disaster management (Huffman et al. 2020, 2023). The data are available on Giovanni's online portal (Giovanni documentation 2020). The details of the parameters are delineated in Table 1.

Table 1

List of utilized parameters from remote sensing for study regions

S. No.Name of the parameterUnit of the parameterSource of dataSpatial and temporal resolutionStudy period
CLW g/m2 MODIS 1° × 1°
Daily 
2002–2023 
CER μm 
CF – 
COT – 
CTT 
CTP hPa 
AOD – 
AWV cm 
Precipitation mm/day GPM 0.1° × 0.1°
Daily 
2002–2023 
S. No.Name of the parameterUnit of the parameterSource of dataSpatial and temporal resolutionStudy period
CLW g/m2 MODIS 1° × 1°
Daily 
2002–2023 
CER μm 
CF – 
COT – 
CTT 
CTP hPa 
AOD – 
AWV cm 
Precipitation mm/day GPM 0.1° × 0.1°
Daily 
2002–2023 

Data preprocessing

Data preprocessing has been conducted to convert the data into an optimized input format suitable for further analysis. To maintain the integrity and usability of the dataset, several preprocessing techniques such as interpolation and feature extraction have been undertaken. Missing or corrupted values are identified throughout the dataset. To address these, linear interpolation and temporal averaging have been employed for imputation. The preprocessing techniques specially for utilizing for ML models employed in this research included selecting relevant feature variables along with addressing missing values to fill gaps with observational data, normalization, and dividing the dataset into training and testing subsets, for training and testing in the ML algorithms (Sunder et al. 2023). In terms of feature selection and engineering, key variables such as COT, CLW, CF, and others are chosen for analysis, with CER serving as the target variable. Wavelet transformation has been utilized to create new features, capturing both approximation and detail coefficients. Additionally, feature extraction methods have been employed using regression models to identify the most influential features based on their predictive capabilities. Using the daily time series data, these models are constructed using Keras and Tensor Flow packages in Python 3. Training and testing have been performed for the period from 2002 to 2023, for which satellite data are available. In the network modeling, out of the total data, 70% are selected for training, 15% for validation, and 15% of each for testing.

Analysis techniques

For understanding the variations in cloud and aerosol properties, these preprocessed data have been analyzed for variations in interquartile range (IQR), mean, and median considering both the regions individually. Apart from this, to enhance and interpret interdependency among variables as well as regions, correlation coefficients are taken into account among all variables. For analyzing the preprocessed data for the prediction of a variable, a range of statistical and ML techniques are utilized. Wavelet analysis has been conducted using the Haar wavelet transformation, allowing for the extraction of coefficients that represent both the approximation and detailed characteristics of the data. For the model's evaluation and comparison, performance metrics such as root mean squared error (RMSE) and coefficient of determination (r2) are employed to assess the accuracy and reliability of the models. Additionally, correlation coefficients including Pearson and Spearman (r) have been calculated to determine the strength and direction of the relationships between the predicted and actual values, providing a comprehensive evaluation of model performance.

ML models

In this section, six different models have been used to predict cloud properties and precipitation. A short description of the various models used, such as SVR, RF, K-nearest neighbor (KNN), decision tree (DT), ridge, and eXtreme Gradient Boosting (XGBoost or XGB), is presented here for better understanding. SVR is an ML model belonging to the support vector machines class. In SVR, a subset of training data points, known as support vectors, is utilized to find a hyperplane that optimally fits the data while minimizing the error between predicted and actual values. Its versatility in handling both linear and non-linear relationships, along with its effectiveness in high-dimensional spaces, makes SVR well-suited for intricate regression as well as classification tasks. Its maximum efficiency is observed in forecasting financial data and time series prediction (Kleynhans et al. 2017). The SVR must use a cost function to measure the estimated risk in order to lessen the regression error. It is required to choose a loss function to calculate the cost from the least module loss function, quadratic loss function, and so forth. The insensitive loss function exhibits the sparsity of the solution. It contains a fixed and symmetrical margin term. It runs the risk of overfitting the data with poor generalization if the margin is either zero or very small. On the contrary, if the margin tends to be large, it leads to better generalization at the risk of having a higher testing error. Generally, the estimation function in SVR takes the following form: f(x) = (ω·φ(x)) + b, which denotes the inner product in Ω, a feature space of possibly different dimensionality such that ω:X → Ω and bR. The other two parameters, ω and b, can be determined from the training dataset by minimizing the regression risk based on the estimated risk.

To achieve high accuracy in predictions, RF employs an ensemble learning approach. It combines the strengths of multiple DTs, resulting in a more robust and reliable predictive model. By aggregating the predictions of individual trees, RF mitigates overfitting and enhances generalization performance. It excels in both classification and regression problems, offering versatility in capturing complex patterns within the data. The algorithm's analysis of feature importance provides valuable insights into the contribution of different variables, aiding in the interpretation of the results (Meenal et al. 2021). KNN being a non-parametric as well as instance-based learning algorithm relies on the proximity of data points in the feature space. It predicts the target variable by considering the majority class or average of its KNN. Its simplicity and intuitive approach make it particularly useful for tasks with locally varying patterns. However, its performance may be influenced by the choice of distance metric and the determination of an optimal k value (Li & Sui 2021). KNN is a supervised classification algorithm that operates by storing all the data and classifying new data points based on their distances to the stored data. The distance from the test point to each training point is computed using the following equation: Distance = (x_train – x_test)^p. Here, p represents the Minkowski metric (with p values ranging from 1 to 4), x_train is a training data point, and x_test is the data point for which we want to determine the class. The algorithm identifies the ‘K’ nearest neighbors to the test point, and the most common class among these neighbors is assigned to the test point (Rajput et al. 2023).

The DT is a fundamental and interpretable algorithm that recursively partitions the data based on feature conditions to predict the target variable. Its hierarchical structure makes it adept at capturing complex decision boundaries and interactions within the data. DTs offer transparency in model interpretation and are valuable for assessing feature importance. However, they are prone to overfitting, which can be mitigated by ensemble techniques like RF (Marhain et al. 2021). Ridge regression, a linear regression variant, introduces regularization by adding a penalty term to the traditional least squares objective function. XGBoost is a powerful and efficient ML algorithm known for its speed and accuracy. It builds a series of DTs sequentially, correcting errors from previous trees. Its standout features include regularization to prevent overfitting, parallel processing for faster computations, and effective handling of missing data. XGBoost's flexibility with custom objectives and evaluation metrics, along with its superior performance, sets it apart from other ML models, making it a preferred choice for diverse applications in finance, healthcare, and beyond (Liu et al. 2022). This regularization term helps prevent overfitting. Also with the Gini feature, SHAP feature, and so on, important features can be identified that in turn can bring out the most important parameter for prediction.

The selection of ML models for this study is based on their distinct strengths and suitability for the dataset and problem at hand. SVR has been chosen for its ability to model complex, non-linear relationships through versatile kernel functions. RF has been selected for its ensemble approach, providing robust performance and insights into feature importance while managing large datasets and interactions effectively. KNN has been included for its simplicity and capability to capture local patterns in the data. DT has been used for its interpretability and capacity to handle various types of data, although its tendency to over fit is mitigated in ensemble methods such as RF. Ridge regression has been applied to address multicollinearity and enhance model generalization through regularization. Finally, XGBoost has been incorporated for its efficiency and high performance in capturing complex patterns, leveraging gradient boosting to improve accuracy and handle large-scale data.

Hyperparameter tuning has been performed to optimize model performance. For SVR, the hyperparameters included ‘C’, ‘gamma’, and ‘kernel’, which regulate the regularization strength, kernel coefficient, and type of kernel function used, respectively. The RF model has been fine-tuned using ‘n_estimators’, ‘max_depth’, and ‘max_features’, which determine the number of trees, the maximum depth of each tree, and the number of features considered at each split, respectively. The DT model has been configured with ‘max_depth’, ‘min_samples_split’, and ‘criterion’, influencing the tree's depth, the minimum number of samples required to split an internal node, and the function used to measure split quality, respectively. For the KNN algorithm, the hyperparameters ‘n_neighbors’, ‘weights’, and ‘algorithm’ are adjusted to define the number of neighbors, the weight function applied in prediction, and the algorithm used to compute the nearest neighbors, respectively. Ridge regression has been optimized using ‘alpha’, ‘solver’, and ‘normalize’, which control the regularization parameter, the optimization solver, and whether the data has been normalized prior to model fitting, respectively. Lastly, the XGBoost model utilized hyperparameters including ‘learning_rate’, ‘n_estimators’, ‘max_depth’, ‘subsample’, and ‘colsample_bytree’, which govern the learning rate, the number of boosting iterations, the maximum tree depth, the fraction of samples used per tree, and the fraction of features used per split. These models and their respective hyperparameters are systematically selected and tuned to maximize predictive performance and ensure robust results in the analysis. The models are trained using the wavelet-transformed features. In our analysis, both train–test split and cross-validation methods have been utilized. Specifically, the dataset is initially divided into training and test subsets using a train–test split approach as described in the preprocessing steps. This division allowed for a straightforward evaluation of model performance on unseen data. Additionally, cross-validation is employed to further assess the model's robustness and generalizability.

Studying the changing scenario in rainfall from the perspective of clouds due to change in climate and geographic locations in India is essential, as analyzing cloud properties and their heterogeneity can give better insights to understand variations in precipitation. However, recent studies suggest that a more accurate long-range forecast may require the consideration of multiple parameters, beyond just two or three. Anthropogenic variations are to be taken into account along with multiple parameters more precisely to understand natural variability as well over this influence (Sinha et al. 2015). The intensity and frequency of precipitation events may increase due to climate change, leading to increased risks of flood and droughts. Analyzing precipitation for flood- and drought-prone regions, a whisker plot indicates that mean value of precipitation in the flood-prone region (8.83 mm/day) is notably higher compared with the drought-prone region (2.32 mm/day) (Figure 2(a)). Similarly, the median precipitation value in the flood-prone region (2.46 mm/day) is higher, indicating a central tendency toward higher precipitation levels in this region as well as a potential regional trend of elevated precipitation concentrations in flood-prone areas.
Figure 2

(a) Precipitation variation and (b) probability distribution curve for the drought- and flood-prone regions.

Figure 2

(a) Precipitation variation and (b) probability distribution curve for the drought- and flood-prone regions.

Close modal

The IQR, representing the middle 50% of the data, is wider in the flood-prone region (10.05 mm/day), suggesting a broader spread of precipitation levels within this region. Meanwhile, the drought-prone region exhibits the narrowest IQR (1.36 mm/day), indicating comparatively much less variability in precipitation levels. It is also depicted in the histogram (Figure 2(b)) that the frequency of lower magnitude of precipitation is higher (∼0 to 3 mm/day) in the drought-prone region, while it clearly states a wider range (∼5 to 13 mm/day) for the flood-prone region. Along with clear contrast in precipitation frequency as well as magnitude, it is also important to look into cloud micro- and macrophysical aspects to understand precipitation patterns and better predictability. Thus, this study focuses on annually averaged variations in precipitation to understand heterogeneity in cloud properties over coastal as well as inland regions including the flood- and drought-prone areas of Gujarat and Rajasthan, respectively, during 2002–2023.

Variation and interdependency among cloud properties

Variations in rainfall patterns have been mutually dependent on cloud properties such as CLW, CER, COT, CTT, CTP, and CF as these properties have a major impact on rain formation and its delay. Variations in magnitude of cloud properties along with other climate variables such as humidity, temperature, atmospheric circulations, and SST are responsible for the deficit or excess of rainfall over different regions. The impact of changing climate on monsoon and its heterogeneity can be well understood with analysis of such parameters at the regional level especially considering changing extreme weather conditions at various temporal scales (Srivastava & Shah 2019). The analysis of cloud variables over flood- and drought-prone regions shows a distinct behavior of these parameters from the regional point of view. The variation in magnitude as well as interdependency of cloud properties over both regions needs to be taken into account for a precise understanding of changing precipitation.

Regional comparison of cloud properties

The comparative analysis of cloud properties between drought- and flood-prone regions reveals significant variations as depicted by spatial plots (Figure 3(a)–3(h)). The flood-prone region exhibits substantially higher values for CLW, COT, and CF compared with the drought-prone region. These differences suggest that clouds in the flood-prone area are generally thicker, more extensive, and contain more liquid water content, while clouds in the drought-prone region are thinner, less optically dense, and have lower cloud cover. A higher CLW (ranging from 111 to 152 g/m²) in the flood-prone area indicates the presence of thicker clouds, such as nimbostratus or deep convective clouds, which are often associated with heavier precipitation. By contrast, a lower CLW (ranging from 94 to 108 g/m²) in the drought-prone region suggests the possibility of stratiform clouds, which typically produce lighter precipitation or drizzle. Higher COT values (12.55–17.14) in the flood-prone area suggest denser clouds with more effective light scattering, characteristic of clouds with significant vertical development. Lower COT values (11.52–13.17) in the drought-prone region imply thinner, less optically dense clouds allowing more solar radiation to penetrate.
Figure 3

Spatial plots indicating regional variations for (a and e) COT, (b and f) CER, (c and g) CF, and (d and h) CLW for drought-prone and flood-prone regions, respectively.

Figure 3

Spatial plots indicating regional variations for (a and e) COT, (b and f) CER, (c and g) CF, and (d and h) CLW for drought-prone and flood-prone regions, respectively.

Close modal

The CF is also substantially higher (0.41–0.46) in the flood-prone region, indicating more widespread cloud cover, which can play a critical role in cooling the surface by reflecting incoming solar radiation. By contrast, the lower CF (0.26–0.35) in the drought-prone region suggests more localized or patchy cloud cover. Additionally, the flood-prone area has a larger CER (13.65–14.54 μm), indicating the presence of larger cloud droplets, which can enhance precipitation efficiency. Comparatively smaller droplet sizes (12.77–13.10 μm) in the drought-prone region, influenced by higher aerosol concentrations, can suppress precipitation by inhibiting droplet growth. These regional differences in cloud properties are likely influenced by local meteorological conditions, aerosol concentrations, and topographical factors. Understanding these differences is crucial for improving cloud representation in climate models and predicting the impacts of clouds on regional and global climate systems.

From the perspective of regional variability, an analysis of various parameters across these regions reveals intriguing insights into the drought- and flood-prone regions (Figure 4(a)–4(h)). In general, it indicates an observed pattern with higher values in the flood-prone region than the drought-prone region for CER, CLW, CF, and COT as well as lower values in CTT and CTP. But AWV and AOD show different trends among all the regions showing a discrete nature in both the flood-prone and the drought-prone regions. CER affects development and evolution of clouds through processes such as collision and coalescence of cloud droplets. The mean value of CER in the flood-prone region (16.27 μm) is notably higher compared with the drought-prone region (13.04 μm). Similarly, the median CER value in the flood-prone region (16.20 μm) is the highest among both regions, indicating a central tendency toward higher CER levels in this region as well as a potential regional trend of elevated CER concentrations in flood-prone areas. The IQR, representing the middle 50% of the data, is wider in the drought-prone region (4.30 μm), suggesting a broader spread of CER levels within this region. Conversely, the flood-prone region exhibits the narrowest IQR (3.78 μm), indicating comparatively less variability in CER levels.
Figure 4

Whisker plots indicating regional variations for (a) COT, (b) CER, (c) CF, (d) CLW, (e) CTT, (f) CTP, (g) AWV, and (h) AOD, respectively.

Figure 4

Whisker plots indicating regional variations for (a) COT, (b) CER, (c) CF, (d) CLW, (e) CTT, (f) CTP, (g) AWV, and (h) AOD, respectively.

Close modal

The flood-prone region exhibits higher mean (18.25) and median (16.30) levels of COT compared with the drought-prone area, indicating a potential regional trend of elevated COT concentrations in flood-prone areas. While the drought-prone region demonstrates a small IQR (7.84), the flood-prone region displays a wider spread of COT values, as indicated by the larger IQR (11.86). There is a noticeable contrast between the COT levels in both regions, with the flood-prone region generally showcasing higher COT concentrations compared with the drought prone. Clouds observed in the flood-prone region are thicker and represent more possibility of precipitation. The data show variations in CLW levels across the two regions investigated. The flood-prone region manifests the higher average CLW (194.87 g/m2), followed by the drought-prone region (81.47 g/m2). This trend is further supported by the median values, with the flood-prone region again showing a higher median CLW (163.98 g/m2) than the drought-prone region (55.03 g/m2). The IQR, providing insights into data variability, shows that the flood-prone region has a wider IQR (139.62 g/m2), indicating a broader range of CLW values within this region. Conversely, the drought-prone region has the narrower IQR (82.06 g/m2), suggesting a more consistent pattern of CLW levels among the flood- and drought-prone areas. This analysis suggests a potential regional trend for drought-prone regions having lower CLW; however, AWV may show variations. The flood-prone area exhibits higher average and median CLW compared with the drought-prone area, which is also observed and supported by the precipitation pattern. Also, the box plot shows a negative skewness for both regions indicating the median is shifted toward lower values.

The mean value of CF is highest in the flood-prone region, 0.88, followed by the drought-prone region at 0.51. Similarly, the median CF value in the flood-prone region (0.98) surpasses that in the drought-prone region (0.52), highlighting a central tendency toward elevated CF levels in the former region. The IQR is broadest in the drought-prone area (0.70), suggesting greater variability in CF levels within this region. Conversely, the flood-prone region has the narrowest IQR (0.14), indicating less variability in CF. These regional trends of elevated CF concentrations in flood-prone areas are also similar to the results for CER, COT, and CLW for the same, pointing to more favorable conditions for rain formation. CTT is an indicator of cloud altitude, as temperature generally decreases with altitude in the troposphere. While observing for CTP and CTT, the flood-prone region has shown lower mean (444.12 hPa for CTP and 258 K for CTT) as well as lowest median (411.99 hPa for CTP and 261.55 K for CTT). These results reveal exactly opposite patterns observed for other cloud properties such as CER, COT, CF, and CLW. It also indicated that CTT and CTP are negatively correlated with other cloud properties as observed. The lower IQR of CTT for the flood-prone region (43.75 K) and CTP (314.52 hPa) show a wide variability in the flood-prone region itself. While a comparatively lower IQR of CTT (40.01 K) and a higher IQR of CTP (411.35 hPa) for the drought-prone area shows a notable variability in range of CTP in this inland region. Factors affecting CTT include cloud type (higher clouds typically have colder CTT), atmospheric lapse rate (controls the rate of temperature decrease with altitude), and presence of inversions (warm layers within the atmosphere can lead to higher CTT for clouds trapped below the inversion). AWV as well as AOD show a different nature in flood- and drought-prone regions. The higher mean for AWV (5.57 cm) is observed for the flood-prone region as expected, while it shows lower mean values for the drought-prone region with 4.85 cm. Considering the IQR, AWV and AOD show a similarity in the drought-prone region with higher ranges of 2.53 cm (AWV) and 0.45 (AOD), with 1.73 cm (AWV) and 0.29 (AOD) for the flood-prone region. This indicates that the drought-prone region has a higher variability with a wider range in AWV and AOD values than the flood-prone area. In summary, both AWV and AOD play crucial roles by influencing the amount of solar radiation absorbed and reflected, shaping Earth's climate.

Regional correlation among cloud properties

The further analysis of cloud properties and other variables across both regions, one of which is flood prone and the other drought prone (Figure 4(a)–4(h)), demonstrates distinct correlations along with regional disparities in these parameters. The correlation matrix (heatmap) illustrates the relationships between variables in our dataset as it provides quantitative insights and visual tools that enhance the interpretability to understand the interdependency of these variables. The linear relation is quantified using the Pearson correlation coefficient. Darker shades on the heatmap represent stronger correlations, while lighter shades indicate weaker correlations. Positive correlations are depicted in warm colors such as red, while negative correlations are shown in cool colors like blue. The diagonal line on the heatmap signifies the perfect correlation of each variable with itself, always equal to 1. Here, only statistically significant values (high confidence, p-value < 0.05 indicated by *) are considered for analyzing relationships among the variables. Dependency of these variables has been examined considering the two different regions in this study.

Exploring the correlation matrix shows intriguing insights into the relationships among parameters across diverse study regions (Figure 5). Considering the drought-prone region, CER is correlated with CLW, having a high value (0.72), while it shows a moderate correlation with COT (0.55). Some studies suggest a positive correlation for certain cloud types, although the relationship can be weak also. COT shows a comparatively higher value of correlation coefficient with CLW (0.96). The correlation analysis illustrates that the variable COT exhibits a strong positive correlation with CLW. However, in this drought-prone region, a negative correlation is noted for COT with AOD (−0.35), CTP (−0.27), and CTT (−0.34). This may be because of the lack of moisture in the region as well as the high dust particles in that desert area. Observing the correlation coefficient for CLW and CF, it appears higher at 0.63 for the drought-prone region, whereas in the flood-prone region, they are comparatively similar at 0.65 showing a good connection between the available liquid water and the vertical distribution of the cloud. CLW demonstrates a similar negative correlation with CTP, −0.42 for the drought-prone region and −0.67 for the flood-prone region, compared with CTT with a much stronger negative correlation (−0.51 for the drought-prone region). Cloud top temperatures exhibit a notable negative correlation with rising AOD over land in deep mixed-phase clouds, but conversely, no significant changes have been observed for uniformly liquid clouds (Niu & Li 2012).
Figure 5

Correlation matrix heatmap showing correlation coefficients for the (a) drought-prone region and (b) flood-prone region (−1 means a perfect negative linear relation, 1 indicates a perfect positive linear relation, and 0 shows no relation among the variables).

Figure 5

Correlation matrix heatmap showing correlation coefficients for the (a) drought-prone region and (b) flood-prone region (−1 means a perfect negative linear relation, 1 indicates a perfect positive linear relation, and 0 shows no relation among the variables).

Close modal

In the drought-prone region, CF indicated negative correlations with CTP, CTT, and AOD, with CF showing a moderate correlation with AOD (−0.56) and stronger correlations with CTT (−0.86) and CTP (−0.80). A strong positive correlation emerges for AOD with CTP (0.58) and CTT (0.59), indicating a significant association between aerosols and cloud properties. By contrast, in the flood-prone region, CF and CER have a higher positive correlation (0.65), opposite to the expected Twomey effect, where higher CF leads to smaller droplets (lower CER). CLW and CER are moderately correlated (0.47), while CER in the flood-prone region shows a moderate negative correlation with CTP (−0.66). The correlation between COT and CLW is stronger (0.90) in the flood-prone region, with CF (−0.88) and CLW (−0.67) showing strong negative correlations with CTP. Overall, the spatial coverage and COT influence radiative transfer and, along with CTT, determine the thermal effect. CTT is negatively correlated with other cloud properties in all regions. As CTT increases, evaporation may increase, reducing CLW and CER, which in turn lowers CF and COT. Higher CF with smaller droplets (lower CER) might result in a similar CLW as lower CF with larger droplets (higher CER). Low COT indicates more heat-absorbing aerosols, which can make thin clouds by causing droplet evaporation. Higher AOD can suppress cloud formation, lowering CF and potentially COT, but the effects depend on the aerosol type and size. In the drought-prone region, cloud microphysical properties like CER, COT, and CLW show significant correlations, increasing over time, whereas the AOD, CTT, and CTP correlations are negative. In the flood-prone region, the correlation of COT and CLW increases over time, while the CTT, CTP, and AOD correlations vary between regions, reflecting the different climatic influences of the drought- versus flood-prone areas.

Inter-regional analysis for connection among cloud properties

Understanding the variations in COT and CTT alongside CER, CF, and CLW provides a more comprehensive picture of cloud properties in the drought-prone and flood-prone regions from the interdependence aspect (Figure 6). Exploring the correlations reveals intriguing insights into the relation between the drought-prone region and the flood-prone region across diverse study areas of western India. Notably, for CER, a moderate negative correlation emerges (with a coefficient of ∼0.50) indicating a significant association between CER and CTT, CTP irrespective of regions. CER (0.52) and CF (0.76) of the drought-prone region are positively correlated with AWV in the flood-prone region also. Meanwhile AOD of the drought-prone and AWV of the flood-prone region seem to be negatively correlated, which shows that there may be a moderate impact on aerosols and available water vapor of one region over the other. This knowledge is valuable for studies related to radiative transfer, precipitation patterns, and climate change impacts.
Figure 6

Heatmap for the interdependency of cloud properties for the selected study regions.

Figure 6

Heatmap for the interdependency of cloud properties for the selected study regions.

Close modal

Analysis based on ML techniques

The uncertainty in variations of cloud properties such as CF, CER, COT, and CLW is a major limitation in understanding and projecting future climate scenarios. The ability of ML techniques to detect patterns and anomalies, its scalability in massive as well as complex datasets, and its adaptive learning with higher empirical accuracy and better computational efficiency are advantages over traditional methods in atmospheric sciences. The application of ML models such as SVR and RF for regression-based analysis of all parameters can help to capture complex trends observed over the decades during the monsoon period (JJAS).

Prediction of CER using multiple ML models

CER (effective radius of cloud droplet) being a microphysical property of clouds has a strong impact on rain droplet formation, thereby influencing the magnitude and duration of precipitation. Thus, prediction of proper CER can help to investigate precipitation scenarios in a better way with ML models. Training and testing multiple supervised ML models for predicting CER can identify the efficiency of the models over different regions with better selection of features such as cloud properties for these data-driven models. In this section, the main results of our analysis for prediction of CER with different models are presented based on the abovementioned depiction (Figure 7). All models are analyzed based on RMSE, coefficient of determination (r2), and Pearson coefficient (r) for the test dataset. This process has incorporated almost 920 data points for the test and more than 3,700 data points for the training models.
Figure 7

Prediction of CER by various ML models for drought- and flood-prone regions (a to f) represent SVR, RF, KNN, DT, ridge, and XGBoost, respectively.

Figure 7

Prediction of CER by various ML models for drought- and flood-prone regions (a to f) represent SVR, RF, KNN, DT, ridge, and XGBoost, respectively.

Close modal

RF, DT, and XGBoost have performed better than other models over all regions. Application of ML models has yielded better RMSE with RF (1.15 and 0.76), XGBoost (1.21 and 0.80), and DT (1.61 and 1.19) for the drought-prone and flood-prone regions, respectively. Our outcomes for values of r are: RF (0.93, 0.97), DT (0.86, 0.92), and XGBoost (0.92, 0.96) for the drought-prone and flood-prone regions, respectively. The values of r2 after applying various models have been obtained as 0.86 and 0.93 (RF), 0.74 and 0.84 (DT), and 0.84 and 0.93 (XGBoost) for the drought-prone and flood-prone regions, respectively. Among all the above-utilized models, RF provides better accuracy for simulating CER over both regions than other ML models from all aspects including RMSE, coefficient of determination (r2), and Pearson coefficient (r) for our data after validation. The ensemble learning approach, robustness to noise, high accuracy, scalability to large datasets, and adaptability in parameter tuning render RF a favored option for various prediction tasks spanning diverse domains. Nonetheless, it is crucial to acknowledge that the efficacy of RF can fluctuate depending on the unique attributes of the dataset and the intricacies of the prediction objective. However, based on the abovementioned comparisons and considering low RMSE and high correlation coefficient, it is delineated that RF is preferred over both regions as it has shown amelioration in all statistical parameters mentioned here. While for the drought-prone region, ML models predict cloud properties better than over a flood-prone region, capturing variations more effectively. As shown in Figure 7, the lower values of slopes (∼less than 0.50) obtained by ML models like SVR, KNN, and ridge indicate that these models highly underpredict the CER for every region, while higher values of slopes showcase comparatively better predictability of models such as RF, DT, and XGBoost. However, the study has also tried features with AWV and AOD along with cloud properties and the results have shown a slightly lower coefficient of correlation.

As stated in the recent study by Nandgude et al. (2023), the parameters used to predict the drought index mostly include precipitation, temperature, soil moisture, and El Niño–Southern Oscillation (ENSO). However, inclusion of cloud parameters along with these variables can help to give better insights for drought prediction. The cloud microphysics plays an important role in understanding extreme rainfall in India (Samantray & Gouda 2024). Our findings for regional heterogeneity such as correlation of CER with CLW and COT in the drought- and flood-prone regions can give further inputs to enhance and correlate cloud microphysical properties of different regions. Considering a different perspective for specific types of geographical regions, recent researches such as Goyal et al. (2023) and Rakkasagi et al. (2024) have indicated that the patterns of trends for wetlands are shown to be correlated with several variables, including mean annual precipitation, mean temperature, average annual-maximum temperature, elevation, and climatic class for wetlands. Trend analysis from the regional perspective using statistical analysis such as the Mann–Kendall test can also be done using prediction by ML models.

Several studies have demonstrated the efficacy of ML techniques in predicting various hydrological events. Mosavi et al. (2018) have conducted a comprehensive review of ML models applied to flood prediction, highlighting the utility of hybrid and ensemble models, neural networks, and so on in analyzing precipitation patterns and hydrological factors. In a review of cyclone forecasting using ML, Chen et al. (2020) have stated that precipitation, tide, etc., are used to predict rainfall and storm-related variables. Precipitation nowcasting using radar echo systems along with long short-term memory (LSTM) models have been addressed by Shi et al. (2015). However, applying ML models for predicting CER through cloud properties is a novel approach to study precipitation patterns from the perspective of clouds.

Comparing ML models for better efficiency for prediction of CER

Comparing different ML regression models involves evaluating them based on multiple criteria, such as performance metrics, interpretability, robustness, computational efficiency, and generalization ability over various regions as well as for various parameters. This thorough comparison ensures that the chosen model aligns well with the goals and constraints of the specific application of predicting CER over both regions. For evaluation of best predictive models for a specific region, Taylor plots are used to compare and find the best model for use across both regions and the same time period based on correlation coefficient and standard deviation (Figure 8). Based on the higher value of standard deviation obtained and compared among all the models, RF (2.85, 2.84), XGBoost (2.82, 2.93), and DT (3.08, 3.00) for the drought- and flood-prone regions, respectively, have given better results among all models for both the regions as depicted in the Taylor diagrams. It can be seen that standard deviation error is also less for the RF model compared with other models. Models with low standard deviation and high correlation coefficient are believed to be a good statistics perspective as shown from the Taylor diagram. SVR, KNN, and ridge models have low standard deviation but also showcase low correlation coefficient, while RF, DT, and XGBoost reveal comparatively high SD along with high correlation coefficient, depicting almost similar results for both regions.
Figure 8

Taylor diagrams showing correlation coefficients and standard deviations among ML models: (a) drought-prone region and (b) flood-prone region.

Figure 8

Taylor diagrams showing correlation coefficients and standard deviations among ML models: (a) drought-prone region and (b) flood-prone region.

Close modal

Variable importance in a model indicates which variables have the greatest influence on the model's predictions. It has also been exhibited that the influencing factor for the prediction of CER among the cloud properties is CTT with 69.92 and 80.98% importance of feature, respectively, for the drought- and flood-prone regions, which plays an important role in precipitation along with other climate variables. CTT, being the temperature at the top of a cloud, helps in the formation of precipitation along with cloud base temperature as the higher the difference between them, the more favorable the condition. Other parameters such as Spearman coefficient, Kendall coefficient, mean absolute error (MAE), and mean absolute percentage error (MAPE) are also taken into account for checking the suitable model based on the performance of all the ML models (which are not stated here). In general, the models have given better results for the drought-prone region. While comparing various models, RF has performed well over both regions from the perspective of the performance matrix. To comprehend the intricacies of the extreme conditions, ML along with satellite-based observations can provide invaluable insights into the heterogeneity of clouds and precipitation arising from uncertainty due to climate change scenarios in future projections.

The contributions from this study collectively aid the field by providing a more detailed, comprehensive, and interpretable analysis of cloud parameters, improving our understanding of their long-term variations along with altering precipitation, spatial distributions, and the effectiveness of various ML models for prediction of a variable. For prediction of precipitation, SST, humidity, pressure, elevation, and so on have been utilized by Mosavi et al. (2018). Our study gives a new insight into the heterogeneity of cloud properties from the spatial and temporal perspectives for prediction of CER with the most suitable ML model, which in turn will be a remarkable variable contributing to precipitation analysis.

Cloud properties, which play a vital role in analyzing global and regional rainfall patterns evolving due to climate change and diverse geographic factors, are integral to analyzing drought- and flood-prone regions, specifically in monsoon-dependent countries such as India. To understand this, this study investigates the long-term trends and predictive capabilities for cloud properties in the drought- and flood-prone regions in western India utilizing satellite-derived data including ML models as well. As per the comparison of cloud properties over the drought- and flood-prone regions, a complex nature of variables has been observed, which depends on other climate variables as well as regional variability. Representing a general observation form the abovementioned data analysis, CTT is negatively correlated with all other cloud properties such as CER, CF, COT, and CLW. As CTT increases, CER tends to decrease due to evaporation of more CLW, thinning the clouds indicating low COT and low CF. CTT of the flood-prone region, which is a coastal region, can also be affected by warm SST, high stability, and high cloud tops with low CTT for the convective type of clouds. For the drought-prone region that is an inland region, land-surface heating, topography as well as convective processes are strong influencers on clouds. For clouds, CLW depends upon CF and CER both variables. The same amount of CLW may be observed with high CF and low CER values as well as low CF and high CER values. Due to high humidity and moisture availability, the CLW in the flood-prone region can have higher values, while due to dry air mass, high entrainment of dry air, clouds over the drought-prone area can have low CLW as observed over the drought-prone region. CER plays an important role in rain formation, delay of rain, cloud-precipitation feedback loop as well as radiation budget influencing the radiation reaching the surface of the Earth. While looking over the drought- and flood-prone regions, the low or high values obtained depend upon the geographical influences such as coastal and inland areas as well as cloud types. Rain-bearing clouds such as cumulo-nimbus have higher CER values, while non-rain-bearing clouds such as stratiform can have smaller values.

Due to the importance of CER, the present study has performed regression-based prediction for it using six different supervised ML models. In practice, comparing different ML regression models involves evaluating them based on multiple criteria, such as performance metrics, interpretability, robustness, computational efficiency, and generalization ability. This thorough comparison ensures that the chosen model aligns well with the goals and constraints of the specific application. The evaluation of predictive models for a specific region involved comparing Taylor plots for CER based on correlation coefficients and standard deviations. RF (0.93, 0.97), XGBoost (0.92, 0.96), and DT (0.86, 0.92) exhibited higher correlation coefficients across both regions. RF is often superior as it aggregates the predictions of multiple DTs, reducing overfitting and enhancing accuracy. It efficiently handles large, high-dimensional datasets and is robust to outliers and noise. The model is sensitive to hyperparameter settings and provides valuable insights into feature importance. Additionally, the RF model is versatile, effectively managing both classification and regression tasks across various applications. This research underscores the importance of exploiting ML techniques to provide valuable insights into predicting cloud dynamics through various cloud properties and precipitation patterns facing climate change uncertainties.

This study encountered limitations such as the use of moderate-resolution datasets due to some computational constraints, despite the potential for improved precision with high-resolution data. The datasets are chosen to balance spatial coverage and temporal resolution within accessible computational resources, with plans to incorporate higher resolution data in future work. Traditional ML models are used rather than deep learning techniques, which, while potentially offering superior performance, are limited by their complexity and computational demands. Future research can explore deep learning models on temporal and spatial data, including permutations of all cloud properties, to predict cloud dynamics alongside climate variables for enhanced future projections. This approach can enable a detailed investigation of cloud variable variations and capture regional differences more effectively, contributing to a better understanding of climate change. Furthermore, incorporating a wider range of atmospheric variables beyond cloud parameters can improve the comprehensiveness and predictive accuracy of the analysis, representing another area for future exploration.

All relevant data are available from an online repository or repositories.

The authors declare there is no conflict.

Amato
F.
,
Guignard
F.
,
Robert
S.
&
Kanevski
M.
(
2020
)
A novel framework for spatio-temporal prediction of environmental data using deep learning
,
Scientific Reports
,
10
(
1
),
22243
.
https://doi.org/10.1038/s41598-020-79148-7
.
Brunner
M. I.
(
2023
)
Floods and droughts: A multivariate perspective on hazard estimation
[Preprint]. https://doi.org/10.5194/hess-2023-20
.
Brunner
M. I.
,
Slater
L.
,
Tallaksen
L. M.
&
Clark
M.
(
2021
)
Challenges in modeling and predicting floods and droughts: A review
,
WIREs Water
,
8
(
3
),
e1520
.
https://doi.org/10.1002/wat2.1520
.
Chen
R.
,
Zhang
W.
&
Wang
X.
(
2020
)
Machine learning in tropical cyclone forecast modeling: A review
,
Atmosphere
,
11
(
7
),
676
.
https://doi.org/10.3390/atmos11070676
.
Eyring
V.
,
Cox
P. M.
,
Flato
G. M.
,
Gleckler
P. J.
,
Abramowitz
G.
,
Caldwell
P.
,
Collins
W. D.
,
Gier
B. K.
,
Hall
A. D.
,
Hoffman
F. M.
,
Hurtt
G. C.
,
Jahn
A.
,
Jones
C. D.
,
Klein
S. A.
,
Krasting
J. P.
,
Kwiatkowski
L.
,
Lorenz
R.
,
Maloney
E.
,
Meehl
G. A.
&
Williamson
M. S.
(
2019
)
Taking climate model evaluation to the next level
,
Nature Climate Change
,
9
(
2
),
102
110
.
https://doi.org/10.1038/s41558-018-0355-y
.
Gao
S.
&
Li
X.
(
2010
)
Precipitation equations and their applications to the analysis of diurnal variation of tropical oceanic rainfall
,
Journal of Geophysical Research
,
115
(
D8
),
D08204
.
https://doi.org/10.1029/2009JD012452
.
Giovanni (2020) An online web portal of Earth Data. Washington, DC: NASA
. https:// giovanni.gsfc.nasa.gov/giovanni/.
Goyal, M. K., Gupta, A. K., Jha, S., Rakkasagi, S. & Jain, V. (2022) Climate change impact on precipitation extremes over Indian cities: non-stationary analysis. Technological Forecasting and Social Change, 180, 121685
. https://doi.org/10.1016/j.techfore.2022.121685.
Goyal
M. K.
,
Rakkasagi
S.
,
Shaga
S.
,
Zhang
T. C.
,
Surampalli
R. Y.
&
Dubey
S.
(
2023
)
Spatiotemporal-based automated inundation mapping of Ramsar wetlands using Google Earth Engine
,
Scientific Reports
,
13
(
1
),
17324
.
https://doi.org/10.1038/s41598-023-43910-4
.
Gupta
V.
,
Rakkasagi
S.
,
Rajpoot
S.
,
Imanni
H. S. E.
&
Singh
S.
(
2023
)
Spatiotemporal analysis of Imja Lake to estimate the downstream flood hazard using the SHIVEK approach
,
Acta Geophysica
,
71
(
5
),
2233
2244
.
https://doi.org/10.1007/s11600-023-01124-2
.
Håkansson
N.
,
Adok
C.
,
Thoss
A.
,
Scheirer
R.
&
Hörnquist
S.
(
2018
)
Neural network cloud top pressure and height for MODIS
,
Atmospheric Measurement Techniques
,
11
(
5
),
3177
3196
.
https://doi.org/10.5194/amt-11-3177-2018
.
Huffman
G. J.
,
Bolvin
D. T.
,
Nelkin
E. J.
&
Adler
R. F.
(
2015
)
Integrated multi-satellite retrievals for GPM (IMERG) algorithm, data set descriptions, and evaluation
,
Journal of Hydrometeorology
,
4
(
26
),
2020
05
.
Huffman
G. J.
,
Stocker
E. F.
,
Bolvin
D. T.
,
Nelkin
E. J.
&
Tan
J.
(
2020
)
Integrated Multi-satellite Retrievals for the Global Precipitation Measurement (GPM) Mission (IMERG), Satellite Precipitation Measurement, 1, 343–353
.
Huffman
G. J.
,
Stocker
E. F.
,
Bolvin
D. T.
,
Nelkin
E. J.
&
Tan
J.
(
2023
)
GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree V07, Edited by Andrey Savtchenko, Greenbelt, MD: Goddard Earth Sciences Data and Information Services Center (GES DISC) [Dataset] 10.5067/GPM/IMERGDF/DAY/07.
Iguchi
T.
,
Choi
I.-J.
,
Sato
Y.
,
Suzuki
K.
&
Nakajima
T.
(
2015
)
Overview of the development of the Aerosol Loading Interface for Cloud microphysics In Simulation (ALICIS)
,
Progress in Earth and Planetary Science
,
2
(
1
),
45
.
https://doi.org/10.1186/s40645-015-0075-0
.
Karadan
M. M.
,
Raju
P. V. S.
&
Mishra
A.
(
2021
)
Simulations of Indian summer monsoon using RegCM: A comparison with ERA and GFDL analysis
,
Theoretical and Applied Climatology
,
143
(
3–4
),
1381
1391
.
https://doi.org/10.1007/s00704-020-03496-7
.
Karmakar
N.
,
Chakraborty
A.
&
Nanjundiah
R. S.
(
2017
)
Increased sporadic extremes decrease the intraseasonal variability in the Indian summer monsoon rainfall
,
Scientific Reports
,
7
(
1
),
7824
.
https://doi.org/10.1038/s41598-017-07529-6
.
Khastagir
A.
,
Hossain
I.
&
Anwar
A. H. M. F.
(
2022
)
Efficacy of linear multiple regression and artificial neural network for long-term rainfall forecasting in Western Australia
,
Meteorology and Atmospheric Physics
,
134
(
4
),
69
.
https://doi.org/10.1007/s00703-022-00907-4
.
Kikuchi
N.
,
Nakajima
T.
,
Kumagai
H.
,
Kuroiwa
H.
,
Kamei
A.
,
Nakamura
R.
&
Nakajima
T. Y.
(
2006
)
Cloud optical thickness and effective particle radius derived from transmitted solar radiation measurements: Comparison with cloud radar observations
,
Journal of Geophysical Research: Atmospheres
,
111
(
D7
),
2005JD006363
.
https://doi.org/10.1029/2005JD006363
.
King
M. D.
,
Tsay
S.-C.
,
Platnick
S. E.
,
Wang
M.
&
Liou
K.-N.
(
1997
)
Cloud Retrieval Algorithms for MODIS: Optical Thickness, Effective Particle Radius, and Thermodynamic Phase. MODIS Algorithm Theoretical Basis Document, 1997, 440
.
Kleynhans
T.
,
Montanaro
M.
,
Gerace
A.
&
Kanan
C.
(
2017
)
Predicting top-of-atmosphere thermal radiance using MERRA-2 atmospheric data with deep learning
,
Remote Sensing
,
9
(
11
),
1133
.
https://doi.org/10.3390/rs9111133
.
Koren
I.
,
Kostinski
A.
,
Wollner
U.
&
Dubrovin
D.
(
2022
)
Faint yet widespread glories reflect microphysics of marine clouds
,
npj Climate and Atmospheric Science
,
5
(
1
),
87
.
https://doi.org/10.1038/s41612-022-00312-z
.
Lee
H.
,
Kim
J.
,
Waliser
D. E.
,
Loikith
P. C.
,
Mattmann
C. A.
&
McGinnis
S.
(
2015
)
Using joint probability distribution functions to evaluate simulations of precipitation, cloud fraction and insolation in the North America Regional Climate Change Assessment Program (NARCCAP)
,
Climate Dynamics
,
45
(
1–2
),
309
323
.
https://doi.org/10.1007/s00382-014-2253-y
.
Li, X. & Sui, Y. (2021) Multiple regression and K-nearest-neighbor based algorithm for estimating missing values within sensor. In 2021 7th Annual International Conference on Network and Information Systems for Computers (ICNISC), Guiyang, China, 23–25 July 2021. New York, NY: IEEE. pp. 613–618.
doi: 10.1109/ICNISC54316.2021.00116.
Li
X.
&
Palazzolo
A.
(
2022
)
A review of flywheel energy storage systems: State of the art and opportunities
,
Journal of Energy Storage
,
46
,
103576
.
https://doi.org/10.1016/j.est.2021.103576
.
Liu
Y.
,
Qian
Y.
,
Feng
S.
,
Berg
L. K.
,
Juliano
T. W.
,
Jiménez
P. A.
&
Liu
Y.
(
2022
)
Sensitivity of solar irradiance to model parameters in cloud and aerosol treatments of WRF-solar
,
Solar Energy
,
233
,
446
460
.
https://doi.org/10.1016/j.solener.2022.01.061
.
Marhain, S., Ahmed, A. N., Murti, M. A., Kumar, P. & El-Shafie, A. (2021) Investigating the application of artificial intelligence for earthquake prediction in Terengganu. Natural Hazards, 108 (1), 977–999. https://doi.org/10.1007/s11069-021-04716-7
.
Meenal
R.
,
Michael
P. A.
,
Pamela
D.
&
Rajasekaran
E.
(
2021
)
Weather prediction using random forest machine learning model
,
Indonesian Journal of Electrical Engineering and Computer Science
,
22
(
2
),
1208
.
https://doi.org/10.11591/ijeecs.v22.i2.pp1208-1215
.
Menzel
W. P.
,
Frey
R. A.
,
Zhang
H.
,
Wylie
D. P.
,
Moeller
C. C.
,
Holz
R. E.
,
Maddux
B.
,
Baum
B. A.
,
Strabala
K. I.
&
Gumley
L. E.
(
2008
)
MODIS global cloud-top pressure and amount estimation: Algorithm description and results
,
Journal of Applied Meteorology and Climatology
,
47
(
4
),
1175
1198
.
https://doi.org/10.1175/2007JAMC1705.1
.
Mosavi
A.
,
Ozturk
P.
&
Chau
K.
(
2018
)
Flood prediction using machine learning models: Literature review
,
Water
,
10
(
11
),
1536
.
https://doi.org/10.3390/w10111536
.
Nandgude
N.
,
Singh
T. P.
,
Nandgude
S.
&
Tiwari
M.
(
2023
)
Drought prediction: A comprehensive review of different drought prediction models and adopted technologies
,
Sustainability
,
15
(
15
),
11684
.
https://doi.org/10.3390/su151511684
.
Niu
F.
&
Li
Z.
(
2012
)
Systematic variations of cloud top temperature and precipitation rate with aerosols over the global tropics
,
Atmospheric Chemistry and Physics
,
12
(
18
),
8491
8498
.
https://doi.org/10.5194/acp-12-8491-2012
.
Panicker
D. V.
,
Vachharajani
B. H.
&
Srivastava
R.
(
2023
)
Effects of aerosol emission changes on sea ice concentration and thickness in the Russian Arctic during the 2020 lockdown
,
Journal of Water and Climate Change
,
14
(
9
),
3203
3220
.
https://doi.org/10.2166/wcc.2023.253
.
Platnick
S.
,
Hubanks
P.
,
Meyer
K.
&
King
M. D.
(
2015
)
MODIS Atmosphere L3 Monthly Product (08_L3). NASA MODIS Adaptive Processing System, Greenbelt, MD: Goddard Space Flight Center. http://dx.doi.org/10.5067/MODIS/MYD08_D3.006
.
Pörtner, H-O., Roberts, D., Tignor, M., Poloczanska, E., Mintenbeck, K., Alegría, A., Craig, M., Langsdorf, S., Löschke, S., Möller, V., Okem, A., Rama, B., Belling, D., Dieck, W., Götze, S., Kersher, T., Mangele, P., Maus, B., Mühle, A. & Weyer, N. (2022) Climate Change 2022: Impacts, Adaptation and Vulnerability Working Group II Contribution to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge, UK: Cambridge University Press.
Qian
Y.
,
Long
C. N.
,
Wang
H.
,
Comstock
J. M.
,
McFarlane
S. A.
&
Xie
S.
(
2012
)
Evaluation of cloud fraction and its radiative effect simulated by IPCC AR4 global models against ARM surface observations
,
Atmospheric Chemistry and Physics
,
12
(
4
),
1785
1810
.
https://doi.org/10.5194/acp-12-1785-2012
.
Rajesh
P. V.
&
Goswami
B. N.
(
2023
)
Climate change and potential demise of the Indian deserts
,
Earth's Future
,
11
(
8
),
e2022EF003459
.
https://doi.org/10.1029/2022EF003459
.
Rajput
J.
,
Singh
M.
,
Lal
K.
,
Khanna
M.
,
Sarangi
A.
,
Mukherjee
J.
&
Singh
S.
(
2023
)
Performance evaluation of soft computing techniques for forecasting daily reference evapotranspiration
,
Journal of Water and Climate Change
,
14
(
1
),
350
368
.
https://doi.org/10.2166/wcc.2022.385
.
Rakkasagi
S.
,
Poonia
V.
&
Goyal
M. K.
(
2023
)
Flash drought as a new climate threat: Drought indices, insights from a study in India and implications for future research
,
Journal of Water and Climate Change
,
14
(
9
),
3368
3384
.
https://doi.org/10.2166/wcc.2023.347
.
Rakkasagi
S.
,
Goyal
M. K.
&
Jha
S.
(
2024
)
Evaluating the future risk of coastal Ramsar wetlands in India to extreme rainfalls using fuzzy logic
,
Journal of Hydrology
,
632
,
130869
.
https://doi.org/10.1016/j.jhydrol.2024.130869
.
Remer
L. A.
,
Kaufman
Y. J.
,
Tanré
D.
,
Mattoo
S.
,
Chu
D. A.
,
Martins
J. V.
,
Li
R.-R.
,
Ichoku
C.
,
Levy
R. C.
,
Kleidman
R. G.
,
Eck
T. F.
,
Vermote
E.
&
Holben
B. N.
(
2005
)
The MODIS aerosol algorithm, products, and validation
,
Journal of the Atmospheric Sciences
,
62
(
4
),
947
973
.
https://doi.org/10.1175/JAS3385.1
.
Samantray
P.
&
Gouda
K. C.
(
2024
)
A review on the extreme rainfall studies in India
,
Natural Hazards Research
,
4
(
3
),
347
356
.
https://doi.org/10.1016/j.nhres.2023.08.005
.
Saponaro
G.
,
Sporre
M. K.
,
Neubauer
D.
,
Kokkola
H.
,
Kolmonen
P.
,
Sogacheva
L.
,
Arola
A.
,
De Leeuw
G.
,
Karset
I. H. H.
,
Laaksonen
A.
&
Lohmann
U.
(
2020
)
Evaluation of aerosol and cloud properties in three climate models using MODIS observations and its corresponding COSP simulator, as well as their application in aerosol–cloud interactions
,
Atmospheric Chemistry and Physics
,
20
(
3
),
1607
1626
.
https://doi.org/10.5194/acp-20-1607-2020
.
Shah
R.
&
Srivastava
R.
(
2019
)
Effect of climate change on cloud properties over Arabian sea and central India
,
Pure and Applied Geophysics
,
176
(
6
),
2729
2738
.
https://doi.org/10.1007/s00024-019-02125-3
.
Shah
R.
,
Srivastava
R.
&
Patel
J.
(
2021
)
Study of regional heterogeneity of cloud properties during different rainfall scenarios over monsoon-dominated region
,
Journal of Water and Climate Change
,
12
(
4
),
1086
1106
.
https://doi.org/10.2166/wcc.2020.178
.
Shi
X.
,
Chen
Z.
,
Wang
H.
,
Yeung
D.-Y.
,
Wong
W.
&
Woo
W.
(
2015
)
Convolutional LSTM network: A machine learning approach for precipitation nowcasting
.
Advances in Neural Information Processing Systems
,
28
,
1
11
.
Shikwambana
L.
(
2022
)
Global distribution of clouds over six years: A review using multiple sensors and reanalysis data
,
Atmosphere
,
13
(
9
),
1514
.
https://doi.org/10.3390/atmos13091514
.
Singh
D.
(
2016
)
Tug of war on rainfall changes
,
Nature Climate Change
,
6
(
1
),
20
22
.
https://doi.org/10.1038/nclimate2901
.
Singh
D.
,
Tsiang
M.
,
Rajaratnam
B.
&
Diffenbaugh
N. S.
(
2014
)
Observed changes in extreme wet and dry spells during the South Asian summer monsoon season
,
Nature Climate Change
,
4
(
6
),
456
461
.
https://doi.org/10.1038/nclimate2208
.
Sinha
A.
,
Kathayat
G.
,
Cheng
H.
,
Breitenbach
S. F. M.
,
Berkelhammer
M.
,
Mudelsee
M.
,
Biswas
J.
&
Edwards
R. L.
(
2015
)
Trends and oscillations in the Indian summer monsoon rainfall over the last two millennia
,
Nature Communications
,
6
(
1
),
6309
.
https://doi.org/10.1038/ncomms7309
.
Srivastava
R.
&
Shah
R.
(
2019
)
Study of monsoonal features using regional climate model over heterogeneous monsoon dominated region
,
E3S Web of Conferences
,
101
,
03004
.
https://doi.org/10.1051/e3sconf/201910103004
.
Thakur
N.
,
Karmakar
S.
&
Soni
S.
(
2021
)
Rainfall forecasting using various artificial neural network techniques – A review
,
International Journal of Scientific Research in Computer Science, Engineering and Information Technology
,
7
(
3
),
506
526
.
https://doi.org/10.32628/CSEIT2173159
.
Zhou
Q.
,
Li
J.
,
Xu
J.
,
Qin
X.
,
Deng
C.
,
Fu
J. S.
,
Wang
Q.
,
Yiming
M.
,
Huang
K.
&
Zhuang
G.
(
2018
)
First long-term detection of paleo-oceanic signature of dust aerosol at the southern marginal area of the Taklimakan Desert
,
Scientific Reports
,
8
(
1
),
6779
.
https://doi.org/10.1038/s41598-018-25166-5
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).