Abstract
The article examines machine learning models for precipitation forecasting in the Ambica River basin, addressing the important requirement for accurate hydrological forecasts in water resource management. Using a comprehensive collection of meteorological variables such as temperature, humidity, wind speed, and precipitation, four separate models are used: Support Vector Regression (SVR), Random Forest (RF), Decision Tree (DT), and Multiple Linear Regression (MLR). These models’ performance is rigorously evaluated using various assessment indicators. The cross-correlation function (XCF) is used in this study to evaluate the correlations between climatic variables and precipitation. The XCF analysis reveals several noteworthy trends, such as a high link between maximum temperature and precipitation, with maxima consistently found at months across all four sites. Furthermore, relative humidity and wind speed have significant connections with precipitation. The findings highlight the value of machine learning approaches in improving precipitation forecast accuracy. The RF and SVR models typically outperform, with values ranging from 0.74 to 0.91. This impressive accuracy underlines their effectiveness in precipitation forecasting, beating competing models in both the training and testing stages. These findings have significant consequences for hydrological processes, notably in the Ambica River basin, where accurate precipitation forecasting is critical for sustainable water resource management.
HIGHLIGHTS
To create precise precipitation forecasting models for the Ambica River basin and evaluating the efficiency of various machine learning techniques.
To estimate precipitation in the Ambica River basin most accurately, the study will determine the most accurate ML model.
INTRODUCTION
Models for precipitation forecasting are essential tools for predicting and managing water resources, lessening the effects of extreme weather, and guaranteeing community safety and security (Ludwig et al. 2014; Mehta & Kumar 2022; Mehta et al. 2022a, 2022b). Agriculture, energy production, water management, emergency services, and disaster management organizations all depend on precise precipitation forecasts (Wang & Xie 2018). Rainfall is crucial to agriculture, and variations in rainfall patterns may have a big influence on agricultural output and food security. The productivity of farmers can be increased and crop losses from droughts, floods, and other extreme weather events can be decreased by using accurate precipitation forecasting models to guide planting, irrigation, and other agricultural operations (Ali et al. 2017). Another area where precise precipitation forecasting is crucial is water resource management. Water availability is directly impacted by precipitation patterns, which also have an influence on reservoir levels, river flows, and groundwater recharge rates (Kumar & Yadav 2022). Water managers can limit the danger of flooding, maintain water availability during droughts and other occurrences that cause water scarcity, and improve water allocation and distribution with the use of accurate precipitation forecasts (Gong et al. 2016; Mehta & Kumar 2021).
Precipitation patterns have an impact on the energy industry as well because hydroelectric power generation is highly dependent on rainfall. Energy firms may limit the risk of power outages caused by extreme weather by planning for changes in water supply, maximizing power output, and using precipitation forecasting models (Sharma et al. 2023). Precipitation forecasting models are also used by emergency services and disaster management organizations to plan for and respond to severe weather occurrences. Planning evacuations, allocating resources, and preparing for probable flooding, landslides, or other dangerous events can all be facilitated by accurate precipitation forecasts. Recent variations in rainfall patterns brought on by global climate change have increased the frequency and severity of severe precipitation events. Therefore, it is essential to manage the impact of these events on communities and infrastructure through accurate precipitation forecasting. To lessen the effects of severe occurrences, it is crucial to create trustworthy and accurate precipitation forecasting models (Schumann et al. 2016; Mehta & Yadav 2021).
Models for predicting precipitation may be created using a variety of modelling approaches. Statistical models use historical data to find patterns and relationships between various variables, including temperature, humidity, pressure, and precipitation, they are frequently used in precipitation forecasting (Dastorani et al. 2016). These models forecast future precipitation patterns by using the relationships between these factors. Statistical models have the benefits of being straightforward, simple to use, and inexpensive to compute. The autoregressive model, which analyses past precipitation data to produce forecasts for the future, is one of the most often used statistical models in precipitation forecasting (Pham et al. 2019). Since past precipitation patterns are thought to be a good indicator of future precipitation patterns, autoregressive models are excellent for making short-term forecasts (Pérez-Alarcón et al. 2022). It is also excellent for recognizing trends and cycles in precipitation patterns, which makes it appropriate for long-term predictions. The multiple regression models, which take into account numerous independent factors to forecast precipitation patterns, are another frequently used statistical model (Jobson 1991). When a number of factors may affect precipitation patterns and their relationships with precipitation may not be linear, this model can be helpful (Themeßl et al. 2011).
Statistical models have several drawbacks despite their benefits. The accuracy of statistical models can be constrained by changes in the underlying connections between variables over time, which is one of their most important limitations. For instance, changes in land use or urbanization may affect how temperature and precipitation relate to one another, making historical data less relevant for generating forecasts about the future. Furthermore, because extreme precipitation events may not exhibit the same patterns as more typical precipitation patterns, statistical models may have difficulty capturing these events (Benyahya et al. 2007; Patel et al. 2023).
The most sophisticated and reliable precipitation forecasting models now in use are numerical weather prediction models (Markovics & Mayer 2022). To mimic atmospheric conditions and forecast weather patterns, they employ complex mathematical and physical equations (Moosavi et al. 2021). To provide very accurate forecasts at various geographical and temporal dimensions, these models may include data from several sources, including weather stations, satellites, and radar. Meteorological agencies frequently utilize numerical weather prediction models to provide official weather predictions. High precision is one benefit of numerical weather prediction models (Weyn et al. 2020). These projections may be produced by these models up to a week in advance, which can be crucial for planning and decision-making in a variety of industries. They are adaptable tools for a variety of uses because they can create forecasts at many geographical and temporal dimensions, from global to regional to local. Additionally, decision-makers can prepare for uncertain events by using probabilistic forecasts that can be provided by numerical weather prediction models. However, there are a number of drawbacks to numerical weather prediction models. The requirement for large computing resources and skill to construct and maintain these models is one of the key limitations. They need to be regularly updated and calibrated based on the most recent data and scientific developments, and they need powerful computers and sophisticated software to operate. Furthermore, these models are sensitive to input data errors, such as inaccurate temperature or pressure readings, which can seriously impair forecast accuracy (Yoon 2019).
Artificial intelligence models known as ‘machine learning (ML)’ have grown in popularity recently. They handle complicated and varied data associated with precipitation forecasting and utilize algorithms to find patterns and correlations in huge databases (Akkem et al. 2023). In comparison to other models, such as statistical and numerical weather prediction models, ML models have a number of advantages (Cho et al. 2020). The capacity of ML models to handle vast and complicated datasets is one of its benefits (Dhal & Azad 2022). Precipitation forecasting takes into account a lot of different factors, including terrain, pressure, humidity, temperature, and wind speed and direction (Espeholt et al. 2022). Traditional statistical models struggle to find relationships and trends because these variables can be mixed in various ways. Large and complicated datasets may be handled by ML models, which can also spot patterns and connections that statistical models could miss out on (Choudhury et al. 2021).
The capacity of ML models to learn from data and adjust to changing circumstances is another advantage. As new data becomes available, ML models can update their predictions by using historical data to find patterns and relationships between various variables (Reichstein et al. 2019). For forecasting precipitation, where weather patterns can change quickly and unpredictably, the capacity to learn from data and adapt to changing conditions is particularly crucial (Basha et al. 2020). Additionally, ML models can be used to create ensemble models, which combine the output of various models to increase accuracy. When forecasting precipitation, ensemble models are especially helpful because different weather models may be required to capture various aspects of weather patterns. As an illustration, one model could perform better at forecasting precipitation in hilly areas than another model would perform better at forecasting precipitation in coastal areas. Ensemble models are able to provide forecasts that are more reliable and accurate by integrating the output of numerous models. Finally, probabilistic predictions may be generated using ML models, which can be helpful for decision-making in a variety of industries. Probabilistic predictions give information on the possibility of various events, such as the likelihood that a specified amount of rain will fall over a particular time frame. Making informed decisions regarding water management, agriculture, and other industries sensitive to precipitation patterns may be done using this knowledge. Various researchers have used different ML methods for the precipitation forecasting such as deep learning (Salman et al. 2015), artificial neural network(ANN) (Shah et al. 2018; Basha et al. 2020; Kumar & Yadav 2021), support vector regression (SVR) (Cramer et al. 2017; Singh et al. 2023), recurrent neural network (RNN) (Tang et al. 2022), long short-term memory (LSTM) network (Barrera-Animas et al. 2022), decision tree (Rahman et al. 2022).
Need of the study
The region of South Asia, including the Ambica River basin in South Gujarat, India, is seeing an increase in the frequency and severity of severe precipitation events, which has prompted the necessity for this study. The management of water resources, agriculture, infrastructure, and human livelihoods are all significantly impacted by these events. For successfully managing water resources and reducing the effects of severe catastrophes, accurate precipitation forecasting models are essential. However, there are drawbacks to the current precipitation forecasting models, such as their inability to account for the intricate non-linear relationships between meteorological variables and precipitation. This study develops and evaluates the precision of various machine learning models for precipitation forecasting in the Ambica River basin in an effort to overcome these limitations. In order to improve water resource management strategies and lessen the effects of extreme precipitation events, the study will add to the body of knowledge by selecting the best precise model for precipitation forecasting in the area. The findings of the research will be especially useful to decision-makers, managers of water resources, and other interested parties in vulnerable areas who must manage water resources in the face of changing climate conditions.
Research gap
Recent catastrophic flooding occurrences in the area have brought to light the necessity for precise precipitation forecasting models in order to enhance water resource management and lessen the effects of extreme events. Despite the significance of the problem, little study has been conducted especially on the Ambica River watershed. Additionally, there are no studies that compare how well various machine learning models perform in predicting the region's monthly precipitation. Finding the most precise ML model for precipitation forecasting in the Ambica River basin is crucial because it may help with the creation of efficient mitigation plans to lessen the effects of floods on the neighbourhood's infrastructure and inhabitants.
Research objective and novelty
The objective of this research is to create precise precipitation forecasting models for the Ambica River basin and evaluating the efficiency of various machine learning techniques. In order to estimate precipitation in the Ambica River basin most accurately, the study will determine the most accurate ML model. This will aid in reducing the effects of severe precipitation occurrences and enhancing the management of water resources. Support Vector Regression, Random Forest, and Decision Tree are three ML models that will be used to assess the precision of predictions. The ML models are also compared to the statistical model Multiple Linear Regression (MLR) method. For areas like South Asia that are susceptible to severe precipitation occurrences, this study will be helpful. It will aid in the decision-making process and enable managers of water resources to take the required steps to lessen the effects of severe occurrences.
STUDY AREA AND DATASET
Except for the southwest monsoon season, the basin experiences a hot and relatively dry climate during summers. The temperature varies from 32 to 40 °C during the day and 25 to 8 °C during the night. The southwest monsoon, which lasts from June to September, accounts for nearly 98% of the annual rainfall in the basin. Outside of the monsoon season, there is very little rainfall.
Table 1 shows the rain gauge stations details of Ahwa, Borkhal, Gandevi, and Mankunia in the Navsari and Dang districts includes latitude, longitude, and elevation information. Table 2 displays the data that was used, which included parameters such as mean monthly precipitation, wind speed, minimum and maximum monthly temperature, and relative humidity. The mean monthly precipitation data, which is available from 1981 to 2021, was obtained from NASA Power Access' Data Access Viewer. These parameters provide information about the climatic conditions in the study area, which aids in understanding the impact of climate on the Ambica River Basin's hydrological regime.
Presents information on rain gauge stations
Station Name . | Districts . | Taluka . | Latitude . | Longitude . | Elevations (m) . |
---|---|---|---|---|---|
Ahwa | Dang | Ahwa | 20°45′33″ | 73°41′23″ | 468 |
Borkhal | Dang | Ahwa | 20°42′33″ | 73°43′23″ | 318 |
Gandevi | Navsari | Navsari | 20°48′49″ | 73°00′16″ | 15 |
Mankunia | Navsari | Vansda | 20°41′48″ | 73°23′55″ | 98 |
Station Name . | Districts . | Taluka . | Latitude . | Longitude . | Elevations (m) . |
---|---|---|---|---|---|
Ahwa | Dang | Ahwa | 20°45′33″ | 73°41′23″ | 468 |
Borkhal | Dang | Ahwa | 20°42′33″ | 73°43′23″ | 318 |
Gandevi | Navsari | Navsari | 20°48′49″ | 73°00′16″ | 15 |
Mankunia | Navsari | Vansda | 20°41′48″ | 73°23′55″ | 98 |
1981–2021 Climate data from NASA for Ambica River
Data . | Period . | Source . |
---|---|---|
Mean Monthly Precipitation data (mm) | 1981–2021 | From Data access Viewer- NASA Power access (https://power.larc.nasa.gov/data-access-viewer/) |
Wind speed(m/s) | 1981–2021 | |
Minimum Monthly Temperature data (°C) | 1981–2021 | |
Maximum Monthly Temperature data (°C) | 1981–2021 | |
Relative Humidity (%) | 1981–2021 |
Data . | Period . | Source . |
---|---|---|
Mean Monthly Precipitation data (mm) | 1981–2021 | From Data access Viewer- NASA Power access (https://power.larc.nasa.gov/data-access-viewer/) |
Wind speed(m/s) | 1981–2021 | |
Minimum Monthly Temperature data (°C) | 1981–2021 | |
Maximum Monthly Temperature data (°C) | 1981–2021 | |
Relative Humidity (%) | 1981–2021 |
METHODOLOGY
Support vector machine
Support Vector Machine (SVM) techniques are a subset of supervised learning algorithms used for classification and regression applications. Support Vector Regression (SVR), one of several prediction algorithms, is renowned for being trustworthy and highly efficient for regression applications (Shams et al. 2023; Zhu et al. 2023). SVR is an SVM variation, uses an integrated method for working with continuous data. SVRs are capable of both linear and nonlinear regression, they are a flexible choice for various dataset types. In comparison to other local models and algorithms that rely on traditional chaotic techniques, SVRs are more resilient and robust for datasets that have a high level of noise. They are also more reliable for datasets that have mixed noise compared to other models that use conventional chaotic techniques. The goal of SVR is to maximize the number of instances that fit within the ‘street’ while minimizing the number of margin violations (Pan et al. (2023). The width of the street is determined by a hyper-parameter known as Epsilon.










Here, C is a hyperparameter that controls the tradeoff between maximizing the margin and minimizing the errors, and are the slack variables that allow deviations from
. The objective function penalizes samples whose predictions are far from their true targets, with the penalty depending on whether the prediction lies above or below the
tube.






Pseudocode for the SVM
- 1.
Start
- 2.
Import libraries i.e. numpy, pandas, seaborn, matplotlib.pyplot
- 3.
Input-
Dataset with such variables
- 4.
Define X and Y variables.
- 5.
Preprocessing of the dataset-find missing values, normalization of data
- 6.
from sklearn.model_selection import train_test_split
- 7.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1
- 8.
- 9.
Import SVR model with kernel from sklearn library.
Decision tree
Decision trees (DTs) are a supervised learning technique used for classification and regression problems. They are non-parametric in nature and aim to develop a model that predicts the value of a target variable by learning simple decision rules based on the data attributes. A decision tree can be thought of as an approximate piecewise constant function. Bagging is a parallelized approach that is not dependent on the base learners (Méndez et al. 2023). Examples of such models include random forest and additional trees regression. Random forest creates multiple decision tree models by using training data and feature bootstrapping, and then averages several base learners to arrive at the final prediction. On the other hand, extra-tree's regression trains various base learners using all the data, and the node splitting is more randomly distributed (Cojbasic et al. 2023).









The algorithm selects the candidate split that minimizes the impurity measure
, and recursively applies the same process to the left and right subsets
and
until a stopping criterion is met. The stopping criterion can be either reaching the maximum allowable depth, where no further splits are allowed, or when the number of samples in a node
is less than a minimum threshold (
) or when
. Appendix 7 shows the flowchart of the DT model.
Pseudocode for the DT
- 1.
Import libraries i.e. numpy, pandas, seaborn, matplotlib.pyplot
- 2.
Input-
Train Dataset n observations
Test Dataset m predictors.
- 3.
Splitting datasets for the best partitions into individual classes.
- 4.
Import Decision Tree model.
- 5.
Create the root node for the tree.
- 6.
Apply best condition for tree i.e. max depth, leaf node, decision node.
Root node: splitting of tree starts.
Decision node: where decisions are made.
Child node: this node can't be split for further.
- 7.
Check the accuracy if it is not good then try repeating with above condition until best accuracy.
- 8.
Output: Decision Tree
Multiple linear regression






Pseudocode for the MLR
- 1.
Start
- 2.
Import libraries i.e. numpy, pandas, seaborn, matplotlib.pyplot
- 3.
Input-
Train Dataset n observations
Test Dataset m predictors.
- 4.
Read multiple variable numbers of data.
- 5.
Locate the F-Statistics by OLS, check is it significant or not.
- 6.
If P value < 0.05 if yes the find (β1 X1,β2 X2…βn Xn) and if not so it is not significant.
- 7.
Interpret the slopes values (β1, β2, β3, β4)
- 8.
Generate MLR model.
- 9.
Find the accuracy from the evaluation metrices.
- 10.
End
Random forest regression
Random forest is a powerful and accurate regression model that can handle a variety of problems, including those with non-linear relationships. It is an ensemble learning method used for regression in supervised machine learning (Zhang et al. 2022). During the training phase, multiple decision trees are built, and each tree predicts the mean of the classes. The steps involved in the random forest algorithm are as follows: first, a random set of p data points are chosen from the training set. Then, a decision tree is constructed using these p data points. This process is repeated for a total of N trees. Finally, for each of the N trees, the value of y is predicted for a new data point, and the average of all the predicted y values is assigned to the new data point. For predicting rainfall data using environmental input variables, the random forest algorithm is selected as the predictive model. The algorithm builds a large number of decision trees during the training phase, and the mode of mean prediction or regression of each tree determines the resulting class. According to (Kusiak et al. 2013), the random forest technique is effective in handling large datasets and can produce positive experimental results even with a significant amount of missing data. Appendix 9 shows the flowchart of the random forest model.
Pseudocode for the RFR
- 1.
Import libraries i.e. numpy, pandas, seaborn, matplotlib.pyplot
- 2.
Input-
- a.
Train Dataset n observations
- b.
Test Dataset m predictors.
- a.
- 3.
Splitting datasets for the best partitions into individual classes.
- 4.
Import random forest model.
- 5.
Select randomly n observation from the observation set.
- 6.
Generate tree from the randomly chosen data.
- 7.
Split the node into sub-node.
- 8.
Model is trained on each bootstrapped independently.
- 9.
Random forest model is created using maximum values.
- 10.
Check the accuracy if it is not good then try repeating with above condition until best accuracy.
Developed model and measuring performance
In this study, three different machine learning models were developed to predict precipitation using different combinations of exogenous inputs. The models were evaluated based on four metrics: coefficient of determination (R2), mean absolute error (MAE), root mean square error (RMSE), and mean squared error (MSE). The R2 metric assesses how well the model can replicate measured values and predict future values. The MAE is the average magnitude of the predicted and measured values, while the RMSE is the square root of the average squared difference between predicted and measured values. The MSE is the average of the set of errors from predicted and measured values, and the Explained Variance score (EVS) is equal to the dispersion of errors between measured and predicted values. Table 3 summarizes the three different models developed based on different combinations of input variables. Model A used four inputs, namely maximum temperature (Tmax), minimum temperature (Tmin), relative humidity (RH), and wind speed (WS). Model B utilized RH and WS as two inputs, and Model C only used RH.
Models developed based on different input
Model . | Inputs . |
---|---|
A | Tmax, Tmin, RH, WS |
B | RH, WS |
C | RH |
Model . | Inputs . |
---|---|
A | Tmax, Tmin, RH, WS |
B | RH, WS |
C | RH |
RESULTS AND DISCUSSIONS
Time series analysis
The statistics of the monthly data for the four stations Ahwa, Borkhal, Gandevi, and Mankunia are shown in Appendix 1. The variables included in the table are relative humidity, wind speed, maximum temperature, minimum temperature, and precipitation. The statistics reported for each variable include the mean, standard deviation (σ), and coefficient of variation (CV). The maximum and minimum values for each variable are also reported. The CV is the ratio between the standard deviation and the mean, and it provides a measure of the variability of the data. The table shows that there are variations in the monthly statistics of the four stations, indicating that the climate conditions differ across the studied area.

The patterns observed in the four stations were very similar. The Appendix 10 shows the Tmin showed peaks of 0.63 for Borkhal, 0.6 for Ahwa and Gandevi, and 0.62 for Mankunia. On the other hand, the Appendix 11 shows the Tmax showed XCF peaks of 0.65 for Ahwa and Gandevi, 0.7 for Borkhal, and 0.65 for Mankunia. This indicates a stronger correlation between maximum temperature and precipitation. In all four stations, the peaks were observed at
months. The cross-correlation between relative humidity and precipitation is shows in Appendix 12, all the stations showed an XCF value close to 0.7, indicating a good relationship between humidity and precipitation. Similarly, for the cross-correlation between wind speed and precipitation is shows in Appendix 13, peaks were observed at
months, and an XCF value close to 0.7 was obtained, indicating a good relationship between wind speed and precipitation.
Ahwa station
This section discusses the predictions made for the Ahwa stations, and presents the computed evolution metrics for both the training and testing stages in an Appendix 2. The best performance during the training stage was achieved by SVR Model A, which included all the monitored inputs, with an R2 value of 0.85 and a low MAE of 0.80 mm, RMSE of 2.01 mm, MSE of 2.96 mm, and EVS of 0.85. Random Forest Model A also performed well, with an R2 value of 0.95 and a low MAE of 0.87 mm, RMSE of 1.94 mm, MSE of 3.75 mm, and EVS of 0.95. However, the performance reduced significantly for SVR Model B, which did not include Tmax and Tmin as input, with an R2 value of 0.71, MAE of 1.18 mm, RMSE of 2.73 mm, MSE of 7.48 mm, and EVS of 0.73. Model C, which simply took relative humidity as input, performed even worse, with an R2 value of 0.60, MAE of 1.46 mm, RMSE of 3.25 mm, MSE of 10.54 mm, and EVS of 0.62. MLR Model C had the lowest performance, with an R2 value of 0.54, MAE of 0.51 mm, RMSE of 3.56 mm, MSE of 12.69 mm, and EVS of 0.54. However, there was a significant difference in the model's performance during the testing stage, with SVR Model A still performing the best, with an R2 value of 0.87, MAE of 1.18 mm, RMSE of 1.98 mm, MSE of 1.98 mm, and EVS of 0.87. Random Forest Model A performed admirably as well, with an R2 of 0.71, MAE of 3.16 mm, RMSE of 5.92 mm, MSE of 3.11 mm, and EVS of 0.72.
Appendix 14(a) depicts a scatter plot with an R2 value of 0.87 of the association between precipitations measured and predicted by the SVR Model A for the Ahwa station. Appendix 14(b) depicts the predicted precipitation (in mm) vs. the measured precipitation (in mm) for the SVR Model A at the Ahwa station. The figure depicts a perfect match between predicted and measured precipitation. Appendix 14(c) depicts the predicted precipitation (in mm) vs. the measured precipitation (in mm) using the Decision Tree Model A for the Ahwa station. The graphic also shows the coefficient of determination (R2 = 0.51), which indicates how well the trend line fits the data.
Borkhal station
In this section, the predictions for the Borkhal stations are discussed, and the evaluation metrics are presented for both the training and testing stages in a Appendix 3. For the training stage, the best performance was obtained with SVR Model A, which included all monitored inputs (R2 = 0.84, MAE (mm) = 0.89, RMSE (mm) = 2.17, MSE (mm) = 4.71, EVS = 0.84), and Random Forest Model A (R2 = 0.97, MAE (mm) = 0.48, RMSE (mm) = 0.98, MSE (mm) = 0.97, EVS = 0.97), as shown in the Appendix 3. However, the performance decreased for SVR Model B, which did not include Tmax and Tmin as inputs (R2 = 0.76, MAE (mm) = 1.26, RMSE (mm) = 2.68, MSE (mm) = 7.18, EVS = 0.76) and for Model C, which only included relative humidity (R2 = 0.64, MAE (mm) = 1.58, RMSE (mm) = 3.25, MSE (mm) = 10.57, EVS = 0.65). The worst performance was achieved by Multiple Linear Regression Model C for the testing stage (R2 = 0.51, MAE (mm) = 2.82, RMSE (mm) = 3.88, MSE (mm) = 13.64, EVS = 0.51). However, there was a marked difference in performance between the models for the testing stage, with better performance for SVR Model A (R2 = 0.87, MAE (mm) = 1.26, RMSE (mm) = 2.04, MSE (mm) = 4.17, EVS = 0.84) and Random Forest Model A (R2 = 0.84, MAE (mm) = 1.17, RMSE (mm) = 2.24, MSE (mm) = 5.00, EVS = 0.84).
Appendix 15(a) depicts a scatter plot with an R2 value of 0.87 of the association between precipitations measured and predicted by the SVR Model A for the Borkhal Station. Appendix 15(b) depicts the predicted precipitation (in mm) vs. the measured precipitation (in mm) for the SVR Model A at the Borkhal Station. The figure depicts a perfect match between predicted and measured precipitation. Appendix 15(c) depicts the predicted precipitation (in mm) vs. the measured precipitation (in mm) using the Decision Tree Model A for the Borkhal Station. The graphic also shows the coefficient of determination (R2 = 0.66), which indicates how well the trend line fits the data.
Gandevi station
In this section, the predictions made for the Gandevi stations and present the evaluation metrics computed for the training and testing stages in the Appendix 4. Among the models tested during the training stage, the SVR Model A with all the monitored inputs (R2 = 0.83, MAE = 0.98 mm, RMSE = 2.65 mm, MSE = 7.05 mm, EVS = 0.83) and Random Forest Model A (R2 = 0.96, MAE = 0.63 mm, RMSE = 1.39 mm, MSE = 1.94 mm, EVS = 0.96) performed the best, as shown in the Appendix 4. The performance reduced when passing to SVR Model B, which excluded Tmax and Tmin as inputs (R2 = 0.69, MAE = 1.46 mm, RMSE = 3.55 mm, MSE = 12.59 mm, EVS = 0.70). Furthermore, the inclusion of relative humidity in Model C resulted in a further reduction in performance (R2 = 0.59, MAE = 1.76 mm, RMSE = 4.11 mm, MSE = 16.92 mm, EVS = 0.60). The worst performance was observed for Multiple Linear Regression Model C during the testing stage (R2 = 0.51, MAE = 3.29 mm, RMSE = 4.69 mm, MSE = 21.99 mm, EVS = 0.51). However, a marked improvement in performance was observed for the testing stage, with better performance for SVR Model A (R2 = 0.85, MAE = 1.49 mm, RMSE = 2.75 mm, MSE = 7.59 mm, EVS = 0.86) and Random Forest Model A (R2 = 0.84, MAE = 1.35 mm, RMSE = 2.65 mm, MSE = 7.02 mm, EVS = 0.84).
Appendix 16(a) depicts a scatter plot with an R2 value of 0.85 of the association between precipitations measured and predicted by the SVR Model A for the Gandevi Station. Appendix 16(b) depicts the predicted precipitation (in mm) vs. the measured precipitation (in mm) for the SVR Model A at the Gandevi Station. The figure depicts a perfect match between predicted and measured precipitation. Appendix 16(c) depicts the predicted precipitation (in mm) vs. the measured precipitation (in mm) using the Decision Tree Model A for the Gandevi Station. The graphic also shows the coefficient of determination (R2 = 0.74), which indicates how well the trend line fits the data.
Mankunia station
In this section, the predictions for the Mankunia stations are discussed, and the evaluation metrics for both training and testing stages are presented in a Appendix 5. For the training stage, the best performance was achieved by SVR Model A, which included all the monitored inputs (R2 = 0.84, MAE = 1.16 mm, RMSE = 3.00 mm, MSE = 9.21 mm, EVS = 0.84) and Random Forest Model A (R2 = 0.97, MAE = 0.66 mm, RMSE = 1.38 mm, MSE = 1.92 mm, EVS = 0.97), as shown in the Appendix 5. The performance decreased for SVR-Model B, which did not include Tmax and Tmin as input (R2 = 0.74, MAE = 1.63 mm, RMSE = 3.81 mm, MSE = 14.52 mm, EVS = 0.74) and for Model C, which included relative humidity (R2 = 0.63, MAE = 2.05 mm, RMSE = 4.55 mm, MSE = 20.72 mm, EVS = 0.64). The worst performance was observed for Multiple Linear Regression Model C in the testing stage (R2 = 0.54, MAE = 3.80 mm, RMSE = 5.29 mm, MSE = 27.99 mm, EVS = 0.54). However, during the testing stage, a significant difference was observed between the models, with SVR Model A achieving the best performance (R2 = 0.87, MAE = 1.58 mm, RMSE = 3.00 mm, MSE = 9.00 mm, EVS = 0.87) and Random Forest Model A (R2 = 0.91, MAE = 1.17 mm, RMSE = 1.38 mm, MSE = 5.25 mm, EVS = 0.92).
Appendix 17(a) depicts a scatter plot with an R2 value of 0.87 of the association between precipitations measured and predicted by the SVR Model A for the Mankunia Station. Appendix 17(b) depicts the predicted precipitation (in mm) vs. the measured precipitation (in mm) for the SVR Model A at the Mankunia Station. The figure depicts a perfect match between predicted and measured precipitation. Appendix 17(c) depicts the predicted precipitation (in mm) vs. the measured precipitation (in mm) using the Decision Tree Model A for the Mankunia Station. The graphic also shows the coefficient of determination (R2 = 0.67), which indicates how well the trend line fits the data.
CONCLUSION
The study showed the potential of ML methods for creating accurate precipitation prediction models for hydrological systems. SVR, RF, DT, and MRL methods were investigated, and the performance of the models was assessed using five evaluation metrics. The findings demonstrated that the RF and SVR models outperformed the other models in offering high accuracy precipitation forecast for all four stations in the Ambica River basin. This study emphasizes how crucial it is to take into account local meteorological variables like temperature, humidity, wind speed, and precipitation when creating precise precipitation prediction models. The study's findings also highlight the need for more research to confirm these models' efficacy in additional domains with distinct characteristics. Overall, effective precipitation prediction models are critical for efficient water resource management, particularly in places where irrigation and agriculture are strongly reliant on rainfall. ML approaches offer a feasible solution for addressing this issue and improving precipitation forecasting accuracy. Our future goals include creating a rainfall prediction model that takes into account wind direction, sea surface temperature, sunshine data, cloud cover, and climate indices. We also want to look into how climate change is affecting rainfall, and further research should be done in regions that have both tropical monsoon and other distinct climates, such as semi-arid climates. It is advised that more study and development be done to improve the models' suitability for predicting urban precipitation.
DECLARATIONS
All authors have read, understood, and have complied as applicable with the statement on ‘Ethical responsibilities of Authors’ as found in the Instructions for Authors.
FUNDING
This research received no external funding.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.