Abstract
Rainfall–runoff (R–R) analysis is essential for sustainable water resource management. In the present study focusing on the Peddavagu River Basin, various modelling approaches were explored, including the widely used Soil and Water Assessment Tool (SWAT) model, as well as seven artificial intelligence (AI) models. The AI models consisted of seven data-driven models, namely support vector regression, artificial neural network, multiple linear regression, Extreme Gradient Boosting (XGBoost) regression, k-nearest neighbour regression, and random forest regression, along with one deep learning model called long short-term memory (LSTM). To evaluate the performance of these models, a calibration period from 1990 to 2005 and a validation period from 2006 to 2010 were considered. The evaluation metrics used were R2 (coefficient of determination) and NSE (Nash–Sutcliffe Efficiency). The study's findings revealed that all eight models yielded generally acceptable results for modelling the R–R process in the Peddavagu River Basin. Specifically, the LSTM demonstrated very good performance in simulating R–R during both the calibration period (R2 is 0.88 and NSE is 0.88) and the validation period (R2 is 0.88 and NSE is 0.85). In conclusion, the study highlighted the growing trend of adopting AI techniques, particularly the LSTM model, for R–R analysis.
HIGHLIGHTS
The study used SWAT and seven AI models for the Peddavagu River Basin.
LSTM performed well in simulating R–R during calibration (R2 is 0.88 and NSE is 0.88) and validation (R2 is 0.88 and NSE is 0.85).
These models are valuable for sustainable water management in the Peddavagu River Basin.
LIST OF ABBREVIATIONS
- CWC
Central Water Commission
- SWAT
Soil and Water Assessment Tool
- CN2
Initial SCS CN II value
- GW_DELAY
groundwater delay (days)
- IMD
Indian Meteorological Department
- ALPHA_BF
baseflow alpha factor (days)
- SW
sub-watershed
- SURLAG
surface runoff lag time in the HRU (days)
- R2
coefficient of determination
- GWQMN
threshold depth of water in the shallow aquifer required for return flow to occur (mm)
- NSE
Nash–Sutcliffe Efficiency
- ALPHA_BNK
baseflow alpha factor for bank storage (days)
- GW_REVAP
groundwater ‘revap’ coefficient
- PBIAS
percent of bias
- SOL_AWC
available water capacity of the soil layer
- R–R
rainfall–runoff
- OV_N
Manning's ‘n’ value for overland flow
- AI
artificial intelligence
- ESCO
soil evaporation compensation factor
- LSTM
long short-term memory
- SOL_K
saturated hydraulic conductivity (mm/h)
- FAO
Food and Agriculture Organization
- CH_N2
Manning's n value for main channel
- CH_K2
effective hydraulic conductivity (mm/h)
- RCHRG_DP
deep aquifer percolation fraction
- SOL_BD
moist bulk density (Mg/m3 or g/cm3)
- CANMX
maximum canopy storage (mm)
- REVAPMN
threshold depth of water in the shallow aquifer for ‘revap’ to occur
- EPCO
plant uptake compensation factor
INTRODUCTION
The management and planning of water resources, including irrigation water management, river basin engineering, reservoir operation, flood control, and navigation, rely heavily on rainfall–runoff (R–R) modelling (Santos & da Silva 2014; Noori & Kalin 2016; Shekar & Mathew 2022c). Additionally, it is crucial for the mitigation, early detection, and incident mitigation of natural catastrophes like flooding and drought (Shamseldin 2010; Nourani et al. 2013; Suwarno et al. 2020; Kumar et al. 2021; Gupta et al. 2023). The most popular methods for predicting discharge and estimating water balance are R–R models (Beven 2012). However, modelling the R–R process is a challenging hydrologic problem since the process exhibits randomness and complicated spatial and temporal dynamics (Singh & Sankarasubramanian 2014).
There have been numerous challenges over the past few decades to pinpoint and fully comprehend the R–R process (Omar et al. 2020, 2021a, 2021b, 2022a, 2022b; Gaur et al. 2021; Shekhar et al. 2021; Singh et al. 2023; Srivastava et al. 2023). Physically based, conceptual models and mathematically based, data-driven models are two primary groups of modelling methodologies (Fathian et al. 2018). Physically based models require a large amount of hydroclimatic data and input parameters to simulate complex hydrologic processes such as R–R. However, the usage of physically based models is frequently constrained by the lack of access to such physical data (Liu & Todini 2002; Lu et al. 2013). The connection between meteorological data and runoff is, however, captured by mathematically based, data-driven models without explicit understanding of the physical behaviour of the watershed programme (Modarres & Ouarda 2013; Kan et al. 2015).
There are numerous models in the hydrology literature; however, investigations on river discharge prediction have indicated that R–R models that directly acquire the physical phenomena of the streamflow approach are more effective (Malago et al. 2016; Pandey et al. 2016). Soil and Water Assessment Tool (SWAT), one of many watershed hydrology models, is used extensively around the world for applications, policy creation, environmental conditions, analysis for water restoration at several geographic scales, and decision-making (Guse et al. 2016; Himanshu et al. 2017; Dhami et al. 2018; Frizzle et al. 2021; Getachew et al. 2021; Dash et al. 2023). The model is also semi-distributed because it operates at the hydrologic response unit (HRU) level. The smallest spatial unit in the model is called the HRU, and the typical definition of the HRU combines together all identical soils, land use/land cover (LULC), and slopes within a sub-watershed (Neitsch et al. 2011; Nerantzaki et al. 2015; Schmalz et al. 2015; Swain et al. 2018; Aadhar et al. 2019; Veettil & Mishra 2020; Gupta et al. 2022; Mathew et al. 2022; Shekar & Mathew 2022a, 2022b; Mathew & Shekar 2023).
The SWAT model has been effectively employed in a number of applications, including the assessment of the consequences of hydropower projects, surface water, impacts on groundwater, climate change, snowmelt, the cycling of non-point source pollutants, etc. (Tokar & Markus 2000; Sinnathamby et al. 2017; Gupta et al. 2020a, 2020b; Nazari-Sharabian et al. 2020; Gupta et al. 2022; Khajuria et al. 2022; Nyakundi et al. 2022; Rautela et al. 2022, 2023; Umugwaneza et al. 2022; Gaur et al. 2023). Furthermore, this model needs a lot of temporal and spatial data, as well as features that can often be difficult to predict (Makwana & Tiwari 2014). The accuracy of the input data and model parameters determine how well the model performs. Additionally, the lengthy and difficult calibration and validation processes are caused by the numerous parameters, a wide variety of values, and the intricate connections between them (Rezaeianzadeh et al. 2013; Omani et al. 2017).
On the other hand, with the development of artificial intelligence (AI) during the past few decades, academics have become increasingly interested in the estimation of hydrological variables (He et al. 2014; Chutiman et al. 2022). AI techniques are becoming more popular among engineers and are increasingly important in the modelling of water resources and R–R (Jeong & Kim 2005). AI techniques can effectively handle large amounts of noisy, non-linear, and dynamic data, particularly when the basic physical relationships are not properly known (Ateeq-ur-Rauf et al. 2018; Elkiran et al. 2019). As a result, they are good choices for time series modelling issues with a data-driven approach (Lallahem & Maina 2003). Black box simulations that are precise about the non-linear and non-stationary behaviour of the R–R process include support vector regression, adaptive neuro-fuzzy systems, artificial neural networks (ANNs), and others. These models have already been successfully applied in numerous papers (Dawson & Wilby 1998; Sajikumara & Thandaveswara 1999; Tokar & Johnson 1999; Antar et al. 2006; Gazzaz et al. 2012).
Data-driven models have been widely employed in the fields of sustainable water management and hydrology since they are recognised as being capable of simulating extremely non-linear and complicated hydrological processes (Shoaib et al. 2014). Low complexity and computing costs, high adaptability and transferability are all distinct benefits of neural network-based models (Wu et al. 2009). The long short-term memory (LSTM)-based techniques in the hydrological domains perform quite well, notably for simulations of rainfall and runoff (Yin et al. 2021; Schmidhuber 2015; Shen 2018; Xiang et al. 2020; Bai et al. 2021). LSTM, an improved recurrent neural network, can manage the challenge of long-distance dependency that recurrent neutral networks (RNNs) are unable to solve due to the vanishing and exploding gradients (Kratzert et al. 2018). This time series forecasting model is currently among the more widely used ones (Marcais & de-Dreuzy 2017; Karim et al. 2018; Yuan et al. 2018; Zhang et al. 2018; Bai et al. 2019; Fan et al. 2020).
The study addresses several significant gaps in existing research. Firstly, there were limited comparative studies between the SWAT model and AI techniques for simulating R–R. Additionally, there have been very limited efforts to employ advanced deep learning methods, such as LSTM, for hydrological issues. Moreover, no previous study has compared R–R modelling using the SWAT model, AI techniques, which include seven data-driven models and an LSTM deep learning model. Furthermore, the Peddavagu watershed has not previously been subjected to R–R modelling using SWAT and seven AI models. Therefore, the primary objective of this study was to compare the performance of three distinct modelling approaches: the SWAT model, seven data-driven models (support vector regression model, ANN, multilinear regression model, k-nearest neighbour (KNN) regression model, XGBoost regression model, random forest (RF) regression model), and the deep learning model (LSTM) for monthly streamflow modelling in the Peddavagu River Basin. By conducting this comparative analysis and introducing the application of LSTM in R–R modelling, the study contributes to addressing these research gaps, providing valuable insights into the performance of different modelling approaches and their suitability for the specific context of the Peddavagu River Basin.
STUDY AREA
MATERIALS AND METHODS
The study design involved comparing the performance of the SWAT model with seven AI techniques, including ANNs, KNN regression, linear regression, XGBoost regression, RF regression, support vector regression, and the LSTM deep learning model. The study was carried out by implementing the different models for monthly streamflow modelling in the Peddavagu River Basin. The calibration period encompassed the years 1990–2005, while the validation period covered the years 2006–2010. The models were calibrated and validated using observed streamflow data. The data analysis involved assessing the performance of the models based on evaluation metrics, namely R2 and NSE. These metrics were used to measure the correlation between simulated and observed streamflow values and the agreement between simulated and observed streamflow patterns, respectively. The results of the different models were compared to determine their effectiveness in R–R analysis.
Data
- (a)
SWAT
A physically based semi-distributed parameter model known as SWAT was developed to forecast sediment movement, nutrient transport, erosion from agricultural sources, and runoff in watersheds under various management options (Arnold & Allen 1996; Neitsch et al. 2004). In this research, the hydrologic model for the Peddavagu River Basin was set up using ARCSWAT, an interface in ARCGIS 10.7. Geo-environmental data with a high degree of spatial variability, such as soil data, LULC, and DEM, were used. There are 83 HRUs altogether among the 13 sub-watersheds that make up the Peddavagu River Basin. Figure 2 depicts the delineated watersheds and sub-watersheds that were derived using the SWAT model. The first 3 years of the simulation were chosen as the warm-up period, and the SWAT model was run from 1987 to 2010 throughout that time. The model was run with the maximum temperature, the solar radiation, the minimum temperature data, the relative humidity, the rainfall, and the wind speed datasets as weather inputs, along with DEM, LULC, and a soil map as shown in Figure 2. In this study, the model incorporates various techniques, including the Penman–Monteith approach for estimating potential evapotranspiration and the variable storage method for channel flow routing. These methods are employed to enhance the accuracy and reliability of the model's simulations. Using the curve number method developed by the Soil Conservation Service (SCS), the SWAT calculates the surface runoff volume from each HRU. The model was then run to simulate the surface runoff after this stage.
- (b)
Data-driven and LSTM models
Model performance evaluation
The evaluation of performance is a crucial step during the operation of any research project. It is essential to assess the effectiveness of each deployed model or procedure using one or more metrics to ensure reliable results from SWAT and data-driven models. There are various efficacy indicators available for evaluating model performance, with NSE and R2 being the most commonly used ones (Moriasi et al. 2007).
Nash–Sutcliffe Efficiency
Performance rating . | PBIAS (%) . | NSE . |
---|---|---|
Unsatisfactory | PBIAS > ±25 | NSE < 0.50 |
Satisfactory | ±15 < PBIAS < ±25 | 0.50 < NSE < 0.65 |
Good | ±10 < PBIAS < ±15 | 0.65 < NSE < 0.75 |
Very good | PBIAS < ±10 | 0.75 < NSE < 1.00 |
Performance rating . | PBIAS (%) . | NSE . |
---|---|---|
Unsatisfactory | PBIAS > ±25 | NSE < 0.50 |
Satisfactory | ±15 < PBIAS < ±25 | 0.50 < NSE < 0.65 |
Good | ±10 < PBIAS < ±15 | 0.65 < NSE < 0.75 |
Very good | PBIAS < ±10 | 0.75 < NSE < 1.00 |
Coefficient of determination
RESULTS AND DISCUSSIONS
- (a)
R–R modelling using SWAT
Although the SWAT model produces a variety of outcomes at the outlet of each sub-watershed, the focus of this study is on the streamflow at the main outlet of the total watershed because observable data for streamflow into the Peddavagu is available (site Bhatpalli). The Peddavagu watershed is divided into 13 sub-watersheds, and the main outlet is located at sub-watershed 12 (Figure 2). The total contributing area of the Peddavagu watershed is 3,150 km2. The period from 1987 to 2010 is divided into three parts, with the years 1987 through 1989 serving as the warm-up period, 1990 through 2005 as the calibration period, and 2006 through 2010 as the validation period.
Sl. No. . | Parameter Name . | Fitted Value . | Min value . | Max value . |
---|---|---|---|---|
1 | R__CN2.mgt | −0.1 | −0.25 | −0.05 |
2 | R__SOL_BD(..).sol | 0.259 | −0.5 | 0.6 |
3 | A__GW_DELAY.gw | 69 | 0 | 100 |
4 | V__ESCO.hru | 0.4753 | 0.01 | 1 |
5 | V__GW_REVAP.gw | 0.17 | 0 | 1 |
6 | V__CH_K2.rte | 36.25 | 25 | 100 |
7 | V__CH_N2.rte | 0.0685 | 0.05 | 0.1 |
8 | V__CANMX.hru | 61 | 50 | 150 |
9 | R__OV_N.hru | 0.79 | 0 | 1 |
10 | A__GWQMN.gw | 1,820 | 0 | 2,000 |
11 | R__SOL_K(..).sol | 0.272 | −0.8 | 0.8 |
12 | R__SURLAG.bsn | 46.5 | 0 | 150 |
13 | V__ALPHA_BNK.rte | 0.085 | 0 | 0.5 |
14 | A__RCHRG_DP.gw | 0.27 | 0 | 1 |
15 | R__SOL_AWC(..).sol | −0.062 | −0.1 | 0.1 |
16 | V__REVAPMN.gw | 325 | 0 | 500 |
17 | V__EPCO.hru | 0.69 | 0 | 1 |
18 | V__ALPHA_BF.gw | 0.925 | 0.7 | 1 |
Sl. No. . | Parameter Name . | Fitted Value . | Min value . | Max value . |
---|---|---|---|---|
1 | R__CN2.mgt | −0.1 | −0.25 | −0.05 |
2 | R__SOL_BD(..).sol | 0.259 | −0.5 | 0.6 |
3 | A__GW_DELAY.gw | 69 | 0 | 100 |
4 | V__ESCO.hru | 0.4753 | 0.01 | 1 |
5 | V__GW_REVAP.gw | 0.17 | 0 | 1 |
6 | V__CH_K2.rte | 36.25 | 25 | 100 |
7 | V__CH_N2.rte | 0.0685 | 0.05 | 0.1 |
8 | V__CANMX.hru | 61 | 50 | 150 |
9 | R__OV_N.hru | 0.79 | 0 | 1 |
10 | A__GWQMN.gw | 1,820 | 0 | 2,000 |
11 | R__SOL_K(..).sol | 0.272 | −0.8 | 0.8 |
12 | R__SURLAG.bsn | 46.5 | 0 | 150 |
13 | V__ALPHA_BNK.rte | 0.085 | 0 | 0.5 |
14 | A__RCHRG_DP.gw | 0.27 | 0 | 1 |
15 | R__SOL_AWC(..).sol | −0.062 | −0.1 | 0.1 |
16 | V__REVAPMN.gw | 325 | 0 | 500 |
17 | V__EPCO.hru | 0.69 | 0 | 1 |
18 | V__ALPHA_BF.gw | 0.925 | 0.7 | 1 |
Once the fitted parameters have been obtained, they are then rewritten in ArcSWAT for the final output values. The Peddavagu watershed's water balance is presented using the SWAT model over a 21-year period (1990–2010). The overall curve number for the Peddavagu watershed is 68.72. From 1990 to 2010, there was 1,160.7 mm of precipitation per year on average, which had a 224.13-mm runoff at the main outlet. There are 571.6 mm of evaporation and transpiration in this investigation. The depths of shallow aquifer percolation, recharge to the deep aquifer, and lateral flow are 316.74, 101.36, and 47.29 mm, respectively.
- (b)
Data-driven models
The same calibration and validation datasets are used to construct data-driven models in addition to the physically based SWAT model. Similar to the SWAT model, 16 years of data (1990–2005) are used for training (calibration), and an additional 5 years of data (2006–2010) are used for testing (validation).
R–R modelling using ANN
Studies on hydrology typically employ ANN as a forecasting technique. Engineers frequently utilise feed-forward, back-propagation (BP) network models in ANN. It has been proven that any engineering problem may be forecasted and simulated using the three-layer BP network model (Hornik 1988; ASCE 2000). The study incorporates monthly rainfall, the maximum temperature, the wind, the relative humidity, the minimum temperature, and the solar as input variables. The study's output is monthly observed discharge. The number of neurons and the type of structure were found by trial and error, which led the network to explore a two-layer structure with 25 neurons inserted in each layer. Training and testing are performed using monthly data from the Peddavagu watershed from 1990 to 2005 and 2006 to 2010.
R–R modelling using support vector regression model
The SVM principle, which is used for difficulty classification and non-linear regression, served as the foundation for SVR (Nourani et al. 2020). Contrary to many other black box forecasting techniques, SVR minimises the error between the actual and anticipated parameters rather than reducing operational hazards as a goal function. SVR is a category of AI model that is built via supervised learning. For SVM modelling, the choice of the kernel function is essential since the performance of the SVM model depends on the selection of the kernel parameters (Xu et al. 2012). In the current study, six independent variables are included: rainfall, the maximum temperature, the wind, the relative humidity, the minimum temperature, and the solar. The prediction of the discharge observed is analysed based on these six factors. Training and testing are performed using monthly data from the Peddavagu watershed from 1990 to 2005 and 2006 to 2010. In this case, the regressor is trained using a polynomial kernel of degree 3. To figure out the accuracy of the model prediction, R2, also known as the coefficient of determination, is determined.
R–R modelling using KNN model regression
KNN is an efficient process that predicts the numerical target by utilising a similarity metric. By assuming that similar objects exist nearby, the target is predicted using local interpolation of the targets connected to the training set's nearest neighbours. Rainfall, the maximum temperature, the wind, the relative humidity, the minimum temperature, and the solar radiation are the six independent variables in the present study. These six parameters are used to analyse the prediction of the observed discharge. Training and testing are performed using monthly data from the Peddavagu watershed from 1990 to 2005 and 2006 to 2010. Standardisation and normalisation of the data are carried out as a pre-processing step before fitting the regression model. The optimum number of neighbours has been found to be four through parameter adjustment. To figure out the accuracy of the model prediction, R2, also known as the coefficient of determination, is determined.
R–R modelling using multiple linear regression model
A statistical method for predicting continuous or real variables is linear regression. It illustrates a linear relationship between the variables and can thus show how the value of the dependent variable changes as the independent variable changes. In the case of multiple linear regression (MLR), the relationship between an independent variable and a number of dependent variables is predicted. Rainfall, the maximum temperature, the wind, the relative humidity, the minimum temperature, and the solar radiation are the six independent variables in the present study. These six parameters are used to analyse the prediction of the observed discharge. Training and testing are performed using monthly data from the Peddavagu watershed from 1990 to 2005 and 2006 to 2010. Before fitting the regression model, data standardisation and normalisation are performed as a pre-processing step. R2 is calculated to find the efficacy of the model's prediction.
R–R modelling using RF model regression
RF is a combined learning strategy for classification and regression (Liaw & Wiener 2002). The ensemble learning and decision tree frameworks are combined by the RF algorithm to produce a number of randomly selected decision trees from the data. The system then averages the data to produce a new result, which usually produces accurate classifications and forecasts. The six independent variables in the current study are maximum temperature, rainfall, relative humidity, wind, minimum temperature, and solar radiation. The analysis of the predicted discharge is done using these six parameters. Peddavagu watershed monthly data from 1990 to 2005 and 2006 to 2010 are used for training and testing. Standardisation and normalisation of the data are carried out as a pre-processing step before fitting the regression model. To determine the accuracy of the model prediction, R2, also known as the coefficient of determination, is evaluated.
R–R modelling using XGBoost regression model
Chen & Guestrin (2016) proposed the C + +-based language XGBoost (Nobre & Neves 2019). The model has had a lot of success since it first appeared and frequently places among the top models in several data mining competitions. XGBoost is a useful gradient boosting technique for regression predictive modelling. Using decision tree models that are introduced to them one at a time and fitted to correct the prediction errors provided by prior models, the gradient boosting family of ensemble machine learning approaches produces ensembles. Rainfall, the maximum temperature, the relative humidity, the minimum temperature, the wind, and the solar are the six independent variables in the present study. These six parameters are used to analyse the prediction of the observed discharge. Training and testing are performed using monthly data from the Peddavagu watershed from 1990 to 2005 and 2006 to 2010. The number of gradient-boosted trees or estimators used is 1,000. The learning rate chosen is 0.1, and the objective is set to minimise absolute error. R2 is calculated to find the efficacy of the model's prediction.
- (c)
Deep learning model
R–R modelling using the LSTM model
The inability of basic ANN to learn long-term dependencies is overcome by LSTM networks, a specific sort of neural network that can tackle problems involving historical data and sequential information (Xiang et al. 2020). The existing RNN cannot be utilised for learning since the error gradient disappears and it is impossible to store data in a layer close to the input layer when the RNN is learning. A model to address these problems is LSTM. LSTM is employed in a number of fields, such as speech recognition, language modelling, and translation, by integrating with other neural networks. It can also be used to ascertain how long-term time depends. In the present study, six independent variables are considered: rainfall, maximum temperature, wind, relative humidity, minimum temperature, and solar. The prediction of the discharge observed is analysed based on these six factors. Training and testing are performed using monthly data from the Peddavagu watershed from 1990 to 2005 and 2006 to 2010. 5 LSTM layers, each having two units, have been fitted with the training data with the objective of minimising mean squared error. The optimiser used is Adam, and the batch size is chosen as 8. The training has been accomplished in 400 epochs. R2 is calculated to find the efficacy of the model's prediction.
Comparison of SWAT, data-driven, and LSTM models
The comparison of the SWAT model with seven data-driven models and one deep learning model, including ANN, multilinear regression, support vector regression, XGBoost regression, RF regression, KNN, and LSTM, is described below. R2 values for the training or calibration dataset and the testing or validation dataset were used to compare the fitness of the models. The outcomes from several models are displayed in Table 3. The multilinear regression model produced the lowest R2 of 0.59 during the training period (calibration), whereas the RF and XGBoost models both had the highest R2 of 0.95. For the testing period (validation), the LSTM model generated the highest R2 of 0.88, while the RF and KNN models achieved the lowest R2 of 0.62. Regarding R2 values for both training (calibration) and testing (validation), the SWAT and seven artificial intelligences are appropriate (Santhi et al. 2001; Van-Liew et al. 2007).
Model . | R2 . | NSE . | ||
---|---|---|---|---|
Training/calibration data . | Testing/validation data . | Training/calibration data . | Testing/validation data . | |
SWAT | 0.63 | 0.75 | 0.62 | 0.71 |
ANN | 0.86 | 0.75 | 0.85 | 0.75 |
SVR | 0.77 | 0.73 | 0.74 | 0.71 |
KNN | 0.74 | 0.62 | 0.73 | 0.61 |
MLR | 0.59 | 0.62 | 0.59 | 0.59 |
RF | 0.95 | 0.66 | 0.94 | 0.65 |
XGBoost | 0.95 | 0.68 | 0.95 | 0.68 |
LSTM | 0.88 | 0.88 | 0.88 | 0.85 |
Model . | R2 . | NSE . | ||
---|---|---|---|---|
Training/calibration data . | Testing/validation data . | Training/calibration data . | Testing/validation data . | |
SWAT | 0.63 | 0.75 | 0.62 | 0.71 |
ANN | 0.86 | 0.75 | 0.85 | 0.75 |
SVR | 0.77 | 0.73 | 0.74 | 0.71 |
KNN | 0.74 | 0.62 | 0.73 | 0.61 |
MLR | 0.59 | 0.62 | 0.59 | 0.59 |
RF | 0.95 | 0.66 | 0.94 | 0.65 |
XGBoost | 0.95 | 0.68 | 0.95 | 0.68 |
LSTM | 0.88 | 0.88 | 0.88 | 0.85 |
The multilinear regression model achieved the lowest NSE of 0.59 for the training period (calibration), whereas the XGBoost model produced the highest NSE of 0.95 for the training period. For the testing period (validation), the multilinear regression model achieved the lowest NSE of 0.59, while the LSTM model achieved the highest NSE of 0.85. The findings of the model's LSTM and ANN are very good with respect to NSE values, and it can be said that it was very efficient at simulating the monthly flow in the Peddavagu watershed, according to Moriasi et al.'s 2007 study on NSE.
The main findings in the SWAT model are that it is unable to match the peak observed discharge in the current research region, although small observed peaks are matched with simulated discharge in the SWAT model. R2 was 0.63 and 0.75 during the calibration and validation periods, respectively. In the calibration and validation periods, the NSE was 0.62 and 0.71, respectively. SWAT performed well in the validation period compared with calibration. When compared with other literature (Thavhana et al. 2018; Zakizadeh et al. 2020), the present study area SWAT model performed well. In comparison to the multilinear regression model, the SWAT model has performed well during the calibration and validation periods. The SWAT model's comparatively poor performance when compared to AI models (with the exception of MLP) might be the result of the most sensitive parameters' identification (Cibin et al. 2010). When compared to SWAT and other models in the current study during the calibration and validation periods, the LSTM model performed very well in terms of finding peaks of observed discharge and lows of observed discharge. As a result, compared to the LSTM model, the SWAT model performs rather poorly when simulating streamflow. The LSTM model appears to achieve the best overall result when compared to all other models when it comes to having the NSE value be very good for both the training period (calibration) and testing period (validation) to simulate R–R in the Peddavagu watershed. As a result, the Peddavagu watershed can efficiently simulate R–R using the LSTM model.
CONCLUSIONS
The analysis of R–R is a crucial and essential step in managing and planning for water resources. Traditionally, hydrologic models have been utilised for R–R analysis, considering the complex interactions within the water cycle. In recent years, the field of hydrology has increasingly integrated AI techniques, which have shown promising results, sometimes even outperforming conventional hydrological models in simulating runoff. The SWAT model and AI models were employed in this study's runoff analysis of a novel concept in the targeted region. In order to compare the study's findings and simulate the runoff around the river basin, it was carried out at the Peddavagu River Basin in India. Generally, simulations were run between 1990 and 2010. For the calibration period, which runs from 1990 to 2005, the validation period is from 2006 to 2010. In the current study, it was found that the R2 and NSE correlations for simulating R–R for all eight models were satisfactory. It can be said that the model's LSTM and ANN outcomes are excellent in terms of NSE values and that it was highly efficient in simulating the monthly flow. Overall, when compared to all other models, the deep learning model, the LSTM model, appears to have the best performance for simulating R–R in the Peddavagu watershed throughout both the training phase (calibration) (R2 is 0.88 and NSE is 0.88) and the testing period (validation) (R2 is 0.88 and NSE is 0.85). The calibrated and validated models from this study can be valuable tools for decision-makers to plan for sustainable water management in the Peddavagu watershed.
ACKNOWLEDGEMENTS
The authors would like to thank the editor and reviewers for their valuable comments and suggestions, which helped improve the quality of this paper. The authors would like to thank the Central Water Commission of India for providing stage discharge data (http://www.cwc.gov.in/). The authors would like to thank the US Geological Survey (USGS) for making the satellite data available (https://earthexplorer.usgs.gov/). Thank you to the Indian Meteorological Department for providing rainfall and temperature data on their website at https://mausam.imd.gov.in/. We would also especially like to thank NASA Power for providing the wind speed, relative humidity, and solar data on their website at https://power.larc.nasa.gov/data-access-viewer/.
FUNDING
There was no funding for this project.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.