## ABSTRACT

The present study aims to evaluate the potentiality of Bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Networks (CNNs), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LGBM), and Random Forest (RF) for predicting daily inflows to the Sri Ram Sagar Project (SRSP), Telangana, India. Inputs to the model are rainfall, evaporation, time lag inflows, and climate indices. Seven combinations (S1–S7) of inputs were made. Fifteen and a half years of data were considered, out of which 11 years were used for training. Hyperparameter tuning is performed with the Tree-Structured Parzen Estimator. The performance of the algorithms is assessed using Kling–Gupta efficiency (KGE). Results indicate that Bi-LSTM with combination S7 performed better than others, as evident from KGE values of 0.92 and 0.87 during the training and testing, respectively. Furthermore, the Stacking Ensemble Mechanism (SEM) has also been employed to ascertain its efficacy over other chosen algorithms, resulting in KGE values of 0.94 and 0.89 during training and testing. It has also been able to simulate peak inflow events satisfactorily. Thus, SEM is a better alternative for reservoir inflow predictions.

## HIGHLIGHTS

The Stacking Ensemble mechanism (SEM) has performed better than other ML algorithms (Bi-LSTM, CNN, RF, XGBoost, LGBM) in simulating the daily inflow to the Sri Ram Sagar Project, with KGE values of 0.94 and 0.89 in the training and testing.

The optimal combination of climate indices attributing to reservoir inflow was found using Shannon's entropy weighted measure.

SEM has performed satisfactorily in replicating the peak inflow events.

## INTRODUCTION

Reservoirs are essential for water resource management and cater to flood control, irrigation, hydropower generation, domestic and industrial purposes, etc. In this regard, inflow is a crucial factor in the optimal reservoir operation. Also, extreme rainfall events and higher evaporation rates due to climate change impact the inflow pattern. It necessitates accurate prediction of inflows with consideration of climate indices. Over the past decade, various researchers have proposed physical, conceptual, empirical, and machine learning (ML) models for simulating reservoir inflows (Alquraish *et al.* 2021). However, due to the extensive data requirements and underlying assumptions, physical and conceptual models may not mimic the inflow precisely. Also, the limitation of empirical-based algorithms considering mainly linearity in the time series is another challenge in forecasting studies (Wang *et al.* 2022).

The capability of addressing non-linearity, underlying patterns in the data, and advantage over conventional statistical algorithms make ML more appropriate for prediction (Jothiprakash & Magar 2012; Rozos *et al.* 2022; Xu *et al.* 2022a). However, the significant challenge is identifying a suitable algorithm that best fits a chosen situation. In this context, researchers are making efforts to explore multiple algorithms based on their applicability.

Random Forest (RF) is one of those techniques used in inflow prediction due to its simple mathematical background. Yang *et al.* (2017) applied RF, Support Vector Regression, and Artificial Neural Networks for inflow forecasting of Danjiangkou Reservoir, China, and Clair Engle Lake Reservoir, USA. It is found that climate indices influence the inflow. Similar work has been reported by Kim *et al.* (2019) using RF, Artificial Neural Networks, and Adaptive Neural-Based Fuzzy Inference System for several dams in South Korea. The inputs for their study included various climatic indicators. RF is found to be the best for two case studies. RF's limitations include handling asymmetric and higher dimensional data (Pes 2021). This prompted researchers to explore boosting algorithms, which have high computational efficiency and can handle large datasets (Pan *et al.* 2022).

Mehraein *et al.* (2022) predicted the monthly streamflow for three stations in Turkey using eXtreme Gradient Boosting (XGBoost), RF, Artificial Neural Networks, and nonlinear regression. Ibrahim *et al.* (2022) applied XGBoost, Adaptive Neuro Fuzzy Interference System, Multilayer Perceptron, and Support Vector Regression in forecasting inflow to Bukit Tabur Dam, Malaysia. Hyperparameter tuning was conducted with a grid search method. In both case studies, XGBoost is the best, echoing the outcome of other case studies (Ni *et al.* 2020; Dong *et al.* 2023). However, the challenge of boosting algorithms lies in effectively capturing temporal patterns in sequential data (Kiangala & Wang 2021).

It leads to the usage of Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks, which can handle nonlinear relationships and spatial-temporal patterns (Najafabadi *et al.* 2015; Semmelmann *et al.* 2022). Apaydin *et al.* (2020) employed three algorithms for inflow prediction of the Ermenek hydroelectric dam, Turkey, and LSTM is the best. Kim *et al.* (2021) employed LSTM and Gated Recurrent Unit to Andong and Imha Dam, South Korea. Employed algorithms performed well except for peak events. Khorram & Jehbez (2023) utilized LSTM to analyze the medium and high inflows of the Doroodzan reservoir in Iran. LSTM was performing better than Support Vector Machine (SVM) and Adaptive Neuro Fuzzy Interference System. Similar studies on LSTM for inflow prediction were reported by researchers (Latif *et al.* 2021; Skariah & Suriyakala 2022). LSTM was improvised further into bidirectional, i.e., Bi-LSTM, where sequential processing can be performed backward and forward. This aspect improves the ability to produce reliable predictions (Wegayehu & Muluneh 2022). The application of Bi-LSTM in daily reservoir inflow prediction is limited.

CNN has convolution and pooling layers in its architecture, enabling it to extract vital information from the input promptly. Shu *et al.* (2021) employed CNN to predict streamflow in three reservoirs in China. It was found that CNN performed exceedingly well as compared to artificial neural networks and extreme learning machines. Xu *et al.* (2022b) proposed an improved CNN with a Leaky ReLU activation function for predicting inflow in the Zhexi Reservoir, China. The model retained its accuracy for moderate inflows. Several other investigations have also demonstrated the effectiveness of CNN in simulating inflow (Li *et al.* 2022; Song 2022).

It has been observed that an ML algorithm performs best only in some scenarios. Researchers emphasized the mechanism of ensemble–learning, which is the combination of ML algorithms to improve prediction accuracy. Kermani *et al.* (2021) reviewed a variety of ensemble architectures of ML algorithms based on Boosting, Bagging, and Stacking approaches for modeling river flow and streamflow. Zhou *et al.* (2018) ensembled Artificial Neural Networks and regression to enhance the monthly prediction accuracy of Jinsha River, China. It performed better than the weighted average and simple mean. Huang *et al.* (2022) examined the efficacy of different Deep Learning and ML for forecasting the short-, medium-, and long-term inflows of Shimen Reservoir, Taiwan, using 14 years of rainfall data. They employed Ensemble Means and Switched Prediction approaches. They concluded that integrating forecasting techniques was more effective and stable than ML and deep learning algorithms. Lu *et al.* (2023) have employed the Stacking Ensemble Mechanism (SEM) with the integration of XGBoost, RF, Adaptive Boosting, and attention mechanism as the meta-model for simulating the daily inflow of the Fuchu reservoir. Results showed better performance of SEM as compared to other algorithms.

The climate change significantly impacts the streamflow pattern. Many large-scale climate indices as predictors have lately been introduced into hydrological modeling (Rajagopalan *et al.* 2019). These indices can identify extreme events based on long-term climate data records (Moradi *et al.* 2020). Islam & Imteaz (2020) considered two climate indicators, the Indian Ocean Dipole and El Nino Southern Oscillation, as predictors and trained the Autoregressive Integrated Moving Average with exogenous input in rainfall forecasting in Western Australia. Rajesh *et al.* (2023) have considered the North Atlantic Oscillation (NAO), Southern Oscillation Index (SOI), and Arctic Oscillation (AO) as input features to feed LSTM, RF, K-Nearest Neighbor, and Gradient Boosting Regressor for short-range forecasting of inflow to Bhadra Reservoir, India.

According to the comprehensive review, the following salient research gaps were observed:

Limited research has been conducted on ensemble mechanisms of ML algorithms based on stacking for predicting reservoir inflow.

A study identifying the significant climate indices associated with a reservoir inflow has yet to be conducted.

Light Gradient Boosting Machine (LGBM), with quicker training speed and minimal memory consumption, has not been extensively studied in the context of inflow prediction.

The mentioned research works motivated the implementation of ML algorithms to predict daily inflow to the Sri Ram Sagar Project (SRSP), Telangana, India. Water requirements from this project have increased manifold over the past few years due to substantial agricultural activities. This necessitates precise estimation of inflows to the project, which will help water managers with effective planning.

Thus, the envisaged objectives that can be studied for SRSP are summarized as follows:

- (1)
To identify input variables, including climate indices suitable for reservoir inflow prediction

- (2)
To explore the applicability of ML algorithms, namely, XGBoost, LGBM, Bi-LSTM, CNN, and RF, for simulating inflows

- (3)
To develop SEM with a suitable ML algorithm based on objective (2).

The remaining manuscript is organized as follows: The case study and data details are shown in Section 2. The framework for methods is detailed in Section 3. In Section 4, results and discussion are presented. Section 5 comprises the summary and conclusions.

## STUDY AREA AND DATA COLLECTION

*et al.*2021). Nirmal, Jagtial, Peddapalli, Jayashankar, Karimnagar, Mahabubabad, Hanumakonda, and Warangal districts in Telangana receive water for irrigation from the project SRSP and partly drinking water. Figure 1 represents an index map of the project with the selected rain gauge stations, namely, Betmogrra, Bhainsa, Sriram Sagar, Yelli, Nanded, Nizam Sagar, and Degloor, located upstream of the project.

The daily reservoir inflow data of SRSP from August 2007 to December 2022 (5,362 inflow records) has been collected from the Department of Irrigation and Command Area Development, Telangana, India. Water mainly flows into the reservoir during the monsoon season (June–September). The rest of the year experiences minimal or no inflow.

Daily meteorological data (rainfall, *T*_{max}, and *T*_{min}) has been obtained from the India Meteorological Department, Pune, India. The Thiessen polygon approach is used to compute the mean areal rainfall over the catchment from seven rain gauge stations (Hwang *et al.* 2020). Evaporation is calculated using Ivanov, an empirical technique based on temperature and relative humidity (Muhammad *et al.* 2019). The minimum and maximum rainfall values are 0 and 229.05 mm, whereas these are 0.76 and 19.31 mm/day for evaporation; 0 and 11,846 m^{3}/s for the inflow. The mean values of rainfall, evaporation, and inflow, respectively, are 2.06 mm, 7.42 mm/day, and 175.02 m^{3}/s.

The present work studies the implication of climate phenomenon indices on reservoir inflow prediction. Information about these indices was downloaded from the respective home pages and used for modeling (not computed by the authors). Details are as follows:

The daily values of gridded ocean-atmospheric circulation indices, namely, Sea Level Pressure (SLP), Meridional Wind (MW), Greenland Blocking Index (GBI), Geopotential Height (GH), and Zonal Wind (ZW), were downloaded from the National Oceanic and Atmospheric Administration of the United States (NOAA) (https://psl.noaa.gov/data/timeseries/daily/) for the seven rain gauge stations and combined into a single series using the Thiessen polygon formula.

Similarly, teleconnection patterns, namely, Western Pacific Oscillation (WPA), Pacific North American Index (PNA), NAO, and AO, were downloaded from NOAA.

NINO1.2, NINO 3, NINO 3.4, and NINO 4 have been downloaded from the World Meteorological Organization (https://climexp.knmi.nl/start.cgi).

SOI is downloaded from https://www.longpaddock.qld.gov.au/soi/soi-data-files/

Dipole Mode Index (DMI) has been downloaded from Japan Agency for Marine-Earth Science and Technology (https://www.jamstec.go.jp/aplinfo/sintexf/e/iod/dipole_mode_index.html).

The climate indices data are globally averaged.

## METHODS

The present section provides brief information about techniques employed: XGBoost, LGBM, Bi-LSTM, CNN, RF, and SEM for simulating inflow.

### XGBoost

*et al.*2021). Figure 2 describes the general structure of the XGBoost. Following are the various steps involved:

Initializing the model with baseline prediction such as mean, median, or mode of the target variable.

- Residuals are calculated by subtracting the target value from the baseline prediction (Equation (1)).where is the error for the
*i*th sample, is the actual value of the target variable, and is the baseline prediction. - DT discovers the most prominent feature and attempts to predict the residuals. The training process aims to minimize the weighted square difference between the residuals of the actual values and the projected values. The gradient of the loss function determines the weights. The weighted squared difference is computed using Equation (2).where is the weight of the
*i*th sample, is prediction of the*t*th DT for*i*th sample, and denotes the loss for the*i*th sample. - The predictions are updated by adding the predicted residual to the baseline prediction (Equation (3)).
- Multiple DTs are constructed in a similar approach using Equation (4), and predictions are updated until the desired accuracy is achieved.where is the final prediction value, and
*T*represents the number of DTs.

XGBoost consists of several parameters, such as n_estimators, max_depth, reg_alpha, reg_lambda, min_child_weight, gamma, max_leaf_nodes, max_delta_step, sub_sample, colsample_bylevel, and scale_pos_weight. Out of these, the first six are considered hyperparameters. The optimal hyperparameter values have been discussed in the results and discussion section.

### LGBM

LGBM works on the ensemble of DTs in a Gradient Boosting framework. However, unlike conventional DT, the dataset of the feature column is split into bins, making the algorithm run faster. This technique is called histogram or bin splitting (Chakraborty *et al.* 2020). Another unique feature of the LGBM that enhances its efficiency and avoids overfitting is the gradient-based side sampling. At every split, it samples a subset of the features established on the gradient of the loss function.

Parameters of LGBM are max_depth, n_estimators, learning_rate, num_leaves, max_bin, feature fraction, bagging_fraction, min_data_in_leaf, min_sum_hessian_in_leaf, bagging_freq, lambda_l1, lambda_l2, min_gain_to_split, and min_data_in_bin. Among them, the first seven are considered hyperparameters. Detailed information about hyperparameters is presented in the results and discussion section.

### Bi-LSTM

LSTM replaces the implicit function of Recurrent Neural Networks with a memory unit consisting of input, forget, update, and output gates. The forget gate determines how much past information to forget. The input gate decides how much new input to add to the cell state. The update phase calculates new values by combining the outputs of the input and forget gates. The output gate determines the new hidden state premised on the updated cell state. A detailed description of the working of LSTM has been provided by Liang *et al.* (2021).

*et al.*2022). Figure 4 presents the architecture of Bi-LSTM. It incorporates two LSTM neural networks to perform the forward and backward calculations of the hidden vector and , respectively (Equations (5) and (6)). Both LSTMs learn from the sequence data, but one looks backward, and the other forwards. The results from the two LSTMs are then averaged to provide a forecast, (Equation (7)).

are the weights for input to the forward layer and input to the backward layer. Here, denote the weights for hidden-to-hidden layers, are the weights for forward to the output layer and backward to the output layer, respectively.

The parameters of Bi-LSTM are lstm_units, dropout_rate, learning_rate, epochs, and activation function. All of these parameters are considered for hyperparameters. Detailed information about these hyperparameters is presented in the results and discussion section.

### CNN

*et al.*2021). The convolution layer is the fundamental unit of a CNN model, and the kernels that define its parameters are its most essential components. Each kernel assembles an input matrix of the training sample to compute a dot product. The resulting matrix is subsequently sent through an activation function to obtain the convolution features during the feature extraction step (Equation (8)). The optimal values of the hyperparameters have been discussed in the results and discussion section.where is the weight matrix of the

*i*th kernel, is the bias of

*i*th kernel, is the convolution layer kernels, and

*f*is the activation function.

*W*,

*b*, fully connected vector

*p*, and the projected value

*Q*by establishing the relation shown in Equation (10). The steps of the CNN are presented in Figure 5.

Filters, strides, kernel_size, pool_size, hidden_nodes, activation function, epochs, number of layers, learning_rate, and batch_size are all parameters of CNN. The first seven of them are termed hyperparameters. Detailed information on the optimal values of these hyperparameters is given in the results and discussion section.

### RF

*et al.*2021). The RF algorithm works in the following steps:

The entire dataset of size

*T*will be randomly divided into subsample*N*using the bootstrap sampling technique. DT is constructed on bootstrapped data.DT will recursively split the data into smaller subsets according to a threshold value (typically a point that minimizes the Mean Square Error) associated with the feature variable. The contribution of the feature to the reduction in prediction error can be determined by estimating the decrease in Mean Square Error. The decline in the Mean Square Error value is directly proportional to RFI.

RF predicts a new data point for each DT by passing data through it. The optimal number of trees is determined through hyperparameter tuning. The outputs of all the DTs are averaged to obtain the final prediction. Figure 6 outlines the sequential operation of the RF.

The RF algorithm comprises several parameters such as n_estimators, max_sample, max_features, min_sample_split, max_depth, max_leaf_nodes, and min_samples_leaf, out of which the first four are considered hyperparameters. The optimal values of the hyperparameters have been discussed in the results and discussion section.

### SEM

*et al.*2022). In addition, the original features of the dataset are included, resulting in a new dataset. In SEM, the choice of meta-model is crucial. The best-performing ML algorithm has been chosen as the meta-model for training the new dataset. Hyperparameter optimization of the meta-model is performed to determine the optimal architecture of the algorithm. Figure 7 shows the workflow of an ensemble of ML algorithms using SEM. The stepwise workflow of SEM is as follows (Lu

*et al.*2023):

Step 1: Training of ML algorithms with hyperparameter tuning

*p*are the number of training and testing samples, and

*n*is the number of features. Let represent the

*i*th ML algorithm with hyperparameters (Equation (11))

*i*th algorithm, each algorithm is trained on the training data (Equation (12)).

Step 2: Development of a meta-model

Step 3: Final predictions using the meta-model

### Other related supporting approaches employed

The Bayesian sampling technique Tree-Structured Parzen Estimator (Nguyen *et al.* 2020; Sipper 2022) is used for hyperparameter tuning.

Spearman correlation coefficient (SCC) is a statistical metric that defines the association between two variables (Zhou *et al.* 2023).

Forward feature selection (FFS) is a dimensionality reduction technique based on selecting the most significant features of the dataset. At each stage, the algorithm adds one feature and attempts to train the model and calculate the efficiency of the model using root mean square error (Amiri *et al.* 2011). The features in this method are ranked using F-values. It is the ratio of the root mean square error with and without the feature of the model. It, thus, indicates how much the feature enhances model performance.

The entropy weighted method (EWM) is a method to assign weights to various features based on the decision matrix (Lee & Chang 2018). Mechanism of weight computations involves formulation of decision matrix, normalization, and entropy estimation.

*b*, and the correlation between observed and predicted data (Mathevet

*et al.*2023). KGE ranges from −∞ to 1, with 1 representing maximum efficacy (Equation (20)).where and are the standard deviations of the observed and predicted data, respectively. and are the means of the simulated and observed values.

*R*shows the Pearson correlation coefficient between the observed and predicted data.

The best climate indices attributed to reservoir inflow have been selected in the first phase.

Seven input scenarios are studied in the second phase to train the ML algorithms. These scenarios also incorporate the selected climate indices.

In the third phase, SEM is implemented, and a comparative assessment has been performed with other ML algorithms.

Comparative evaluation of ML algorithms simulating peak flow is performed in the fourth phase.

## RESULTS AND DISCUSSION

This section divides the results into four phases (Figure 8).

### Selection of best climate indices influencing reservoir inflow discharges

Fifteen climate indices are considered for this study based on previous research works (Moradi *et al.* 2020). RFI, FFS, and SCC methods determine the most significant index corresponding to daily reservoir inflow. The relative importance of the three feature selection methods has been calculated using EWM. The highest weight of 0.5 is assigned to the RFI, followed by SCC (0.26) and FFS (0.24). The normalized weighted scores are estimated for each index and ranked accordingly. Few studies (Rucker *et al.* 2015) assert that median divides are robust for identifying critical variables and are employed in the present study.

Table 1 shows the results of feature selection methods employed to find the most significant climate indices. Rankings of the index for each of the techniques have also been presented. The most significant climate indices attributed to reservoir inflow have been highlighted in bold.

Climate indices . | RFI . | SCC . | FFS . | EWM . | ||||
---|---|---|---|---|---|---|---|---|

Score . | Rank . | Absolute coefficient . | Rank . | F-values . | Rank . | Normalized weighted score . | Final rank . | |

AO | 0.000 | 13 | 0.002 | 14 | 5.800 | 13 | 0.002 | 14 |

GBI | 0.201 | 3 | 0.321 | 1 | 176.340 | 1 | 0.194 | 1 |

NAO | 0.016 | 10 | 0.004 | 13 | 4.290 | 14 | 0.012 | 11 |

PNA | 0.001 | 13 | 0.002 | 14 | 1.360 | 15 | 0.001 | 15 |

DMI | 0.028 | 7 | 0.237 | 6 | 145.870 | 4 | 0.075 | 6 |

SOI | 0.001 | 13 | 0.193 | 9 | 90.010 | 7 | 0.042 | 8 |

WPA | 0.007 | 11 | 0.025 | 11 | 11.280 | 11 | 0.009 | 12 |

GH | 0.043 | 5 | 0.245 | 4 | 148.230 | 3 | 0.085 | 5 |

MW | 0.020 | 9 | 0.013 | 12 | 11.180 | 12 | 0.016 | 10 |

SLP | 0.039 | 6 | 0.038 | 10 | 32.970 | 10 | 0.034 | 9 |

ZW | 0.004 | 12 | 0.206 | 8 | 99.050 | 6 | 0.046 | 7 |

NINO 1.2 | 0.022 | 8 | 0.290 | 2 | 67.710 | 9 | 0.06 | 13 |

NINO 3 | 0.241 | 2 | 0.235 | 7 | 73.660 | 8 | 0.188 | 2 |

NINO 3.4 | 0.124 | 4 | 0.241 | 5 | 115.540 | 5 | 0.125 | 4 |

NINO 4 | 0.252 | 1 | 0.260 | 3 | 158.020 | 2 | 0.183 | 3 |

Climate indices . | RFI . | SCC . | FFS . | EWM . | ||||
---|---|---|---|---|---|---|---|---|

Score . | Rank . | Absolute coefficient . | Rank . | F-values . | Rank . | Normalized weighted score . | Final rank . | |

AO | 0.000 | 13 | 0.002 | 14 | 5.800 | 13 | 0.002 | 14 |

GBI | 0.201 | 3 | 0.321 | 1 | 176.340 | 1 | 0.194 | 1 |

NAO | 0.016 | 10 | 0.004 | 13 | 4.290 | 14 | 0.012 | 11 |

PNA | 0.001 | 13 | 0.002 | 14 | 1.360 | 15 | 0.001 | 15 |

DMI | 0.028 | 7 | 0.237 | 6 | 145.870 | 4 | 0.075 | 6 |

SOI | 0.001 | 13 | 0.193 | 9 | 90.010 | 7 | 0.042 | 8 |

WPA | 0.007 | 11 | 0.025 | 11 | 11.280 | 11 | 0.009 | 12 |

GH | 0.043 | 5 | 0.245 | 4 | 148.230 | 3 | 0.085 | 5 |

MW | 0.020 | 9 | 0.013 | 12 | 11.180 | 12 | 0.016 | 10 |

SLP | 0.039 | 6 | 0.038 | 10 | 32.970 | 10 | 0.034 | 9 |

ZW | 0.004 | 12 | 0.206 | 8 | 99.050 | 6 | 0.046 | 7 |

NINO 1.2 | 0.022 | 8 | 0.290 | 2 | 67.710 | 9 | 0.06 | 13 |

NINO 3 | 0.241 | 2 | 0.235 | 7 | 73.660 | 8 | 0.188 | 2 |

NINO 3.4 | 0.124 | 4 | 0.241 | 5 | 115.540 | 5 | 0.125 | 4 |

NINO 4 | 0.252 | 1 | 0.260 | 3 | 158.020 | 2 | 0.183 | 3 |

*Note:* The most significant climate indices attributing to reservoir inflow have been highlighted in bold.

As observed from Table 1, no uniformity is found in the rankings across RFI, SCC, and FFS, which may be due to different philosophies of scoring patterns. According to FFS and SCC, GBI is the most significant index, whereas NINO 4 is the best predictor based on RFI. The other indices based on teleconnection patterns (AO, NAO, PNA, MW, and WPA) have exhibited the poorest performance across all three methods. This necessitates the usage of EWM.

The normalized weighted scores of the indices by EWM suggest that GBI is the most sensitive climatic indicator of reservoir inflow, followed by NINO 3 and NINO 4. The results show six such indices have weighted scores more significant than the median value of 0.06. Accordingly, the predictors (with respective weights), GBI (0.194), NINO3 (0.188), NINO4 (0.183), NINO3.4 (0.125), GH (0.085), and DMI (0.075) have been chosen as inputs for ML algorithms to simulate inflow and construct cause–effect models.

### Selection of input scenarios and training of ML algorithms

The 70:30 splitting ratio has been chosen for training and testing (Khosravi *et al.* 2018). The training dataset contains 11 years (1 August 2007 to 31 July 2018), whereas testing is for 4½ years (1 August 2018 to 31 December 2022). In addition, standardization is performed wherever required. Table 2 provides a statistical description of the training (3,751 datasets) and testing (1,611 datasets). Information in the table includes mean, standard deviation, minimum, and maximum values of each feature.

Variables Statistical Parameters . | P(t)
. | E(t)
. | Q(t)
. | GBI . | NINO3 . | NINO 3.4 . | NINO 4 . | GH . | DMI . |
---|---|---|---|---|---|---|---|---|---|

(mm) . | (mm/day) . | (m^{3}/s)
. | (m) . | (°C) . | (°C) . | (°C) . | (m) . | (°C) . | |

(1) . | (2) . | (3) . | (4) . | (5) . | (6) . | (7) . | (8) . | (9) . | |

Mean | 2.39 (2.30) | 7.60 (7.26) | 129.88 (267.60) | 5,320.14 (5,329.54) | 0.24 (−0.44) | 0.20 (−0.44) | 0.26 (−0.17) | 80.39 (80.25) | 0 (0.14) |

Standard deviation | 5.93 (9.60) | 4.83 (5.03) | 549.42 (1,086.74) | 179.68 (180.74) | 0.91 (0.49) | 0.91 (0.55) | 0.70 (0.64) | 33.99 (34.76) | 0.51 (0.79) |

Minimum | 0 (0) | 0.79 (0.76) | 0 (0) | 4,853.03 (4,853.01) | −1.70 (−1.45) | −1.81 (−1.72) | −1.56 (−1.21) | −4.25 (−29.50) | −1.54 (−1.37) |

Maximum | 92.80 (229.05) | 19.31 (18.80) | 11,846 (11,786.41) | 5,707.53 (5,703.43) | 3.07 (0.74) | 3.29 (0.90) | 2.09 (1.32) | 167.75 (164.00) | 1.43 (3.09) |

Variables Statistical Parameters . | P(t)
. | E(t)
. | Q(t)
. | GBI . | NINO3 . | NINO 3.4 . | NINO 4 . | GH . | DMI . |
---|---|---|---|---|---|---|---|---|---|

(mm) . | (mm/day) . | (m^{3}/s)
. | (m) . | (°C) . | (°C) . | (°C) . | (m) . | (°C) . | |

(1) . | (2) . | (3) . | (4) . | (5) . | (6) . | (7) . | (8) . | (9) . | |

Mean | 2.39 (2.30) | 7.60 (7.26) | 129.88 (267.60) | 5,320.14 (5,329.54) | 0.24 (−0.44) | 0.20 (−0.44) | 0.26 (−0.17) | 80.39 (80.25) | 0 (0.14) |

Standard deviation | 5.93 (9.60) | 4.83 (5.03) | 549.42 (1,086.74) | 179.68 (180.74) | 0.91 (0.49) | 0.91 (0.55) | 0.70 (0.64) | 33.99 (34.76) | 0.51 (0.79) |

Minimum | 0 (0) | 0.79 (0.76) | 0 (0) | 4,853.03 (4,853.01) | −1.70 (−1.45) | −1.81 (−1.72) | −1.56 (−1.21) | −4.25 (−29.50) | −1.54 (−1.37) |

Maximum | 92.80 (229.05) | 19.31 (18.80) | 11,846 (11,786.41) | 5,707.53 (5,703.43) | 3.07 (0.74) | 3.29 (0.90) | 2.09 (1.32) | 167.75 (164.00) | 1.43 (3.09) |

*Note:* Values in parentheses represent values corresponding to the testing dataset.

Training and testing datasets are statistically comparable. Rainfall (*P*(*t*)) in the catchment area during the training period has a mean of 2.39 mm, which is similar to that of the testing period (2.30 mm). The mean daily evaporation rate (*E*(*t*)) in the training period (7.60 mm/day) is relatively higher than that of testing (7.26 mm/day). Inflow (*Q*(*t*)) in the training phase has a slightly higher maximum value (11,846 m³/s) and lower mean value (129.88 m³/s) as compared to inflow in the testing phase, whose maximum value observed was 11,786.41 m³/s, and the mean was 267.60 m³/s. The maximum GBI values (5,707.53 and 5,707.43 m) for the training and testing phases are similar. The comparable mean values (5,320.14 m in the training dataset and 5,329.54 m in the testing dataset) suggest that average GBI stays consistent across both datasets. The standard deviation of GH in training is 33.99 m; it is marginally more significant at 34.76 m in testing. The standard deviation of the DMI in testing is relatively at a higher value, 0.79, compared to that of training (0.51), indicating higher variability of DMI in the testing dataset. The standard deviation of NINO 4 is similar for training (0.70 °C) and testing phase (0.64 °C). The NINO indices during training exhibit comparatively higher values than those during testing, as evidenced by their minimum, maximum, and mean values.

*Q*(

*t*− 12) has the lowest correlation of 0.29 with the current inflow.

*Q*(*t* − 1) and *Q*(*t* − 2) have correlation coefficients of 0.87 and 0.72. These are chosen as the input variables to construct time series models as values exceed the threshold of 0.7 (Ratner 2009). Lagged days of rainfall and evaporation are also considered up to 2 days to maintain consistency.

*Q*(

*t*) is the output. Scenarios S1 and S2 are based on time series models in which only lagging inflow values are used to train various ML algorithms. Scenarios S3–S6 contain cause-and-effect variables such as

*P*(

*t*),

*E*(

*t*) and climate indices. In scenario S7, the inputs of all preceding scenarios are combined. A generalized framework representing the input–output of the scenarios is presented (Equation (21)):where

*Q*(

*t*−

*n*) is the inflow of the reservoir lagged by

*n*days (m

^{3}/s),

*P*(

*t*) is the present-day rainfall (mm),

*P*(

*t*−

*n*) is the rainfall lagged by

*n*days (mm),

*E*(

*t*) is the present-day evaporation (mm/day),

*E*(

*t*−

*n*) is the evaporation lagged by

*n*days (mm/day).

Scenarios . | Model inputs . |
---|---|

S1 | Q(t − 2), Q(t − 1) |

S2 | Q(t − 1) |

S3 | P(t), P(t − 1), P(t − 2), E(t), E(t − 1), E(t − 2) |

S4 | P(t), E(t) |

S5 | NINO 3, NINO 3.4, NINO 4, GBI, DMI, GH |

S6 | NINO 3, NINO 3.4, NINO 4, GBI, DMI, GH, P(t), E(t) |

S7 | NINO 3, NINO 3.4, NINO 4, GBI, DMI, GH, P(t), P(t − 1), P(t − 2), E(t), E(t − 1), E(t − 2), Q(t − 1), Q(t − 2) |

Scenarios . | Model inputs . |
---|---|

S1 | Q(t − 2), Q(t − 1) |

S2 | Q(t − 1) |

S3 | P(t), P(t − 1), P(t − 2), E(t), E(t − 1), E(t − 2) |

S4 | P(t), E(t) |

S5 | NINO 3, NINO 3.4, NINO 4, GBI, DMI, GH |

S6 | NINO 3, NINO 3.4, NINO 4, GBI, DMI, GH, P(t), E(t) |

S7 | NINO 3, NINO 3.4, NINO 4, GBI, DMI, GH, P(t), P(t − 1), P(t − 2), E(t), E(t − 1), E(t − 2), Q(t − 1), Q(t − 2) |

Tables 4–8 illustrate the hyperparameter tuning of the ML algorithms using Tree-Structured Parzen Estimator for S1–S7. KGE values for training and testing are also part of these tables.

Hyperparameter . | Range of parameter . | Optimal value . | ||||||
---|---|---|---|---|---|---|---|---|

S1 . | S2 . | S3 . | S4 . | S5 . | S6 . | S7 . | ||

n_estimators | 100–1,000 | 400 | 310 | 115 | 105 | 255 | 555 | 755 |

max_depth | 1–40 | 1 | 1 | 1 | 1 | 13 | 27 | 3 |

reg_alpha | 0–1 | 0.6 | 0.5 | 1 | 0.1 | 0.6 | 0.5 | 0.7 |

reg_lambda | 0–1 | 0.2 | 0.2 | 0.4 | 0.5 | 0.8 | 0.5 | 0.8 |

min_child_weight | 1–10 | 6 | 9 | 6 | 9 | 4 | 2 | 6 |

gamma | 0–1 | 0.3 | 0.3 | 0.5 | 0.1 | 0.6 | 0.9 | 0.5 |

Training KGE | 0.81 | 0.75 | 0.56 | 0.41 | 0.99 | 0.97 | 0.99 | |

Testing KGE | 0.8 | 0.79 | 0.51 | 0.4 | 0.73 | 0.69 | 0.85 |

Hyperparameter . | Range of parameter . | Optimal value . | ||||||
---|---|---|---|---|---|---|---|---|

S1 . | S2 . | S3 . | S4 . | S5 . | S6 . | S7 . | ||

n_estimators | 100–1,000 | 400 | 310 | 115 | 105 | 255 | 555 | 755 |

max_depth | 1–40 | 1 | 1 | 1 | 1 | 13 | 27 | 3 |

reg_alpha | 0–1 | 0.6 | 0.5 | 1 | 0.1 | 0.6 | 0.5 | 0.7 |

reg_lambda | 0–1 | 0.2 | 0.2 | 0.4 | 0.5 | 0.8 | 0.5 | 0.8 |

min_child_weight | 1–10 | 6 | 9 | 6 | 9 | 4 | 2 | 6 |

gamma | 0–1 | 0.3 | 0.3 | 0.5 | 0.1 | 0.6 | 0.9 | 0.5 |

Training KGE | 0.81 | 0.75 | 0.56 | 0.41 | 0.99 | 0.97 | 0.99 | |

Testing KGE | 0.8 | 0.79 | 0.51 | 0.4 | 0.73 | 0.69 | 0.85 |

Hyperparameter . | Range of parameter . | Optimal value . | ||||||
---|---|---|---|---|---|---|---|---|

S1 . | S2 . | S3 . | S4 . | S5 . | S6 . | S7 . | ||

max_depth | 3–20 | 3 | 4 | 12 | 14 | 19 | 17 | 4 |

n_estimators | 100–1,000 | 741 | 812 | 893 | 339 | 913 | 461 | 841 |

learning_rate | 0.001–0.1 | 0.09 | 0.04 | 0.003 | 0.02 | 0.04 | 0.03 | 0.01 |

num_leaves | 10–500 | 129 | 429 | 314 | 350 | 149 | 265 | 389 |

max_bin | 50–300 | 236 | 127 | 222 | 232 | 239 | 177 | 281 |

feature_fraction | 0.1–1 | 0.11 | 0.83 | 0.51 | 0.97 | 0.88 | 0.65 | 0.95 |

bagging_fraction | 0.1–1 | 0.7 | 0.75 | 0.3 | 0.7 | 0.75 | 0.9 | 0.55 |

Training KGE | 0.81 | 0.78 | 0.47 | 0.42 | 0.98 | 0.97 | 0.9 | |

Testing KGE | 0.82 | 0.82 | 0.46 | 0.41 | 0.71 | 0.66 | 0.85 |

Hyperparameter . | Range of parameter . | Optimal value . | ||||||
---|---|---|---|---|---|---|---|---|

S1 . | S2 . | S3 . | S4 . | S5 . | S6 . | S7 . | ||

max_depth | 3–20 | 3 | 4 | 12 | 14 | 19 | 17 | 4 |

n_estimators | 100–1,000 | 741 | 812 | 893 | 339 | 913 | 461 | 841 |

learning_rate | 0.001–0.1 | 0.09 | 0.04 | 0.003 | 0.02 | 0.04 | 0.03 | 0.01 |

num_leaves | 10–500 | 129 | 429 | 314 | 350 | 149 | 265 | 389 |

max_bin | 50–300 | 236 | 127 | 222 | 232 | 239 | 177 | 281 |

feature_fraction | 0.1–1 | 0.11 | 0.83 | 0.51 | 0.97 | 0.88 | 0.65 | 0.95 |

bagging_fraction | 0.1–1 | 0.7 | 0.75 | 0.3 | 0.7 | 0.75 | 0.9 | 0.55 |

Training KGE | 0.81 | 0.78 | 0.47 | 0.42 | 0.98 | 0.97 | 0.9 | |

Testing KGE | 0.82 | 0.82 | 0.46 | 0.41 | 0.71 | 0.66 | 0.85 |

Hyperparameter . | Range of parameter . | Optimal value . | ||||||
---|---|---|---|---|---|---|---|---|

S1 . | S2 . | S3 . | S4 . | S5 . | S6 . | S7 . | ||

lstm_units | 64–256 | 230 | 182 | 148 | 208 | 249 | 188 | 95 |

dropout_rate | 0.0–0.5 | 0.12 | 0.37 | 0.41 | 0.32 | 0.49 | 0.23 | 0.05 |

learning_rate | 0.0001–0.1 | 0.001 | 0.001 | 0.003 | 0.003 | 0.008 | 0.015 | 0.005 |

activation | [relu, tanh, sigmoid] | relu | relu | relu | relu | tanh | tanh | relu |

epochs | 1–100 | 50 | 29 | 98 | 94 | 92 | 89 | 55 |

Training KGE | 0.78 | 0.8 | 0.24 | 0.17 | 0.7 | 0.86 | 0.92 | |

Testing KGE | 0.83 | 0.81 | 0.22 | 0.14 | 0.56 | 0.7 | 0.87 |

Hyperparameter . | Range of parameter . | Optimal value . | ||||||
---|---|---|---|---|---|---|---|---|

S1 . | S2 . | S3 . | S4 . | S5 . | S6 . | S7 . | ||

lstm_units | 64–256 | 230 | 182 | 148 | 208 | 249 | 188 | 95 |

dropout_rate | 0.0–0.5 | 0.12 | 0.37 | 0.41 | 0.32 | 0.49 | 0.23 | 0.05 |

learning_rate | 0.0001–0.1 | 0.001 | 0.001 | 0.003 | 0.003 | 0.008 | 0.015 | 0.005 |

activation | [relu, tanh, sigmoid] | relu | relu | relu | relu | tanh | tanh | relu |

epochs | 1–100 | 50 | 29 | 98 | 94 | 92 | 89 | 55 |

Training KGE | 0.78 | 0.8 | 0.24 | 0.17 | 0.7 | 0.86 | 0.92 | |

Testing KGE | 0.83 | 0.81 | 0.22 | 0.14 | 0.56 | 0.7 | 0.87 |

Hyperparameter . | Range of parameter . | Optimal value . | ||||||
---|---|---|---|---|---|---|---|---|

S1 . | S2 . | S3 . | S4 . | S5 . | S6 . | S7 . | ||

filters | 32–316 | 214 | 298 | 67 | 315 | 308 | 165 | 228 |

kernel_size | 3–7 | 3 | 7 | 3 | 7 | 5 | 3 | 6 |

pool_size | 1–3 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |

strides | 1–3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

hidden nodes | 64–256 | 159 | 181 | 243 | 255 | 90 | 227 | 213 |

activation | [relu, tanh, sigmoid] | relu | relu | sigmoid | tanh | relu | tanh | relu |

Training KGE | 0.75 | 0.76 | 0.14 | 0.18 | 0.47 | 0.62 | 0.85 | |

Testing KGE | 0.78 | 0.81 | 0.14 | 0.17 | 0.43 | 0.53 | 0.86 |

Hyperparameter . | Range of parameter . | Optimal value . | ||||||
---|---|---|---|---|---|---|---|---|

S1 . | S2 . | S3 . | S4 . | S5 . | S6 . | S7 . | ||

filters | 32–316 | 214 | 298 | 67 | 315 | 308 | 165 | 228 |

kernel_size | 3–7 | 3 | 7 | 3 | 7 | 5 | 3 | 6 |

pool_size | 1–3 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |

strides | 1–3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

hidden nodes | 64–256 | 159 | 181 | 243 | 255 | 90 | 227 | 213 |

activation | [relu, tanh, sigmoid] | relu | relu | sigmoid | tanh | relu | tanh | relu |

Training KGE | 0.75 | 0.76 | 0.14 | 0.18 | 0.47 | 0.62 | 0.85 | |

Testing KGE | 0.78 | 0.81 | 0.14 | 0.17 | 0.43 | 0.53 | 0.86 |

Hyperparameter . | Range of parameter . | Optimal value . | ||||||
---|---|---|---|---|---|---|---|---|

S1 . | S2 . | S3 . | S4 . | S5 . | S6 . | S7 . | ||

n_estimators | 10–500 | 458 | 181 | 387 | 445 | 340 | 314 | 284 |

max_depth | 1–20 | 5 | 6 | 2 | 4 | 15 | 18 | 4 |

min_samples_split | 1–10 | 2 | 4 | 2 | 2 | 2 | 2 | 1 |

min_samples_leaf | 1–5 | 2 | 4 | 4 | 2 | 1 | 2 | 1 |

Training KGE | 0.82 | 0.81 | 0.49 | 0.42 | 0.84 | 0.82 | 0.85 | |

Testing KGE | 0.81 | 0.82 | 0.46 | 0.4 | 0.63 | 0.62 | 0.84 |

Hyperparameter . | Range of parameter . | Optimal value . | ||||||
---|---|---|---|---|---|---|---|---|

S1 . | S2 . | S3 . | S4 . | S5 . | S6 . | S7 . | ||

n_estimators | 10–500 | 458 | 181 | 387 | 445 | 340 | 314 | 284 |

max_depth | 1–20 | 5 | 6 | 2 | 4 | 15 | 18 | 4 |

min_samples_split | 1–10 | 2 | 4 | 2 | 2 | 2 | 2 | 1 |

min_samples_leaf | 1–5 | 2 | 4 | 4 | 2 | 1 | 2 | 1 |

Training KGE | 0.82 | 0.81 | 0.49 | 0.42 | 0.84 | 0.82 | 0.85 | |

Testing KGE | 0.81 | 0.82 | 0.46 | 0.4 | 0.63 | 0.62 | 0.84 |

XGBoost (refer to Table 4): In each scenario, the number of DT (n_estimators) varied; however, in most cases, the optimal value of this hyperparameter is on the lower side of the search space. In S1–S4, a minimum tree depth of 1 is achieved. In most of the instances, reg_alpha falls between 0.5 and 0.7. Similarly, the weight of the child nodes of the trees (min_child_weight) is primarily at the upper end of the range (6–9). Parameters such as max_leaf_nodes, max_delta_step have been assigned a default value of 1, whereas sub_sample, colsample_bylevel, scale_pos_weight have a value of 0.

XGBoost achieved its highest KGE values of 0.99, 0.97, and 0.99 in S5, S6, and S7 in training. Interestingly, KGE values in testing these scenarios are unsatisfactory, specifically for S5 and S6. In S3 and S4, the performance is unsatisfactory in training and testing.

LGBM (refer to Table 5): The optimal DT in the model (n_estimators) has been found in the upper side of the range (741–931) for all the scenarios except S4 and S6. The optimal depth of the DT (max_depth) for all scenarios varied across the search space. In most scenarios, the num_leaves hyperparameter has obtained values greater than 200, indicating higher model complexity. LGBM has obtained the optimal values for max_bin, and feature_fraction on the upper side of the search space for most scenarios. Default values of 20 and 0.003 have been set to parameters such as min_sum_hessian_in_leaf and min_data_in_leaf, respectively. The default values for other parameters, including lambda_l1, bagging_freq, min_gain_to_split, and lambda_l2 have been set to zero.

LGBM has performed similarly to XGBoost in all scenarios. In training, S5 has attained the highest KGE of 0.98, followed by S6 (0.97) and S7(0.9). In testing, S7 has achieved a KGE value of 0.85. Notably, the testing efficacy of S5 and S6 has decreased by 28 and 32% compared to training, indicating less model generalization capability in these scenarios.

Bi-LSTM (refer to Table 6): The optimal number of lstm_units in the LSTM layer ranges from 95 to 249 across S1–S7. The optimal values for dropout_rate and learning_rate varied across the search space. The ‘relu’ activation function was the best in all scenarios except S5 and S6. In most instances, optimal values of epochs are on the upper side of the range.

Bi-LSTM has achieved its highest KGE values of 0.92 and 0.87 in training and testing for S7. S1, S2, and S6 have also performed well, as evidenced by their training KGE values (0.78, 0.8, and 0.86) and testing KGE values (0.83, 0.81, and 0.7), respectively. The algorithm did not perform well in S3 and S4.

CNN (refer to Table 7): The number of filters in the convolution layer was on the upper side of the search space for all the scenarios except for S3 and S6. The optimal number of kernel_size varies from 3 to 7 across S1–S7. In S3–S7, the optimal pool size is two. Across all scenarios, the activation function is not consistent. The number of convolution layers, learning_rate, and batch_size have been set to their default values of 1, 0.001, and 64, respectively.

The best performance of CNN has been obtained in S7 for training and testing with KGE values of 0.85 and 0.86, respectively. S1 and S2 have also shown good performance, achieving KGE values of 0.75 and 0.76 during training and 0.78 and 0.81 during testing. The cause–effect models (S3 and S4) performed unsatisfactorily, as evidenced by their low KGE values.

RF (refer to Table 8): The optimal values for min_samples_split and min_samples_leaf are on the lower side of the search space across S1–S7. The optimal values of n_estimators do not exhibit significant variation except for S2. The optimal values for max_depth (2–6) are on the lower side of the search space for most of the scenarios. The max_sample parameter has been set to none, indicating that all samples for each tree have been considered. The sqrt has been assigned to the max_feature parameter, indicating that the number of features required for optimal node separation is the square root of the total number of features. The setting of max_leaf_nodes to none indicates no upper limit on the number of leaf nodes.

RF has achieved the highest KGE value in training (0.85) and testing (0.84) for S7. RF has also performed well for S5 and S6 in training, as evidenced by their respective KGE values (0.84 and 0.82). Notably, the performance of these scenarios is moderate in the testing. RF has also not performed well in S3 and S4, similar to other algorithms.

### Selection of best input combination and implementation of SEM

All the ML algorithms performed well with KGE values larger than 0.8 in testing. Bi-LSTM achieved the maximum KGE of 0.87 during the testing phase, followed by CNN (0.85), XGBoost (0.85), LGBM (0.85), and RF (0.84). In the training, S7 in XGBoost attained the maximum KGE of 0.99. Nonetheless, the performance decreased by about 14% during the testing phase, i.e., 0.85. Compared to its training phase, the testing phase performance of BI-LSTM has only reduced by 5%. Consequently, Bi-LSTM is considered superior to XGBoost in the S7 and used as the basis for SEM.

Figure 11 presents KGE values of SEM (0.94 and 0.89 during the training and testing) and five ML algorithms. KGE values of SEM are observed to be better than those of Bi-LSTM (0.92 and 0.87).

### Peak reservoir inflow analysis

The efficacy of the ML algorithms is also explored for peak reservoir inflow analysis. The 98th percentile has been utilized to identify extreme events (Bharti *et al.* 2016). Thirty-four extreme events were considered, out of which 20 are considered for training. The resulting KGE values of XGBoost, LGBM, Bi-LSTM, CNN, and RF for training are 0.93, 0.46, 0.85, 0.8, and 0.67. These values for testing are 0.77, 0.31, 0.81,0.8, and 0.59, respectively. The testing efficacy of XGBoost has decreased by 17% compared to training, indicating less generalization capability. Thus, Bi-LSTM with KGE values (0.85 and 0.81) has been selected as the meta-model for SEM. SEM obtained KGE values of 0.86 and 0.81 during training and testing, indicating its high performance in simulating peak inflow events.

Figure 12(a) shows that XGBoost, SEM, and BI-LSTM have performed well in simulating peak inflows in the training. All the algorithms have underestimated most of the peak inflow events, except SEM. XGBoost has been able to replicate the observed peak flows quite efficiently. Bi-LSTM has underestimated 16 out of 20 extreme occurrences, with an average residual of 875 m^{3}/s. The performance of LGBM and RF algorithms has been moderate, as they have consistently underestimated the majority of peak inflows with an average error of 1,947 and 1,726 m^{3}/s, respectively. In contrast, SEM has marginally overestimated all the peak inflows, with an average residual of 598 m^{3}/s.

It is observed from Figure 12(b) that BI-LSTM and XGBoost have mostly underestimated the peak inflow events in testing. LGBM has not been able to simulate the peaks, which can be seen from the more significant gap between the observed and simulated lines. With an average residual of 1,294 m^{3}/s, the Bi-LSTM underestimated 10 peak inflow occurrences. CNN underestimated 8 of 14 events, with an average inaccuracy of 1,219 m^{3}/s. On the other hand, SEM has marginally overestimated almost every peak inflow, with an average residual of 978 m^{3}/s. RF also underestimated the majority of peak inflow occurrences, with an average error in inflow value of 1,885 m^{3}/s.

### Discussion

Most previous studies have simulated reservoir inflows on a monthly time scale (Noorbeh *et al.* 2020; Allawi *et al.* 2021; Han *et al.* 2023). This is because of the aggregation of monthly data over extended periods to reduce noise and variability inherent in daily data, making it easier for ML algorithms to identify data patterns. The present study has, however, undertaken a more intricate approach by considering daily timescale in predicting inflow to the SRSP. In this regard, a substantial amount of training data from 11 years (3,751 records) has been considered, which is also in similar lines with that of the previous research works (Bai *et al.* 2016; Lee & Kim 2021; Saab *et al.* 2022). Higher training data allows the algorithm to be trained on a broader range of climatic fluctuations affecting the inflow.

Frequency and distribution of rainfall are partially governed by the intricate relationship of climate patterns across various continents and oceans (Maity & Kashid 2011; Hasanean *et al.* 2023). These were expressed as inflow-related climate indices (Muluye & Coulibaly 2007; Kim *et al.* 2019). Relevant discussion in the context of the case study is as follows:

NINO indices (NINO 3.4, NINO 3, and NINO 4) measure the differences in the sea surface temperature (SST), indicating developments for El Niño and La Niña events. This is responsible for variability in the global wind patterns, leading to a shift in monsoon circulation over India (Azad & Rajeevan 2016). Thus, NINO indices have a strong association with inflows of SRSP, which the results of the present research work have also evidenced (refer to Table 1).

DMI directly influences atmospheric circulation over the Indian Ocean, strongly impacting cloud formation and rainfall patterns. GH indicates pressure gradients in the atmosphere. Its variability will affect the wind patterns, impacting the Indian monsoon. A lower GH value for SRSP (mean GH of 80.39 m and 80.25 m in training and testing) shows areas of low atmospheric pressure, leading to frequent cloud formation and increased rainfall over the catchment area (Jabbar & Hassan 2022).

GBI reflects the atmospheric blocking patterns over Greenland, which impacts global atmospheric circulation, notably in the Northern Atlantic Ocean (Barrett *et al.* 2020). This blocking pattern over the Atlantic Ocean influences the timing, strength, and distribution of Indian summer monsoon rainfall. A higher value of GBI in the present study (mean GBI of 5,320.14 m and 5,329.54 m in training and testing) indicates enhanced rainfall in the SRSP catchment area and higher inflows correspondingly.

On the other hand, climate indices based on teleconnection patterns such as the WPA, PNA, NAO, and AO have shown relatively weaker association with the daily inflows of SRSP. SRSP is located in the tropical region, whereas these indices primarily represent atmospheric circulation patterns of the mid-latitude Northern Hemisphere, far from the case study area (Yu & Lin 2016). Thus, the influence of such indices on the tropical atmospheric dynamics could be weaker than that of other climate indices. Indices such as SLP, MW, and ZW are based on coarser spatial scales covering wider regions of atmospheric conditions. Thus, their direct impact on the localized rainfall patterns of the SRSP catchment area might not have been strong. The present study employed a mechanism to find out the most significant climate indices, which is based on EWM, along the same lines as previous studies (Gelati *et al.* 2014; Amnatsan *et al.* 2018).

The present work also reveals that combined model (S7: *Q*(*t*), *Q*(*t**−* 1), *Q*(*t**−* 2), *P*(*t*), *P*(*t**−* 1), *P*(*t**−* 2), *E*(*t*), *E*(*t**−* 1), *E*(*t**−* 2), GBI, GH, NINO 3, NINO 3.4, NINO 4, and DMI) performed better than cause–effect and time series models for simulating inflows, which is echoed by the works of Magar & Jothiprakash (2011). All the ML algorithms (XGBoost, LGBM, Bi-LSTM, CNN, and RF) have performed very well in simulating the inflow patterns under S7. Among them, Bi-LSTM has shown the best efficiency and matches the analysis of Feizi *et al.* (2022). Bi-LSTM is better since it can learn from both forward and backward directions, while other algorithms are unidirectional (Mughees *et al.* 2021). SEM with Bi-LSTM as the meta-model has been implemented to improve the predictive accuracy. The results show that SEM (KGE of 0.94 and 0.89 in training and testing) has performed relatively better than Bi-LSTM (KGE of 0.89 and 0.87 in training and testing). Granata *et al.* (2022) have drawn similar conclusions in their study on streamflow forecasting. Lu *et al.* (2023) have also found that SEM has performed better than XGBoost, RF, and AdaBoost.

The present research indicates that the climate indices with the conjunction of rainfall, evaporation, and time-lagged inflow have significantly improved the predictive accuracy, in contrast to the utilization of climate indices as the sole inputs (S5: GBI, GH, NINO 3, NINO 3.4, NINO 4, and DMI). It may be due to the global averaging of climate indicators, which are not adjusted or downscaled for the case study.

In peak inflow analysis, SEM has shown promising results matching the extreme values. Neural-based ML algorithms such as Bi-LSTM and CNN have marginally underestimated the peak flows. Similar observations have been reported by Forghanparast & Mohammadi (2022) and Granata *et al.* (2022). Underestimation of extreme inflows will aid decision-makers in preparing for the worst-case scenario and implementing water conservation measures. The findings of the present study have significant implications for research on reservoir inflow forecasting, specifically from the viewpoint of climatic indices. There are many uncertainties in the data of climate indices. Inflow prediction can be improved if these uncertainties are minimized, which will be considered as future work.

## SUMMARY AND CONCLUSIONS

The present study modeled the nonlinear complexities of climate indices, meteorological variables, and time-lagged inflow data for daily inflow to SRSP, India. Five ML algorithms, XGBoost, LGBM, Bi-LSTM, CNN, and RF, are employed for simulating inflow. Climate indices were evaluated utilizing RFI, FFS, and SCC. The results from these three methods were aggregated using EWM. Combining time-lagged historical inflows, rainfall, evaporation, their time-lagged variables, and six influential indices is the suitable combination for the ML algorithms (scenario S7). An ensemble framework based on the stacking mechanism is employed. The conclusions are as follows:

- 1.
GBI, GH, NINO 3, NINO 3.4, NINO 4, and DMI are the most influential climate indices on the inflow to the reservoir.

- 2.
SEM utilizing Bi-LSTM (best ML Algorithm) as the meta-model performed marginally better than Bi-LSTM, LGBM, RF, and CNN.

- 3.
SEM has predicted the peak inflow events quite well, with KGE values of 0.86 and 0.81 for both training and testing, respectively. Notably, SEM has marginally overestimated the inflows.

Further research works that may be explored include land use characteristics, groundwater data, soil, utilizing finer resolution of climate indices, and climate change framework.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.