## Abstract

The present study applies three Machine Learning Algorithms, namely, Bi-directional Long Short-Term Memory (Bi-LSTM), Wavelet Neural Network (WNN), and eXtreme Gradient Boosting (XGBoost), to assess their suitability for streamflow projections of the Lower Godavari Basin. Historical data of 39 years of daily rainfall, evapotranspiration, and discharge were used, of which 80% applied for the model training and 20% for the validation. A Random Search method was used for hyperparameter tuning. XGBoost performed better than WNN, and Bi-LSTM with an R2, RMSE, NSE, and PBIAS of 0.88, 1.48, 0.86, and 29.3% during training, and 0.86, 1.63, 0.85, and 28.5%, during validation, indicating the model consistency. Therefore, it was further used for projecting streamflow from climate change perspective. Global Climate Model, Ec-Earth3 was employed in the present study. Four Shared Socioeconomic Pathways (SSPs) were considered and downscaled using Empirical Quantile Mapping. Eight decadal streamflow projections were computed – D1 to D8 (2021–2030 to 2091–2099) – exhibiting significant changes within the warm-up period. They were compared with three historical time periods of H1 (1982–1994), H2 (1995–2007), and H3 (2008–2020). The highest daily streamflow projections were observed in D1, D3, D4, D5, and D8 in SSP245 as per XGBoost analysis.

## HIGHLIGHTS

eXtreme Gradient Boosting (XGBoost) model displays an exceptional performance with

*R*^{2}= 0.88, RMSE = 1.48, NSE = 0.87, and PBIAS = 29.3% in training. These values in validation are 0.86, 1.48, 0.85, and 28.5%. XGBoost has the edge over the four variants of WNN and a Bi-LSTM model.The highest daily streamflow projections were observed in D

_{1}, D_{3}, D_{4}, D_{5}, and D_{8}in SSP245 as per XGBoost analysis.

### Graphical Abstract

## INTRODUCTION

Streamflow projections will assist decision-makers in perceiving a basin's water status, enabling them to make effective basin planning and management initiatives. In this regard, hydrological models play a crucial role, necessitating the identification of a reliable model. Another dimension is climate change, where the patterns of rainfall and temperatures are expected to vary in the future time horizons with increasing greenhouse gas emissions. It significantly impacts water resources (Konapala *et al.* 2020). Thus, it is essential to analyze the streamflow changes at a smaller time scale that can be useful to the decision makers (Galavi *et al.* 2019).

Simulating a catchment response with a great extent of non-linearity using a hydrological model is challenging (Kabir *et al.* 2022). In this context, diverse process-based hydrological models (Horton *et al.* 2022) were examined to achieve improved streamflow simulations. Many approximations in conceptualizing the model and high data requirements are the challenges, despite showing better interpretability of the underlying mechanisms (Moges *et al.* 2020). In this context, machine learning (ML) models were well suited for establishing intricate non-linear relationships within a system. Complementarily, computational efficiency is the added advantage. Unlike process-based hydrological models, ML models do not entail in-depth knowledge of the basin (Kim *et al.* 2021; Rozos *et al.* 2022). Many studies reported that Adaptive Neuro Fuzzy Inference Systems and Artificial Neural Networks (ANNs) were well suited for simulating streamflow (Galavi & Lee 2012; Galavi *et al.* 2013). However, one of the challenges that may encounter is overfitting. They frequently fail to capture long-term dependency from time series data.

Recurrent Neural Networks (RNNs) (Zhang *et al.* 2021) were used to address these shortcomings, demonstrating better streamflow accuracy with minimal training time than ANN. However, these models have a vanishing gradient problem. Accordingly, Long Short-Term Memory (LSTM) and its variants were investigated and found reliable in streamflow predictions (Mirzaei *et al.* 2021). Bi *et al.* (2020) compared a deep neural network, CAGANet, with the LSTM, Support Vector Machine (SVM), and the attention-mechanism-based LSTM incorporating data augmentation in forecasting daily streamflow in Qingxi river basin, Southwest China. They have found that CAGANet has shown exceptional capability with an NSE of 0.98, followed by LSTM variants with an NSE of 0.85–0.92. However, these models require high computational efforts. This situation necessitates simple hybrid models such as WNN to achieve satisfactory predictions.

Kasiviswanathan *et al.* (2016) applied ANN and WNN combined with Block Bootstrap (BB) sampling to forecast the Bow River Basin's streamflow, Alberta, Canada. They employed a percent volumetric error indicator. Comparative analysis revealed the dominance of WNN-BB over ANN-BB. Honorato *et al.* (2018) compared ANN and WNN over the Sobradinho dam in Northeast Brazil, in which the approximation of the wavelet decompositions was fed to ANN as inputs. The results confirmed that WNN proved superior to ANN in the lead time intervals of 1, 3, and 6 months, respectively, but had deteriorated in 12-month lead times. They opined that a high-frequency component improved model accuracy in shorter forecast horizons and vice versa for longer forecasts. Sharghi *et al.* (2019) developed an emotional ANN (EANN), WNN, and Wavelet-EANN for streamflow modeling of the Trinity and West Nishnabotna rivers. Wavelet-EANN is found promising for simulating streamflow on daily and monthly temporal scales. Criswell & Lin (2021) employed ANN, WNN, and Inverse Discrete Wavelet Transform (IDWT) to forecast river flow in Tittabawassee, USA. They revealed that WNN outperformed ANN and IDWT because of incorporating higher-level wavelet decomposition. They also opined that model accuracy could be improved using appropriate mother wavelets.

Furthermore, studies have been conducted to identify a model structure that works effectively even without pre-processing, which was possible in XGBoost. The superiority of this model and its variants has been demonstrated in streamflow simulation studies conducted across various catchments. Sahour *et al.* (2021) applied XGBoost and Random Forest (RF) models for simulating streamflow. The performance of XGBoost was slightly better than RF. Ni *et al.* (2020) employed the Gaussian mixture model (GMM)-XGBoost, SVM, and XGBoost for monthly streamflow forecasting of two stations in the Yangtze River Basin. GMM-XGBoost displayed a performance superior to SVM and XGBoost in streamflow simulation. Bai *et al.* (2021) developed a hybrid model based on XGBoost, Bayesian optimization algorithm, and Gaussian process regression for forecasting streamflow. They applied the developed model and eight streamflow prediction methods to the Yangtze River Basin, China. The hybrid XGBoost model showed satisfactory performance compared with other models with an *R*^{2} of 0.97, MAPE of 8.09%, and RMSE of 1.85.

Rabezanahary *et al.* (2021) applied ANN and Soil Water Assessment Tool (SWAT) in climate change perspective for streamflow forecasting over Mangoky River, Africa. They employed Shared Socioeconomic Pathways (SSPs), namely, SSP370 and SSP585. It was concluded that there would be a decrease in annual streamflow in the basin. Song *et al.* (2022) applied SWAT and LSTM to forecast streamflow using SSP245 and SSP585 scenarios over the Yeongsan Basin, South Korea. SWAT projected an increase in the future streamflow by 17.7%. Contrarily, LSTM predicted a decrease in streamflow by 13.6%. In summary, precise information about streamflow changes would be handy for policy-makers in decision-making.

It is observed from literature survey that three ML algorithms, LSTM, WNN, and XGBoost, were promising, as evident from different case studies and has limited applications. In addition, these models were never compared simultaneously for Indian conditions but were explored individually. Accordingly, the objectives framed are as follows:

- (1)
To investigate the prediction performance of Bi-LSTM, WNN, and XGBoost from a historical perspective and identify the suitable one.

- (2)
To project streamflow for eight decadal time horizons, D

_{1}(2021–2030), D_{2}(2031–2040), D_{3}(2041–2050), D_{4}(2051–2060), D_{5}(2061–2070), D_{6}(2071–2080), D_{7}(2081–2090), and D_{8}(2091–2099) using suitable ML model obtained from (1).

## CASE STUDY AND DATA COLLECTION

^{2}(refer to Figure 1). The study area is in the lower reaches of the Godavari basin and is nearer to the Bay of Bengal. There is a considerable water requirement in the basin for various purposes like drinking, irrigation, industrial, stabilization of canals, inter-state dependencies, etc. (Polavaram Project Authority 2021). This requires efficient planning and management initiatives to ensure water security to satisfy the vast requirements of the basin. Accordingly, streamflow in the future is required to be estimated.

Rainfall, maximum and minimum temperatures for 52 grid locations of LGB for the period 1982–2020 were collected from India Meteorological Department. Daily discharge for Polavaram and Perur was obtained from Central Water Commission. Future rainfall, maximum and minimum temperatures for 2021–2099 were obtained using the global climate model, EC-Earth3, and downscaling method, Empirical Quantile Mapping for SSP126, SSP245, SSP370, and SSP585 (Mishra *et al.* 2020). Evapotranspiration was computed using the Penman–Monteith method (https://www.fao.org/land-water/databases-and-software/eto-calculator/en/). In this study, Ec-Earth3 was selected as it was found promising in the South Asian Region as evident from the previous research works (Mishra *et al.* 2020; Parding *et al.* 2020; Iqbal *et al.* 2021; Desmet & Ngo-Duc 2022) and also because of its high grid resolution of 0.7° (Liu *et al.* 2022). Average rainfall, evapotranspiration, and observed discharge were the inputs to Bi-LSTM, WNN, and XGBoost to simulate streamflow. Additionally, the future rainfall and evapotranspiration were averaged and fed into the best algorithm for projecting streamflows for SSP126, SSP245, SSP370, and SSP585.

Furthermore, decadal time horizons were chosen as they facilitate understanding the most significant changes within the warming range (Mahmood *et al.* 2021) that would be experienced in the basin. The historical streamflow is split into three-time horizons, H_{1} (1982–1994), H_{2} (1995–2007), and H_{3} (2008–2020), for a comprehensive comparison of the hydrologic behaviour of the basin. Supplementary Material, Table S1 presents streamflow information for three historical time horizons.

## DESCRIPTION OF METHODS

This section briefly explains the Bi-LSTM, WNN, and XGBoost methods.

### Bi-LSTM (Bi-directional Long Short-Term Memory)

Bi-LSTM (Li *et al.* 2021), an improved version of LSTM, is a series processing model comprising two LSTMs – one that receives input forward (past to future) and the another backward (future to past) – thus providing computational flexibility. This Bi-LSTM is a cyclic RNN variant that adds a gating unit to address the issue of a long-term dependence on long-sequence data. It restrains the vanishing gradient and helps retain each cyclic unit's various time-scale information features from the meteorological data. It uses three sigmoid functions and a *tanh* function introducing input, forget, and output gates. Input gates quantify the extent of new information to be carried into the memory. Forget gates determine whether previous time stamp information on averaged rainfall and evapotranspiration should be retained or removed. Cell gates specify the memory of LSTM, and output gates control the values of hidden states.

#### Forward pass mechanism

*t*and

*t*− 1; , , , are the cell states at times

*t*and

*t*− 1 and , are outputs at times

*t*and

*t*− 1; , , and are input, output, and forget gates, respectively at time

*t*; , , and are weight vectors. , , and are biases of input, output, and forget gates. represents sigmoid activation function. Input, output, and forget gates (Qi

*et al.*2019) are governed by Equations (1)–(3), which are as follows:

*softsign*,

*tanh*, and Rectified Linear Activation Function (

*relu*), respectively. The symbol ⊗ represents the multiplication factor. and are the input state matrix and corresponding bias states. The obtained outputs , , are presented in Equations (4)–(7):

Let *i*_{1}, *i*_{2},…, *i _{n}* denote data entered and

*c*

_{t}_{−1},

*a*

_{t}_{−1}denote data obtained from Equations (4)–(7). The six distinct Bi-LSTM weights (Wang

*et al.*2019) will be updated until the desired output is achieved.

The working steps of Bi-LSTM are as follows:

- (1)
Identify Bi-LSTM layer nodes, the number of neurons, learning rate, batch size, dropout, the number of epochs, and termination criteria.

- (2)
Compute cell states (

*c*_{t}_{−1},*a*_{t}_{−1}). - (3)
Let the cell states pass through the forward and backward layers for updating six unique weights. This updating of weights (Wang

*et al.*2019) is continued until termination criteria are satisfied.

Internal components of the configuration model are referred to as parameters, and how these perform will depend on the training dataset used. The explicitly stated parameters that regulate the training process are known as hyperparameters and are established as part of the training process. In Bi-LSTM, all parameters are considered important and treated as hyperparameters. These are Bi-LSTM layer node (P_{L1}), Learning rate (P_{L2}), Batch size (P_{L3}), the number of neurons (P_{L4}), Epochs (P_{L5}), and Dropout (P_{L6}), respectively. The weights updated in input, forget, memory, and output gates are influenced by P_{L1} to P_{L6} (refer to Table 1). P_{L1}, P_{L4}, and P_{L5} positively influence the model performance when they are increased. They help to store the information in cell states and enhance the scope for efficiently updating weights. P_{L2} and P_{L3} govern the acceleration mechanism of weights updated. Lower values of P_{L2} and P_{L3} are preferable as the model could learn all the complex patterns effectively. P_{L6} helps improve the processing time, thereby reducing biases. Therefore, these are assessed for identifying optimum values using the Random Search hyperparameter tuning method. The Random Search method is chosen for tuning due to its ease of applicability and ability to generate infinitive parameter variations (Elgeldawi *et al.* 2021). The ranges of parameters related to Bi-LSTM with remarks are described in Table 1 (columns 2–3).

Index (1) . | Parameters, their range, and optimal value (2) . | Remarks (3) . |
---|---|---|

P_{L1} | Bi-LSTM layer node 16–256 (64) | Bi-LSTM layer nodes are hidden layer nodes that aid in data conveyance from input, forget, memory, and output gates. The larger the Bi-LSTM layer nodes, the more comprehensive the information from the given data stored in the cell state, resulting in a better output. |

P_{L2} | Learning rate 0–1 (0.2) | The learning rate describes the step size at which the weights are altered to achieve minimal loss function. However, larger learning rates may result in underfitting, while very low values may result in an overfitting scenario. |

P_{L3} | Batch size 16–256 (64) | Batch size specifies the number of samples sent to the network instantly. More batch sizes result in an erroneous capture of complex data patterns. |

P_{L4} | Number of neurons 16–256 (128) | The number of neurons available in a dense model layer plays a major role in model performance. Overfitting may arise if there are more neurons than what is desirable. |

P_{L5} | Epochs 10–100 (15) | Epoch refers to one cycle through the entire training dataset. However, a more significant number of epochs will consume memory, rapidly increasing the processing time. |

P_{L6} | Dropout 0–0.5 (0.25) | It refers to data or noise intentionally dropped from a neural network to improve processing and reduce overall computational time. |

Index (1) . | Parameters, their range, and optimal value (2) . | Remarks (3) . |
---|---|---|

P_{L1} | Bi-LSTM layer node 16–256 (64) | Bi-LSTM layer nodes are hidden layer nodes that aid in data conveyance from input, forget, memory, and output gates. The larger the Bi-LSTM layer nodes, the more comprehensive the information from the given data stored in the cell state, resulting in a better output. |

P_{L2} | Learning rate 0–1 (0.2) | The learning rate describes the step size at which the weights are altered to achieve minimal loss function. However, larger learning rates may result in underfitting, while very low values may result in an overfitting scenario. |

P_{L3} | Batch size 16–256 (64) | Batch size specifies the number of samples sent to the network instantly. More batch sizes result in an erroneous capture of complex data patterns. |

P_{L4} | Number of neurons 16–256 (128) | The number of neurons available in a dense model layer plays a major role in model performance. Overfitting may arise if there are more neurons than what is desirable. |

P_{L5} | Epochs 10–100 (15) | Epoch refers to one cycle through the entire training dataset. However, a more significant number of epochs will consume memory, rapidly increasing the processing time. |

P_{L6} | Dropout 0–0.5 (0.25) | It refers to data or noise intentionally dropped from a neural network to improve processing and reduce overall computational time. |

Hyperparameters influencing Bi-LSTM and their optimal values of parameters are presented in Table 1 (column 2). The optimal value of bi-layer nodes is closer to the lower threshold value, implying that a smaller number of forward and backward hidden layer nodes are sufficient to provide the desired output while preserving the most insightful information in the cell state. The optimal learning rate is detected nearer to the lower threshold, indicating that a smaller step size is picked to increase the likelihood of obtaining a minimal loss function. Optimal batch size and bi-layer node values are similar. Greater bi-layer nodes allow better conveyance of weight modifications from input, forget, memory, and output gates, resulting in superior output.

In contrast, the smallest batch size is recommended for improved model performance. An ideal number of neurons lie precisely in the middle of the lower and upper threshold limits. Interestingly, the thresholds for batch size, bi-layer node number, and the number of neurons are similar. The optimal number of epochs is closer to the lower threshold, implying a relatively short computational time. The optimal dropout value is found exactly at the center value of the lower and upper thresholds, implying that the input data is moderately noisy.

### WNN (Wavelet Neural Network)

WNN (Guo *et al.* 2022) works based on signal processing. This study chooses the WNN method to pre-process the meteorological inputs by removing noise through wavelet decomposition, thereby enhancing the simulated streamflow accuracy over traditional ANN. This pre-processing using wavelet aids in better-capturing patterns from the given meteorological information. The philosophy behind using wavelets is to capture adequate information from primary meteorological data and decompose the same into details and approximations with the help of wavelet transforms. These behave distinctly in each sub-time series in comparison with the primary data. The decomposition process can be iterated with successive approximations so that the primary data is partitioned into many lower-resolution components. Scaling (dilation) and shifting (translation) are the two prominent factors in wavelet decomposition. This wavelet transforms processed via high-pass and low-pass filters (Alexandridis & Zapranis 2013).

*y*(

*x*) is the simulated output values; is the number of wavelons, is the input value; and are the input and hidden network weights; is constant;

*n*is the number of inputs; is the input to wavelon normalized with ((

*x*

_{i}*−*

*b*)/

*a*);

*b*and

*a*are the translation and dilation parameters; and is intermediate outputs based on chosen mother wavelet.

Equation (8) typically represents the single hidden layer mechanism of feed-forward WNN. When no hidden units exist, Equation (8) acts like a linear model. The first term in the equation represents the constant . The second term represents the hidden layer quantified as the product of associated network weight and a mother wavelet. The mother wavelet is a function of weighted input, translation, and dilation parameters. The third term represents the product of input network weight and corresponding input value.

The working steps of WNN are as follows:

- (1)
Identification of hidden units, the number of epochs, termination criteria, type of mother wavelet, and initialization of

*a*and*b*. - (2)
Assigning initial weight vectors (network weights and ).

- (3)
Computation of based on

*a*and*b*, corresponding and . - (4)
Comparison of with observed to compute the associated error.

- (5)
Updating weights and other parameters until termination criteria are satisfied.

#### Mother wavelets

WNN mechanism is significantly influenced by the parameters described in Table 2. Here also, all the parameters are considered important and treated as hyperparameters. These are the number of hidden layers (P_{W1}), the number of wavelons in the first layer (P_{W2}), the number of wavelons in the second layer (P_{W3}), activation function (P_{W4}), dropout (P_{W5}), and learning rate (P_{W6}), respectively. The parameters P_{W1}, P_{W2}, P_{W3}, P_{W4}, and P_{W6} control the network weights while the input data is navigated via hidden layers to generate the desired outcome. P_{W5} is an intentionally dropped wavelons parameter from the neural network to improve the processing and reduce the overall computation time. Table 2 presents the parameter ranges with remarks on four wavelets employed (columns 3–7).

Index (1) . | Parameters (2) . | Range (Optimal value) for Gaussian (3) . | Range (Optimal value) for Mexican hat (4) . | Range (Optimal value) for Morlet (5) . | Range (Optimal value) for Shannon (6) . | Remarks (7) . |
---|---|---|---|---|---|---|

P_{W1} | Number of hidden layers | 0–3 (2) | 0–3 (1) | 0–3 (2) | 0–3 (0) | The number of hidden layers is associated with an input and output layer, allowing wavelons to take input and generate output. Overfitting takes place due to an increase in hidden layers. Although fewer hidden layers reduce computational work, it may result in underfitting. |

P_{W2} | Number of wavelons in the first layer | 32–512 (256) | 32–512 (128) | 32–512 (128) | NA | The number of wavelons refers to the hidden units in the first hidden layer that would navigate the updated weighted inputs through the corresponding hidden layers. A large number of wavelons increases the accuracy of the model. |

P_{W3} | Number of wavelons in the second layer | 32–512 (32) | NA | 32–512 (32) | NA | The number of wavelons refers to the hidden units in the second hidden layer that would navigate the updated weighted inputs through the first hidden layer resulting in an output. A large number of wavelons increases the accuracy of the model. |

P_{W4} | Activation function | tanh, relu (tanh) | tanh, relu (relu) | tanh, relu (relu) | tanh, relu (tanh) | This function controls the activation and deactivation of wavelons as they move from the input to the output layer. The suitable activation function is chosen based on the type of problem and the mother wavelet. |

P_{W5} | Dropout | 0–0.5 (0.25) | 0–0.5 (0.25) | 0–0.5 (0) | 0–0.5 (0.25) | Dropout is a parameter that allows specific wavelons to be selectively removed from a neural network to optimize processing and reduce computing time. If the dropout rate exceeds a specific threshold, it shows that the model is unsuitable for the provided input datasets. Higher dropout indicates that the given datasets are noisy. |

P_{W6} | Learning rate | 0–1 (8.68 × 10^{−4}) | 0–1 (10^{−3}) | 0–1 (2.04 × 10^{−3}) | 0–1 (5.92 × 10^{−4}) | Learning rate determines the step size where weights are updated to achieve the smallest loss function. Higher learning speeds up convergence, but it may result in the loss of intricate patterns from information available, leading to underfitting. |

Index (1) . | Parameters (2) . | Range (Optimal value) for Gaussian (3) . | Range (Optimal value) for Mexican hat (4) . | Range (Optimal value) for Morlet (5) . | Range (Optimal value) for Shannon (6) . | Remarks (7) . |
---|---|---|---|---|---|---|

P_{W1} | Number of hidden layers | 0–3 (2) | 0–3 (1) | 0–3 (2) | 0–3 (0) | The number of hidden layers is associated with an input and output layer, allowing wavelons to take input and generate output. Overfitting takes place due to an increase in hidden layers. Although fewer hidden layers reduce computational work, it may result in underfitting. |

P_{W2} | Number of wavelons in the first layer | 32–512 (256) | 32–512 (128) | 32–512 (128) | NA | The number of wavelons refers to the hidden units in the first hidden layer that would navigate the updated weighted inputs through the corresponding hidden layers. A large number of wavelons increases the accuracy of the model. |

P_{W3} | Number of wavelons in the second layer | 32–512 (32) | NA | 32–512 (32) | NA | The number of wavelons refers to the hidden units in the second hidden layer that would navigate the updated weighted inputs through the first hidden layer resulting in an output. A large number of wavelons increases the accuracy of the model. |

P_{W4} | Activation function | tanh, relu (tanh) | tanh, relu (relu) | tanh, relu (relu) | tanh, relu (tanh) | This function controls the activation and deactivation of wavelons as they move from the input to the output layer. The suitable activation function is chosen based on the type of problem and the mother wavelet. |

P_{W5} | Dropout | 0–0.5 (0.25) | 0–0.5 (0.25) | 0–0.5 (0) | 0–0.5 (0.25) | Dropout is a parameter that allows specific wavelons to be selectively removed from a neural network to optimize processing and reduce computing time. If the dropout rate exceeds a specific threshold, it shows that the model is unsuitable for the provided input datasets. Higher dropout indicates that the given datasets are noisy. |

P_{W6} | Learning rate | 0–1 (8.68 × 10^{−4}) | 0–1 (10^{−3}) | 0–1 (2.04 × 10^{−3}) | 0–1 (5.92 × 10^{−4}) | Learning rate determines the step size where weights are updated to achieve the smallest loss function. Higher learning speeds up convergence, but it may result in the loss of intricate patterns from information available, leading to underfitting. |

Optimal values of hyperparameters influencing WNN for four wavelets – Gaussian, Mexican hat, Morlet, and Shannon – are presented in Table 2 (columns 3–6). Two hidden layers were identified in the Gaussian with 256, 32 in the first and second hidden layers, respectively. These are 128, 32 wavelons in Morlet wavelets. On the other hand, the Mexican hat has a single hidden layer with 128 wavelons. In the Shannon wavelet, no hidden layer is identified. Mexican hat uses *relu* activation function, whereas all other wavelets use *tanh* activation function to activate weights and biases. Dropout values are similar in Gaussian, Mexican hat, and Morlet wavelets. In contrast, no dropout is seen in the Shannon wavelet. Compared with the other two mother wavelets, the learning rates for Shannon and Morlet wavelets are lower and higher, respectively.

### XGBoost (eXtreme Gradient Boosting)

XGBoost (Wang *et al.* 2022) is designed to be computationally efficient and highly effective. This method is chosen for its simpler yet effective structure and does not need any pre-processing or updating of information in gated units. This statistical framework comprises features, weak learners, and an additive model with the aim of loss function minimization. Decision trees are considered weak learners and are constructed based on similarity scores. These are constrained by the number of leafs, nodes, splits, or layers. Shorter decision trees possessing a single split are called stump/decision stump. Larger decision tree splits generally range from 4 to 8. However, more splits can be used, which may be computationally expensive. The additive trees are introduced without replacing the existing trees. A gradient descent procedure can be implemented to minimize the losses due to adding trees.

*i*’ divided into either left or right branches, depending on the chosen separation criteria. Loss reduction can be calculated, and the branch with the highest loss reduction is preferred. XGBoost can be mathematically expressed (Chen & Guestrin 2016) as follows (refer to Equations (13)–(15)):

Here, is the predicted value based on an initial guess; is the initial guess; is the similarity score for the *j*th leaf node; is the learning rate; is the observed values; is the predicted values; is the loss function; , are the ordinates of a particular leaf node; and is the regularization parameter.

The working steps of XGBoost are as follows:

- (1)
Identification of tree depth, number of leafs, verbosity, learning rate, regularization parameter, termination criteria, and other related parameters.

- (2)
Initial guess of predictor values , selection of the type of loss function.

- (3)
Selection of similarity score as splitting criteria for the decision trees. The corresponding leaf nodes having maximum similarity scores are chosen.

- (4)
Computation of predictor values .

- (5)
Continuity of epochs till the objective of minimizing the loss function is achieved.

Here, the parameters considered are subsample (P_{X1}), the number of estimators (P_{X2}), minimum child weight (P_{X3}), maximum depth (P_{X4}), learning rate (P_{X5}), colsample bytree (P_{X6}), lambda (P_{X7}), alpha (P_{X8}), verbosity (P_{X9}), and maximum bin (P_{X10}), respectively. Of the parameters mentioned, P_{X1} to P_{X6} significantly influence the loss function; thus, they are considered hyperparameters. P_{X1} is used to select subsample data that should be considered for training purposes. P_{X2}, P_{X3}, P_{X4}, and P_{X6} govern the partitioning of decision trees, impacting the similarity score. P_{X5} decides the pace of the learning process from which iterated predictor values are obtained with a minimal loss function. P_{X7} to P_{X10} do not significantly influence the simulated predictor values. Thus, they are assigned default values. P_{X7} and P_{X8} are regularization parameters added as a penalty to the loss function to minimize the overall error and to counter the overfitting scenario. Table 3 presents the parameter ranges with remarks (columns 2–3).

S.No (1) . | Parameters, their range, and optimal value (2) . | Remarks (3) . |
---|---|---|

P_{X1} | Subsample 0.5–1 (0.8) | Subsample is a randomly chosen portion of the trained data before building decision trees which would help avoid overfitting. |

P_{X2} | Number of estimators 100–1,000 (100) | The number of estimators is the actual population of estimators used in forming decision trees for achieving the minimum loss function. However, the number of estimators picked for a specific problem should be determined, resulting in a considerable reduction in the loss function. |

P_{X3} | Minimum child weight 0–10 (5.5) | The weight assigned for deciding the successive partitioning of the decision tree is described as the minimum child weight. The greater the minimal child weight values, the more conservative the partitioning in the building tree. Lower values are recommended to produce more decision trees, which improve the convergence rate; nevertheless, lower values are computationally more expensive. |

P_{X4} | Maximum depth 3–15 (8) | The number of leaves from the root to the farthest leaf defines the maximum depth. The greater the maximum depth of a tree, the more complex the model, which overfits and aggressively consumes memory. Lower than optimal values, on the other hand, would result in the development of an insufficient number of decision trees, which would have a negative impact on the convergence criteria. |

P_{X5} | Learning rate 0–1 (0.55) | The learning rate is the step size at which the weights are updated to get the minimal loss function. A lower learning rate would increase the likelihood of pinpointing precise outcomes. The higher the learning values, the more conservative the boosting procedure develops. |

P_{X6} | Colsample bytree 0.5–1 (0.65) | Colsample bytree is the column subsample ratio used in the development of each tree. The value specifies the fraction of columns to be subsampled. Higher values of colsample make the model more conservative. |

P_{X7} | Lambda Default (1) | Lambda value indicates the L_{2} regularization term on leaf weights which are used for adjusting the loss function and countering the overfitting by summing up the square of feature coefficients. |

P_{X8} | Alpha (α) Default (2.75) | Alpha values indicate the L_{1} regularization term on leaf weights, which are used to adjust the loss function and counter the overfitting by summing up the feature coefficients. |

P_{X9} | Verbosity Default (1) | Verbosity is the adjustment made in the XGBoost to facilitate visualization of the training process. |

P_{X10} | Maximum bin Default (256) | The maximum bin is chosen to bucket the feature values. |

S.No (1) . | Parameters, their range, and optimal value (2) . | Remarks (3) . |
---|---|---|

P_{X1} | Subsample 0.5–1 (0.8) | Subsample is a randomly chosen portion of the trained data before building decision trees which would help avoid overfitting. |

P_{X2} | Number of estimators 100–1,000 (100) | The number of estimators is the actual population of estimators used in forming decision trees for achieving the minimum loss function. However, the number of estimators picked for a specific problem should be determined, resulting in a considerable reduction in the loss function. |

P_{X3} | Minimum child weight 0–10 (5.5) | The weight assigned for deciding the successive partitioning of the decision tree is described as the minimum child weight. The greater the minimal child weight values, the more conservative the partitioning in the building tree. Lower values are recommended to produce more decision trees, which improve the convergence rate; nevertheless, lower values are computationally more expensive. |

P_{X4} | Maximum depth 3–15 (8) | The number of leaves from the root to the farthest leaf defines the maximum depth. The greater the maximum depth of a tree, the more complex the model, which overfits and aggressively consumes memory. Lower than optimal values, on the other hand, would result in the development of an insufficient number of decision trees, which would have a negative impact on the convergence criteria. |

P_{X5} | Learning rate 0–1 (0.55) | The learning rate is the step size at which the weights are updated to get the minimal loss function. A lower learning rate would increase the likelihood of pinpointing precise outcomes. The higher the learning values, the more conservative the boosting procedure develops. |

P_{X6} | Colsample bytree 0.5–1 (0.65) | Colsample bytree is the column subsample ratio used in the development of each tree. The value specifies the fraction of columns to be subsampled. Higher values of colsample make the model more conservative. |

P_{X7} | Lambda Default (1) | Lambda value indicates the L_{2} regularization term on leaf weights which are used for adjusting the loss function and countering the overfitting by summing up the square of feature coefficients. |

P_{X8} | Alpha (α) Default (2.75) | Alpha values indicate the L_{1} regularization term on leaf weights, which are used to adjust the loss function and counter the overfitting by summing up the feature coefficients. |

P_{X9} | Verbosity Default (1) | Verbosity is the adjustment made in the XGBoost to facilitate visualization of the training process. |

P_{X10} | Maximum bin Default (256) | The maximum bin is chosen to bucket the feature values. |

Optimal values of hyperparameters influencing the XGBoost are presented in Table 3 (column 2). The optimal number of estimators is the same as the lower threshold, revealing that the model's minimal value of estimators is adequate for this problem. The model is inferred to be conservative in developing decision trees for attaining the minimal loss function. Optimal values for minimum child weight and colsampling are found slightly inclined towards the upper threshold. The optimal tree depth value leans slightly towards the lower threshold value, suggesting that the model is simple. The optimal learning rate value is slightly inclined towards the upper threshold, inferring fairly low computational time with a higher convergence rate.

## RESULTS AND DISCUSSION

### Comparison of the observed and simulated discharge

The simulated discharges obtained using hyperparameter tuning of the three ML models are compared with the observed discharge. It is further evaluated using *R*^{2}, RMSE, NSE, and PBIAS metrics, and the related discussion follows (refer to Table 4). Relevant observations (in addition to remarks) are presented in Table 4. WNN-Gaussian wavelet has displayed similar model performance in the training period compared with the WNN-Mexican hat. The model has fallen short in the validation period based on *R*^{2} obtained. WNN-Mexican hat displayed better performance than the WNN-Gaussian wavelet in the training and validation periods with relatively lower RMSE values. Despite a higher learning rate, WNN-Morlet is found to dominate all other WNN models in the training period. However, it failed to show consistency in the validation period compared with other WNN models.

S.No . | Model . | Training . | Validation . | Remarks . | ||||||
---|---|---|---|---|---|---|---|---|---|---|

R^{2}
. | RMSE . | NSE . | PBIAS (%) . | R^{2}
. | RMSE . | NSE . | PBIAS (%) . | |||

1 | Bi-LSTM | 0.60 | 1.73 | 0.59 | 12.4 | 0.58 | 1.96 | 0.57 | 13.0 | This model has shown a satisfactory performance with an R^{2} of 0.6 and 0.58 in the training and validation periods. This model structure falls short compared with XGBoost, although it proved better than WNN. |

2 | WNN (Gaussian) | 0.23 | 2.12 | 0.23 | 0.3 | 0.22 | 2.20 | 0.22 | 1.0 | The model has shown unsatisfactory performance both in the training and validation periods. |

3 | WNN (Mexican hat) | 0.22 | 2.06 | 0.23 | 0.3 | 0.22 | 2.13 | 0.22 | 1.0 | Relatively low metrics were displayed by Mexican hat in the present study compared with other WNN models. Overall, the model has been unsatisfactory during training and validation periods. |

4 | WNN (Shannon) | 0.26 | 2.30 | 0.26 | 1.14 | 0.25 | 2.31 | 0.25 | 1.46 | This wavelet has shown a better consistency in the validation period than the other employed mother wavelets, although the overall model performance is unsatisfactory in both the training and validation periods. |

5 | WNN (Morlet) | 0.29 | 2.06 | 0.29 | 0.9 | 0.22 | 2.39 | 0.21 | 1.8 | Morlet wavelet is slightly better than other mother wavelets in training but lacks consistency in validation. Overall model performance can be rated as unsatisfactory for the present study. |

6 | XGBoost | 0.88 | 1.49 | 0.86 | 29.3 | 0.86 | 1.63 | 0.85 | 28.5 | The model is exceptional, with an R^{2} of 0.875 in training. It has also displayed a great consistency in validation with an R^{2} of 0.863. Thus, this model helps derive reliable streamflow forecasts compared with other models employed in this study. |

S.No . | Model . | Training . | Validation . | Remarks . | ||||||
---|---|---|---|---|---|---|---|---|---|---|

R^{2}
. | RMSE . | NSE . | PBIAS (%) . | R^{2}
. | RMSE . | NSE . | PBIAS (%) . | |||

1 | Bi-LSTM | 0.60 | 1.73 | 0.59 | 12.4 | 0.58 | 1.96 | 0.57 | 13.0 | This model has shown a satisfactory performance with an R^{2} of 0.6 and 0.58 in the training and validation periods. This model structure falls short compared with XGBoost, although it proved better than WNN. |

2 | WNN (Gaussian) | 0.23 | 2.12 | 0.23 | 0.3 | 0.22 | 2.20 | 0.22 | 1.0 | The model has shown unsatisfactory performance both in the training and validation periods. |

3 | WNN (Mexican hat) | 0.22 | 2.06 | 0.23 | 0.3 | 0.22 | 2.13 | 0.22 | 1.0 | Relatively low metrics were displayed by Mexican hat in the present study compared with other WNN models. Overall, the model has been unsatisfactory during training and validation periods. |

4 | WNN (Shannon) | 0.26 | 2.30 | 0.26 | 1.14 | 0.25 | 2.31 | 0.25 | 1.46 | This wavelet has shown a better consistency in the validation period than the other employed mother wavelets, although the overall model performance is unsatisfactory in both the training and validation periods. |

5 | WNN (Morlet) | 0.29 | 2.06 | 0.29 | 0.9 | 0.22 | 2.39 | 0.21 | 1.8 | Morlet wavelet is slightly better than other mother wavelets in training but lacks consistency in validation. Overall model performance can be rated as unsatisfactory for the present study. |

6 | XGBoost | 0.88 | 1.49 | 0.86 | 29.3 | 0.86 | 1.63 | 0.85 | 28.5 | The model is exceptional, with an R^{2} of 0.875 in training. It has also displayed a great consistency in validation with an R^{2} of 0.863. Thus, this model helps derive reliable streamflow forecasts compared with other models employed in this study. |

The WNN-Gaussian was inferior to other WNN models, even at low learning rates. WNN-Shannon wavelet showed a better consistency than other WNN models in the training and validation periods based on the *R*^{2} value. However, the RMSE values were relatively higher than the other WNN models. Similar inferences are drawn for NSE and PBIAS.

The WNN could not display better simulation accuracy in any of the four variants due to a lack of spatial information on the average rainfall. Bi-LSTM has shown a satisfactory performance dominating all WNN models. However, it fell short when compared with XGBoost. Possible reasons for this under-performance of Bi-LSTM could be the gated units' inability to extract information from the provided data. XGBoost has shown exceptional performance compared with Bi-LSTM and WNN, as observed from the identified metrics in the training and validation periods, as shown in Table 4.

The simulated hydrographs obtained by XGBoost during training (1982–2013) and validation (2014–2020) matched well with the observed discharges having minimal deviations. No phase errors were observed in both training and validation periods. Amplitude errors are observed due to underestimations of the model in the years 1983, 1988, 2002, 2003, 2006, and 2008 in the training period and 2018 and 2020 in the validation. This may be due to spatial variability of rainfall and consideration of the continuous simulation approach, which has an adverse impact on peak streamflow (Potdar *et al.* 2021). The trained and tested parameters obtained from XGBoost using historical data hold good even for future time horizons in projecting streamflow.

### Projecting streamflow using XGBoost

XGBoost is further considered for forecasting streamflow in the study area for 2021–2099 using the climate data obtained (Mishra *et al.* 2020). The future streamflow time period is split into eight decadal horizons, namely, D_{1} (2021–2030), D_{2} (2031–2040), D_{3} (2041–2050), D_{4} (2051–2060), D_{5} (2061–2070), D_{6} (2071–2080), D_{7} (2081–2090), and D_{8} (2091–2099) for in-depth analysis. They are also compared with the historical time horizons; namely, H_{1}, H_{2}, and H_{3} (refer to Supplementary Material, Table S1), to assess how streamflow varies/deviates in eight decadal future time horizons.

The daily average streamflow, highest streamflow, occurrence of significant streamflow events, and percentage increase in daily average streamflow for eight decadal time horizons SSP-wise are discussed in this section (refer to Table 5).

Horizons (1) . | Total number of daily records (2) . | Streamflow records lesser than 10 (mm/day) (3) . | Streamflow records between 10–20 (mm/day) (4) . | Streamflow records between 20–30 (mm/day) (5) . | Streamflow records between 30–50 (mm/day) (6) . | Streamflow records between 50–65 (mm/day) (7) . | Streamflow records greater than 65 (mm/day) (8) . | Highest streamflow (mm/day) (9) . | Daily average streamflow (mm/day) (10) . | |
---|---|---|---|---|---|---|---|---|---|---|

D_{1} | SSP126 | 3,652 | 3,512 | 117 | 15 | 8 | 0 | 0 | 41.34 | 2.10 |

SSP245 | 3,652 | 3,502 | 110 | 21 | 19 | 0 | 0 | 45.95 | 2.35 | |

SSP370 | 3,652 | 3,510 | 111 | 20 | 11 | 0 | 0 | 40.96 | 2.47 | |

SSP585 | 3,652 | 3,520 | 102 | 18 | 12 | 0 | 0 | 38.14 | 2.53 | |

D_{2} | SSP126 | 3,653 | 3,542 | 84 | 19 | 8 | 0 | 0 | 44.33 | 2.25 |

SSP245 | 3,653 | 3,529 | 78 | 34 | 12 | 0 | 0 | 38.39 | 2.32 | |

SSP370 | 3,653 | 3,545 | 69 | 28 | 11 | 0 | 0 | 36.19 | 2.46 | |

SSP585 | 3,653 | 3,561 | 59 | 25 | 8 | 0 | 0 | 35.95 | 2.51 | |

D_{3} | SSP126 | 3,652 | 3,497 | 121 | 22 | 12 | 0 | 0 | 38.56 | 2.48 |

SSP245 | 3,652 | 3,490 | 122 | 22 | 17 | 1 | 0 | 51.54 | 2.78 | |

SSP370 | 3,652 | 3,497 | 123 | 16 | 16 | 0 | 0 | 37.81 | 2.79 | |

SSP585 | 3,652 | 3,516 | 106 | 13 | 17 | 0 | 0 | 41.45 | 2.89 | |

D_{4} | SSP126 | 3,653 | 3,412 | 178 | 33 | 30 | 0 | 0 | 49.60 | 3.41 |

SSP245 | 3,653 | 3,369 | 200 | 41 | 34 | 9 | 0 | 59.66 | 3.48 | |

SSP370 | 3,653 | 3,392 | 181 | 38 | 38 | 4 | 0 | 51.41 | 3.83 | |

SSP585 | 3,653 | 3,413 | 166 | 32 | 40 | 2 | 0 | 50.58 | 3.87 | |

D_{5} | SSP126 | 3,652 | 3,584 | 60 | 8 | 0 | 0 | 0 | 28.02 | 1.98 |

SSP245 | 3,652 | 3,570 | 67 | 14 | 1 | 0 | 0 | 38.25 | 2.12 | |

SSP370 | 3,652 | 3,610 | 41 | 1 | 0 | 0 | 0 | 21.73 | 1.99 | |

SSP585 | 3,652 | 3,598 | 48 | 6 | 0 | 0 | 0 | 24.78 | 2.27 | |

D_{6} | SSP126 | 3,653 | 3,288 | 257 | 65 | 31 | 11 | 1 | 65.72 | 4.10 |

SSP245 | 3,653 | 3,422 | 168 | 41 | 21 | 1 | 0 | 53.58 | 3.04 | |

SSP370 | 3,653 | 3,350 | 208 | 54 | 28 | 13 | 0 | 63.87 | 3.86 | |

SSP585 | 3,653 | 3,300 | 237 | 67 | 31 | 11 | 7 | 72.82 | 4.41 | |

D_{7} | SSP126 | 3,652 | 3,459 | 130 | 38 | 23 | 2 | 0 | 54.86 | 3.26 |

SSP245 | 3,652 | 3,438 | 141 | 20 | 41 | 11 | 1 | 66.59 | 3.31 | |

SSP370 | 3,652 | 3,476 | 123 | 18 | 26 | 8 | 1 | 69.32 | 2.98 | |

SSP585 | 3,652 | 3,440 | 147 | 24 | 29 | 7 | 5 | 79.04 | 3.40 | |

D_{8} | SSP126 | 3,287 | 3,163 | 102 | 20 | 2 | 0 | 0 | 35.72 | 2.60 |

SSP245 | 3,287 | 3,146 | 114 | 16 | 11 | 0 | 0 | 48.77 | 2.74 | |

SSP370 | 3,287 | 3,206 | 71 | 10 | 0 | 0 | 0 | 27.70 | 2.55 | |

SSP585 | 3,287 | 3,163 | 105 | 18 | 1 | 0 | 0 | 31.59 | 2.90 |

Horizons (1) . | Total number of daily records (2) . | Streamflow records lesser than 10 (mm/day) (3) . | Streamflow records between 10–20 (mm/day) (4) . | Streamflow records between 20–30 (mm/day) (5) . | Streamflow records between 30–50 (mm/day) (6) . | Streamflow records between 50–65 (mm/day) (7) . | Streamflow records greater than 65 (mm/day) (8) . | Highest streamflow (mm/day) (9) . | Daily average streamflow (mm/day) (10) . | |
---|---|---|---|---|---|---|---|---|---|---|

D_{1} | SSP126 | 3,652 | 3,512 | 117 | 15 | 8 | 0 | 0 | 41.34 | 2.10 |

SSP245 | 3,652 | 3,502 | 110 | 21 | 19 | 0 | 0 | 45.95 | 2.35 | |

SSP370 | 3,652 | 3,510 | 111 | 20 | 11 | 0 | 0 | 40.96 | 2.47 | |

SSP585 | 3,652 | 3,520 | 102 | 18 | 12 | 0 | 0 | 38.14 | 2.53 | |

D_{2} | SSP126 | 3,653 | 3,542 | 84 | 19 | 8 | 0 | 0 | 44.33 | 2.25 |

SSP245 | 3,653 | 3,529 | 78 | 34 | 12 | 0 | 0 | 38.39 | 2.32 | |

SSP370 | 3,653 | 3,545 | 69 | 28 | 11 | 0 | 0 | 36.19 | 2.46 | |

SSP585 | 3,653 | 3,561 | 59 | 25 | 8 | 0 | 0 | 35.95 | 2.51 | |

D_{3} | SSP126 | 3,652 | 3,497 | 121 | 22 | 12 | 0 | 0 | 38.56 | 2.48 |

SSP245 | 3,652 | 3,490 | 122 | 22 | 17 | 1 | 0 | 51.54 | 2.78 | |

SSP370 | 3,652 | 3,497 | 123 | 16 | 16 | 0 | 0 | 37.81 | 2.79 | |

SSP585 | 3,652 | 3,516 | 106 | 13 | 17 | 0 | 0 | 41.45 | 2.89 | |

D_{4} | SSP126 | 3,653 | 3,412 | 178 | 33 | 30 | 0 | 0 | 49.60 | 3.41 |

SSP245 | 3,653 | 3,369 | 200 | 41 | 34 | 9 | 0 | 59.66 | 3.48 | |

SSP370 | 3,653 | 3,392 | 181 | 38 | 38 | 4 | 0 | 51.41 | 3.83 | |

SSP585 | 3,653 | 3,413 | 166 | 32 | 40 | 2 | 0 | 50.58 | 3.87 | |

D_{5} | SSP126 | 3,652 | 3,584 | 60 | 8 | 0 | 0 | 0 | 28.02 | 1.98 |

SSP245 | 3,652 | 3,570 | 67 | 14 | 1 | 0 | 0 | 38.25 | 2.12 | |

SSP370 | 3,652 | 3,610 | 41 | 1 | 0 | 0 | 0 | 21.73 | 1.99 | |

SSP585 | 3,652 | 3,598 | 48 | 6 | 0 | 0 | 0 | 24.78 | 2.27 | |

D_{6} | SSP126 | 3,653 | 3,288 | 257 | 65 | 31 | 11 | 1 | 65.72 | 4.10 |

SSP245 | 3,653 | 3,422 | 168 | 41 | 21 | 1 | 0 | 53.58 | 3.04 | |

SSP370 | 3,653 | 3,350 | 208 | 54 | 28 | 13 | 0 | 63.87 | 3.86 | |

SSP585 | 3,653 | 3,300 | 237 | 67 | 31 | 11 | 7 | 72.82 | 4.41 | |

D_{7} | SSP126 | 3,652 | 3,459 | 130 | 38 | 23 | 2 | 0 | 54.86 | 3.26 |

SSP245 | 3,652 | 3,438 | 141 | 20 | 41 | 11 | 1 | 66.59 | 3.31 | |

SSP370 | 3,652 | 3,476 | 123 | 18 | 26 | 8 | 1 | 69.32 | 2.98 | |

SSP585 | 3,652 | 3,440 | 147 | 24 | 29 | 7 | 5 | 79.04 | 3.40 | |

D_{8} | SSP126 | 3,287 | 3,163 | 102 | 20 | 2 | 0 | 0 | 35.72 | 2.60 |

SSP245 | 3,287 | 3,146 | 114 | 16 | 11 | 0 | 0 | 48.77 | 2.74 | |

SSP370 | 3,287 | 3,206 | 71 | 10 | 0 | 0 | 0 | 27.70 | 2.55 | |

SSP585 | 3,287 | 3,163 | 105 | 18 | 1 | 0 | 0 | 31.59 | 2.90 |

Bold numbers represent the highest value.

### Daily streamflow analysis

Percentages of the streamflow events observed in less than 10 mm/day (ratio of the total number of records less than 10 mm/day to the total number of records for SSP585 within each decade) are 96.39, 97.48, 96.28, 93.43, 98.52, 90.34, 94.19, and 96.23% for D_{1} to D_{8}, respectively (column 3). A similar trend is observed for other SSPs. D_{7} has 41 events in the 30–50 mm/day in SSP245. These are 67 for D_{6} in 20–30 mm/day in SSP585 category; 257 in D_{6} for 10–20 mm/day in SSP126 category (columns 6 and 4), respectively. However, no streamflow events were identified in D_{1}, D_{2}, D_{3}, and D_{5} (column 7). The number of streamflow events greater than 65 mm/day is significantly found in D_{6} and D_{7} in SSP585. It is also observed that no significant streamflow events are identified in the remaining future time horizons (column 8).

The highest streamflow observed for D_{1}, D_{3}, D_{4}, D_{5}, and D_{8} were 45.95, 51.54, 59.66, 38.25, and 48.77 mm/day in the SSP245 scenario. These are D_{6} and D_{7}, with 72.82 and 79.04 mm/day in the case of SSP585. It is D_{2} with 44.33 mm/day in SSP126 (column 9). An increasing trend is observed in daily average streamflows (column 10) from D_{1} to D_{4} for SSP126 to SSP585. On the contrary, an increase is observed from SSP126 to SSP245 for D_{5}, D_{7}, and D_{8}. After that, a decrease is observed from SSP245 to SSP370 and an increase from SSP370 to SSP585.

### Monthly average streamflow analysis

_{1}to D

_{8}). It is observed that streamflow contributed majorly in July–September. From July–September, monthly streamflow contributions for D1 were 166.53, 176.25, 172.23, and 166.36 mm for SSP126 to SSP585, respectively. They were significantly lower than H

_{1}(236.61 mm), slightly lower than H

_{2}(205.56 mm), and H

_{3}(196.65 mm).

However, the remaining months (other than July–September) have higher streamflow of 88.53, 110.52, 127.94, and 141.86 mm compared with historical scenarios of 65.35, 66.43, and 48.8 mm (Figure 2(a)). The status of D_{2}, D_{3}, D_{5}, and D_{8} are similar to that of D_{1} (Figure 2(b), 2(c), 2(e) and 2(h)). Monthly streamflow contributions of D_{4} from July to September were found to be 236.99, 252.98, 245.83, and 239.73 mm and higher than H_{1}, H_{2}, and H_{3}. Similarly, streamflow contributions for the remaining months are dominant at 177.68, 171.08, 220.42, and 231.27 mm compared with historical scenarios (Figure 2(d)). The situation of D_{6} is almost similar to that of D_{4}. The only difference is streamflow contribution in July–September in SSP245 is 226.8 mm and is less than H_{1} (Figure 2(f)). The monthly streamflow contributions of D_{7} from July to September were 185.99, 239, 181.1, and 206.52 mm, which are lower than H_{1}, H_{2}, and H_{3} for SSP126 and SSP370. These are (a) higher than all three historical time horizons in the case of SSP245, (b) lower than H_{1}, and (c) higher than the other two historical time horizons in the case of SSP585. However, the remaining months have higher streamflow of 210.3, 164.15, 181.5, and 206.96 mm compared with historical scenarios (Figure 2(g)).

The highest percentage increase in future streamflow average when compared with H_{1}, H_{2}, and H_{3}, respectively, are: D_{6} for SSP126 with 114.66, 138.37, and 164.52%, respectively; D_{4} for SSP 245 with 82.2, 102.33, and 124.52%, respectively; D_{6} for SSP370 with 102.01, 124.24, and 149.03%, respectively; D_{6} for SSP585 with 130.9, 156.4, and 184.52%, respectively.

The lowest percentage increase in future streamflow average when compared with H_{1}, H_{2}, and H_{3}, respectively, are: D_{4} for SSP126 with 3.66, 15.12, and 27.74%; D_{5} for SSP 245 with 10.99, 23.26, and 36.77%; D_{5} for SSP 370 with 4.19, 15.70, and 28.39%, respectively; D5 for SSP 585 with 18.85, 31.98, and 46.45%, respectively.

In each future time horizon, the highest deviation is observed from H_{3}, followed by H_{2} and H_{1}. It indicates that future streamflow is significantly higher than the historical time horizon, likely to serve the requirements of LGB. Interestingly, deviations are in increasing order from SSP126 to SSP585 for D_{1}. For example, deviation of H_{1} from D_{1} in case of SSP126, SSP245, SSP370, and SSP585 is 9.95, 23, 29.3, and 32.5%. Similar are the inferences for D_{2} to D_{4}.

## SUMMARY AND CONCLUSION

Three ML models, Bi-LSTM, WNN, and XGBoost, were applied to simulate streamflows of the LGB. Four mother wavelets, namely, Gaussian, Mexican hat, Morlet, and Shannon, were employed in WNN. Random Search-based hyperparameter tuning was employed to fine-tune the parameters in these methods. Thirty-nine years of daily rainfall, evapotranspiration, and streamflow data were used for training and validation. XGboost was found superior and was applied for projecting streamflow in the basin. Eight future decadal time horizons were proposed to study the climate change impacts. The following are the notable conclusions:

XGBoost performed exceptionally well in the training period with an

*R*^{2}, RMSE, NSE, and PBIAS of 0.88, 1.48, 0.86, and 29.3% during training, and corresponding values during validation are 0.86, 1.63, 0.85, and 28.5%, respectively.Bi-LSTM has shown a satisfactory performance with an

*R*^{2}of 0.6 in training and 0.58 in validation periods.Morlet mother wavelet has shown better metrics in training, even with a high learning rate.

The highest streamflows were identified mainly in the SSP245 scenario in future decadal time horizons.

SSP245 is observed to have a higher percentage of July–September monthly streamflow contributions than the other SSPs in all future decadal time horizons. Streamflow contributions in the remaining months were more prominent than July–September in eight future decadal time horizons compared with the three historical time horizons.

## ACKNOWLEDGEMENTS

This research work is supported by the Council of Scientific and Industrial Research, New Delhi, through Project no. 22(0782)/19/EMR-II dated 24.7.19. Special acknowledgements and gratitude to Prof Vimal Mishra, IIT Gandhinagar, for making CMIP6 data available. The authors are thankful to Prof D. Nagesh Kumar, Indian Institute of Science, Bangalore and Dr RN Sankhua, Mr Shankar Rao, officials of Godavari River Basin, for providing time for brainstorming discussions; IMD and CWC for providing necessary data.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.