Streamflow forecasting is crucial for planning, designing, and managing water resources. Accurate streamflow forecasting is essential in developing water resource systems that are both technically and economically efficient. This study tested several machine learning techniques to estimate monthly streamflow data in the Hunza River Basin, Pakistan, using streamflow, precipitation, and air temperature data between 1985 and 2013. The techniques tested included adaptive boosting (AB), gradient boosting (GB), random forest (RF), and K-nearest neighbors (KNN). The models were developed using river discharge as the target variable, while air temperature and precipitation as the input variables. The model's performance was assessed via four statistical performance indicators namely root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE), and coefficient of determination (R2). The results obtained for RMSE, MSE, MAE, and R2 using AB, GB, RF, and KNN techniques are (16.8, 281, 6.53, and 0.998), (95.1, 9,047, 61.5, and 0.921), (126.8, 16,078, 74.6, and 0.859), and (219.9, 48,356, 146.3, and 0.775), respectively. The results indicate that AB outperforms GB, RF, and KNN in predicting monthly streamflow for the Hunza River Basin. Machine learning, particularly AB, offers a reliable approach for streamflow forecasting, aiding hazard and water management in the area.

  • This study used machine learning techniques to estimate monthly streamflow in Hunza River Basin, Pakistan.

  • The dataset included the mean monthly streamflow, precipitation, and temperature between 1985 and 2013.

  • AB, GB, RF, and KNN models were used for forecasting.

  • RMSE, MSE, MAE, and R2 were used as performance indicators.

  • The AB model outperformed the other models in terms of prediction accuracy.

The hydrologic system is complex, nonlinear, and dynamic (Sivakumar & Singh 2012). Nonlinearity is common in hydrological time-series due to the coupling of natural elements such as climate, landform, soil, vegetation, and anthropogenic activities (Saco et al. 2018). River flow forecasting is necessary for managing water resources, especially controlling reservoir outflows during low and high river flow (Ficchì et al. 2016). Proper management of reservoir outflow needs accurate streamflow prediction (Ghumman et al. 2018), as do hydroelectric project design, real-time operation of water resource projects, efficient management strategies, and proactive mitigation efforts to reduce the environmental impact of climatic events (Anusree & Varghese 2016). A range of models has been used to simulate the complicated nonlinearity of streamflow processes, including autoregressive (AR) and autoregressive moving average (ARMA) models, which use classical statistics to assess historical data and create streamflow projections (Mehdizadeh & Kozekalani Sales 2018). However, these models may not always work well because they cannot represent the nonlinear dynamics involved in transforming rainfall to runoff (Weerts & El Serafy 2006). Machine learning and artificial neural networks (ANNs) have become popular since about 2000 for their ability to model complex nonlinear processes, particularly in the area of hydrological time-series (Yu et al. 2014). ANNs have been found superior to AR models in simulating river flow (Kisi & Kerem Cigizoglu 2007). ANN-based models have also been found superior to other models in simulating streamflow time-series (Gholami & Sahour 2022). ANNs have been used to simulate monthly streamflow in the River Nile, Egypt, and found superior to linear ARMA models (Elganiny & Eldwer 2018).

Although ANNs have been effective in modelling streamflow, they can still face issues such as overfitting, slow convergence, and local minima (Afan et al. 2016). To address these challenges, researchers have explored other subcategories of artificial intelligence (AI) in the hydrological and environmental fields. Machine learning-based approaches such as AB, GB, RF, and KNN have recently been proposed to tackle complex hydrologic processes (Hadi & Tombul 2018). For instance, Liao et al. (2020) developed a hybrid inflow prediction framework by combining gradient-boosting regression trees (GBRT) and the maximum information coefficient (MIC). GBRT models, which consist of a group of decision trees, can capture nonlinear correlations between input and output with extended lead periods (Goldstein et al. 2018). Ren et al. (2014) created a combined model using empirical mode decomposition (EMD) and KNN for forecasting yearly average rainfall. Snieder et al. (2021) used a hybrid approach, combining ANNs with additional algorithms such as Synthetic Minority Over-sampling Technique for Regression (SMOTER-AB) to achieve accurate results for two Canadian watersheds, the Bow River in Alberta and the Don River in Ontario. The Nash–Sutcliffe coefficients of efficiency for the Bow and Don River base models using SMOTER-AB were 0.95 and 0.80, respectively.

RF and extreme learning machine (ELM) are two popular algorithms used for time-series prediction in the hydrologic and environmental fields (Feng et al. 2022). It requires minimal data pre-processing, making it convenient for practical applications (Kannangara et al. 2018). For instance, Tongal & Booij (2018) used RF to develop a simulation framework for predicting streamflow in four American rivers, based on temperature, precipitation, and potential evapotranspiration. ELM has also been applied in hydrologic and environmental studies. It is a feedforward neural network that can perform well in large-scale and complex data analysis (Ni et al. 2006). ELM can handle nonlinearity, noise, and high dimensionality, and does not require parameter tuning, which can be time-consuming (Song et al. 2017).

The aim of this study was to evaluate the effectiveness of various machine learning techniques, including AB, GB, RF, and KNN algorithms, in modelling monthly streamflow data for the Hunza River, Pakistan.

Study area

The Hunza watershed is in Pakistan's northern Karakoram Mountain range, in the Upper Indus Basin, and covers 13,567.23 km2 (Garee et al. 2017). It is fed by 14 small and medium tributaries, including the Danyore, which maintains the main river flow between them. The highest point in the area is Distaghil Sar, Pakistan's 7th-highest peak and the 19th-highest mountain in the world, standing 7,885 metres above sea level. The lowest is Danyore Gauging Station, 1,370 metres above sea level. About 80% of the total inflow comes from the snowy and heavily glaciated region 3,500 metres above sea level. The study area is a primary source of the Indus River, which contributes more than 12% of inflow to the Tarbela Dam (Ali & De Boer 2007).

Data collection

Several datasets were used for simulating monthly streamflow data in the river.

Digital elevation model

The digital elevation model (DEM) data for the study area were obtained from the National Aeronautics and Space Administration (NASA) website (https://earthdata.nasa.gov/). The resolution of the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Digital Elevation Model (GDEM) downloaded was 30 m. The DEM data were used to delineate the Hunza River watershed – see Figure 1.
Figure 1

The study area and the Hunza watershed.

Figure 1

The study area and the Hunza watershed.

Close modal

Hydroclimatic datasets

Meteorological data on mean precipitation and temperature for the period 1985 to 2013 were collected from the Pakistan Meteorological Department (PMD). Monthly streamflow data for the period 1985 to 2013 were obtained from the Water and Power Development Authority (WAPDA).

Methods

Selection of algorithms was based on their performance in various fields, as well as their characteristics (Figure 2). A diverse set of algorithms was chosen.
Figure 2

Flowchart illustrating the methodology.

Figure 2

Flowchart illustrating the methodology.

Close modal

Gradient boosting

Gradient boosting (GB) is a well-known machine learning algorithm used for regression and classification tasks. It builds an ensemble of decision trees sequentially to approximate the underlying function. Pseudo-residuals, which represent the gradients of the loss function, are employed to train each individual tree. The final prediction of the ensemble is obtained by combining the previous approximation with the contribution of the current tree, weighted appropriately. GB incorporates regularization techniques and early stopping to mitigate overfitting. Moreover, it demonstrates fast execution, scalability, and efficiency, making it highly suitable for handling large datasets.

Adaptive boosting

Adaptive boosting (AB) is an ensemble method that combines multiple classifiers iteratively. It selects weak classifiers from a predefined set and assigns them coefficients. The algorithm learns from the training data to create a discriminant function through weighted voting of the weak classifiers. This enables reliable classification by aggregating the predictions of the weak classifiers.

Random forest

Random forest (RF) is a popular machine learning approach developed by Breiman that offers stability and generalization capabilities. It consists of an ensemble of decision trees, where each tree is built by randomly selecting samples and attributes from all predictors. RF's final output is determined through majority voting of the decision trees' outputs. Out-of-bag samples, obtained by removing certain items from the original dataset, are commonly used to evaluate RF's performance. Two key parameters in RF calibration are the number of trees (ntree) and the number of predictors evaluated at each node (mtry). The optimal value of mtry can be determined empirically, while the choice of ntree significantly affects the forecast results and is determined through experimentation.

K-nearest neighbors

K-nearest neighbors (KNN) is a widely used algorithm that requires no training. During testing, predictions are made by comparing an input to the values of its nearest neighbours in the training data. It operates by classifying a selected feature based on its proximity to its nearest neighbour. The distance between objects is typically calculated using the Euclidean distance formula, which measures the square root of the sum of squared differences between each feature of the objects. KNN is applied in various fields for its simplicity and effectiveness in classification tasks.

Evaluation of a machine learning model's performance is essential in its development. Several statistical indicators were used in this study, including root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE), and coefficient of determination (R2). These indicators are commonly used in regression tasks to quantify the difference between the predicted and actual values.

Root mean square error

The RMSE is the square root of the mean square error (MSE), and gives an idea of the model's overall performance from the residuals' standard deviation (Equation (1)):
(1)
where is the predicted value of y, and yj is the mean value of y.

Mean square error

The MSE measures the average squared difference between the predicted and actual values (Equation (2)):
(2)

Mean absolute error

The MAE measures the average absolute difference between the dataset's actual and anticipated values by calculating the mean of the residuals in the dataset (Equation (3)):
(3)

Coefficient of determination (R2)

The R2 is an important indicator for comparing a model's accuracy to actual data. It expresses the percentage of variability in the dependent variable (y) explained by the independent variable(s) or predictors. R2 has a range of 0 to 1, with a value closer to 1 indicating a stronger link between the model and the data. A better fit is denoted by a higher R2, whereas a lower value shows that the data variance was only partially captured. R2 indicates the extent to which the model effectively captures the variation present in the observed data using Equation (4):
(4)
Monthly streamflow data from 1985 to 2013, shown in Figure 3, track changes in the observed streamflow data over different time periods and expose temporal patterns.
Figure 3

Monthly streamflow time-series of the study area.

Figure 3

Monthly streamflow time-series of the study area.

Close modal

Table 1 presents an overview of the statistical outcomes of the various models' prediction accuracy.

Table 1

Model performance comparisons

ModelRMSEMSEMAER2
AB 16.8 281 6.53 0.998 
GB 95.1 9,047 61.5 0.921 
RF 126.8 16,078 74.6 0.859 
KNN 219.9 48,356 146.3 0.775 
ModelRMSEMSEMAER2
AB 16.8 281 6.53 0.998 
GB 95.1 9,047 61.5 0.921 
RF 126.8 16,078 74.6 0.859 
KNN 219.9 48,356 146.3 0.775 

Table 1 demonstrates that AB outperforms the other three in terms of prediction accuracy. Previous studies have also reported the better prediction capability of AB, especially for large-scale watersheds or low runoff periods (Liu et al. 2014). AB has demonstrated superior generalization ability during data input processing, indicating its capability to effectively capture underlying patterns and make accurate predictions on unseen data. This highlights AB's reliability and suitability for real-world applications, showcasing its potential for accurate predictions across various domains (Ying et al. 2013).

KNN had the lowest prediction performance of the four. This is because KNN has no training procedure and does not attempt to maximize any effectiveness metric. On the other hand, GB performed better than RF and KNN in terms of RMSE, MSE, MAE, and R2 (Table 1). The literature also suggests that the Extreme GB approach outperforms both RF and KNN in terms of accuracy (Venkatesan & Mahindrakar 2019).

In comparison to KNN, RF demonstrates superior performance, although it has lower performance indicators when compared to both AB and GB.

The monthly observed streamflow data from 1985 to 2013 were compared with that predicted by the four models. The comparison is shown in Figures 4 and 5, which show how close the predicted values are to those observed.
Figure 4

Performance of models (AB, GB, RF, and KNN) compared to the observed data.

Figure 4

Performance of models (AB, GB, RF, and KNN) compared to the observed data.

Close modal
Figure 5

Individual diagrams showing actual flows vs those predicted by the four models.

Figure 5

Individual diagrams showing actual flows vs those predicted by the four models.

Close modal

It is clear from Figures 4 and 5 that AB predicts streamflow in the Hunza River Basin much better than the other three models, as there is a greater overlap between the observed and predicted flows. Equally, compared to other models, KNN exhibits less overlap.

The box plot in Figure 6 is a comparison between the distributions of the observed and forecast streamflow data from the models, which highlights data outliers (Kauffmann & Huber 2010). The distribution of the observed data is similar to that of AB's forecast data, but dissimilar, in terms of central tendency and outliers, to that of the KNN model.
Figure 6

Box plots for actual and forecast values by the models.

Figure 6

Box plots for actual and forecast values by the models.

Close modal

The median value represents the typical streamflow condition, excluding the effects of extreme events such as floods and droughts. The median values of the observed data, and the forecasts from AB, GB, RF, and KNN are 91.23, 102.87, 121.69, 132.11, and 202.32, respectively. GB and RF have almost the same median values, while KNN has the highest and AB has the lowest. The distributions of the observed data and that predicted by the AB model are also similar in both general and extreme conditions (Figure 6).

Sieve diagram results hold significant implications for this study. They offer compelling evidence for the strong prediction performance of AB, reinforcing its superiority over the other models. The sieve diagrams shown in Figure 7 clearly illustrate that AB exhibits a stronger relationship with the actual flow, as indicated by the darker tones, which are more prominent than in the other models, confirming the strong performance.
Figure 7

Sieve diagrams for observed vs prediction AI models.

Figure 7

Sieve diagrams for observed vs prediction AI models.

Close modal
The scatter plot – Figure 8 – is a graphical representation of the relationship between the observed and predicted streamflow values for all four models. Again, the predictions from AB are clearly much closer to the observed values than those from the other models. The data points for AB are more tightly clustered along the diagonal line, which represents perfect prediction, while those for the other models are more spread out and further from the line. This also supports the finding that AB is the best performing model for predicting monthly streamflow in the Hunza River Basin.
Figure 8

Scatter plots for actual flow vs predictions.

Figure 8

Scatter plots for actual flow vs predictions.

Close modal

Four machine learning models were used in this study – AB, GB, RF, and KNN – to predict monthly streamflow for the Hunza River Basin, Pakistan, using precipitation and temperature data as input and discharge as the output. The performance indicators used to evaluate the models were RMSE, MSE, MAE, and R2. AB outperformed the other three models, with the highest values for all of RMSE, MSE, MAE, and R2 (16.8, 281, 6.53, and 0.998, respectively). In contrast, the KNN model had the lowest performance among the models due to its lack of a training procedure. The findings suggest that the AB model can be used to forecast monthly streamflow reliably for the Hunza River and potentially for other watersheds.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Afan
H. A.
,
El-shafie
A.
,
Mohtar
W. H. M. W.
&
Yaseen
Z. M.
2016
Past, present and prospect of an artificial intelligence (AI) based model for sediment transport prediction
.
Journal of Hydrology
541
,
902
913
.
Feng
B.-f.
,
Xu
Y.-s.
,
Zhang
T.
&
Zhang
X.
2022
Hydrological time series prediction by extreme learning machine and sparrow search algorithm
.
Water Supply
22
(
3
),
3143
3157
.
Ficchì
A.
,
Raso
L.
,
Dorchies
D.
,
Pianosi
F.
,
Malaterre
P.-O.
,
Van Overloop
P.-J.
&
Jay-Allemand
M.
2016
Optimal operation of the multireservoir system in the seine river basin using deterministic and ensemble forecasts
.
Journal of Water Resources Planning and Management
142
(
1
),
05015005
.
Gholami
V.
&
Sahour
H.
2022
Prediction of groundwater drawdown using artificial neural networks
.
Environmental Science and Pollution Research
29
(
22
),
33544
33557
.
Ghumman
A. R.
,
Ahmad
S.
&
Hashmi
H. N.
2018
Performance assessment of artificial neural networks and support vector regression models for stream flow predictions
.
Environmental Monitoring and Assessment
190
,
1
20
.
Goldstein
A.
,
Fink
L.
,
Meitin
A.
,
Bohadana
S.
,
Lutenberg
O.
&
Ravid
G.
2018
Applying machine learning on sensor data for irrigation recommendations: revealing the agronomist's tacit knowledge
.
Precision Agriculture
19
,
421
444
.
Kisi
O.
&
Kerem Cigizoglu
H.
2007
Comparison of different ANN techniques in river flow prediction
.
Civil Engineering and Environmental Systems
24
(
3
),
211
231
.
Liao
S.
,
Liu
Z.
,
Liu
B.
,
Cheng
C.
,
Jin
X.
&
Zhao
Z.
2020
Multistep-ahead daily inflow forecasting using the ERA-Interim reanalysis data set based on gradient-boosting regression trees
.
Hydrology and Earth System Sciences
24
(
5
),
2343
2363
.
Ni
Y.
,
Zhao
X.
,
Bao
G.
,
Zou
L.
,
Teng
L.
,
Wang
Z.
,
Song
M.
,
Xiong
J.
,
Bai
Y.
&
Pei
G.
2006
Activation of β2-adrenergic receptor stimulates γ-secretase activity and accelerates amyloid plaque formation
.
Nature Medicine
12
(
12
),
1390
1396
.
Ren
Y.
,
Suganthan
P.
&
Srikanth
N.
2014
A comparative study of empirical mode decomposition-based short-term wind speed forecasting methods
.
IEEE Transactions on Sustainable Energy
6
(
1
),
236
244
.
Saco
P. M.
,
Moreno-de las Heras
M.
,
Keesstra
S.
,
Baartman
J.
,
Yetemen
O.
&
Rodríguez
J. F.
2018
Vegetation and soil degradation in drylands: nonlinear feedbacks and early warning signals
.
Current Opinion in Environmental Science & Health
5
,
67
72
.
Sivakumar
B.
&
Singh
V.
2012
Hydrologic system complexity and nonlinear dynamic concepts for a catchment classification framework
.
Hydrology and Earth System Sciences
16
(
11
),
4119
4131
.
Snieder
E., Abogadil, K.
&
Khan
U. T.
2021
Resampling and ensemble techniques for improving ANN-based high-flow forecast accuracy
.
Hydrol. Earth Syst. Sci.
25
(
5
), p.
2543
2566
.
Song
Y.-Q.
,
Yang
L.-A.
,
Li
B.
,
Hu
Y.-M.
,
Wang
A.-L.
,
Zhou
W.
,
Cui
X.-S.
&
Liu
Y.-L.
2017
Spatial prediction of soil organic matter using a hybrid geostatistical model of an extreme learning machine and ordinary kriging
.
Sustainability
9
(
5
),
754
.
Venkatesan
E.
&
Mahindrakar
A. B.
2019
Forecasting floods using extreme gradient boosting – a new approach
.
International Journal of Civil Engineering and Technology
10
(
2
),
1336
1346
.
Ying
C.
,
Qi-Guang
M.
,
Jia-Chen
L.
&
Lin
G.
2013
Advance and prospects of AdaBoost algorithm
.
Acta Automatica Sinica
39
(
6
),
745
758
.
Yu
Y.
,
Zhu
Y.
,
Li
S.
&
Wan
D.
2014
Time series outlier detection based on sliding window prediction
.
Mathematical Problems in Engineering
2014
,
879736
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).