Abstract
Predicting missing historical or forecasting streamflows for future periods is a challenging task. This paper presents open-source data-driven machine learning models for streamflow prediction. The Random Forests algorithm is employed and the results are compared with other machine learning algorithms. The developed models are applied to the Kızılırmak River, Turkey. First model is built with streamflow of a single station (SS), and the second model is built with streamflows of multiple stations (MS). The SS model uses input parameters derived from one streamflow station. The MS model uses streamflow observations of nearby stations. Both models are tested to estimate missing historical and predict future streamflows. Model prediction performances are measured by root mean squared error (RMSE), Nash–Sutcliffe efficiency (NSE), coefficient of determination (R2), and percent bias (PBIAS). The SS model has an RMSE of 8.54, NSE and R2 of 0.98, and PBIAS of 0.7% for the historical period. The MS model has an RMSE of 17.65, NSE of 0.91, R2 of 0.93, and PBIAS of −13.64% for the future period. The SS model is useful to estimate missing historical streamflows, while the MS model provides better predictions for future periods, with its ability to better catch flow trends.
HIGHLIGHTS
An open-source machine learning model to estimate streamflow.
Use of single- and multi-streamflow station datasets as inputs are compared.
A single streamflow station can be used to estimate its missing flows in a historical period.
A multi-station model can better capture streamflow trends.
LIST OF ACRONYMS
- ANNs
artificial neural networks
- DL
deep learning
- ELM
extreme learning machine
- ML
machine learning
- MS
multi-station
- NSE
Nash–Sutcliffe Efficiency
- RF
Random Forests
- RMSE
root mean squared error
- SS
single station
- SVMs
support vector machines
- SVR
support vector regression
- SWAT
Soil and Water Assessment Tool
- WNNs
wavelet neural networks
- XGBoost
extreme gradient boosting
INTRODUCTION
Streamflow is widely used in water resource planning and management, including hydropower planning and operations, water supply operations for urban, agriculture and environment, drought management, and flood mitigation. Accurate, timely and continuous streamflow predictions provide stakeholders and decision-makers with essential information (Besaw et al. 2010) for better managing complex water systems and efficient water resource management (Ghobadi & Kang 2022). Many parameters, such as precipitation, temperature, evapotranspiration, land use, topography and soil characteristics, affect and contribute to streamflow, characterized by a nonlinear relationship between streamflow and watershed characteristics (Adnan et al. 2019; Shah et al. 2021). Streamflow is often measured at stream gauge stations, calculated via physically-based hydrological models, or statistically estimated via data-driven empirical models. Although in situ observations are important for obtaining accurate streamflow records, spatiotemporal availability of stream gauge observations for desired locations can be limited. Moreover, these stations can wrongly measure streamflow due to human or instrument error, requiring re-evaluation, and estimation of flows for the missing periods. Hydrological models, based on explicit relationships between inputs and outputs, are data-intensive and require good system knowledge, involving physical formulas to describe the complicated meteorological and hydrological processes (Feng et al. 2022; Cacal et al. 2023). Without prior knowledge of hydrological systems, statistically-based or data-driven models, including machine learning, however, mathematically connect inputs and outputs and disregard intervening physical processes (Chu et al. 2021; Duarte et al. 2022). These empirical models are less data-intensive and can be easily applied, especially, when the objective is to estimate missing or falsely measured flows in a stream flow dataset or predict short-term future flows.
As more water-related data become available, the use of machine learning models to estimate historical or forecast future streamflows has gained popularity over the past two decades. These regression-based machine learning models find statistical relationships between input data and target for a past period, called training and make predictions for desired periods. Several machine learning algorithms have been used to predict streamflow. Govindaraju (2000), Sit et al. (2020) and Khullar & Singh (2021) reviewed early and recent machine learning applications in hydrology and water resources. Commonly used algorithms are artificial neural networks (ANNs), support vector machines (SVMs), Random Forests (RF), extreme gradient boosting (XGBoost), and deep learning (DL). Hsu et al. (1995) used the ANN to model the rainfall–runoff process. Besaw et al. (2010) used the ANN to forecast streamflow for ungauged basins. Erdal & Karakurt (2013) estimated monthly streamflow with RF and SVM and concluded that RF yields promising outputs. Booker & Woods (2014) compared RF hydrology estimates with a physically-based model and concluded that RF shows good performance at estimating hydrological parameters. Noori & Kalin (2016) predicted daily streamflow for ungauged watersheds with the ANN model coupled with Soil and Water Assessment Tool (SWAT), a physically-based hydrological model. Petty & Dhingra (2018) used RF to predict streamflow for flood forecasting. Adnan et al. (2019, 2020) predicted daily and monthly streamflow, respectively, with an optimally pruned extreme learning machine (ELM). Dalkiliç & Hashimi (2020) predicted daily streamflow using ANN, wavelet neural networks (WNNs), and adaptive neuro-fuzzy inference system and concluded that WNN provides more accurate estimates. Li et al. (2020) showed that RF presents better and more stable streamflow prediction performance than other machine learning algorithms, namely SVM and XGBoost. Ni et al. (2020) developed a model to predict monthly streamflow, coupling XGBoost with the Gaussian mixture model. Shijun et al. (2020) used an RF algorithm to forecast medium and long-term runoff. Kumar et al. (2021) used SVM to model real-time streamflow using satellite inputs. Lin et al. (2021) developed a hybrid DL model to predict hourly streamflow. Feng et al. (2022) used ELM based on a sparrow search algorithm for forecasting runoff time series. Ghobadi & Kang (2022) developed a DL model for predicting long-term streamflow on a monthly time scale. Xu et al. (2022) used the DL method to predict monthly streamflow, including variables from general circulation models. Sayed et al. (2023) simulated the rainfall–runoff process with two hydrological models and two ML models and concluded that ML models are effective forecasting tools.
RF is a decision tree-based classification and regression algorithm. RF grows several decision trees throughout the model-building process, trained by a bootstrapped sample of the input dataset (Panahi et al. 2022). For regression problems, such as streamflow prediction, the final output is an ensemble average of all individual tree decisions in the forest (Breiman 2001).
Selecting appropriate algorithms and input variables when predicting missing streamflows or forecasting future streamflows for data-driven models is important, yet little research has addressed it. In addition, most of the developed models are not transparent. The goal of this paper is to help researchers choose optimal input variables with minimum data requirements by comparing single and multi-station datasets and determine the best algorithm for streamflow prediction. Also, it is aimed to develop an open-source, flexible and easy-to-adapt model. All source code and data are shared online via GitHub (Dogan 2023). The model is built with Scikit-Learn, an open-source machine learning library in Python (Pedregosa et al. 2011). Daily streamflow prediction is employed for the Kızılırmak River, Turkey, but the developed model is independent of data resolution. Thus, depending on the data availability, other time steps, such as hourly, weekly or monthly, can be used.
MATERIALS AND METHODS
Study area
Study area: Kızılırmak River Basin with selected streamflow stations.

(a) Observed daily flow of all stations and (b) observed daily flow of station #1543. The calibration and validation or historical period are from October 1, 2010 to September 30, 2014. The future period is from October 1, 2014 to September 30, 2015.
(a) Observed daily flow of all stations and (b) observed daily flow of station #1543. The calibration and validation or historical period are from October 1, 2010 to September 30, 2014. The future period is from October 1, 2014 to September 30, 2015.
RF model


Input variables of the single-station (SS) model and multi-station (MS) model are shown in Table 1, with data range, mean and standard deviation values for the training period from October 1, 2010 to September 30, 2014. For the SS model, variables are derived from a streamflow time-series dataset of station #1543. The training data are from October 1, 2010 to September 30, 2014 in a daily time-step. The derived parameters for the SS model are day of the month ; the month of the year
; year of the date
; maximum flow in a given month
; mean flow in a given month
; minimum flow in a given month
; maximum flow in a given year
; mean flow in a given year
; and minimum flow in a given year
. For the MS model, daily streamflow datasets from four stations (#1539; #1535; #1517; and #1501) are usedfrom October 1, 2010 to September 30, 2014. All these stations are located in the Kızılırmak River Basin. Both SS and MS models are trained to predict the assumed missing streamflows (target variable) of station #1543.
Input (independent) variables of training period (October 1, 2010–September 30, 2014) for single- and multi-station models
Single-station model variable . | Data range . | Mean . | Standard deviation . |
---|---|---|---|
[1, 31] | – | – | |
[1, 12] | – | – | |
[2010, 2014] | – | – | |
[9, 546] | 88 | 114 | |
[11, 178] | 54 | 50 | |
[1, 149] | 34 | 40 | |
[71, 546] | 358 | 182 | |
[19, 70] | 54 | 20 | |
[1, 10] | 6 | 4 | |
Multi-station model variable | Data range | Mean | Standard deviation |
| [1, 131] | 12 | 18 |
| [2, 301] | 31 | 42 |
| [1, 25] | 6 | 4 |
| [6, 111] | 62 | 27 |
Target variable | Data range | Mean | Standard deviation |
| [1, 546] | 54 | 68 |
Single-station model variable . | Data range . | Mean . | Standard deviation . |
---|---|---|---|
[1, 31] | – | – | |
[1, 12] | – | – | |
[2010, 2014] | – | – | |
[9, 546] | 88 | 114 | |
[11, 178] | 54 | 50 | |
[1, 149] | 34 | 40 | |
[71, 546] | 358 | 182 | |
[19, 70] | 54 | 20 | |
[1, 10] | 6 | 4 | |
Multi-station model variable | Data range | Mean | Standard deviation |
| [1, 131] | 12 | 18 |
| [2, 301] | 31 | 42 |
| [1, 25] | 6 | 4 |
| [6, 111] | 62 | 27 |
Target variable | Data range | Mean | Standard deviation |
| [1, 546] | 54 | 68 |
Target variable is station #1543 streamflow.









Cross-correlation of input variables of (a) SS and (b) MS models. Station #1543 streamflows are used in the SS model (Q).
Cross-correlation of input variables of (a) SS and (b) MS models. Station #1543 streamflows are used in the SS model (Q).






An example regression decision tree with a depth of 2 and two input variables ().
An example regression decision tree with a depth of 2 and two input variables ().
Model parameter calibration
Model parameter calibration for (a) SS and (b) MS models. Calibrated parameter numbers are shown under blue bars. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/wst.2023.171.
Model parameter calibration for (a) SS and (b) MS models. Calibrated parameter numbers are shown under blue bars. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/wst.2023.171.
RESULTS
Daily streamflows are predicted with SS and MS models and compared to observed streamflows. Missing streamflows in historical periods are predicted and future streamflows are forecasted using SS and MS models. Model prediction performances are presented. The RF algorithm is compared to other commonly used ML algorithms.
Historical period streamflow estimation
Predicted and observed daily flow with (a) SS and (b) MS station data in the historical period (25% of October 1, 2010–September 30, 2014).
Predicted and observed daily flow with (a) SS and (b) MS station data in the historical period (25% of October 1, 2010–September 30, 2014).
Future period streamflow forecast
2015 water year (October 1, 2014–September 30, 2015) SS and MS predictions with observed flows in the future period: (a) Daily and monthly flow time series and (b) daily and monthly average flow comparison.
2015 water year (October 1, 2014–September 30, 2015) SS and MS predictions with observed flows in the future period: (a) Daily and monthly flow time series and (b) daily and monthly average flow comparison.
Prediction performances




Prediction performances of SS and MS models are summarized in Table 2. Performance indicators of single- and multi-station models for the historical and future periods with performance indicators. Mean () and standard deviation (
) of daily observed streamflow of the training period are 54.80 and 69.79
, respectively, which are mean and standard deviation values of 75% of streamflow in the period from October 1, 2010 to September 30, 2014. For the historical period, the SS model has better prediction performance, where the mean predicted streamflow (
) is closer to the mean observed streamflow (
). RMSE and PBIAS are smaller, while NSE and R2 are greater than the MS model in the historical period. For the future period, however, the MS model has better prediction performance in terms of RMSE, NSE and R2 indicators, even though the SS model's mean predicted streamflow (
) is closer to the mean observed streamflow (
), compared to the mean predicted streamflow of the MS model (
). Moreover, the MS model has a greater absolute PBIAS than the SS model. The MS model's predictions are slightly more biased and the model tends to overpredict, while the SS model has less PBIAS with a tendency to underpredict streamflows.
Performance indicators of single- and multi-station models for the historical and future periods
Parameter . | Single station . | Multi-station . |
---|---|---|
Historical period (October 1, 2010–September 30, 2014) | ||
| 54.80 | |
| 69.79 | |
| 51.28 | |
| 63.83 | |
| 50.91 | 52.43 |
| 61.83 | 66.55 |
| 8.54 | 15.51 |
| 0.98 | 0.94 |
| 0.98 | 0.95 |
| 0.70% | −2.24% |
Future period (October 1, 2014–September 30, 2015) | ||
| 56.32 | |
| 57.93 | |
| 54.63 | 64.0 |
| 47.59 | 58.76 |
| 24.71 | 17.65 |
| 0.82 | 0.91 |
| 0.83 | 0.93 |
| 3.01% | −13.64% |
Parameter . | Single station . | Multi-station . |
---|---|---|
Historical period (October 1, 2010–September 30, 2014) | ||
| 54.80 | |
| 69.79 | |
| 51.28 | |
| 63.83 | |
| 50.91 | 52.43 |
| 61.83 | 66.55 |
| 8.54 | 15.51 |
| 0.98 | 0.94 |
| 0.98 | 0.95 |
| 0.70% | −2.24% |
Future period (October 1, 2014–September 30, 2015) | ||
| 56.32 | |
| 57.93 | |
| 54.63 | 64.0 |
| 47.59 | 58.76 |
| 24.71 | 17.65 |
| 0.82 | 0.91 |
| 0.83 | 0.93 |
| 3.01% | −13.64% |
Comparison of RF to other algorithms
Distribution of errors (observed–predicted) with RF, XGBoost, SVR, and ANN algorithms for (a) the SS model and (b) the MS model in the historical period (25% of October 1, 2010–September 30, 2014).
Distribution of errors (observed–predicted) with RF, XGBoost, SVR, and ANN algorithms for (a) the SS model and (b) the MS model in the historical period (25% of October 1, 2010–September 30, 2014).


Average prediction error, RMSE, and runtime comparison of RF, XGBoost, SVR, and ANN algorithms for (a) the SS model and (b) the MS model.
Average prediction error, RMSE, and runtime comparison of RF, XGBoost, SVR, and ANN algorithms for (a) the SS model and (b) the MS model.
DISCUSSION
The SS model with its derived input variables, including time variables, such as the day of the month and month of the year and statistical variables, such as maximum, mean and minimum values in a given month or year, from one streamflow station can successfully predict missing data scattered in its historical period. The MS model utilizes streamflows from nearby stations, and it can better forecast future streamflows. Adding more input variables and including more stations can improve the predictions of both models. When building ML models, it is better to start with many input variables and eliminate ones that have little impact on the prediction performance. Sharifi et al. (2017) discuss the optimal length of training and test sets and input combinations for ML applications in runoff prediction. Since machine learning algorithms require a large amount of data, the developed models may not be suitable for ungauged basins or basins with insufficient data.
CONCLUSIONS
RF-based machine learning models are developed to predict daily streamflows. The single-station (SS) model is built with variables derived from the time-series dataset of a single station (#1543). The multi-station (MS) model is built with variables, corresponding to streamflow observations of four other nearby streamflow stations. Predictions are made for two periods: historical and future. The historical period is inside the training period, where randomly selected 25% of the dataset is witheld. This portion represents missing historical values. The future period is one year of streamflow time series in daily time-step from October 1, 2014 to September 30, 2015. The MS model tends to overpredict streamflows with PBIAS of −2.24 and −13.64% in the historical and future periods, respectively. Both models can successfully predict missing periods in the historical period, where the SS model has an NSE and R2 of 0.98 and the MS model has an NSE of 0.94 and R2 of 0.95, while the MS model is superior in the future period with its NSE of 0.91 and R2 of 0.93, compared to the SS model's NSE and R2 of 0.82 and 0.83, respectively. The SS model is useful with its fewer data requirements to estimate the missing streamflow of a single station itself, while the MS model can be better utilized for future periods with its ability to better follow flow trends. Comparing RF to other ML algorithms, RF has a lower average prediction error and RMSE. Moreover, RF has less runtime than other compared algorithms, except for XGBoost. The developed algorithm is applied to the Kızılırmak River Basin. The open-source algorithm can also be easily applied to other basins to predict streamflow for desired locations and times.
ACKNOWLEDGEMENTS
The author acknowledges the helpful comments and suggestions from three anonymous reviewers to enhance the manuscript.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories at https://github.com/msdogan/Stream-flow-prediction.
CONFLICT OF INTEREST
The authors declare there is no conflict.