The objective of this study is the development of a state-of-the-art method based on long short-term memory (LSTM), support vector machine (SVM), and random forest (RF) to predict the streamflow in the Mekong Delta in Vietnam, an area crucial to Vietnam's food security. Water level and flow data from 2014 to 2018 at the Tan Chau station and Can Tho (on the Hau River) were used as the input data of the prediction model. Three different ranges of data – from the preceding 4, 8, and 12 days – were used to predict streamflow for both 1 and 7 days ahead, resulting in six individual predictions. Various statistical indices, namely root-mean-square error, mean absolute error (MAE), and the coefficient of determination (R2), were used to assess the predictive ability of the model. The results showed that the SVM and random forest models were successful in improving the performance of the LSTM model, with R2 > 80%. For a prediction of 1 day ahead, the proposed models gave an R2 value of 2–5% higher than a prediction of 7 days ahead. These results highlighted that LSTM is a robust technique for characterizing and predicting time series behaviors in hydrology applications.

  • Daily streamflow forecasting was done using hybrid machine learning approaches.

  • Model performance was evaluated using RMSE, R2, and MAE.

  • Developed models achieved high accuracy in daily streamflow forecasting.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Streamflow is an important index that directly influences the quantification of available water resources for water supply projects and agricultural, hydroelectric, and other development (Malik et al. 2020; Hunt et al. 2022). Adnan et al. (2021) reported that approximately 20% of river flow is affected by human activities. Changes in land use and the construction of dams and reservoirs are the main factors influencing the trend of river flows (Adnan et al. 2021).

Early warning systems are installed in many watersheds around the world, providing real-time flow measurements of river systems for water resource management. However, these warning systems require significant investment and can encounter difficulties in poorer regions (Krajewski et al. 2017; Hussain & Khan 2020), so state-of-the-art methods must be developed that not only reduce investment capital but also have high accuracy and reliability.

Because river flow is a complex physical process, flow predictions can be solved using physics-based models and data-driven models. Many previous studies have demonstrated the efficiency of these models for hydrological predictions, especially those over the short term (Yadav et al. 2007). Physics-based models are developed based on flow generation processes using mathematical formulas or parameterization of physical processes (Paniconi & Putti 2015; Yan et al. 2021), which is complex and time-consuming. Especially in small catchments, the process of flow formation is complex and nonlinear, so physics-based models are limited in their ability to accurately predict flows (Alizadeh et al. 2018). In addition, physics-based models require the availability of reliable data, which is lacking in most global river basins (Alizadeh et al. 2018; Parisouj et al. 2020). In addition, model parameters derived from the physical characteristics of a basin are often associated with high uncertainty (Bulygina et al. 2011); their application is also limited by very high costs. All of this has restricted the use of these models in many locations around the world. Therefore, physics-based models still need to be developed, or replaced with robust and automated methods, to address the limitations of these models.

To overcome these challenges, data-driven models have recently been receiving increased attention from scientific communities around the world due to their accurate prediction capability (Kratzert et al. 2019). They are categorized into two main models: statistical and machine learning (ML). Statistical models are built on the assumption that the flow generation process follows a normal distribution, so such models have limited accuracy when predicting the characteristics of nonlinear and random flow processes (Ghimire et al. 2021). In the last two decades, ML has proven to be a successful and cost-effective solution in the field of water resources with its ability to measure and predict flooding (Islam et al. 2021; Nguyen 2022b), surface water (Acharya et al. 2019; Chen et al. 2020), water quality (Haghiabi et al. 2018; Ahmed et al. 2019), water salinity (Melesse et al. 2020; Jung et al. 2021), and groundwater (Sahoo et al. 2017; Singha et al. 2021). Particularly, in recent years, ML has been widely applied in streamflow prediction with models focusing less on the physical characteristics of the hydrological cycle and more on using black box methods to establish optimal mathematical relationships between input and output data. Such models have been widely applied in flow prediction in small and medium watersheds (Jothiprakash & Magar 2012). These models include artificial neural network (ANN) (Zealand et al. 1999; Besaw et al. 2010), support vector machine (SVM) (Kisi & Cimen 2011; Huang et al. 2014), adaptive neuro-fuzzy inference system (ANFIS) (Chang & Chang 2006; Firat & Güngör 2007), long short-term memory (LSTM) (Feng et al. 2020; Ni et al. 2020), extreme learning machine (ELM) (Yaseen et al. 2016; Adnan et al. 2019), and fuzzy neural network (FNN) (Valença & Ludermir 2000; Deka & Chandramouli 2003). The advantage of ML models is their ability to handle large datasets and accept datasets at different scales, while not being sensitive to missing data (Yaseen et al. 2018). The models find the optimization relationships between the input data and the output data for the prediction. In addition, ML models can simulate nonlinear and complicated dynamic streamflow systems with high accuracy, as has been recognized since the 1990s (Khosravi et al. 2021). Challenges in the model transformation from a basin with available flow data to another with similar characteristics have been solved by ML modeling (Contreras et al. 2019; Khosravi et al. 2021). Lin et al. (2021) used precipitation and runoff data from six meteorological and hydrological stations to develop a hybrid model based on the first-order difference (DIFF), the feedforward neural network (FFNN), and the LSTM to predict the hourly flows of the Andun Basin in China. Rahimzad et al. (2021) constructed four models based on linear regression (LR), multilayer perceptron (MLP), SVM, and LSTM to predict daily streamflows in the Kentucky watershed in the United States. Data used in the model include precipitation and discharge for the period 1986–2012. Adnan et al. (2021) developed hybrid models based on locally weighted learning (LWL), additive regression (AR), bagging (BG), dagging (DG), random subspace (RS), and rotation forest for monthly forecasting flows in the Jhelum River Basin, Pakistan. Monthly rainfall data from 1965 to 2012 at the Kohala station on the Jhelum River Basin was used to construct these models. Le et al. (2021) developed an LSTM model for 1-day and 2-day flow forecasting at Son Tay station in the Red River Basin in Vietnam. The authors used river flow data for 20 years (1995–2014) to train, validate, and evaluate the model. However, the literature suggests that there is no universal model, that is, one that can solve all problems in all regions. Moreover, ML is associated with generation issues: models have weak prediction ability when the training dataset is not long enough or the validation data is not in the training data range (Melesse et al. 2011). Moreover, although individual models bring better performance, complex structures and parameter configurations offer great challenges to build a better individual model. Popular methods, including trial-and-error, random search have been widely applied, however, they have a low convergence rate and do not specifically consider interactions between parameters and hyperparameters. This is why recent studies have emphasized the development of a model together: Because they have the ability to eliminate individual model weak points.

The objective of this study is the development of a state-of-the-art method based on LSTM, SVM, and RF to predict the streamflow in the Mekong Delta in Vietnam. These three models are considered to be the most popular and have been widely applied in previous studies to predict the streamflow. Moreover, these models have advantages in fast convergence ability and solve nonlinear problems, as well as generate models with high accuracy in high-dimensional spaces. In addition, these models have effective memories. Finally, the RF model has the ability to automatically resolve missing values. This study is different from previous studies because this is the first time these models have been applied to predict streamflow in the Mekong River. Water resource management in the area (prediction, reservoir operation, and flood control) has been carried out based on streamflow. In developing countries like Vietnam, which lack the appropriate amount of data to build water resource management strategies, streamflow prediction is important. The development of a model in areas with limited data has received attention from the scientific communities of the world. The results of this study will bring new understanding and improvements in streamflow modeling and prediction. The findings of this study can help decision-makers better manage water resources.

Study area and data

The Mekong River is seen as 1 of the 10 most important rivers in the world in terms of flow and sediment. Its 4,350 km-long journey begins in the Tibetan Plateau in China. It flows through six different countries: China, Laos, Cambodia, Thailand, Myanmar, and Vietnam. Finally, the Mekong River empties into the East Sea through the Mekong Delta.

The Vietnamese Mekong Delta (VMD) is located downstream of the Mekong River and covers an area of approximately 39,400 km2 (Figure 1). This region is home to nearly 20 million people. Rice cultivation is the main crop in the VMD, covering an area of about 1.9 million ha, representing about 50% of the country's total rice production. The delta is relatively flat, with an average altitude of 0–2 m above mean sea level. The delta has a tropical climate with two main seasons: the dry season is from November to April, and the rainy season is from May to October. Average precipitation in the VMD ranges from 1,400 to 2,200 mm per year, 90–95% of which falls during the rainy season. The tides in the delta are very complicated and divided into two main regimes: semi-diurnal in the East Sea and diurnal in the West Sea. For the semi-diurnal tide, the high tide period lasts about 6 h, and the low tide period is about 7 h. The average magnitude of the tides in this region varies from 3 to 4 m and the maximum tide can reach 4.1 m. For the diurnal tide, there are two peaks and two feet during the day, with a magnitude ranging from 0.8 to 1.2 m.
Figure 1

The location of the VMD.

Figure 1

The location of the VMD.

Close modal

Figure 1 shows the hydrological networks in the VMD are very dense (80 m/ha) with two main rivers: the Tien River and the Hau River. According to a report by the Ministry of Natural Resources and Environment, the annual flow in the delta is around 500 km3, of which approximately 23 km3 (4.6%) comes directly from precipitation. The remaining 477 km3 is from the flow of the upstream Mekong River. The flow at the Tan Chau station ranges from 5,000 to 17,000 m3/s. 70–80% of the annual flow occurs in the rainy season, which causes pressure on agricultural development in the dry season.

The VMD is thought to be particularly affected by climate change. Previous studies have predicted a sea-level rise of between 46 and 77 cm by the end of the 21st century in this region, which will aggravate drought and saltwater intrusion, especially during the dry season from October to May each year. The accurate prediction of streamflow in the delta, therefore, plays an important role in supporting those responsible for water resource management and the sustainable development of agriculture.

To meet the increasing demand for water resources, several small- and medium-sized water conservation facilities have been built in the upper river. Due to the relatively staggered information-sharing policy, users cannot collect data upstream on time. This harms socio-economic development in the delta, especially agricultural development. Figure 2 shows the water level and river flow at the Tan Chau and Can Tho stations on the Hau River from 2014 to 2018, the datasets were used to build the streamflow prediction models.
Figure 2

Water level at the Chau Doc station and flow at the Can Tho station in the Hau River from 2014 to 2018.

Figure 2

Water level at the Chau Doc station and flow at the Can Tho station in the Hau River from 2014 to 2018.

Close modal

The model training process of the ML method often encounters numerical difficulties, because the raw streamflow data has strong nonlinearity, which strongly influences the prediction model (Niu et al. 2020). It is necessary to normalize these data to limit these problems. This study uses a neural network; the original values of all attributes have been kept, but the databases have been normalized to within a range of 0–1.

Methodology

Long short-term memory algorithm

LSTM is a type of recurrent neural network (RNN) that extends its memory. In RNN, the output of the last step is fed as the input to the current step. LSTM was designed to solve the problem of long-term dependencies of the RNN in which the RNN cannot predict the problem stored in the long-term memory but can give more accurate predictions from recent information (Hochreiter & Schmidhuber 1997). LSTM allows RNNs to remember their inputs over a long period. LSTM stores information for a long time in its memory. It can read, write, and delete information from its memory. LSTM is widely used for prediction based on time series data (Vojtek et al. 2021). The structure of LSTM includes three gates: input, forget, and output (Dong et al. 2020; Ghimire et al. 2021).

The forget gate deletes information that is no longer useful. A piece of input data Xt at time t and previous cell output data ht−1 are sent to the gate and they are multiplied with matrices of weights and biases. The output results are binary functions. If these results are equal to 0, the gate overwrites the information; if the results are equal to 1, the information is kept for the next step.
where ft is the forget gate, σ is the sigmoid function, and Wf and bf are the weight and bias matrices of the forget gate, respectively.
The input gate adds useful information about the state of the door. In this gate, the information is regulated using the sigmoid function and the non-useful information in the memory is filtered out using the input data Xt and h(t−1), as with the forget gate. Then, the tanh function is used to generate the vector which gives an output result from −1 to 1. This result contains all possible values of the input data Xt and h(t−1). Vector values and set values are multiplied to get useful information.
where it is the input gate, Ct−1 and Ct are the cell states at time t − 1 and t, respectively, and W and b are the respective weight matrices and bias of the cell state.
The output gate extracts useful information from the current state of the one to present in the output format.
where Ot is the output gate and Wo and bo are the weight matrices and bias of the output gate, respectively.

Support vector machine

SVM is a monitoring algorithm that can solve classification and regression problems and was first proposed by Vapnik et al. (1995). SVM creates a hyperplane in an N-dimensional space to divide the data into two parts corresponding to their class. In two-dimensional space, this hyperplane is a line dividing the plane of space into two parts corresponding to two layers, each layer being located on one side of the line. This technique is applied to the linear model by dividing the dataset into feature spaces via a nonlinear function (Samantaray et al. 2022). It uses the principle of structural risk minimization and statistical learning to determine the boundary between the two opposite classes to improve the generalization capacity, thanks to the reduction of the generalization error as opposed to the training error (Christian et al. 2021; Essam et al. 2022). SVM works using the kernel function that converts data from the input feature space to the higher-dimensional feature space. This conversion supports determining complex input–output relationships in a relatively simple way (Christian et al. 2021). The SVM technique for solving the regression problem can be expressed as follows:
where is the Lagrange multiplier, K(x,z) is the kernel function inside the multiplier, and bi is the bias.

The performance of the SVM model depends on the parameters kernel, C, and gamma. Kernel parameters can be linear, poly, radial basis function (RBF), sigmoid, and precomputed. C adjusts for omitted outliers when building the SVM model, while gamma determines the number of data to build the hyperplane. In this study, SVM was used to optimize the parameters of the LSTM algorithm.

Random forest

Random forest (RF) is a powerful supervised learning algorithm and was first proposed by Breiman (2001). This algorithm uses the results of the decision tree prediction to solve the classification and regression problems. RF makes it possible to combine a large number of decision trees (weak models) automatically and randomly to create the best results with higher accuracy than individual models (Zhang et al. 2019; Peng et al. 2020). The sub-models (the decision trees) are evaluated using the majority voting method to select the best model (Al-Abadi & Shahid 2016). RF works based on three main steps (Tian et al. 2016): the first is to randomly select n data from the dataset using the bootstrapping technique. In this study, the dataset was divided into two parts: 80% of the data were used to train the models, while 20% of the data were used to validate the models. The second step is the building of the decision tree using the decision tree algorithm. RF includes many decision trees; each tree is built using the decision tree algorithm on different datasets and using different sets of attributes. Then, the RF prediction results are aggregated from the decision trees. The third step is the vote for the best prediction results. The best result is then returned.

RF has the advantage of solving the problem of missing data by using the average value of the adjacent values (Ziegler & König 2014). Also, when the forest has more trees, RF can avoid the overfitting problem. Although RF has high precision, it has limitations such as when a dataset has a large number of variables (Arabameri et al. 2019). A decision tree of limited depth often misses important variables. The performance of the RF model is influenced by parameters like max_features, n_estimators, and min_sample_leaf. In this study, RF was applied to compute the weights for each layer of LSTM.

Performance assessment

In this study, various statistical indices were used to evaluate the performance of the prediction model, namely root-mean-square error (RMSE), mean absolute error (MAE), and the coefficient of determination (R2). Several previous studies have confirmed that these indices are widely used and reliable measures for the prediction problem.

RMSE is the most popular statistical index to evaluate the prediction ability of the model. It works by comparing the difference between the prediction value and the observation value (Zhu et al. 2020). The value of RMSE ranges from 0 to 1. The closer the RMSE value is to 0, the more accurate the model's prediction. RMSE is defined by the following equation:
where N is the number of samples, Ypre,i is the prediction value at point number i, and Yobs,i is the observation value at point number i.
MAE is a measure of the mean of the errors between the predicted value and the observed value. The value of MAE is calculated as the sum of the errors divided by the sample size and is defined by the following equation (Legates & McCabe Jr 1999):
R2 is considered one of the most popular measures to assess the level of fit of the model to the observational data. The value of R2 ranges from 0 to 1. The more efficient the model, the closer the value is to 1 (Kumari et al. 2021). R2 is defined by the following equation:
where and are the mean value of predicted and observed daily streamflow, respectively.

Basic steps of modeling by LSTM

Figure 3 shows the methodology used to predict the streamflow in the study area. This method has been divided into four main steps: collection and preparation of data; building of model; validation model; and prediction. All the techniques presented in the previous sections were used in the prediction of streamflow with different horizons. The key steps in the model-building process were as follows.
Figure 3

Flowchart of the proposed hybrid LSTM models.

Figure 3

Flowchart of the proposed hybrid LSTM models.

Close modal

Collection and preparation of data

Water level and flow data were collected at the Chau Doc and Can Tho stations on the Hau River from 2014 to 2018. After data collection, these data were normalized to use as the model input data.

Building of model

Firstly, 80% of the data, corresponding to 1,460 days (4 years) were used to develop the models. The rest of the data, corresponding to 20%, were used to compare and evaluate the performance of the proposed models. The assumption is that the streamflow is a dynamically responsive system and depends on the occurrence of the past. The value of streamflow in the future was predicted using the value of the water level and the streamflow in the past. This is why data for the different preceding days (4, 8, and 12 days) are being tested. This study applied the trial-and-error method, therefore, this selection depends on this method. Moreover, these selections are related to statistical significance and depend on the sizes of the empirical models. Various studies have pointed out that the number of previous days is greater, which increases the computational capacity of the model.

Second, in this study, LSTM was used to predict the daily streamflow at Can Tho in the Hau River station. To improve the successfulness of the LSTM model, the hybrid models were built by integrating SVM and RF in the LSTM network to resample the training dataset to train the base LSTM model. The success of the hybrid models was calculated by comparing their performance with the nonhybrid model.

The structure of the LSTM-SVM and LSTM-RF models is shown in Figure 3, which is referenced in Guo et al. (2019). The prediction process flow was divided into two stages. The first stage is to increase the number of samples for the training models by using the historical samples of precipitation and discharge data to generate the steady series samples and combine them with temporal series samples. While the second stage is to predict the river discharges by integrating SVM/RF and LSTM. The output data of SVM/RF and LSTM was X1 and X2. The combination between X1 and X2 is the final result of the LSTM-SVM/RF model.

To explain the combination of SVR/RF and LSTM, the combination is designed for real-time prediction by the following equation:
where X1(t) is LSTM prediction, X2(t) is RF/SVR prediction, and X(t) is a combination.
where epsilon, theta, and gamma are coefficients.

= |previous label – previous prediction|: degree of absolute error

= |previous label – previous prediction|/previous label: degree of relative error

= previous label – previous prediction: trend of recent error

To build the models, the different preceding days (4, 8, and 12 days) were used to predict the 1 and 7 days ahead.

  • (i)

    Ht−1, Ht−2, Ht−3, Ht−4, Qt−1, Qt−2, Qt−3, Qt−4.

  • (ii)

    Ht−1, Ht−2, Ht−3, Ht−4, Ht−5, Ht−6, Ht−7, Ht−8, Qt−1, Qt−2, Qt−3, Qt−4, Qt−5, Qt−6, Qt−7, Qt−8.

  • (iii)

    Ht−1, Ht−2, Ht−3, Ht−4, Ht−5, Ht−6, Ht−7, Ht−8, Ht−9, Ht−10, Ht−11, Ht−12, Qt−1, Qt−2, Qt−3, Qt−4, Qt−5, Qt−6, Qt−7, Qt−8, Qt−9, Qt−10, Qt−11, Qt−12.

where Ht−1 is the streamflow at 1 previous day.

Model validation

In this study, RMSE, MAE, and R2 were the statistical indices used to validate the proposed models. 20% of the data (river flows) were used to assess the accuracy of the proposed models. This data should not be used to train the models. Depending on the number of preceding days and the days ahead considered, the precisions of the proposed models are different.

Prediction

The proposed models were used to predict the streamflow for 1 and 7 days ahead at the Can Tho station on the Hau River in Mekong Delta.

Modeling parameter optimization in daily streamflow prediction

Each model will have a set of parameters, which affects the performance. The SVM model has parameters including gamma, C, and epsilon. The C parameter represents the rate of misclassification in the model. A large value of C shows a low bias and high variance. And a small value of C shows higher bias and lower variance. Gamma is the parameter of a Gaussian kernel, which supports handling nonlinear classification. The values of parameters are optimized based on the empirical processes. The gamma value equals 0.5. The C value equals 2 and the epsilon value equals 0.05. The RF model has several parameters, including max_depth and min_sample_split. However, the model was most affected by the max_depth parameter. The max_depth of a tree in RF is defined as the longest path between the root node and the leaf node. We set 20 as the optimized value of this parameter based on the experimental processes. While for the LSTM model, the performance of this model depends on the number of layers, the number of hidden units, and the number of iterations. These parameters were 2, 128, and 100, respectively. The parameter selections were based primarily on the trial-and-error process.

Figures 4 and 5 show the value of RMSE of the 4-, 8-, and 12-day datasets for the predictions of 1 and 7 days ahead of the LSTM model. These results were used as the reference values to compare with the hybrid models (LSTM-SVM and LSTM-RF). For 1 day ahead, the value of RMSE decreased when the number of iterations increased and changed very slightly from when the number of iterations is greater than 40. In addition, when a large number of iterations were used to optimize the models, the accuracy of the model increased; however, at a certain point, the value of RMSE became stable and it was possible to stop the process. For 7 days ahead, we can see that the value of RMSE was unstable during the first 50 iterations, and from the 50th iteration, the value of RMSE gradually decreased and stabilized. In the first 50 iterations, the search space of the algorithms for finding the optimization solutions is still wide, so the value of RMSE is not stable. While in the last 50 iterations, the search space of the algorithms was narrowed down and the value of RMSE has approached the best value, so, it is more stable than first 50 iterations. Moreover, the trendline of the value of RMSE depends on the property inherited from the proposed algorithms.
Figure 4

RMSE of 4, 8, and 12 previous days for 1 day ahead of LSTM.

Figure 4

RMSE of 4, 8, and 12 previous days for 1 day ahead of LSTM.

Close modal
Figure 5

RMSE of 4, 8, and 12 previous days for 7 days ahead of LSTM.

Figure 5

RMSE of 4, 8, and 12 previous days for 7 days ahead of LSTM.

Close modal

Evaluation of the number of previous days

Tables 1 and 2 show the results of predicting 1- and 7-day streamflows using the 4, 8, and 12 previous days. In general, when the prediction horizon increases, the performance of the prediction model also increases. In the case of the 1-day ahead forecast, for the LSTM model, the value of RMSE decreased from 0.111 to 0.105 and then to 0.075, for the prediction horizons of 4, 8, and 12 days, respectively. Similarly, the value of MAE decreased from 0.08 to 0.076 and then to 0.055. The value of R2 increased from 0.826 to 0.844 and then 0.921. For the LSTM-SVM model, the RMSE value decreased from 0.111 to 0.088 and then to 0.085, the MAE value decreased from 0.078 to 0.066 and then to 0.06, and the R2 value increased from 0.826 to 0.892 and then to 0.899. For the LSTM-RF model, the values of RMSE and MAE decreased from 0.111 to 0.087 and then to 0.086; and from 0.077 to 0.065 and then 0.06, respectively. The value of R2 increased from 0.828 to 0.893 and then to 0.898.

Table 1

Performance of the models for 1-day ahead prediction using 4, 8, and 12 previous days

MachineLSTM-SVMLSTM-RF
Number of previous days 12 12 
RMSE 0.104 0.062 0.06 0.103 0.061 0.059 
MAE 0.08 0.046 0.045 0.075 0.046 0.045 
R2 0.846 0.946 0.954 0.848 0.947 0.953 
MachineLSTM
Number of previous days 12    
RMSE 0.111 0.105 0.075    
MAE 0.08 0.076 0.055    
R2 0.826 0.844 0.921    
MachineLSTM-SVMLSTM-RF
Number of previous days 12 12 
RMSE 0.104 0.062 0.06 0.103 0.061 0.059 
MAE 0.08 0.046 0.045 0.075 0.046 0.045 
R2 0.846 0.946 0.954 0.848 0.947 0.953 
MachineLSTM
Number of previous days 12    
RMSE 0.111 0.105 0.075    
MAE 0.08 0.076 0.055    
R2 0.826 0.844 0.921    
Table 2

Performance of the models for 7-day ahead prediction using 4, 8, and 12 previous days

MachineLSTM-SVMLSTM-RF
Number of previous days 12 12 
RMSE 0.117 0.088 0.085 0.111 0.087 0.086 
MAE 0.085 0.066 0.06 0.084 0.065 0.06 
R2 0.826 0.892 0.899 0.828 0.893 0.898 
MachineLSTM
Number of previous days 12    
RMSE 0.116 0.09 0.089    
MAE 0.081 0.078 0.062    
R2 0.81 0.84 0.889    
MachineLSTM-SVMLSTM-RF
Number of previous days 12 12 
RMSE 0.117 0.088 0.085 0.111 0.087 0.086 
MAE 0.085 0.066 0.06 0.084 0.065 0.06 
R2 0.826 0.892 0.899 0.828 0.893 0.898 
MachineLSTM
Number of previous days 12    
RMSE 0.116 0.09 0.089    
MAE 0.081 0.078 0.062    
R2 0.81 0.84 0.889    

In the case of the 7-day ahead prediction, the performance of the models also increased slightly as the amount of input data increased (the prediction horizon). The value of RMSE and MAE decreased from 0.116 to 0.09 and then 0.089; from 0.081 to 0.078 and then 0.062. The value of R2 increased from 0.81 to 0.84 and then to 0.889 with the prediction horizon of 4, 8, and 12 days for the LSTM model. For the LSTM-SVM model, the value of RMSE and MAE decreased from 0.117 to 0.088 and then to 0.085; and from 0.085 to 0.066 and 0.06. The value of R2 increased from 0.826 to 0.892 and then 0.899. For the LSTM-RF model, the value of RMSE and MAE decreased from 0.111 to 0.087 and then to 0.086; and from 0.084 to 0.065 and 0.06. The R2 value increased from 0.828 to 0.893 and then to 0.898.

Moreover, as shown in Tables 1 and 2, the performance of the hybrid models outperformed the individual model, and in general, the prediction ability of the LSTM-RF model was better than the LSTM-SVM model in the two cases of the prediction of 1 and 7 days ahead.

Evaluation of the 1 and 7 days ahead

For a definitive analysis of the models proposed in this study, two scenarios were used for forecasting (1 and 7 days). The models used in these predictions are considered the streamflow data from the previous 12-day maximum because these models have the best performance (Tables 1 and 2).

In general, as the ahead prediction increased, model performance decreased. In the case of using 4 days of past data to predict 1 and 7 days, the performance of the LSTM-SVM, LSTM-RF, and LSTM models decreased as the ahead prediction increased from 1 to 7 days. Specifically, the value of RMSE increased from 0.104 to 0.117 for the LSTM-SVM model, from 0.103 to 0.111 for the LSTM-RF model, and from 0.111 to 0.116 for the LSTM model. Regarding MAE, the value increased from 0.08 to 0.085 for the LSTM-SVM model, from 0.075 to 0.084 for the LSTM-RF model, and from 0.08 to 0.081 for the LSTM model.

Similarly, in the case of using the preceding 8 days data, the value of RMSE and MAE increased from 0.062 to 0.088 and 0.046 to 0.066 for the LSTM-SVM model, from 0.061 to 0.087 and 0.046 to 0.065 for the LSTM-RF model, and from 0.065 to 0.09 and 0.076 to 0.078 for the LSTM model.

In the case of the 12-day dataset, the values of RMSE and MAE increased from 0.06 to 0.085 and from 0.045 to 0.06 for the LSTM-SVM model, from 0.059 to 0.086 and 0.045 to 0.06 for the LSTM-SVM model, and from 0.075 to 0.089 and 0.055 to 0.062 for the LSTM model.

In this study, the R2 index was used to evaluate performance. Figure 6 shows the R2 value of the LSTM, LSTM-SVM, and LSTM-RF models. With a 4-day dataset predicting 1 and 7 days, the R2 value decreased from 0.846 to 0.826; from 0.848 to 0.828; and from 0.826 to 0.81 for LSTM-SVM, LSTM-RF, and LSTM, respectively. With 8 days of data, R2 decreased from 0.946 to 0.892; from 0.947 to 0.893; and from 0.844 to 0.84 for LSTM-SVM, LSTM-RF, and LSTM, respectively. For 12 days, R2 decreased from 0.954 to 0.899; from 0.953 to 0.898; and from 0.921 to 0.889.
Figure 6

The value of R2 for predicted streamflow of the different forecasting times ahead (1 and 7 days ahead).

Figure 6

The value of R2 for predicted streamflow of the different forecasting times ahead (1 and 7 days ahead).

Close modal

The results showed that the LSTM-RF model has better generalization performance than the other two models (LSTM, LSTM-SVM) both 1 and 7 days ahead. The second class was the LSTM-SVM model, followed by LSTM. The results again confirm the superiority of the hybrid model.

Figures 7 and 8 show the observed streamflow and 1- and 7-day ahead prediction using 4, 8, and 12 previous days for the LSTM, LSTM-SVM, and LSTM-RF models. It is clear that the predicted streamflow was lower than the observed streamflow in the cases of both the 1- and 7-day ahead forecasts. Moreover, the streamflow during the flood season predicted by the proposed models tends to be underestimated (500 m3/s) compared to the observation streamflow. Because this study uses the data-driven approach to predict the streamflow, therefore, the accuracy of the models depends on the input data, however, the data during the flood in this study is limited (about two or three events per year). So they are not enough to train the models. Several studies have applied different methods to reduce these problems, for example, the use of data transformation techniques such as Fourier or wavelet decomposition to process data before use as data from ML models (Wang et al. 2022).
Figure 7

One-day ahead prediction using data from the 4, 8, and 12 previous days.

Figure 7

One-day ahead prediction using data from the 4, 8, and 12 previous days.

Close modal
Figure 8

Seven-day ahead prediction using data from the 4, 8, and 12 previous days.

Figure 8

Seven-day ahead prediction using data from the 4, 8, and 12 previous days.

Close modal

Accurate streamflow prediction plays an important role in water resource management and is one of the most challenging tasks in hydrology, especially in the context of climate change (Rasouli et al. 2012; Kilinc & Haznedar 2022). Although various models have been developed to predict streamflows in rivers around the world, the accuracy of these models is still a big challenge for the global scientific community. Moreover, each model can only solve the problems of a certain region. There is not yet a universal model to solve all problems in all regions, so it is necessary to develop new models. The objective of this study is the development of a state-of-the-art model to predict streamflow, based on LSTM, SVM, and RF. The Mekong River was selected as the area of study because it is the most important transboundary river in Asia, and the construction of upstream dams and climate change have had increasingly profound effects on the downstream streamflow.

We proposed an optimization framework to determine the optimization hyperparameters of the LSTM model using SVM and RF, both of which help the optimization process to converge faster than traditional research methods such as trial-and-error, grid search, or population-based training. Moreover, the combination of models like SVM, RF, and LSTM provides advantages for time-series prediction problems such as river streamflow prediction. These advantages become clear when the optimization is completed. The proposed models have been trained to find the optimization level of the river streamflow, and they can predict the streamflow value in the following days. This situation was justified by comparing the performance of the hybrid model and the individual model. In general, hybrid models have outperformed individual models because several previous studies have pointed out that hybrid models overcome the weak points of individual models. In this study, the LSTM-SVM and LSTM-RF models were better than the LSTM models, because SVM requires less memory and has the ability to process large data. While in addition to ease of use, RF algorithms have advantages for dealing with overfitting issues. So they can improve the performance of the LSTM model (Nguyen 2022a).

Several recent studies have used LSTM to predict the streamflow of the river. Xu et al. (2020) used the LSTM model to predict the 10- and 1-day ahead streamflow in the Hun and Yangtze rivers, respectively. The results showed a value of R2 ranging from 64 to 75%. Girihagama et al. (2022) applied the LSTM model to predict the daily streamflow for ten different watersheds of the Ottawa River watershed. The value of R2 ranged from 50 to 86%. Li et al. (2021) used the LSTM model and its hybrids to predict streamflow in the Baozhusi Hydrological Station in Jialing River, China. The results indicated that the accuracy of the proposed models varied from 70 to 80% for the value of R2. Qi et al. (2019) developed the LSTM and DEL-LSTM (decomposition-ensemble-learning model and LSTM neural network) model to predict daily inflow into the Ankang reservoir from the Han River in northern China. The results showed a value of R2 from 60 to 70%. In addition, on the same study area (the Mekong River delta), Nguyen et al. (2015) used three models, namely, Least Absolute Shrinkage and Selection Operator (LASSO), Random Forests, and Support Vector Regression (SVR) to predict the streamflow in the Mekong River. The results reported the higher MAE value of 0.486. Although there are differences between the study region and the methodology, however, in general, the previous studies have used the same approach as the ML approach to predict the streamflow. It can be seen that the accuracy of our models was similar to that of models in previous studies. So, we can conclude that the performance of the models proposed in this study was consistent with the performance of the models in the literature.

Although several studies and methods have been applied effectively to predict streamflow in the various world rivers (Petty & Dhingra 2018; Adnan et al. 2021c), in recent years, hydrological processes have been strongly influenced by human activities such as dam construction, which causes difficulties in streamflow prediction, especially in extreme events such as floods or droughts (Ahn & Merwade 2014; Zuo et al. 2014; Nguyen et al. 2022). Many researchers have also wondered if these problems can be solved using data related to human activities to train the necessary models. This has been substantiated in previous studies (Sun et al. 2014; Jalali et al. 2021; Shah et al. 2022). However, in the case of the Mekong River, data-sharing issues are seen as major challenges.

The global optimization problem is one of the largest obstacles to using ML in general and deep learning (DL) in particular; that is, whether the models can predict outside the scope of the training dataset. For example, multiple models may work well for predicting streamflow in the short term, but they cannot predict the long term. In theory, this would not be a significant challenge if the training dataset was sufficient and included all possible events. However, one of the disadvantages when using large datasets is the computation time to train the LSTM model. Xu et al. (2020) proposed two solutions to solve these problems: improving the computing capacity of computers, particularly graphics processing units (GPUs), and aggregating individual watershed datasets with similar characteristics to form the LSTM model. This model can work on a regional level to predict the streamflow in other watersheds. In several cases, the collection of sufficient training data is a significant challenge when using the data-driven model. In this case, several authors have developed hybrid models by combining DL with a model with extrapolation (Kişi 2011) or with a physics-based model (Cho & Kim 2022). Cigizoglu (2003) demonstrated that the ANN model has advantages in solving the extrapolation problem and is better than traditional models such as lognormal distribution.

The streamflow of the Mekong Delta in Vietnam is strongly influenced by human activities upstream of the river and climate change. Therefore, the results in this study play an important role in supporting decision-makers or planners in the creation of effective water resource management strategies for the development of agriculture and industry. Moreover, this article has significantly advanced the knowledge on the real applications of ML tools in earth science, which we believe is useful and necessary to solve problems in real life with new technologies. Although this study is applied to predict the streamflow in the Mekong Delta in Vietnam, its results can apply to other rivers around the world.

Although ML in general, and deep learning in particular, have proven effective in predicting streamflow, there are limitations to using the DL model: When setting hyperparameters of the DL model, the initialization of the parameter models was double, including the initialization of the parameters of the LSTM model and the parameters of the two proposed optimization algorithms. Different tuning methods have been used such as trial-and-error, grid search, and population-based training; however, these methods can be time-consuming and resource intensive. In addition, this study uses the water level at the Tan Chau station to predict the streamflow at the Can Tho station. In reality, several factors influence the streamflow such as precipitation and evaporation. In this study, the methodology was adapted to better predict the streamflow in the Mekong Delta. Moreover, although the proposed models were effective in predicting the streamflow, the streamflow prediction from the proposed models tended to be lower than the observed streamflow. This is related to the limitations of the model training data. To improve the generalizability of the model in extreme event cases, it is necessary to collect extreme event data. The other solutions that can be explored in the future to improve the prediction capacity of the model will be the use of other models such as ELM.

Accurate streamflow prediction plays an important role in water resource management and planning. There are several physics-based and data-based models that have been used to predict streamflow; however, each model has different limitations and there are no universal methods to solve all problems in all regions. The objective of this study is the development of a state-of-the-art method and understanding based on LSTM, SVM, and RF to predict the daytime streamflow in the Mekong Delta of Vietnam. Therefore, the results of this study can support decision-makers in the development and sustainable management of water resources.

The results were validated using various statistical indices and comparing individual LSTM model results. Based on the results obtained, we can conclude that:

  • The performance of the hybrid models (LSTM-SVM and LSTM-RF) outperformed the individual model (LSTM) in predicting the daily streamflow.

  • The models proposed in this study with high accuracy (R2 > 80%) successfully predict the daytime streamflow in the Mekong Delta of Vietnam. The new models can be used to predict streamflow in any region, especially in data-limited regions.

  • The prediction results highlighted that the predicted streamflow was lower than the observed streamflow in the cases of both the 1- and 7-day ahead forecasts, particularly in the flood season (300–500 m3/s).

Although this study was successful in building models to predict the streamflow from 1 and 7 days ahead using the prediction horizon of 4, 8, and 12 days in the Mekong Delta, the models should predict streamflow over a longer period with high accuracy, and the construction of models requires a faster process to better support decision-makers or planners in water resource management.

Future research may extend this streamflow prediction approach using different and perhaps more interesting ensemble-learning techniques by combining physics-based models with data-based models. The idea of model integration has the objective of improving the predictive ability to solve complicated hydrological problems. Moreover, in this study, the value of the previous water level and streamflow were used as the input data of the prediction model. In the future, streamflow prediction can consider more factors, such as rainfall, evaporation, and weir effects. The results of this study can be an effective tool for developing water resource management strategies in all regions of the world.

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by H.D.N. and Q.-H.N. The first draft of the manuscript was written by H.D.N., C.P.V., Q.-H.N., and Q.-T.B. All authors read and approved the final manuscript.

No funding was received for this study.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Adnan
R. M.
,
Liang
Z.
,
Trajkovic
S.
,
Zounemat-Kermani
M.
,
Li
B.
&
Kisi
O.
2019
Daily streamflow prediction using optimally pruned extreme learning machine
.
Journal of Hydrology
577
,
123981
.
Adnan
R. M.
,
Jaafari
A.
,
Mohanavelu
A.
,
Kisi
O.
&
Elbeltagi
A.
2021
Novel ensemble forecasting of streamflow using locally weighted learning algorithm
.
Sustainability
13
(
11
),
5877
.
Ahmed
A. N.
,
Othman
F. B.
,
Afan
H. A.
,
Ibrahim
R. K.
,
Fai
C. M.
,
Hossain
M. S.
,
Ehteram
M.
&
Elshafie
A.
2019
Machine learning methods for better water quality prediction
.
Journal of Hydrology
578
,
124084
.
Alizadeh
Z.
,
Yazdi
J.
,
Kim
J. H.
&
Al-Shamiri
A. K.
2018
Assessment of machine learning techniques for monthly flow prediction
.
Water
10
(
11
),
1676
.
Besaw
L. E.
,
Rizzo
D. M.
,
Bierman
P. R.
&
Hackett
W. R.
2010
Advances in ungauged streamflow prediction using artificial neural networks
.
Journal of Hydrology
386
(
1–4
),
27
37
.
Breiman
L.
2001
Random forests
.
Machine Learning
45
(
1
),
5
32
.
Chang
F.-J.
&
Chang
Y.-T.
2006
Adaptive neuro-fuzzy inference system for prediction of water level in reservoir
.
Advances in Water Resources
29
(
1
),
1
10
.
Deka
P.
&
Chandramouli
V.
2003
A fuzzy neural network model for deriving the river stage–discharge relationship
.
Hydrological Sciences Journal
48
(
2
),
197
209
.
Dong
L.
,
Fang
D.
,
Wang
X.
,
Wei
W.
,
Damaševičius
R.
,
Scherer
R.
&
Woźniak
M.
2020
Prediction of streamflow based on dynamic sliding window LSTM
.
Water
12
(
11
),
3032
.
Essam
Y.
,
Huang
Y.
,
Ng
J. L.
,
Birima
A.
,
Najah
A.-M.
&
El-Shafie
A.
2022
Predicting streamflow in Peninsular Malaysia using support vector machine and deep learning algorithms
.
Scientific Reports
12
, 3883.
Firat
M.
&
Güngör
M.
2007
River flow estimation using adaptive neuro fuzzy inference system
.
Mathematics and Computers in Simulation
75
(
3–4
),
87
96
.
Ghimire
S.
,
Yaseen
Z. M.
,
Farooque
A. A.
,
Deo
R. C.
,
Zhang
J.
&
Tao
X.
2021
Streamflow prediction using an integrated methodology based on convolutional neural network and long short-term memory networks
.
Scientific Reports
11
(
1
),
1
26
.
Girihagama
L.
,
Naveed Khaliq
M.
,
Lamontagne
P.
,
Perdikaris
J.
,
Roy
R.
,
Sushama
L.
&
Elshorbagy
A.
2022
Streamflow modelling and forecasting for Canadian watersheds using LSTM networks with attention mechanism
.
Neural Computing and Applications
34,
1
21
.
Guo
J.
,
Xie
Z.
,
Qin
Y.
,
Jia
L.
&
Wang
Y.
2019
Short-term abnormal passenger flow prediction based on the fusion of SVR and LSTM
.
IEEE Access
7
,
42946
42955
.
Haghiabi
A. H.
,
Nasrolahi
A. H.
&
Parsaie
A.
2018
Water quality prediction using machine learning methods
.
Water Quality Research Journal
53
(
1
),
3
13
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
(
8
),
1735
1780
.
Huang
S.
,
Chang
J.
,
Huang
Q.
&
Chen
Y.
2014
Monthly streamflow prediction using modified EMD-based support vector machine
.
Journal of Hydrology
511
,
764
775
.
Hunt
K. M.
,
Matthews
G. R.
,
Pappenberger
F.
&
Prudhomme
C.
2022
Using a long short-term memory (LSTM) neural network to boost river streamflow forecasts over the western United States
.
Hydrology and Earth System Sciences Discussions
26 (21),
1
30
.
Islam
A. R. M. T.
,
Talukdar
S.
,
Mahato
S.
,
Kundu
S.
,
Eibek
K. U.
,
Pham
Q. B.
,
Kuriqi
A.
&
Linh
N. T. T.
2021
Flood susceptibility modelling using advanced ensemble machine learning models
.
Geoscience Frontiers
12
(
3
),
101075
.
Jalali
J.
,
Ahmadi
A.
&
Abbaspour
K.
2021
Runoff responses to human activities and climate change in an arid watershed of central Iran
.
Hydrological Sciences Journal
66
(
16
),
2280
2297
.
Jung
C.
,
Ahn
S.
,
Sheng
Z.
,
Ayana
E. K.
,
Srinivasan
R.
&
Yeganantham
D.
2021
Evaluate river water salinity in a semi-arid agricultural watershed by coupling ensemble machine learning technique with SWAT model
.
JAWRA Journal of the American Water Resources Association
58 (6), 1175–1188.
Khosravi
K.
,
Golkarian
A.
,
Booij
M. J.
,
Barzegar
R.
,
Sun
W.
,
Yaseen
Z. M.
&
Mosavi
A.
2021
Improving daily stochastic streamflow prediction: comparison of novel hybrid data-mining algorithms
.
Hydrological Sciences Journal
66
(
9
),
1457
1474
.
Kisi
O.
&
Cimen
M.
2011
A wavelet-support vector machine conjunction model for monthly streamflow forecasting
.
Journal of Hydrology
399
(
1–2
),
132
140
.
Krajewski
W. F.
,
Ceynar
D.
,
Demir
I.
,
Goska
R.
,
Kruger
A.
,
Langel
C.
,
Mantilla
R.
,
Niemeier
J.
,
Quintero
F.
&
Seo
B.-C.
2017
Real-time flood forecasting and information system for the state of Iowa
.
Bulletin of the American Meteorological Society
98
(
3
),
539
554
.
Kratzert
F.
,
Klotz
D.
,
Herrnegger
M.
,
Sampson
A. K.
,
Hochreiter
S.
&
Nearing
G. S.
2019
Toward improved predictions in ungauged basins: exploiting the power of machine learning
.
Water Resources Research
55
(
12
),
11344
11354
.
Kumari
N.
,
Srivastava
A.
,
Sahoo
B.
,
Raghuwanshi
N. S.
&
Bretreger
D.
2021
Identification of suitable hydrological models for streamflow assessment in the Kangsabati River Basin, India, by using different model selection scores
.
Natural Resources Research
30
(
6
),
4187
4205
.
Le
X.-H.
,
Nguyen
D.-H.
,
Jung
S.
,
Yeon
M.
&
Lee
G.
2021
Comparison of deep learning techniques for river streamflow forecasting
.
IEEE Access
9
,
71805
71820
.
Lin
Y.
,
Wang
D.
,
Wang
G.
,
Qiu
J.
,
Long
K.
,
Du
Y.
,
Xie
H.
,
Wei
Z.
,
Shangguan
W.
&
Dai
Y.
2021
A hybrid deep learning algorithm and its application to streamflow prediction
.
Journal of Hydrology
601
,
126636
.
Malik
A.
,
Tikhamarine
Y.
,
Souag-Gamane
D.
,
Kisi
O.
&
Pham
Q. B.
2020
Support vector regression optimized by meta-heuristic algorithms for daily streamflow prediction
.
Stochastic Environmental Research and Risk Assessment
34
(
11
),
1755
1773
.
Melesse
A.
,
Ahmad
S.
,
McClain
M.
,
Wang
X.
&
Lim
Y.
2011
Suspended sediment load prediction of river systems: an artificial neural network approach
.
Agricultural Water Management
98
(
5
),
855
866
.
Melesse
A. M.
,
Khosravi
K.
,
Tiefenbacher
J. P.
,
Heddam
S.
,
Kim
S.
,
Mosavi
A.
&
Pham
B. T.
2020
River water salinity prediction using hybrid machine learning models
.
Water
12
(
10
),
2951
.
Nguyen
H. D.
2022a
Daily streamflow forecasting by machine learning in Tra Khuc River in Vietnam
.
Science of the Earth
45 (1), 82–97.
Nguyen
T.-T.
,
Huu
Q. N.
&
Li
M. J.
2015
Forecasting time series water levels on Mekong river using machine learning models
. In:
2015 Seventh International Conference on Knowledge and Systems Engineering (KSE)
, Ho Chi Minh City, Vietnam.
IEEE
, New York, pp.
292
297
.
Nguyen
T. G.
,
Nguyen
H. D.
,
Hoang
T. T.
,
Pham
D. H. B.
,
Tran
N. A.
,
Dang
D. K.
&
Nguyen
H. P.
2022
Assessment of upbasin dam impacts on streamflow at Chiang Saen gauging station during the period 1960–2020 in the context of statistical studies
.
River Research and Applications
38 (7), 1237–1253.
Ni
L.
,
Wang
D.
,
Singh
V. P.
,
Wu
J.
,
Wang
Y.
,
Tao
Y.
&
Zhang
J.
2020
Streamflow and rainfall forecasting by two long short-term memory-based models
.
Journal of Hydrology
583
,
124296
.
Paniconi
C.
&
Putti
M.
2015
Physically based modeling in catchment hydrology at 50: survey and outlook
.
Water Resources Research
51
(
9
),
7090
7129
.
Peng
F.
,
Wen
J.
,
Zhang
Y.
&
Jin
J.
2020
Monthly streamflow prediction based on random forest algorithm and phase space reconstruction theory
.
Journal of Physics: Conference Series
1637,
012091
.
Petty
T.
&
Dhingra
P.
2018
Streamflow hydrology estimate using machine learning (SHEM)
.
JAWRA Journal of the American Water Resources Association
54
(
1
),
55
68
.
Qi
Y.
,
Zhou
Z.
,
Yang
L.
,
Quan
Y.
&
Miao
Q.
2019
A decomposition-ensemble learning model based on LSTM neural network for daily reservoir inflow forecasting
.
Water Resources Management
33
(
12
),
4123
4139
.
Rahimzad
M.
,
Moghaddam Nia
A.
,
Zolfonoon
H.
,
Soltani
J.
,
Danandeh Mehr
A.
&
Kwon
H.-H.
2021
Performance comparison of an LSTM-based deep learning model versus conventional machine learning algorithms for streamflow forecasting
.
Water Resources Management
35
(
12
),
4167
4187
.
Rasouli
K.
,
Hsieh
W. W.
&
Cannon
A. J.
2012
Daily streamflow forecasting by machine learning methods with weather and climate inputs
.
Journal of Hydrology
414
,
284
293
.
Sahoo
S.
,
Russo
T.
,
Elliott
J.
&
Foster
I.
2017
Machine learning algorithms for modeling groundwater level changes in agricultural regions of the US
.
Water Resources Research
53
(
5
),
3878
3895
.
Samantaray
S.
,
Sawan Das
S.
,
Sahoo
A.
&
Prakash Satapathy
D.
2022
Monthly runoff prediction at Baitarani river basin by support vector machine based on Salp swarm algorithm
.
Ain Shams Engineering Journal
13
(
5
),
101732
.
Singha
S.
,
Pasupuleti
S.
,
Singha
S. S.
,
Singh
R.
&
Kumar
S.
2021
Prediction of groundwater quality using efficient machine learning technique
.
Chemosphere
276
,
130265
.
Sun
A. Y.
,
Wang
D.
&
Xu
X.
2014
Monthly streamflow forecasting using Gaussian process regression
.
Journal of Hydrology
511
,
72
81
.
Valença
M.
&
Ludermir
T.
2000
Monthly stream flow forecasting using an neural fuzzy network model
. In:
Proceedings. Vol. 1. Sixth Brazilian Symposium on Neural Networks
Rio de Janeiro, Brazil.
IEEE
, New York, pp.
117
119
.
Vapnik
V.
,
Guyon
I.
&
Hastie
T.
1995
Support vector machines
.
Machine Learning
20
(
3
),
273
297
.
Vojtek
M.
,
Vojteková
J.
,
Costache
R.
,
Pham
Q. B.
,
Lee
S.
,
Arshad
A.
,
Sahoo
S.
,
Linh
N. T. T.
&
Anh
D. T.
2021
Comparison of multi-criteria-analytical hierarchy process and machine learning-boosted tree models for regional flood susceptibility mapping: a case study from Slovakia
.
Geomatics, Natural Hazards and Risk
12
(
1
),
1153
1180
.
Wang
K.
,
Band
S. S.
,
Ameri
R.
,
Biyari
M.
,
Hai
T.
,
Hsu
C.-C.
,
Hadjouni
M.
,
Elmannai
H.
,
Chau
K.-W.
&
Mosavi
A.
2022
Performance improvement of machine learning models via wavelet theory in estimating monthly river streamflow
.
Engineering Applications of Computational Fluid Mechanics
16
(
1
),
1833
1848
.
Xu
W.
,
Jiang
Y.
,
Zhang
X.
,
Li
Y.
,
Zhang
R.
&
Fu
G.
2020
Using long short-term memory networks for river flow prediction
.
Hydrology Research
51
(
6
),
1358
1376
.
Yan
L.
,
Chen
C.
,
Hang
T.
&
Hu
Y.
2021
A stream prediction model based on attention-LSTM
.
Earth Science Informatics
14
(
2
),
723
733
.
Yaseen
Z. M.
,
Jaafar
O.
,
Deo
R. C.
,
Kisi
O.
,
Adamowski
J.
,
Quilty
J.
&
El-Shafie
A.
2016
Stream-flow forecasting using extreme learning machines: a case study in a semi-arid region in Iraq
.
Journal of Hydrology
542
,
603
614
.
Yaseen
Z. M.
,
Allawi
M. F.
,
Yousif
A. A.
,
Jaafar
O.
,
Hamzah
F. M.
&
El-Shafie
A.
2018
Non-tuned machine learning approach for hydrological time series forecasting
.
Neural Computing and Applications
30
(
5
),
1479
1491
.
Zealand
C. M.
,
Burn
D. H.
&
Simonovic
S. P.
1999
Short term streamflow forecasting using artificial neural networks
.
Journal of Hydrology
214
(
1–4
),
32
48
.
Zhang
G.
,
Wang
M.
&
Liu
K.
2019
Forest fire susceptibility modeling using a convolutional neural network for Yunnan province of China
.
International Journal of Disaster Risk Science
10
(
3
),
386
403
.
Zhu
S.
,
Luo
X.
,
Yuan
X.
&
Xu
Z.
2020
An improved long short-term memory network for streamflow forecasting in the upper Yangtze River
.
Stochastic Environmental Research and Risk Assessment
34
(
9
),
1313
1329
.
Ziegler
A.
&
König
I. R.
2014
Mining data with random forests: current options for real-world applications
.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
4
(
1
),
55
63
.
Zuo
D.
,
Xu
Z.
,
Wu
W.
,
Zhao
J.
&
Zhao
F.
2014
Identification of streamflow response to climate change and human activities in the Wei River Basin, China
.
Water Resources Management
28
(
3
),
833
851
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).