Quality and quantity of streamflow are crucial components in the management and control of water resources according which are challenging due to their nonstationarity and uncertainty path. This paper presented an ensemble data pre-processing-based machine learning (ML) algorithm for the decision-support of water resource management and water pollution control at the watershed scale due to the nonlinear path of streamflow. In the proposed hybrid model, a new time–frequency analysis algorithm, variational mode decomposition (VMD), is implemented to deal with the nonlinearity and nonstationary of a streamflow process. The VMD is exploited to decompose the original water quality and quantity series into a series of intrinsic mode functions (IMFs) with different frequencies. Therefore, an ensemble algorithm, bootstrap aggregating (bagging) algorithm is coupled with two common ML, reduced error pruning tree (REPT) and random forest (RF), to predict all the decomposed modes using VMD. Then, in order to reduce the variance among the base classifiers of the proposed ML, a bootstrap aggregation technique was recruited. Finally, the predicting value of the original water quality and quantity series is obtained by adding up the predicting results of all the decomposed modes. The proposed hybrid decomposition–ensemble model has been applied to two stations in Karoon River, Iran. Results obtained from this study indicate that the proposed hybrid decomposition–ensemble model can capture the nonlinear characteristics of a streamflow process in terms of water quality and quantity simultaneously and thus provide more accurate predicting results compared with those models without data frequency decomposing.

  • This study has applied machine learning (ML) techniques simultaneously for modelling river water quality.

  • A new ensemble ML technique has been developed to assess and predict two important water quality and quantity parameters.

  • The bagging ensemble algorithm is used with ML techniques, where it significantly raises the stability of models in improving accuracy.

  • A VMD data pre-processing technique is recommended to enhance the model's fidelity.

Graphical Abstract

Graphical Abstract
Graphical Abstract

The availability of water resources is critical to the socio-economic development of watersheds, which is dependent on both water quantity and quality (Dessu et al. 2014). Rapid population growth and concomitant socio-economic development, on the other hand, are putting more strain on water supplies and providing issues for watershed management (Black et al. 2014; Safavi et al. 2015). Natural factors such as temperature, geography, terrain, and geology, as well as human activities impact water quantity and quality. In such a dynamic world as the global environment, the ever-growing population, and major environmental contamination, a water crisis in both quantity and quality seems to be unavoidable (Erdlenbruch et al. 2014; Yu et al. 2016; Shen 2018).

In river basin water resource management, the coordinated operation of water quantity and quality is among the most successful ways of water pollution control and environmental restoration (Banks et al. 2011; Jang et al. 2012; Shokri et al. 2014). Many attempts have been made in recent decades to manage river basin water resources with a focus on simultaneously evaluating both water quantity and quality in the river basin water resource management. Hydrological models have been extensively used to predict/forecast river water quantity, but owing to the intricacy of the processes controlling river water quality, current modelling attempts have had little success in reliably predicting water quality (He et al. 2011; Rezaie-Balf et al. 2019a; Melesse et al. 2020). The processes that regulate transit differ depending on the kind of river pollution. The pollutant released by stormwater runoff into receiving waterways in an urban watershed varies depending on the location of the pollutant's initial source. Pollutants that originate on the ground versus those that originate in the storm sewer system, for example. This is only one example of how building physically based river quality models add to the complexity. Most water quantity and quality models used today (such as SWMM, HEC-RAS, MIKE, and others) are based on physical–empirical ideas of river discharge and pollutant formation and transfer by the river. A calibration approach estimates/or adjusts some of the variables driving the processes that are unknown or cannot be monitored directly in the field. Furthermore, the chemical, physical, and biological mechanisms that influence the quality of stormwater runoff are not entirely understood. In addition, the information and data used in numerical models are rarely available, even in heavily instrumented watersheds, and the utilization of spatially varying data generally results in numerical convergence and instability problems (Tayfur 2002). Besides, physically based models involve some uncertainties that arise from applied information and model's structure, and it is important when the results should be used in some sort of decision-making process. Further research is needed to enhance the forecasting effectiveness of quantity and quality modelling due to the limited knowledge and complicated behaviour of such hydrological systems (Obropta & Kardos 2007; Xie et al. 2021).

Recently, machine learning (ML) approaches such as adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN), model tree (MT), gene expression programming (GEP), support vector machine (SVM), and extreme learning machine (ELM) have been widely extensively developed designed for solving various environmental engineering and water quality problems (Solomatine & Xue 2004; Anctil Lauzon & Filion 2008; Shiri et al. 2011; Najafzadeh et al. 2016; Yassin et al. 2016; Li et al. 2018; Rezaie-Balf & Kisi 2018; Kim et al. 2019; Mohammadi et al. 2020, 2021; Shamshirband et al. 2020; Meshram et al. 2021; Wang et al. 2021a, 2021b). For example, Chen & Chau (2016) developed a hybrid double feedforward neural network to estimate the daily suspended sediment load (SSL) of the Muddy Creek in Montana of USA, which provided an appropriate method to model the sediment transport process with nonlinear, fuzzy, and time-varying characteristics. Liu et al. (2013) apply a genetic algorithm optimized support vector regression (SVR) to predict the water temperature and dissolved oxygen (DO) concentration. Various water quality parameters in Malaysia's Johor River, including DO, electrical conductivity (EC), total dissolved solids (TDS), and turbidity, were predicted by an ANN by Najah et al. (2013).

Ghavidel & Montaseri (2014) used ANNs and two different ANFIS – one with grid partition and one with subtractive clustering – to predict TDS in Iran's Zarinehroud basin. Emamgholizadeh et al. (2014) used the multi-layer perceptron (MLP), radial basis network (RBF), and ANFIS models to forecast DO, biochemical oxygen demand (BOD), and chemical oxygen demand (COD) in Karoon River water simultaneously. In order to decrease the impact of the initial weight parameter problem and imbalanced training dataset, Kim & Seo (2015) apply the ensemble modelling technique and several clustering methods to improve the performance of ANN modelling. Barzegar et al. (2016) employed ANN, ANFIS, and other two models, which combine wavelet analysis with ANN and ANFIS to predict ion concentration in the Aji-Chay River (Iran). In Olyaie's study, four models including ANN, ANFIS, coupled wavelet and neural network (WANN), and conventional sediment rating curve (SRC) approaches are used to estimate the daily SSL in two gauging stations in the USA, and WANN shows better performance (Olyaie et al. 2015).

Using traditional ELM and some improved models, such as online sequential ELM (OS-ELM) and RBF-based ELM (R-ELM), DO at eight sites monitored by the US Geological Survey (USGS) was predicted in Heddam's research (Heddam & Kisi 2017). However, the study discovered that the prediction impact of ELM and its modified model varies depending on the monitoring site and parameters, implying that the same model may function well for one site or parameter but not for another. Wang et al. (2021a, 2021b) implemented an ensemble hybrid forecasting model to predict the annual runoff in China. They presented dual-decomposition methods that consisted of extreme-point symmetric mode decomposition and wavelet packet decomposition to overcome the nonstationary in the time–frequency dataset. They found that the proposed ensemble hybrid forecasting model outperforms other benchmark models in terms of four evaluation indexes. Ai et al. (2022) applied several ML models to forecast medium- and long-term runoff of the Yalong River basin, China. To achieve the optimum lag periods, the delay correlation analysis is recruited and a medium- and long-term runoff combined forecasting model based on different lag periods was proposed. They concluded that the lag period of physical factors’ delay can affect the accuracy of runoff forecasting.

Moreover, the single ML models cannot directly identify water quality and quantity change patterns because of the complicated nonlinearity, multiscale variability, and random distribution of such time-series datasets. In this sense, some pre-processing algorithms such as wavelet transform (WT), empirical mode decomposition (EMD), and ensemble EMD have attracted the attention of many scholars for removing the most nonlinear and disorderly noise from the original series (Huang et al. 2014; Mouatadid et al. 2019; Rezaie-Balf et al. 2020; Wang et al. 2020; Liu et al. 2022). However, most of these algorithms suffer from drawbacks with respect to different aspects of actual signal decomposition.

Literature review showed that many scholars applied single ML for modelling water quality or quantity even in the recent years. Due to the nonlinearity and nonstationary that hydrological dataset inherently consists, single ML models cannot well capture the data characteristics of time series and can easily cause the local optimum problem (Liu et al. 2018; Rezaie-Balf et al. 2019b). Another point that has received little attention in the research method literature was reducing the variance of a decision tree classifier. The main causes of error in learning are due to noise, bias, and variance (Koohestani et al. 2019). This problem can be solved by ensemble techniques such as bagging and boosting, where a set of weak learners are combined to create a strong learner that obtains better performance than a single one.

Despite the fact that single ML techniques and their upgraded versions have obtained outstanding results in the area of water quality prediction, there are still two flaws: (1) according to the relevant literature, due to the complicated nonlinearity, extreme irregularity and multiscale variability of data on natural water quality and quantity have a long-term tendency (Sun et al. 2022). However, the standard shallow model's capacity to describe long-term dependency is limited. Furthermore, due to the shallow construction, learning complicated nonlinear connections may be difficult (Shang et al. 2014; Jiang et al. 2016).

(2) In real-world water-monitoring applications, the low stability of a single model in forecasting diverse water quality and quantity indices may result in a high percentage of false negatives and false alarms (Huang et al. 2017). Thus, the combined multiple models as ensemble models can efficiently improve the overall accuracy of the ML models and thanks to the architecture of the ensemble learners, the inputs are passed to each weak learner while collecting their predictions (Khosravi et al. 2021).

By reviewing the literature, studies on applying ML techniques simultaneously for modelling river water quality and quantity are very limited. Thus, in this study, a new ensemble ML technique has been developed to assess and predict two important water quality and quantity parameters (i.e., TDS and river discharge) at the Karoon basin. In this study, the bagging ensemble algorithm is used with ML techniques, where it significantly raises the stability of models in improving accuracy and reducing variance, which eliminates the challenge of overfitting. In order to overcome the seasonality and nonlinearity of time-series records, the variational mode decomposition (VMD) data pre-processing technique is recommended to enhance model's fidelity and performance in modelling.

In this study, two ML algorithms, reduced error pruning tree (REPT) and RF, were developed to simulate and predict water discharge and TDS of two stations on the Karoon River on a monthly scale. In addition, to address and improve model's stability and accuracy, bootstrap aggregating is presented to reduce variance within a noisy dataset, and it helps to avoid overfitting of models. Also, to decrease or eliminate the effect of noise in the raw data on the prediction results, it is necessary to use data pre-processing techniques, where a VMD algorithm containing IMFs is employed to overcome the nonstationary and complexity of the hydrological dataset. Therefore, the descriptions of REPT, RF, Bagging, and VMD methods are briefly presented in this section. The flowchart of the proposed models is depicted in Figure 1.

Figure 1

Step-by-step framework of the present study for combining VMD and bagging algorithm with REPT and RF models.

Figure 1

Step-by-step framework of the present study for combining VMD and bagging algorithm with REPT and RF models.

Close modal

Reduced error pruning tree

The REPT method, as one of the rapid decision tree learning methods, is based on the notion of processing information using entropy and error reduction (Witten et al. 2005). Entropy is a popular criterion that is applied to find the best variable(s) to split on when building a decision tree algorithm. This technique chooses the regression tree logic and then chooses the best tree from among the created trees to use in calculations (Pham et al. 2021). It has a reasonable capacity to simplify the modelling process by using training datasets when the output is large, and it minimizes the complexity of tree topologies (Mohamed et al. 2012). The pruning procedure in this approach chooses the backward overfitting problem and tries to produce the best tree's minimum version using a post-pruning strategy (Figure 2). The suggested technique's effectiveness is heavily reliant on the information obtained from entropy, reduced variance, and error trimming approaches (Srinivasan & Mekala 2014). Based on the entropy function, the information gain (IG) values were calculated as follows:
(1)
Figure 2

Reduced error pruning tree diagram.

Figure 2

Reduced error pruning tree diagram.

Close modal

In successive trimming steps, the IG examines all predictors from the training dataset (N). REPT may assist in minimizing the complexity of decision trees by deleting branches and leaves from the decision tree structure, which might result in overfitting and lower interpretability of a technique (Khosravi et al. 2018).

Random forest

Random forest (RF) is a state-of-the-art hierarchical method proposed by Breiman (2001) to establish the right relationship between output and input variables across AI models. RF is a tree-based ensemble learning technique that raises several decision trees throughout the model building process, each of which in the ensemble model is trained by a bootstrapped sample of the main input dataset. In the end, the mean of these estimate trees is used to get the output estimation values. The RF algorithm's unique evolution is as follows (Svetnik et al. 2003; Cootes et al. 2012):

  1. Using the bootstrap resampling approach, randomly choose k samples from the original training dataset X (N samples) to create k regression trees. p = (1 − 1/N)N is used to calculate the likelihood that samples will not be drawn in this method. If N reaches infinity, p = 0.37, indicating that around 37% of the samples from the initial training dataset X are not drawn, and these data are referred to as out-of-bag (OOB) data. These OOB data, like the training dataset, may be used to test samples.

  2. Additionally, unpruned regression trees with k bootstrap samples are constructed. During the tree-growing process, one attribute is chosen at random from all the A attributes as an internal node (a < A). Then, using the minimal Gini index method, an optimal attribute is derived from a split variable to construct the branch's growth.

  3. The produced k regression trees make up the final RF regression model. Two indices, coefficients of mean square error of OOB (MSEOOB) and determination (), are used to assess the model estimate performance.
    (2)
    (3)
    where n denotes the total OOB samples, yi and ŷi represent the observed and predicted target values, and denotes the OOB variance of the predicted target. The process diagram of the RF is illustrated in Figure 3.

Figure 3

RF diagram.

Variational mode decomposition

The VMD algorithm is considered to concurrently decompose a sophisticated signal into several band-limited intrinsic modes (Sun et al. 2022) and is presented as a novel approach for quasi-orthogonal and adaptive signal decomposition (Dragomiretskiy & Zosso 2013). The VMD method may decompose a signal x(t) into a K discrete number of sub-signals or modes uk, each of which is compacted around its particular centre frequency wk. The VMD is used as a restricted optimization issue as follows (Dragomiretskiy & Zosso 2013):
(4)
where and represent stenography impression for all modes and their central frequencies, respectively. Furthermore, the convolution and Dirac distribution are denoted by * and δ(t), respectively. In order to turn this limited optimization problem into an unconstrained one, the terms of the quadratic penalty and Lagrangian multipliers are defined (Wang et al. 2015):
(5)

The alternate direction multiplier techniques may be used to answer Equation (5). Also, Equation (6) has been proven to be optimized in two phases:

  1. Minimization:
    (6)
  2. Minimization:
    (7)
    where , , , and represent the Fourier transform of , , f(ω), and λ(ω), respectively, and n is the iteration number.

    When compared to the EEMD technique, VMD causes no residual noise in the modes and may prevent duplicate modes. VMD is also an adaptive signal decomposition approach that non-recursively decomposes a multi-component signal into several quasi-orthogonal IMFs (Rezaie-Balf et al. 2019a).

Bootstrap aggregation (bagging)

One of the drawbacks of decision trees is that single tree models suffer from high variance. Although pruning the tree helps reduce this variance, there are alternative methods that actually exploit the variability of single trees in a way that can significantly improve performance over and above that of single trees. Bootstrap aggregation (bagging), a meta-based decision tree initially presented by Breiman (1996), is a popular, effective, and valuable ensemble learning approach. This technique may help increase the accuracy of categorization and generalize data patterns (Roshan & Asadi 2020). Bootstrap samples are generated at random from the training set. To obtain these bootstrap samples, changing the same number of elements as the original set is needed. Partially guided samples may either reject or filter partial noise data, implying that the classifiers in these sets perform better than those in the original sets. As a result, bagging aids in the development of a better classifier for training sets, including noisy data. Bagging can be broken down into three easy processes (Dong et al. 2020):

  1. The training dataset was used to generate m samples at random. In the bootstrapped samples, somewhat varied datasets are acceptable, but the distribution should be the same as the total training set.

  2. According to Breiman (1996), the classifier for each bootstrap sample should be created by training a single, unpruned regression tree rather than individually pruned trees.

  3. Before calculating the average individual predictions of these numerous classifiers, the average prediction value is calculated.

The Karoon River is Iran's biggest and only navigable river, with a length of 950 km and the highest discharge. It comes from the Zagros Mountain range's Zard Kuh Mountains in the Bakhtiari area. Before crossing through Ahwaz, the capital of Khuzestan Province, it receives various tributaries, the most important of which being the Dez River (Emamgholizadeh et al. 2014). Above the Khorramshahr delta, Karoon splits into two main branches before entering the Persian Gulf: the Haffar and the Bahmanshir (still called Karoon). After that, the branches join the Arvand Rud, which empties into the Persian Gulf. The watershed of the Karoon River spans two provinces and comprises 65,230 km2. The river's average discharge is 575 m2 per second (Aminiyan et al. 2018; Ebadati & Hooshmandzadeh 2019). Water quality zoning is required in this system in order to control water quality more effectively. The monthly water quantity and quality information are used to develop ensemble models for estimating TDS parameters and water discharge at the Molasani (longitude 31 °35′01″, latitude 48 °52′40″) and Farsiat (longitude 31 °07′25″, latitude 48 °24′00″) gauging stations on the Karoon River (Figure 4). Time-series data for Molasani and Farsiat stations were available, respectively, from 1968–2020 and 1985–2020. Data from 1968 to 2007 for Molasani and from 1985 to 2011 were used to calibrate proposed models, and to validate the constructed models, data from 2012 to 2020 for Molasani and from 2008 to 2020 were considered. The Khuzestan Water and Power Authority measured the following variables at the Farsiat and Molasani stations:

Figure 4

Map of the Karun River as the case study.

Figure 4

Map of the Karun River as the case study.

Close modal

TDS, pH, sulphates (), bicarbonates (), chlorides (), potassium (), sodium (), calcium (), magnesium (), as along with river discharge as m3/s.

Table 1 lists the statistical characteristics of the river water quantity and quality monitoring stations employed in this research (i.e., Farsiat and Molasani). Based on Table 1, TDS had the highest concentration (2,125 mg/l) among physiochemical variables in the Karoon River. Furthermore, when comparing TDS records to input variables, the standard deviation value derived for this parameter revealed that the standard deviation value of TDS records was distributed across a greater range of values.

Table 1

Statistical indices for the studied stations located in the Karoon basin

VariablesStatistical indices
MinimumAverageMaximumVarianceSkewnessStandard deviation
Molasani station 
90.00 681.45 6,873.50 677,317.32 3.89 822.99 
pH 6.90 7.89 8.90 0.08 −0.36 0.29 
HCO3 0.58 2.90 4.66 0.22 −0.30 0.47 
Cl 1.45 7.42 22.10 14.86 1.01 3.86 
SO4 0.90 4.42 19.14 6.27 1.41 2.50 
Ca 1.47 4.69 19.14 3.31 1.93 1.82 
Mg 0.55 2.54 7.18 1.20 1.11 1.10 
Na 1.35 7.51 22.24 15.52 0.98 3.94 
0.01 0.25 5.00 0.91 4.79 0.95 
TDS 278.00 933.00 2,125.00 140,203.95 0.77 374.44 
Farsiat station 
60.00 524.45 3,016.00 242,284.21 2.71 492.22 
pH 6.01 7.88 8.90 0.13 −0.52 0.36 
HCO3 0.70 3.01 4.51 0.35 −0.77 0.59 
Cl 2.70 9.25 26.80 16.58 0.94 4.07 
SO4 1.34 5.90 18.30 7.69 0.91 2.77 
Ca 1.80 5.42 17.40 3.82 1.28 1.96 
Mg 0.34 3.26 7.77 1.69 0.63 1.30 
Na 2.80 9.52 25.04 18.06 0.87 4.25 
0.01 0.07 0.25 0.00 1.45 0.03 
TDS 425.00 1,146.21 2,473.00 161,692.71 0.58 402.11 
VariablesStatistical indices
MinimumAverageMaximumVarianceSkewnessStandard deviation
Molasani station 
90.00 681.45 6,873.50 677,317.32 3.89 822.99 
pH 6.90 7.89 8.90 0.08 −0.36 0.29 
HCO3 0.58 2.90 4.66 0.22 −0.30 0.47 
Cl 1.45 7.42 22.10 14.86 1.01 3.86 
SO4 0.90 4.42 19.14 6.27 1.41 2.50 
Ca 1.47 4.69 19.14 3.31 1.93 1.82 
Mg 0.55 2.54 7.18 1.20 1.11 1.10 
Na 1.35 7.51 22.24 15.52 0.98 3.94 
0.01 0.25 5.00 0.91 4.79 0.95 
TDS 278.00 933.00 2,125.00 140,203.95 0.77 374.44 
Farsiat station 
60.00 524.45 3,016.00 242,284.21 2.71 492.22 
pH 6.01 7.88 8.90 0.13 −0.52 0.36 
HCO3 0.70 3.01 4.51 0.35 −0.77 0.59 
Cl 2.70 9.25 26.80 16.58 0.94 4.07 
SO4 1.34 5.90 18.30 7.69 0.91 2.77 
Ca 1.80 5.42 17.40 3.82 1.28 1.96 
Mg 0.34 3.26 7.77 1.69 0.63 1.30 
Na 2.80 9.52 25.04 18.06 0.87 4.25 
0.01 0.07 0.25 0.00 1.45 0.03 
TDS 425.00 1,146.21 2,473.00 161,692.71 0.58 402.11 

The acceptability and reliability of proposed models need to evaluate in the present research and to assess the ensemble VMD-based models’ performance with original REPT and RF models, and several error indicators were applied and their relationships are as follows:

  • 1.
    Coefficient of determination (R2):
    (8)
  • 2.
    Root mean square error (RMSE):
    (9)
  • 3.
    Mean absolute error (MAE):
    (10)
  • 4.
    Nash–Sutcliffe efficiency (NSE):
    (11)
  • 5.
    Willmott's index of agreement (WI):
    (12)
  • 6.
    Legates–McCabe's index (LMI):
    (13)
    where and are the observed and forecasting values, respectively; and represent the mean of the observed and predicted values, respectively, and N is the number of time-series data.

Data analysis and predictor selection

One of the most remarkable steps in the development of model architectures is to determine the best input variable for modelling. TDS co-variabilities with SO4, Cl, pH, HCO3, Ca, K, Mg, and Na as physiochemical variables are investigated by the Pearson Coefficient that provides the dependency among several variables simultaneously. For evaluating the relationships among the datasets, the correlation factor which varies between −1 and +1 has been applied. In addition, the linear dependency between two variables for the Molasani and the Farsiat stations is plotted as a graphical bar plot (Figure 5). As illustrated, the monthly TDS has a high correlation with monthly Na (0.96) and Cl (0.95) for both Molasani and Farsiat stations. Three physiochemical variables such as pH, HCO3, and K at both stations show the least correlation with TDS and are preferred to remove from TDS modelling using ensemble ML techniques. In this regard, the following parameters are selected as the optimum input variables for modelling.
(14)
Figure 5

TDS correlation versus other physicochemical parameters at Molasani and Farsiat stations.

Figure 5

TDS correlation versus other physicochemical parameters at Molasani and Farsiat stations.

Close modal

The original river discharge (nondecomposed) with its statistically substantial lagged variables, which is determined by correlation analysis, are applied as inputs for developing the models.

The dependency of the river discharge on its antecedent values is shown in Figure 6, in which the vertical axis indicates the time delay (lag number) and the horizontal axis shows correlation coefficient values. In this study, lagged time with R > 0.5 is selected as the influential variable for Karoon River discharge at each station. Time delays applied to the models are marked in all diagrams. According to Figure 6, three antecedent values are important for modelling streamflow at the Molasani station. Clearly, the lag number for the Farsiat station is selected two as the input variables.

Figure 6

Cross-correlation between river discharge and its antecedent values at Molasani and Farsiat stations.

Figure 6

Cross-correlation between river discharge and its antecedent values at Molasani and Farsiat stations.

Close modal
In addition to correlation analysis, partial auto-correlation function (PACF) values are depicted in Figure 7, in order to illustrate the number of antecedent months as the input variables to ML models at the Molasani and Farsiat stations. According to Figure 7, the upper or lower than ±1.96/√n should be considered as the effective input variables for streamflow forecasting. It can be seen that the results for both stations are almost the same as calculated in correlation analysis and the selected lag time is 3 and 2, respectively, for Molasani and Farsiat stations. After analysing the existed input variables for the proposed stations, the best models standing at an acceptable level of accuracy returned by ensemble models for Molasani and Farsiat stations are expressed as follows:
(15)
(16)
Figure 7

The PACF graphs of the monthly streamflow for Molasani and Farsiat stations.

Figure 7

The PACF graphs of the monthly streamflow for Molasani and Farsiat stations.

Close modal

Prediction results at the Molasani station

For the Molasani gauging station, this section describes the water quantity and quality of the Karoon River. Before presenting the evaluation results, it should be noted that the minimum number of instances per leaf and the maximum tree depth as the hyperparameters of REPT, respectively, were selected 1 and 3. In RF, bag size percent and the number of the leaf were obtained at 150 and 8, respectively. The number of iterations for both models was 200. The VMD algorithm parameters are trained by 4 (number of modal decompositions) and 1000 (penalty factor). Tables 2 and 3 summarize the predictive abilities of the solo and ensemble RF and REPT models for TDS prediction and streamflow for both the training and validation datasets. When comparing standalone RF and REPT models for TDS prediction at the Molasani station, the RF model outperformed REPT in terms of LMI, WI, NSE, and R2, with LMI, WI, NSE, and R2 of 0.68, 0.97, 0.903, 0.91 and 0.53, 0.95, 0.802, 0.83 during model training and validation, respectively. As demonstrated, using bagging ensemble learning to improve the performance of both solo models in predicting TDS on a monthly time frame greatly improved their performance. In this context, the suggested ensembles B-RF and B-REPT enhanced the RMSE values in the prediction of TDS by 38.77 and 1.5%, respectively, over the solo ones.

Table 2

Evaluation metrics of the proposed models in the training and validating periods at the Molasani station for TDS prediction

ModelsStatistical error indices
R2RMSE (mg/l)NSEMAE (mg/l)WILMI
Total available data in the training stage   
REP 0.82 246.961 0.427 211.361 0.91 0.18 
RF 0.91 101.745 0.903 81.253 0.97 0.68 
B-REP 0.9 102.41 0.901 78.65 0.97 0.69 
B-RF 0.95 72.032 0.951 47.684 0.98 0.81 
VMD-B-REP 0.96 65.53 0.96 49.983 0.99 0.81 
VMD-B-RF 0.99 35.223 0.988 23.028 0.99 0.91 
Total available data in the validating stage   
REP 0.83 158.013 0.814 123.529 0.95 0.59 
RF 0.83 163.014 0.802 140.635 0.95 0.53 
B-REP 0.84 155.746 0.817 116.154 0.95 0.62 
B-RF 0.93 99.815 0.926 80.34 0.98 0.73 
VMD-B-REP 0.9 125.369 0.883 110.756 0.97 0.63 
VMD-B-RF 0.96 74.417 0.959 57.331 0.99 0.81 
ModelsStatistical error indices
R2RMSE (mg/l)NSEMAE (mg/l)WILMI
Total available data in the training stage   
REP 0.82 246.961 0.427 211.361 0.91 0.18 
RF 0.91 101.745 0.903 81.253 0.97 0.68 
B-REP 0.9 102.41 0.901 78.65 0.97 0.69 
B-RF 0.95 72.032 0.951 47.684 0.98 0.81 
VMD-B-REP 0.96 65.53 0.96 49.983 0.99 0.81 
VMD-B-RF 0.99 35.223 0.988 23.028 0.99 0.91 
Total available data in the validating stage   
REP 0.83 158.013 0.814 123.529 0.95 0.59 
RF 0.83 163.014 0.802 140.635 0.95 0.53 
B-REP 0.84 155.746 0.817 116.154 0.95 0.62 
B-RF 0.93 99.815 0.926 80.34 0.98 0.73 
VMD-B-REP 0.9 125.369 0.883 110.756 0.97 0.63 
VMD-B-RF 0.96 74.417 0.959 57.331 0.99 0.81 
Table 3

Predictive capabilities of the standalone and ensemble models during training and validating at the Molasani station for streamflow prediction

ModelsStatistical error indices
R2RMSE (m3/s)NSEMAE (m3/s)WILMI
Total available data in the training stage   
REP 0.53 648.564 0.489 327.286 0.78 0.38 
RF 0.65 556.835 0.624 254.506 0.86 0.52 
B-REP 0.65 565.496 0.612 328.406 0.83 0.38 
B-RF 0.68 526.466 0.664 312.822 0.87 0.41 
VMD-B-REP 0.89 366.362 0.837 161.722 0.94 0.69 
VMD-B-RF 0.91 303.340 0.888 151.445 0.96 0.71 
Total available data in the validating stage   
REP 0.59 251.056 0.542 160.131 0.85 0.26 
RF 0.70 213.299 0.670 133.770 0.87 0.38 
B-REP 0.64 261.823 0.502 214.629 0.82 0.21 
B-RF 0.72 227.161 0.625 192.364 0.87 0.31 
VMD-B-REP 0.83 192.515 0.731 163.271 0.92 0.54 
VMD-B-RF 0.89 154.035 0.828 133.841 0.95 0.68 
ModelsStatistical error indices
R2RMSE (m3/s)NSEMAE (m3/s)WILMI
Total available data in the training stage   
REP 0.53 648.564 0.489 327.286 0.78 0.38 
RF 0.65 556.835 0.624 254.506 0.86 0.52 
B-REP 0.65 565.496 0.612 328.406 0.83 0.38 
B-RF 0.68 526.466 0.664 312.822 0.87 0.41 
VMD-B-REP 0.89 366.362 0.837 161.722 0.94 0.69 
VMD-B-RF 0.91 303.340 0.888 151.445 0.96 0.71 
Total available data in the validating stage   
REP 0.59 251.056 0.542 160.131 0.85 0.26 
RF 0.70 213.299 0.670 133.770 0.87 0.38 
B-REP 0.64 261.823 0.502 214.629 0.82 0.21 
B-RF 0.72 227.161 0.625 192.364 0.87 0.31 
VMD-B-REP 0.83 192.515 0.731 163.271 0.92 0.54 
VMD-B-RF 0.89 154.035 0.828 133.841 0.95 0.68 

Table 2 demonstrates that coupling bagging with RF enhanced accuracy in terms of LMI by 17.4 and 17.8% in the TDS prediction variable in the training and calibrating phases, respectively, when compared to B-REPT. When comparing the VMD-B-REPT technique with the VMD-B-RF model, it can be shown that the VMD-B-REPT strategy consistently provides prediction results with bigger RMSE and MAE and overall achieves a considerably lower accuracy. As a result, the experiment findings show that using VMD enhances prediction performance, resulting in considerably greater prediction accuracy and reduced TDS prediction error.

Based on Table 3, in the river discharge prediction at the Molasani station, the ensemble VMD-B-RF and ensemble VMD-B-REPT models outperform the suggested solo and ensemble B-RF/REPT models. In a regression situation, the RF method has greater flexibility and capacity than the REPT algorithm. As a result, ensemble models built on the RF algorithm as one of the basic learners are projected to be more accurate. The generated models also have acceptable performance when it comes to testing datasets. This demonstrates the generated models’ capacity to generalize over a wide range of input variables for an unknown dataset. The estimated value of WI rose significantly from 0.92 to 0.95 when comparing the performance of B-RF and B-REPT models coupled with the VMD pre-processing technique. Similarly, the magnitudes of MAE and RMSE were reduced by 29.43 and 38.47 m3/s, respectively.

Figures 8 and 9 illustrate comparisons of observed and anticipated TDS parameter values using the solo RF and REPT, ensemble-based RF and REPT, and decomposed-based RF and REPT models. Since employing standalone models, there is little dispersion between predicted and observed TDS parameter values at the ideal line, and the data points are primarily clustered around this line. When the TDS prediction method evolved from standalone to the decomposing and ensemble-based models, the dispersion between measured and predicted values was near the optimum line.

Figure 8

Scatter plots of predicted TDS values using standalone and ensemble models against their corresponding values of TDS in Molasani in the validating stage.

Figure 8

Scatter plots of predicted TDS values using standalone and ensemble models against their corresponding values of TDS in Molasani in the validating stage.

Close modal
Figure 9

Comparison of the standalone and ensemble models for TDS prediction using time variation graphs at the Molasani station.

Figure 9

Comparison of the standalone and ensemble models for TDS prediction using time variation graphs at the Molasani station.

Close modal

Figure 10 shows the original discharge and expected values acquired by various predicting techniques during the testing phase to validate the efficiency of the ensemble decomposing-based RF and REPT approaches. It is obvious that decomposing-based ML models perform well in terms of monitoring monthly discharge dynamic changes, proving the possibility of forecasting models as a whole.

Figure 10

Scatter plots of predicted streamflow values using standalone and ensemble models against their corresponding values of streamflow in Molasani in the validating stage.

Figure 10

Scatter plots of predicted streamflow values using standalone and ensemble models against their corresponding values of streamflow in Molasani in the validating stage.

Close modal

As shown in Figure 11, the behaviour of the data is learned by the models, which are able to provide predictions that are congruent with the actual values. The excellent performance achieved in the training stage is maintained in the test phase, indicating that the ensemble process is reliable in achieving the created predictions. This is explained by the fact that ML models can accommodate nonlinearities and represent complicated interactions between independent and dependent variables, such as water quantity and quality correlations in hydrological research.

Figure 11

Comparison of the standalone and ensemble models for streamflow prediction using time variation graphs at the Molasani station.

Figure 11

Comparison of the standalone and ensemble models for streamflow prediction using time variation graphs at the Molasani station.

Close modal

Prediction results at the Farsiat station

In the Farsiat station, parameter's tuning of the proposed techniques, REPT, RF, and VMD, was the same as the Molasani station, except for REPT in which the maximum tree depth was selected as 5. Table 4 shows that the accuracy of the B-REPT ensemble model in the prediction of TDS at the Farsiat station has steadily increased over time when compared to a single REPT model. In terms of NSE and R2, the B-REPT prediction is accurate to 0.86 and 0.97, respectively. Because the standalone RF and REPT had good predictability abilities, the bagging algorithm's efficiency could not be noticed in the TDS prediction at this station during the training phase, and both solo and ensemble bagging-based models worked similarly. The difference between the basis classifiers, RF and REPT is enhanced in the validation step, and the overall ensemble classifier has greater generalization capacity; thus, the classifier prediction performance may be increased. The coupling of the VMD decomposition technique with the B-RF ensemble model decreases TDS error prediction by 33.6 and 37.5%, respectively, in terms of MAE and RMSE. The VMD-based ensemble enhances the RF and REPT classifiers’ prediction performance. Because the number of TDS sequences is vast and the concealed TDS information is difficult to uncover, the prediction method must perform well.

Table 4

Evaluation metrics of the proposed models in the training and validating periods at the Farsiat station for TDS prediction

ModelsStatistical error indices
R2RMSE (mg/l)NSEMAE (mg/l)WILMI
Total available data in the training stage   
REP 0.97 86.45 0.95 71.19 0.99 0.76 
RF 0.97 64.16 0.97 41.59 0.99 0.86 
B-REP 0.97 59.42 0.97 40.31 0.99 0.86 
B-RF 0.98 56.05 0.98 39.48 0.99 0.87 
VMD-B-REP 0.98 45.93 0.98 27.45 1.00 0.91 
VMD-B-RF 0.99 28.39 0.99 19.55 1.00 0.93 
Total available data in the validating stage   
REP 0.94 121.09 0.89 101.92 0.97 0.65 
RF 0.95 83.18 0.95 65.84 0.99 0.77 
B-REP 0.95 82.04 0.95 64.55 0.99 0.78 
B-RF 0.95 83.90 0.95 65.09 0.99 0.78 
VMD-B-REP 0.96 70.42 0.96 56.51 0.99 0.81 
VMD-B-RF 0.98 52.17 0.98 43.20 0.99 0.85 
ModelsStatistical error indices
R2RMSE (mg/l)NSEMAE (mg/l)WILMI
Total available data in the training stage   
REP 0.97 86.45 0.95 71.19 0.99 0.76 
RF 0.97 64.16 0.97 41.59 0.99 0.86 
B-REP 0.97 59.42 0.97 40.31 0.99 0.86 
B-RF 0.98 56.05 0.98 39.48 0.99 0.87 
VMD-B-REP 0.98 45.93 0.98 27.45 1.00 0.91 
VMD-B-RF 0.99 28.39 0.99 19.55 1.00 0.93 
Total available data in the validating stage   
REP 0.94 121.09 0.89 101.92 0.97 0.65 
RF 0.95 83.18 0.95 65.84 0.99 0.77 
B-REP 0.95 82.04 0.95 64.55 0.99 0.78 
B-RF 0.95 83.90 0.95 65.09 0.99 0.78 
VMD-B-REP 0.96 70.42 0.96 56.51 0.99 0.81 
VMD-B-RF 0.98 52.17 0.98 43.20 0.99 0.85 

The accuracy (R2) of the REPT model is enhanced by 31.9%, and the RF model is enhanced by 18.7% when coupled with the bagging ensemble technique for streamflow modeling at the Farsiat station when compared to the findings before decomposing the VMD algorithm (see Table 5). When the accuracy of the ensemble, decomposing, and standalone-based models is compared, the VMD-B-RF model remains the best streamflow prediction model, with LMI, WI, and NSE accuracy of 62, 94, and 78%, respectively. As a result, in this article, the VMD-B-RF model was chosen as the best for both water quantity and quality.

Table 5

Predictive capabilities of the standalone and ensemble models during training and validating at the Farsiat station for streamflow prediction

ModelsStatistical error indices
R2RMSE (m3/s)NSEMAE (m3/s)WILMI
Total available data in the training stage   
REP 0.59 349.15 0.55 223.92 0.81 0.34 
RF 0.67 315.31 0.63 227.36 0.86 0.33 
B-REP 0.65 331.32 0.60 217.46 0.83 0.36 
B-RF 0.81 244.15 0.78 135.57 0.93 0.60 
VMD-B-REP 0.87 222.41 0.82 118.17 0.94 0.65 
VMD-B-RF 0.93 170.33 0.89 111.00 0.97 0.68 
Total available data in the validating stage   
REP 0.47 264.53 0.40 192.74 0.72 0.11 
RF 0.64 239.39 0.51 194.27 0.80 0.21 
B-REP 0.62 238.05 0.51 187.81 0.81 0.24 
B-RF 0.76 182.69 0.71 134.00 0.89 0.31 
VMD-B-REP 0.76 181.25 0.72 127.13 0.89 0.55 
VMD-B-RF 0.83 159.83 0.78 132.34 0.94 0.62 
ModelsStatistical error indices
R2RMSE (m3/s)NSEMAE (m3/s)WILMI
Total available data in the training stage   
REP 0.59 349.15 0.55 223.92 0.81 0.34 
RF 0.67 315.31 0.63 227.36 0.86 0.33 
B-REP 0.65 331.32 0.60 217.46 0.83 0.36 
B-RF 0.81 244.15 0.78 135.57 0.93 0.60 
VMD-B-REP 0.87 222.41 0.82 118.17 0.94 0.65 
VMD-B-RF 0.93 170.33 0.89 111.00 0.97 0.68 
Total available data in the validating stage   
REP 0.47 264.53 0.40 192.74 0.72 0.11 
RF 0.64 239.39 0.51 194.27 0.80 0.21 
B-REP 0.62 238.05 0.51 187.81 0.81 0.24 
B-RF 0.76 182.69 0.71 134.00 0.89 0.31 
VMD-B-REP 0.76 181.25 0.72 127.13 0.89 0.55 
VMD-B-RF 0.83 159.83 0.78 132.34 0.94 0.62 

Figures 12 and 13 compare the measured and projected TDS levels throughout the training and testing phases for the four models. As shown in Figure 12, all standalone, decomposing, and ensemble-based models are capable of properly matching the extreme high and low values while following the overall trend of observed TDS.

Figure 12

Scatter plots of predicted TDS values using standalone and ensemble models against their corresponding values of TDS in Farsiat in the validating stage.

Figure 12

Scatter plots of predicted TDS values using standalone and ensemble models against their corresponding values of TDS in Farsiat in the validating stage.

Close modal
Figure 13

Comparison of the standalone and ensemble models for TDS prediction using time variation graphs at the Farsiat station.

Figure 13

Comparison of the standalone and ensemble models for TDS prediction using time variation graphs at the Farsiat station.

Close modal

The VMD-based ensemble B-RF and B-REPT models (Figure 13) had the best hydrograph approximation and were able to duplicate almost the pattern of measured TDS in each of their own models by looking at the overall image of each of the hydrographs (provided by the models). The primary source of worry and weakness for this station is the prediction of TDS peak values using standalone RF and REPT models. As seen in Figure 13, these solo models were unable to mimic high TDS levels and performed poorly. As a result, it can be inferred that using the ensemble approach in conjunction with time-series data decomposition for various learning algorithms may significantly increase the prediction algorithms’ accuracy. Figures 14 and 15 show hydrographs and scatter plots for streamflow prediction at the Farsiat station, illustrating the forecasting performance of RF, REPT, B-REPT, VMD-B-RF, B-RF, and VMD-B-REPT models.

Figure 14

Scatter plots of predicted streamflow values using standalone and ensemble models against their corresponding values of streamflow in Farsiat in the validating stage.

Figure 14

Scatter plots of predicted streamflow values using standalone and ensemble models against their corresponding values of streamflow in Farsiat in the validating stage.

Close modal
Figure 15

Comparison of the standalone and ensemble models for streamflow prediction using time variation graphs at the Farsiat station.

Figure 15

Comparison of the standalone and ensemble models for streamflow prediction using time variation graphs at the Farsiat station.

Close modal

Other ensemble and standalone models, such as RF, REPT, B-RF, and B-REPT, outperform the ensemble B-RF and B-REPT methods combined with the VMD data pre-processing algorithm, indicating the importance of data pre-processing and decomposing for enhancing the efficiency of the original ML models. The suggested hybrid/ensemble techniques outperform all other approaches in terms of approximating the monthly streamflow and TDS fluctuation tendencies, demonstrating the efficiency of the combined parts in terms of improving the forecasting model's generalization power. As a result, the suggested hybrid and ensemble approaches may be concluded to be an effective tool for forecasting monthly streamflow and TDS series at two sites along the Karoon River.

To compare the VMD-B-RF with the M5P and RBF-SVR proposed by Nouraki et al. (2021), the performance of the models in the prediction of TDS in terms of R2 and RMSE was evaluated. According to Nouraki et al. (2021), between four stations prepared to model and predict TDS, the most similar station to Molasani was the Valiabad station. Results show that VMD-B-RF with R2 = 0.96 outperformed M5P (R2 = 0.9) and is similar to RBF-SVR (R2 = 0.97) in the prediction of TDS in the monthly time-scale. Considering RMSE as the quadratic mean of differences between observed and prediction TDS indicated that the performance of the proposed RF coupled with VMD and bagging ensemble (74.42 mg/l) was satisfactory for the prediction of TDS compared to M5P and RBF-SVR, respectively, with 363.74 and 172.31 mg/l. Consequently, the performances of the hybrid method (VMD-B-RF) were outperformed those of the standalone ML model proposed by Nouraki et al. (2021) in the prediction of monthly TDS.

The effectiveness of an ensemble model using VMD, RF, and bagging for forecasting monthly water quantity (streamflow) and quality (TDS) in the Karoon River in Iran is investigated in this work. In order to test the efficacy and applicability of the suggested ensemble model, it was applied to two stations: Molasani and Farsiat. The particular goals are to create and assess strategies for increasing water quantity and quality prediction accuracy by evaluating the efficiency of various models using various efficiency evaluation indices such as the RMSE, R2, NSE, WI, MAE, and LMI. Based on the comparisons and analyses, the RF-based models outperform the comparable REPT-based models, demonstrating the efficiency of the B-RF ensemble approach in improving the performance of RF models for water quantity and quality prediction. Comparisons of models with and without the VMD method show that using the decomposition approach with RF and REPT models may greatly increase the models’ accuracy in forecasting water quantity and quality, such as streamflow and TDS. Furthermore, when compared to models without a pre-processing algorithm, models with two-stage decomposition–ensemble methods achieve better-predicting results, demonstrating that using two successive processing methods is an effective way to improve the predicting accuracy of ML models. Overall, the findings of this research show that the suggested VMD-B-RF model might be a useful tool for forecasting nonstationary water quantity and quality. The suggested decomposition approach may significantly reduce the unpredictability of the original water quantity and quality time series, while the B-RF/B-REPT models can improve weak learners’ prediction performance.

Although the proposed models had an acceptable accuracy, it is possible for the future studies to apply metaheuristic algorithms such as Arithmetic Optimization Algorithm (AOA), Gradient-based Optimizer (GBO), Harris Hawks Optimization (HHO), Equilibrium Optimizer (EO), and hybrid optimization ones like adaptive hybrid differential evolution algorithm and particle swarm optimization (A-DEPSO) to find the optimum values of hyperparameters of RF and REPT models. This may create more accurate models for the water quality and quantity prediction. As the potential avenue for future research, the uncertainty associated with the input/output variables and models can be investigated to present more reliable predictive models. It can be considered how model input and parameter uncertainty may affect the water quality and quantity prediction results. The target prediction depends on the appropriate selection of the water quality and quantity conditioning factors. In this sense, it is suggested to use feature selection methods such as pointwise mutual information, mutual information, relief-based algorithms, and minimum redundancy maximum relevance in order to reduce computational cost and improve model' s capability.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Ai
P.
,
Song
Y.
,
Xiong
C.
,
Chen
B.
&
Yue
Z.
2022
A novel medium- and long-term runoff combined forecasting model based on different lag periods
.
Journal of Hydroinformatics
24
(
2
),
367
387
.
Aminiyan
M. M.
,
Aitkenhead-Peterson
J.
&
Aminiyan
F. M.
2018
Evaluation of multiple water quality indices for drinking and irrigation purposes for the Karoon river, Iran
.
Environmental Geochemistry and Health
40
(
6
),
2707
2728
.
Anctil
F.
,
Lauzon
N.
&
Filion
M.
2008
Added gains of soil moisture content observations for streamflow predictions using neural networks
.
Journal of Hydrology
359
(
3–4
),
225
234
.
Barzegar
R.
,
Adamowski
J.
&
Moghaddam
A. A.
2016
Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran
.
Stochastic Environmental Research and Risk Assessment
30
(
7
),
1797
1819
.
Black
D. C.
,
Wallbrink
P. J.
&
Jordan
P. W.
2014
Towards best practice implementation and application of models for analysis of water resources management scenarios
.
Environmental Modelling & Software
52
,
136
148
.
Breiman
L.
1996
Bagging predictors
.
Machine Learning
24
(
2
),
123
140
.
Breiman
L.
2001
Random forests
.
Machine Learning
45
(
1
),
5
32
.
Chen
X. Y.
&
Chau
K. W.
2016
A hybrid double feedforward neural network for suspended sediment load estimation
.
Water Resources Management
30
(
7
),
2179
2194
.
Cootes
T. F.
,
Ionita
M. C.
,
Lindner
C.
&
Sauer
P.
2012
Robust and accurate shape model fitting using random forest regression voting
. In: R. Nugent, ed.
European Conference on Computer Vision
.
Springer
,
Berlin, Heidelberg
, pp.
278
291
.
Dessu
S. B.
,
Melesse
A. M.
,
Bhat
M. G.
&
McClain
M. E.
2014
Assessment of water resources availability and demand in the Mara River Basin
.
Catena
115
,
104
114
.
Dong
R. H.
,
Yan
H. H.
&
Zhang
Q. Y.
2020
An intrusion detection model for wireless sensor network based on information gain ratio and bagging algorithm
.
International Journal of Network Security
22
(
2
),
218
230
.
Dragomiretskiy
K.
&
Zosso
D.
2013
Variational mode decomposition
.
IEEE Transactions on Signal Processing
62
(
3
),
531
544
.
Emamgholizadeh
S.
,
Kashi
H.
,
Marofpoor
I.
&
Zalaghi
E.
2014
Prediction of water quality parameters of Karoon River (Iran) by artificial intelligence-based models
.
International Journal of Environmental Science and Technology
11
(
3
),
645
656
.
Erdlenbruch
K.
,
Tidball
M.
&
Zaccour
G.
2014
Quantity–quality management of a groundwater resource by a water agency
.
Environmental Science & Policy
44
,
201
214
.
Ghavidel
S. Z. Z.
&
Montaseri
M.
2014
Application of different data-driven methods for the prediction of total dissolved solids in the Zarinehroud basin
.
Stochastic Environmental Research and Risk Assessment
28
(
8
),
2101
2118
.
He
J.
,
Valeo
C.
,
Chu
A.
&
Neumann
N. F.
2011
Prediction of event-based stormwater runoff quantity and quality by ANNs developed using PMI-based input selection
.
Journal of Hydrology
400
(
1–2
),
10
23
.
Huang
S.
,
Chang
J.
,
Huang
Q.
&
Chen
Y.
2014
Monthly streamflow prediction using modified EMD-based support vector machine
.
Journal of Hydrology
511
,
764
775
.
Khosravi
K.
,
Pham
B. T.
,
Chapi
K.
,
Shirzadi
A.
,
Shahabi
H.
,
Revhaug
I.
,
Prakash
I.
&
Bui
D. T.
2018
A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran
.
Science of the Total Environment
627
,
744
755
.
Khosravi
K.
,
Golkarian
A.
,
Booij
M. J.
,
Barzegar
R.
,
Sun
W.
,
Yaseen
Z. M.
&
Mosavi
A.
2021
Improving daily stochastic streamflow prediction: comparison of novel hybrid data-mining algorithms
.
Hydrological Sciences Journal
66
(
9
),
1457
1474
.
Kim
S.
,
Seo
Y.
,
Rezaie-Balf
M.
,
Kisi
O.
,
Ghorbani
M. A.
&
Singh
V. P.
2019
Evaluation of daily solar radiation flux using soft computing approaches based on different meteorological information: peninsula vs continent
.
Theoretical and Applied Climatology
137
(
1
),
693
712
.
Koohestani
A.
,
Abdar
M.
,
Khosravi
A.
,
Nahavandi
S.
&
Koohestani
M.
2019
Integration of ensemble and evolutionary machine learning algorithms for monitoring diver behavior using physiological signals
.
IEEE Access
7
,
98971
98992
.
Li
X.
,
Sha
J.
,
Li
Y. M.
&
Wang
Z. L.
2018
Comparison of hybrid models for daily streamflow prediction in a forested basin
.
Journal of Hydroinformatics
20
(
1
),
191
205
.
Liu
S.
,
Tai
H.
,
Ding
Q.
,
Li
D.
,
Xu
L.
&
Wei
Y.
2013
A hybrid approach of support vector regression with genetic algorithm optimization for aquaculture water quality prediction
.
Mathematical and Computer Modelling
58
(
3–4
),
458
465
.
Melesse
A. M.
,
Khosravi
K.
,
Tiefenbacher
J. P.
,
Heddam
S.
,
Kim
S.
,
Mosavi
A.
&
Pham
B. T.
2020
River water salinity prediction using hybrid machine learning models
.
Water
12
(
10
),
2951
.
Meshram
S. G.
,
Meshram
C.
,
Santos
C. A. G.
,
Benzougagh
B.
&
Khedher
K. M.
2022
Streamflow prediction based on artificial intelligence techniques
.
Iranian Journal of Science and Technology, Transactions of Civil Engineering
46
(
3
),
2393
2403
.
Mohamed
W. N. H. W.
,
Salleh
M. N. M.
&
Omar
A. H.
2012
A comparative study of reduced error pruning method in decision tree algorithms
. In:
2012 IEEE International Conference on Control System, Computing and Engineering
.
IEEE
, pp.
392
397
.
Mohammadi
B.
,
Linh
N. T. T.
,
Pham
Q. B.
,
Ahmed
A. N.
,
Vojteková
J.
,
Guan
Y.
,
Abba
S. I.
&
El-Shafie
A.
2020
Adaptive neuro-fuzzy inference system coupled with shuffled frog leaping algorithm for predicting river streamflow time series
.
Hydrological Sciences Journal
65
(
10
),
1738
1751
.
Mohammadi
B.
,
Moazenzadeh
R.
,
Christian
K.
&
Duan
Z.
2021
Improving streamflow simulation by combining hydrological process-driven and artificial intelligence-based models
.
Environmental Science and Pollution Research
28
(
46
),
65752
65768
.
Mouatadid
S.
,
Adamowski
J. F.
,
Tiwari
M. K.
&
Quilty
J. M.
2019
Coupling the maximum overlap discrete wavelet transform and long short-term memory networks for irrigation flow forecasting
.
Agricultural Water Management
219
,
72
85
.
Najafzadeh
M.
,
Rezaie Balf
M.
&
Rashedi
E.
2016
Prediction of maximum scour depth around piers with debris accumulation using EPR, MT, and GEP models
.
Journal of Hydroinformatics
18
(
5
),
867
884
.
Najah
A.
,
El-Shafie
A.
,
Karim
O. A.
&
El-Shafie
A. H.
2013
Application of artificial neural networks for water quality prediction
.
Neural Computing and Applications
22
(
1
),
187
201
.
Nouraki
A.
,
Alavi
M.
,
Golabi
M.
&
Albaji
M.
2021
Prediction of water quality parameters using machine learning models: a case study of the Karun River, Iran
.
Environmental Science and Pollution Research
28
(
40
),
57060
57072
.
Obropta
C. C.
&
Kardos
J. S.
2007
Review of urban stormwater quality models: deterministic, stochastic, and hybrid approaches 1
.
JAWRA Journal of the American Water Resources Association
43
(
6
),
1508
1523
.
Pham
B. T.
,
Jaafari
A.
,
Nguyen-Thoi
T.
,
Van Phong
T.
,
Nguyen
H. D.
,
Satyam
N.
,
Masroor
M.
,
Rehman
S.
,
Sajjad
H.
,
Sahana
M.
,
Le
H. V.
&
Prakash
I.
2021
Ensemble machine learning models based on Reduced Error Pruning Tree for prediction of rainfall-induced landslides
.
International Journal of Digital Earth
14
(
5
),
575
596
.
Rezaie-Balf
M.
,
Fani Nowbandegani
S.
,
Samadi
S. Z.
,
Fallah
H.
&
Alaghmand
S.
2019a
An ensemble decomposition-based artificial intelligence approach for daily streamflow prediction
.
Water
11
(
4
),
709
.
Rezaie-Balf
M.
,
Maleki
N.
,
Kim
S.
,
Ashrafian
A.
,
Babaie-Miri
F.
,
Kim
N. W.
,
Chung
I.-M.
&
Alaghmand
S.
2019b
Forecasting daily solar radiation using CEEMDAN decomposition-based MARS model trained by crow search algorithm
.
Energies
12
(
8
),
1416
.
Rezaie-Balf
M.
,
Attar
N. F.
,
Mohammadzadeh
A.
,
Murti
M. A.
,
Ahmed
A. N.
,
Fai
C. M.
,
Nabipour
N.
,
Alaghmand
S.
&
El-Shafie
A.
2020
Physicochemical parameters data assimilation for efficient improvement of water quality index prediction: Comparative assessment of a noise suppression hybridization approach
.
Journal of Cleaner Production
271
,
122576
.
Shamshirband
S.
,
Hashemi
S.
,
Salimi
H.
,
Samadianfard
S.
,
Asadi
E.
,
Shadkani
S.
,
Kargar
K.
,
Mosavi
A.
,
Nabipour
N.
&
Chau
K. W.
2020
Predicting standardized streamflow index for hydrological drought using machine learning models
.
Engineering Applications of Computational Fluid Mechanics
14
(
1
),
339
350
.
Shang
C.
,
Yang
F.
,
Huang
D.
&
Lyu
W.
2014
Data-driven soft sensor development based on deep learning technique
.
Journal of Process Control
24
(
3
),
223
233
.
Shokri
A.
,
Haddad
O. B.
&
Mariño
M. A.
2014
Multi-objective quantity–quality reservoir operation in sudden pollution
.
Water Resources Management
28
(
2
),
567
586
.
Srinivasan
D. B.
&
Mekala
P.
2014
Mining social networking data for classification using REPTree
.
International Journal of Advance Research in Computer Science and Management Studies
2
,
10
.
Sun
X.
,
Zhang
H.
,
Wang
J.
,
Shi
C.
,
Hua
D.
&
Li
J.
2022
Ensemble streamflow forecasting based on variational mode decomposition and long short term memory
.
Scientific Reports
12
(
1
),
1
19
.
Svetnik
V.
,
Liaw
A.
,
Tong
C.
,
Culberson
J. C.
,
Sheridan
R. P.
&
Feuston
B. P.
2003
Random forest: a classification and regression tool for compound classification and QSAR modeling
.
Journal of Chemical Information and Computer Sciences
43
(
6
),
1947
1958
.
Tayfur
G.
2002
Artificial neural networks for sheet sediment transport
.
Hydrological Sciences Journal
47
(
6
),
879
892
.
Wang
Y.
,
Markert
R.
,
Xiang
J.
&
Zheng
W.
2015
Research on variational mode decomposition and its application in detecting rub-impact fault of the rotor system
.
Mechanical Systems and Signal Processing
60
,
243
251
.
Wang
J.
,
Wang
X.
,
Hui Lei
X.
,
Wang
H.
,
hua Zhang
X.
,
Jun You
J.
,
Tan
Q. F.
&
Lian Liu
X.
2020
Teleconnection analysis of monthly streamflow using ensemble empirical mode decomposition
.
Journal of Hydrology
582
,
124411
.
Wang
W. C.
,
Du
Y. J.
,
Chau
K. W.
,
Xu
D. M.
,
Liu
C. J.
&
Ma
Q.
2021a
An ensemble hybrid forecasting model for annual runoff based on sample entropy, secondary decomposition, and long short-term memory neural network
.
Water Resources Management
35
(
14
),
4695
4726
.
Witten
I. H.
,
Frank
E.
,
Hall
M. A.
,
Pal
C. J.
&
DATA
M.
2005
Practical machine learning tools and techniques. Data Mining 2, 4.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).