Reliable river streamflow (RSF) forecasting is an important issue due to its impact on planning and operation of the water resources system. In this study, based on Lower Upper Bound Estimation (LUBE), hybrid artificial intelligence methods were used for point and interval prediction of monthly RSF. Two states based on stations' own data and upstream stations' data were considered for RSF modeling of the Housatonic River during the period of 1941–2018. Ensemble Empirical Mode Decomposition (EEMD) and Variational Mode Decomposition (VMD) methods were used for enhancing the streamflow point forecasting accuracy. Interval Prediction (PI) was applied for tolerating increased uncertainty. Results showed that in state 1, the error criterion value for the superior model decreased from 0.155 to 0.082 and 0.09 for the EEMD- and VMD-based models, respectively. Generally, hybrid models increased the modeling accuracy between 20% and 40%. Via the integrated approaches, the upstream stations' data was successfully used for streamflow prediction of stations without data. In this state, the PI coverage probability values for the VMD-based model were approximately 12% higher than the single model. Generally, the VMD-based model led to more desirable results due to having higher PI coverage probability and lower mean PI width values.

  • AI methods were applied to model the RSF in successive hydrometric stations.

  • To obtain a model with higher efficiency, EEMD and VMD techniques were applied.

  • Interval prediction was applied for providing more details for practical operation decisions.

Graphical Abstract

Graphical Abstract

Accurate prediction of river streamflow (RSF) is of great importance for utilization and management of sustainable water resources. Reliable river discharge prediction is particularly important for hydrological operations and hydro-environmental management. So far, various and complex relationships have been proposed to predict river discharge. The methods of river discharge prediction are divided into two categories: (i) theoretical models and (ii) regression- and artificial-intelligence-based models (Garg & Jothiprakash 2013). According to Talei et al. (2010) and Yaseen et al. (2019), for the conventional models hydro-meteorological variables such as topography, land cover, precipitation, soil, etc. are needed. However, the collection of the mentioned data is not easy for all locations or basins (Koycegiz & Buyukyildiz 2019). On the other hand, a quick solution with high accuracy is needed for water management.

The accurate simulation of river discharge is difficult due to the nonlinearity and complexity of the process. So far, researchers have developed numerous models to enhance the efficiency of streamflow prediction methods. Regression, time series, data processing, and artificial intelligence models are some examples (Mishra et al. 2007; Madadgar & Moradkhani 2013; Mehr et al. 2014; Geng et al. 2016; Roushangar & Ghasempour 2018, 2019). In recent decades, Artificial Intelligence (AI) methods such as Artificial Neural Networks (ANNs), Neuro-Fuzzy models (NF), Genetic Programming (GP), Kernel Extreme Learning Machine (KELM), Support Vector Machine (SVM), Feed Forward Neural Network (FFNN), and Gaussian Process Regression (GPR) have been used to investigate complex hydraulic and hydrological phenomena (Hipni et al. 2013). Sediment transport modeling (Ebtehaj et al. 2016), rainfall–runoff modeling (Chadalawada et al. 2017), relative energy dissipation modeling (Saghebian 2019), monthly pan evaporation estimation (Chen et al. 2019), and groundwater level prediction (Sakizadeh et al. 2019) are some examples.

Sudheer et al. (2014) stated that streamflow forecasts have to deal with complex and highly nonlinear data patterns. They used a hybrid SVM method to forecast monthly streamflow. An SVM model with various input structures was constructed, and the best structure was determined using various statistical performances. The results showed that the hybrid model had a high degree of accuracy. Adnan et al. (2017) investigated the ability of two soft computing methods including ANN and SVM models in modeling monthly streamflow and indicated that the SVM model could be successful in predicting monthly streamflow. Li et al. (2020) used a multi-model integration method for monthly streamflow prediction. The obtained results showed that the machine-learning-based models had the potential for monthly streamflow forecasting.

Also, most hydrological time series are non-stationary, trendy, or with seasonal fluctuations; therefore, in the past few decades signal preprocessing methods have been commonly used by researchers (Kumar & Foufoula-Georgiou 1993; Adamowski et al. 2009). These methods have been used for breaking down and excavating complex, periodic, and irregular hydrological time series. The Empirical Mode Decomposition (EMD) and Variational Mode Decomposition (VMD) methods have been applied recently for decomposition of time series. These methods are proper for decomposition of nonlinear and non-stationary signals (Huang et al. 1998). According to Abdoos (2016), via the variational mode the correlation frequency band and modal components can be found; so, the VMD has better antinoise performance compared with the EMD.

Due to the complexity of the streamflow process, in the past few decades, river flow forecasting remains one of the most important tasks for scientific planning and management of a water resources system. In the long run, accurate river flow forecasting information can help operators and managers achieve maximization of the potential comprehensive benefits from many aspects, like power generation, peak operation, ecological restoration, flood control and water supply. Due to the nonlinearity and uncertainties of streamflow, the existing regression methods often do not show the desired accuracy. Consequently, the applications of many of these methods are limited to special cases of their development, and therefore do not show uniform results under different conditions. With the advantages of easy implementation and high flexibility, artificial intelligence methods have been widely employed to address complex hydrological forecasting problems. However, the conventional AI methods often suffer from some defects in practice, like slow convergence and local minimum. Among artificial models, kernel-based approaches such as KELM and GPR are relatively new important methods based on different kernel types which are based on statistical learning theory. Such models are capable of adapting themselves to predict any variable of interest via sufficient inputs. The training of these methods is fast, they have high accuracy, and the probability of occurrence of data overtraining in these methods is less. In order to enhance the AI model performance, this study proposed hybrid monthly river streamflow forecasting methods by integrating the novel EEMD and VMD pre-processing methods into the learning process of two artificial intelligence methods, namely KELM and GPR. In the proposed models, firstly, the EEMD and VMD methods were used for original series decomposition into several subseries named Intrinsic Mode Functions (IMFs); then, based on the IMFs' energy values, the subseries were inserted into AI methods to point forecast the RSF series. On the other hand, with regard to the accurate point predicting difficulties and also for investigating the uncertainty of predictions, Prediction Intervals (PIs) were done. PIs assimilate the accuracy of the predicted values versus the measured values (Khosravi et al. 2011). For managing critical issues arising from climate changes, decision makers are expected to provide design plans. In PIs, since the uncertainties are considered, a bound will be obtained, which is important for decision making and management. In this research, the Lower and Upper Bound Estimation (LUBE) approach was used for constructing PIs for GPR-based modeling of streamflow series. For assessing the developed models' accuracy, data from three successive stations of Housatonic River during the period of 1941–2018 were used.

Study area

The Housatonic River is a river, approximately 149 miles (240 km) long, in western Massachusetts and western Connecticut in the United States. It flows south to southeast, and drains about 5,100 km2 of southwestern Connecticut into Long Island Sound. In this study, monthly streamflow data from three successive stations were used in which the distance between stations was approximately 50 km. The first station was located near Great Barrington, Massachusetts. The second and third stations were located in Connecticut. The basin areas of the stations were 282, 634, and 966 km2, respectively. Data during the period of 1941–2018 was employed for the river streamflow modeling aim. It should be noted that for comparing the used models' performance, the datasets were divided into three parts consisting of the training, validation, and testing parts: 70% of the datasets from the first were utilized to train the developed models and the 30% of data from the last were applied for the validating (15%) and testing (15%) aims. Figure 1 shows the selected study area and the monthly streamflow time series of the stations.

Figure 1

The study area location and monthly streamflow time series of the selected stations for the 1941–2018 period.

Figure 1

The study area location and monthly streamflow time series of the selected stations for the 1941–2018 period.

Close modal

Pre-processing approaches

Empirical Mode Decomposition (EMD) is a popular and widely used method for time series processing (Wang et al. 2015). This approach can be applied for processing nonlinear and non-stationary signals and decomposing them into several Inherent Mode Function (IMF) components. EMD is able to determine the signal instantaneous frequency. According to Lei et al. (2009), in the process of a signal decomposition into its frequency components, firstly, the components with higher frequency will be separated, then, the process will continue until the lowest frequency component remains. The EEMD is based on EMD and solves the EMD mode mixing issue via the noise-added data analysis approach (Wu & Huang 2009). This method can be summarized as below:

  • (1)

    Random white noise is added to the signal x(t) and then, the new signal is decomposed into several IMF series.

  • (2)

    Step 1 should be repeated until the number of added white noise is greater than or equal to the trial number.

  • (3)

    The mean of the sum of all IMFs is calculated (Ij(t)) in order to achieve the ensemble IMF.

  • (4)

    Finally, the original signal can be formed as x(t)=nj=1Ij(t).

Also, Variational Mode Decomposition (VMD) as a new time series processing method is developed to decompose time series into a sequence of discrete sub-modules with special frequency bands (He et al. 2019). In the VMD method, each IMF is supposed as a finite bandwidth with a different center frequency and the aim is minimization of the estimated bandwidths sum for each IMF. The detailed description is as follows:
(1)
where K, ∂t, δ(t), j, ⊗, uk(t), wk, and f(t) are mode number, function partial derivative for time t, Dirac distribution, imaginary unit, convolution operator, tth data in the kth decomposed mode, center frequency of the kth decomposed mode, and tth data in the original signal, respectively. In general, due to complex features (such as convexity or concavity) in objective and constraint functions in Equation (1), directly detecting the constrained optimization problem solution is difficult. For reducing the difficulty of solution, He et al. (2019) combined the quadratic penalty function guaranteeing reconstruction fidelity and the Lagrange multiplier strictly enforcing constraints for transforming the problem to an unconstrained optimization one as below:
(2)
where Z, a, and λ are augmented Lagrange function, penalty parameter, and Lagrange multiplier, respectively. In Equation (2), for optimizing the modified unconstrained problem, the alternating direction method of multipliers can be used (He et al. 2019). The corresponding solution can be described as below:
(3)
where , , , and are Fourier transforms of f(t), ui(t), λ(t) and ukn+1(t), respectively; and n is number of iterations.

In this study, the capability of pre-processing methods in enhancing the accuracy of the models was investigated. In this regard, at first, the RSF series were decomposed using the EEMD and VMD methods. According to Wu & Huang (2009), the basis of the EEMD method is decomposition of the signal into several IMF series and a residual series, which in combination, form the original signal. The IMF formation is done via subtracting the basic function from the original signal. This process will continue until the residual signal remains approximately constant. In the VMD method, the signal is decomposed to several IMFs.

Kernel-Based (KB) approaches

Among data-driven techniques, Kernel-Based (KB) methods such as Gaussian Process Regression (GPR) and Kernel Extreme Learning Machine (KELM) are considered as relatively innovative and significant techniques in terms of various kernel types and statistical learning theory. These models can adapt themselves to predicting any parameter of interest by adequate inputs. Furthermore, they can model nonlinear decision boundaries, and numerous kernels exist in this regard. These methods are also objectively strong against overfitting, particularly in high-dimensional spaces. Nevertheless, proper selection of the kernel kind is the most essential step in the GPR and KELM methods because of its direct effect on classification precision and training. There are various kernel functions such as linear, radial basis, and polynomial kernel in the KELM and exponential, squared exponential, rational quadratic, ARD rational quadratic, etc. in the GPR.

Extreme Learning Machine (ELM) is an approach based on the Single Layer Feed Forward Neural Network (SLFFNN). According to Huang et al. (2006), the SLFFNN is a straight framework where information weights are linked to hidden neurons. The hidden layer biases are randomly opted, while the weights among the hidden nodes are resolved logically. According to Huang et al. (2006), in this method, execution is preferred and it is more compatible than previous learning methods, due to this fact that, unlike traditional methods that have many variables for setting up, in this method, much human intercession is not required for accomplishing ideal parameters in demonstrating complex issues. ELM design based on kernel function is known as Kernel Extreme Learning Machine (KELM). Also, deriving from the Bayesian framework, GPR is considered as a random process to carry out nonparametric regression with the Gaussian process (Rasmussen & Nickisch 2010). In most prediction issues, GPR is preferred owing to its flexibility in providing representations of the uncertainty. In GPR models, it is assumed that nearby observations require the conveying of data regarding each other. The Gaussian procedure is a method for describing a distribution over functions. Therefore, the covariance and mean are a matrix and a vector for this natural generalization of the Gaussian distribution, respectively. Although the Gaussian distribution is over vectors, its procedure is considered over functions. Accordingly, generalization requires no validation process considering the former knowledge on functional dependencies and data. Moreover, GP regression models can recognize the predictive distribution equivalent to test inputs (Rasmussen & Williams 2006). A GP is referred to as a combination of random variables for which any finite number possesses a mutual multivariate Gaussian distribution. For assessing the KB methods' capability, two criteria were used: Determination Coefficient (DC) and Root Mean Square Error (RMSE).

Lower bound and upper bound estimation (LUBE)

According to Krupnick et al. (2006), the model's uncertainty contains both uncertainties of modeling and prediction. The uncertainty of modeling is due to the inputs, parameters, and structures that generally generate the uncertainty of estimation (Shrestha & Solomatine 2008). One of the most commonly used methods for quantifying of model prediction uncertainty is PI. Generally, traditional interval prediction tries to make the PI based on point forecasting. The lower and upper bounds can be computed based on the forecasting values and the level of confidence. The point forecasting accuracy plays a key role in the PI accuracy. The lower bound and upper bound estimation (LUBE) attempts to directly approximate the PIs upper and lower bounds based on the set of inputs. Based on Meade & Islam (1995), a PI shows the estimation uncertainty of a random parameter's future realization and accounts for more uncertainty sources.

Two criteria, namely PI coverage probability (PICP) and mean PI width (MPIW), were used for quantitative evaluation of constructed PIs (Kasiviswanathan et al. 2013). For comparing the achieved PIs via various methods, the normalized MPIW (i.e. NMPIW = MPIE/R, where R shows the maximum minus minimum of the target amounts) was used.

Proposed modeling process

In this paper, hybrid models were developed for streamflow series point and interval predictions based on data pre-processing, AI, and LUBE approaches. Input parameters have significant effect on the developed models' efficiency. In this study, for the RSF estimation, two types of modeling were assessed based on the data from each station and the data from upstream stations. According to Table 1, for predicting the next month's RSF, the previous amounts of the RSF series during the 1941–2018 period were applied as inputs. In this table, t represents the monthly time step, RSF(t) represents the current time step streamflow values, and RSF(t 1), RSF(t 2), and RSF(t 3) represent the streamflow values at time (t 1), (t 2), and (t 3), respectively.

Table 1

Developed models in the study

ModelStructure
Modeling based on each station's data   
Stations 1, 2, and 3    
M1 RSF(t) = f [RSF(t − 1)]   
M2 RSF(t) = f[RSF(t − 1), RSF(t − 2)]   
M3 RSF(t) = f[RSF(t − 1), RSF(t − 2), RSF(t − 3)]   
Modeling based on previous stations' data 
Station 2  Station 3  
M4 RSF2(t) = f[RSF1(t)] M7 RSF3(t) = f[RSF1(t), RSF2(t)] 
M5 RSF2(t) = f[RSF1(t), RSF1(t − 1)] M8 RSF3(t) = f[RSF1(t − 1), RSF2(t − 1)] 
M6 RSF2(t) = f[RSF1(t), RSF1(t − 1), RSF1(t − 2)] M9 RSF3(t) = f[RSF2(t), RSF2(t − 1)] 
ModelStructure
Modeling based on each station's data   
Stations 1, 2, and 3    
M1 RSF(t) = f [RSF(t − 1)]   
M2 RSF(t) = f[RSF(t − 1), RSF(t − 2)]   
M3 RSF(t) = f[RSF(t − 1), RSF(t − 2), RSF(t − 3)]   
Modeling based on previous stations' data 
Station 2  Station 3  
M4 RSF2(t) = f[RSF1(t)] M7 RSF3(t) = f[RSF1(t), RSF2(t)] 
M5 RSF2(t) = f[RSF1(t), RSF1(t − 1)] M8 RSF3(t) = f[RSF1(t − 1), RSF2(t − 1)] 
M6 RSF2(t) = f[RSF1(t), RSF1(t − 1), RSF1(t − 2)] M9 RSF3(t) = f[RSF2(t), RSF2(t − 1)] 

The KELM and GPR approaches were used with the aim of point forecasting. Directly selecting the original time series as the input parameter may affect the capability of the single predicting models. Different types of factors affect hydrological time series and provide the nonlinearity and uncertainty properties of time series. This causes the time series to have different resolution components. Therefore, in this study, an attempt was made to develop practical hybrid pre-processing-based AI models to improve streamflow forecasting efficiency. In this regard, two data processing models including Ensemble Empirical Mode Decomposition (EEMD) and Variational Mode Decomposition (VMD) were applied. The original RSF series were decomposed into subseries with obvious complexity difference to overcome the disadvantage of the weak generalization ability of single artificial intelligence methods. Then, the IMFs' energy values were computed and the IMFs with higher energy values were selected as AI model inputs. In the following step, hybrid models were used for interval prediction based on the LUBE method. For eliminating the dimensions of input and output parameters the data were normalized into [0, 1]. In Figure 2 the considered modeling states are represented.

Figure 2

The modeling states considered in this study.

Figure 2

The modeling states considered in this study.

Close modal

The results of AI models

In this part, the AI methods' capability was assessed in predicting the river streamflow using the original series (i.e. without using data processing methods). For obtaining the desired forecasting results, setting specific parameters of each artificial intelligence approach is required and their optimal values should be determined. For design of kernel-based approaches, the appropriate kind of kernel function selection should be done. According to Gill et al. (2006), there are various kernel functions for selecting. In this study, the model M3 was run via the GPR and KELM methods for streamflow prediction considering different kernel types. Figure 3 indicates the results of the RMSE statistical parameter of different kernel functions for this model. According to Figure 3, the best kernel function in the KELM model was the RBF kernel and in the GPR model was the squared exponential kernel.

Figure 3

Performance of the GPR and KELM methods based on the RMSE via different kernel functions.

Figure 3

Performance of the GPR and KELM methods based on the RMSE via different kernel functions.

Close modal

The single AI models' results are shown in Figure 4. Based on the results, it can be seen that both single methods showed almost the same accuracy in predicting the RSF parameter in the selected stations. According to Figure 4, in state 1, the model M1 led to the lowest achieved performance. The reason was clearly due to there being only one parameter selection in the input layer. The model M3 with RSF(t 1), RSF(t 2), and RSF(t 3) variables as inputs performed more successfully than the other developed models. The results indicated that the use of RSF(t 2) and RSF(t 3) variables in the input combinations enhanced the efficiency of the models. For state 2, in station 2, the M6 model with input parameters of RSF1(t), RSF1(t 1), and RSF1(t 2) led to better outcomes and in station 3, the M7 model with RSF1(t) and RSF2(t) parameters was chosen as the best model. From the results, it can be deduced that the models developed based on the upstream flow data improved significantly the applied methods' performance. Based on this issue, it can be stated that the sub-basins between the selected consecutive stations had significant physical effects on the downstream station flow regime. Therefore, via the KELM and GPR kernel-based approaches, the previous stations' data could be used when the stations' own data were unavailable. Artificial intelligence methods are very powerful tools and when the interrelationships among the relevant variables are difficult to understand and conventional mathematical analysis methods cannot provide analytical solutions, these methods can be used successfully. Adnan et al. (2017) evaluated the use of ANN and SVM models to estimate RSF based on hydro-meteorological data. They indicated that the SVM as a kernel-based approach provides a high degree of accuracy and reliability and it can be a helpful tool in basins where hydro-meteorological data are readily available. In Figure 5 the scatterplots between the observed vs predicted streamflow for the GPR model are shown. This figure shows that the RSF maximum and minimum amounts were not well predicted.

Figure 4

The obtained results for the RSF modeling for two states.

Figure 4

The obtained results for the RSF modeling for two states.

Close modal
Figure 5

Scatterplots of RSF modeling via GPR superior models for both states.

Figure 5

Scatterplots of RSF modeling via GPR superior models for both states.

Close modal

The results of hybrid EEMD-, VMD-AI models for point predictions

The EEMD and VMD methods were used to enhance the accuracy of the developed models. Figure 6(a) shows the decomposition results of two pre-processing techniques for the RSF(t 1) series of station 1. From the results, it is indicated that the decomposed high frequency IMFs via the VMD method had a relatively stable trend, which was conducive to prediction. It should be noted that the final forecast result of the VMD method was an accumulation of the forecast result of each IMF; therefore, the VMD result characteristics helped to enhance the prediction accuracy, while end-point impacts (large fluctuations) occurred during the EMD decomposition process and affected all IMFs continuously. Therefore, the EMD prediction result had a large error. In the second step, due to increasing the input data numbers after signal decomposition via the EEMD and VMD, the energy values of subseries were computed. Then, the subseries with higher energy values (i.e. higher than average energy amount) were considered as inputs in the RSF modeling process. The energy values for the RSF(t 1) of station 1 are shown in Figure 6(b). From Figure 6(b), IMFs 4, 5, and 6 obtained via the EEMD and the IMFs 6, 7, and 8 obtained via the VMD were chosen as proper IMFs regarding their higher energy amounts compared with the other IMFs.

Figure 6

(a) Decomposed RSF(t − 1) time series for station 1 and (b) energy values of the obtained IMFs via EEMD and VMD.

Figure 6

(a) Decomposed RSF(t − 1) time series for station 1 and (b) energy values of the obtained IMFs via EEMD and VMD.

Close modal

Table 2 shows the ensemble data processing model results. Via comparing the presented results in Figure 4 and Table 2, it was found that the signal processing enhanced the RSF series prediction accuracy. The ensemble methods were more efficient than single AI approaches. As can be seen, for station 1 in state 1 the RMSE value of the GPR model for model M3 was 0.155; the error criterion value for the EEMD- and VMD-GPR reduced to 0.082 and 0.09, respectively. Therefore, using the ensemble methods, the RSF prediction was performed with higher accuracy and these approaches were more successful in the RSF modeling in the study region. Also, the predictive effect of the VMD-GPR method was significantly better than the others. Freire et al. (2019) tried to design and evaluate the efficiency of a wavelet-ANN model for RSF forecasting. They showed that the integrated models (wavelet-ANN) provide acceptable predictions of the RSF. It was found that the wavelet transform is a powerful tool which has a great ability to extract useful information from time series. Consequently, it increases the ANN models' performances significantly. In this study, generally, data pre-processing (VMD and EEMD) increased the modeling accuracy from 20% to 40% in the testing sets. It is indicated that developing models based on the previous stations' data can be considered as a reliable state for the prediction of RSF values in the regions without observational data. Based on Figure 7, the RSF extreme amounts were predicted more accurately by the VMD method.

Table 2

Statistical parameters of the data-processing-AI methods for two states

MethodModelValidating
Test
Validating
Test
Validating
Test
DCRMSEDCRMSEDCRMSEDCRMSEDCRMSEDCRMSE
State1 
  Station 1 Station 2 Station 3 
 VMD-KELM M1 0.785 0.066 0.797 0.094 0.770 0.061 0.784 0.082 0.863 0.059 0.797 0.091 
 EEMD-KELM  0.727 0.088 0.693 0.102 0.713 0.081 0.682 0.089 0.799 0.079 0.693 0.099 
 VMD-KELM M2 0.853 0.064 0.836 0.089 0.815 0.060 0.807 0.078 0.914 0.059 0.838 0.089 
 EEMD-KELM  0.790 0.085 0.727 0.097 0.755 0.080 0.702 0.085 0.846 0.078 0.729 0.097 
 VMD-KELM M3 0.881 0.061 0.935 0.086 0.858 0.054 0.843 0.076 0.961 0.052 0.890 0.086 
 EEMD-KELM  0.816 0.081 0.813 0.094 0.794 0.078 0.733 0.083 0.890 0.076 0.774 0.094 
 VMD-GPR M1 0.816 0.062 0.821 0.087 0.800 0.058 0.814 0.078 0.907 0.056 0.831 0.086 
 EEMD-GPR  0.756 0.083 0.714 0.095 0.741 0.077 0.708 0.085 0.840 0.074 0.723 0.093 
 VMD-GPR M2 0.888 0.060 0.858 0.086 0.848 0.059 0.838 0.075 0.947 0.054 0.884 0.083 
 EEMD-GPR  0.822 0.080 0.746 0.093 0.785 0.075 0.729 0.081 0.877 0.072 0.769 0.090 
 VMD-GPR M3 0.922 0.057 0.921 0.082 0.918 0.052 0.915 0.072 0.964 0.048 0.937 0.071 
 EEMD-GPR  0.854 0.076 0.835 0.090 0.826 0.074 0.796 0.078 0.893 0.071 0.815 0.088 
State 2 
  Station 2  Station 3    
 VMD-KELM M4 0.863 0.059 0.897 0.077 M7 0.944 0.044 0.949 0.058    
 EEMD-KELM  0.799 0.078 0.780 0.084  0.874 0.058 0.825 0.063    
 VMD-KELM M5 0.890 0.052 0.920 0.069 M8 0.902 0.073 0.828 0.098    
 EEMD-KELM  0.824 0.069 0.800 0.075  0.835 0.097 0.720 0.106    
 VMD-KELM M6 0.888 0.053 0.929 0.068 M9 0.644 0.090 0.570 0.124    
 EEMD-KELM  0.822 0.071 0.808 0.074  0.596 0.120 0.496 0.135    
 VMD-GPR M4 0.936 0.047 0.944 0.060 M7 0.978 0.038 0.954 0.047    
 EEMD-GPR  0.867 0.062 0.847 0.065  0.886 0.050 0.883 0.051    
 VMD-GPR M5 0.967 0.039 0.953 0.058 M8 0.940 0.073 0.868 0.093    
 EEMD-GPR  0.895 0.052 0.870 0.063  0.870 0.097 0.755 0.101    
 VMD-GPR M6 0.962 0.041 0.957 0.055 M9 0.671 0.084 0.588 0.116    
 EEMD-GPR  0.891 0.054 0.877 0.060  0.621 0.112 0.511 0.126    
MethodModelValidating
Test
Validating
Test
Validating
Test
DCRMSEDCRMSEDCRMSEDCRMSEDCRMSEDCRMSE
State1 
  Station 1 Station 2 Station 3 
 VMD-KELM M1 0.785 0.066 0.797 0.094 0.770 0.061 0.784 0.082 0.863 0.059 0.797 0.091 
 EEMD-KELM  0.727 0.088 0.693 0.102 0.713 0.081 0.682 0.089 0.799 0.079 0.693 0.099 
 VMD-KELM M2 0.853 0.064 0.836 0.089 0.815 0.060 0.807 0.078 0.914 0.059 0.838 0.089 
 EEMD-KELM  0.790 0.085 0.727 0.097 0.755 0.080 0.702 0.085 0.846 0.078 0.729 0.097 
 VMD-KELM M3 0.881 0.061 0.935 0.086 0.858 0.054 0.843 0.076 0.961 0.052 0.890 0.086 
 EEMD-KELM  0.816 0.081 0.813 0.094 0.794 0.078 0.733 0.083 0.890 0.076 0.774 0.094 
 VMD-GPR M1 0.816 0.062 0.821 0.087 0.800 0.058 0.814 0.078 0.907 0.056 0.831 0.086 
 EEMD-GPR  0.756 0.083 0.714 0.095 0.741 0.077 0.708 0.085 0.840 0.074 0.723 0.093 
 VMD-GPR M2 0.888 0.060 0.858 0.086 0.848 0.059 0.838 0.075 0.947 0.054 0.884 0.083 
 EEMD-GPR  0.822 0.080 0.746 0.093 0.785 0.075 0.729 0.081 0.877 0.072 0.769 0.090 
 VMD-GPR M3 0.922 0.057 0.921 0.082 0.918 0.052 0.915 0.072 0.964 0.048 0.937 0.071 
 EEMD-GPR  0.854 0.076 0.835 0.090 0.826 0.074 0.796 0.078 0.893 0.071 0.815 0.088 
State 2 
  Station 2  Station 3    
 VMD-KELM M4 0.863 0.059 0.897 0.077 M7 0.944 0.044 0.949 0.058    
 EEMD-KELM  0.799 0.078 0.780 0.084  0.874 0.058 0.825 0.063    
 VMD-KELM M5 0.890 0.052 0.920 0.069 M8 0.902 0.073 0.828 0.098    
 EEMD-KELM  0.824 0.069 0.800 0.075  0.835 0.097 0.720 0.106    
 VMD-KELM M6 0.888 0.053 0.929 0.068 M9 0.644 0.090 0.570 0.124    
 EEMD-KELM  0.822 0.071 0.808 0.074  0.596 0.120 0.496 0.135    
 VMD-GPR M4 0.936 0.047 0.944 0.060 M7 0.978 0.038 0.954 0.047    
 EEMD-GPR  0.867 0.062 0.847 0.065  0.886 0.050 0.883 0.051    
 VMD-GPR M5 0.967 0.039 0.953 0.058 M8 0.940 0.073 0.868 0.093    
 EEMD-GPR  0.895 0.052 0.870 0.063  0.870 0.097 0.755 0.101    
 VMD-GPR M6 0.962 0.041 0.957 0.055 M9 0.671 0.084 0.588 0.116    
 EEMD-GPR  0.891 0.054 0.877 0.060  0.621 0.112 0.511 0.126    
Figure 7

Scatterplots of modeling via VMD-GPR superior models for both states.

Figure 7

Scatterplots of modeling via VMD-GPR superior models for both states.

Close modal

Constructed PIs using LUBE methods for the RSF series

For assessing the LUBE method capability in constructing the RSF series PIs, the IMF subseries were used as inputs of the GPR model to project the RSF values (point predictions) and related PIs. Since the forecasting results of the single GPR model were somewhat better than that of the KELM, this method was used with the aim of PI prediction. The associated confidence level for all PIs was considered as 95%. Two PICP and NMPIW criteria were used to determine the GPR optimum structure in the PIs constructed (considering the maximum amount for PICP and minimum amount for NMPIW). Table 3 and Figure 8 show the obtained results. As can be seen, the PICP values for the hybrid VMD-GPR model were up to 18%, 21%, and 16% higher than the single GPR, while the NMPIW values were up to 29%, 33%, and 30% less than the single GPR for stations 1, 2, and 3 in state 1, respectively. In state 2, the PICP values for the VMD-GPR model were up to 13% and 10% higher than the GPR, while the NMPIW values were up to 40% and 33% less than the GPR for stations 2 and 3, respectively. Therefore, the hybrid VMD-GPR model led to better outcomes both in point and interval predictions. Figure 8 shows the PIs obtained for the superior model (VMD-GPR) of station 1 in state 1. According to Figure 8, the lower and upper bound amounts were close to each other. This narrow band represented the low value for the NMPIW parameter.

Table 3

Constructed PI results for two states of LUBE method

Station 1
Station 2
Station 3
ModelPICPNMPIWPICPNMPIWPICPNMPIW
State 1 
 GPR 0.77 2.33 0.75 2.57 0.78 2.55 
 EEMD-GPR 0.91 2.08 0.90 2.22 0.89 2.23 
 VMD-GPR 0.94 1.65 0.95 1.72 0.93 1.76 
State 2 
 GPR   0.84 0.98 0.86 0.85 
 EEMD-GPR   0.91 0.79 0.91 0.77 
 VMD-GPR   0.97 0.51 0.96 0.54 
Station 1
Station 2
Station 3
ModelPICPNMPIWPICPNMPIWPICPNMPIW
State 1 
 GPR 0.77 2.33 0.75 2.57 0.78 2.55 
 EEMD-GPR 0.91 2.08 0.90 2.22 0.89 2.23 
 VMD-GPR 0.94 1.65 0.95 1.72 0.93 1.76 
State 2 
 GPR   0.84 0.98 0.86 0.85 
 EEMD-GPR   0.91 0.79 0.91 0.77 
 VMD-GPR   0.97 0.51 0.96 0.54 
Figure 8

Results of the PI construction by the LUBE method for station 1 of state 1.

Figure 8

Results of the PI construction by the LUBE method for station 1 of state 1.

Close modal

Validation of proposed best models using Arkansas River data

For verifying the applied methods' efficiency, datasets for the Arkansas River were used. The Arkansas River is a major tributary of the Mississippi River. It generally flows to the east and southeast as it traverses the US states of Colorado, Kansas, Oklahoma, and Arkansas. At 2,364 km, it is the sixth-longest river in the United States, the second-longest tributary in the Mississippi–Missouri system, and the 45th longest river in the world. Its drainage basin covers nearly 440,000 km2. Its volume is much smaller than the Missouri and Ohio rivers, with a mean discharge of about 40,000 ft3/s. In this regard, for each state (i.e. modeling based on the station's own data and the previous stations' data) the superior model was run using the single and integrated models and the results were compared with each other. The obtained results are listed in Table 4. As can seen from Table 4, the integrated models led to the desired accuracy and the efficiency of the VMD-GPR was better than the EEMD-GPR and GPR. Based on the PI results, it can be deduced that the VMD-GPR method had an allowable degree of uncertainty in the RSF modeling.

Table 4

Statistical parameters of the applied models for the superior models of each state via Arkansas River data

ModelTesting
PICPNMPIWTesting
PICPNMPIW
DCRMSEDCRMSE
State 1 
 Station 1 Station 2 
 GPR 0.684 0.123 0.75 2.24 0.695 0.112 0.74 2.67 
 EEMD-GPR 0.835 0.090 0.90 2.10 0.796 0.078 0.91 2.12 
 VMD-GPR 0.921 0.082 0.92 1.54 0.915 0.072 0.93 1.84 
 Station 3     
 GPR 0.701 0.128 0.74 2.61     
 EEMD-GPR 0.815 0.088 0.88 2.13     
 VMD-GPR 0.937 0.071 0.94 1.84     
State 2 
 Station 2 Station 3 
 GPR 0.745 0.094 0.85 0.101 0.784 0.086 0.85 0.98 
 EEMD-GPR 0.877 0.060 0.93 0.87 0.883 0.051 0.92 0.81 
 VMD-GPR 0.957 0.055 0.96 0.63 0.954 0.047 0.95 0.64 
ModelTesting
PICPNMPIWTesting
PICPNMPIW
DCRMSEDCRMSE
State 1 
 Station 1 Station 2 
 GPR 0.684 0.123 0.75 2.24 0.695 0.112 0.74 2.67 
 EEMD-GPR 0.835 0.090 0.90 2.10 0.796 0.078 0.91 2.12 
 VMD-GPR 0.921 0.082 0.92 1.54 0.915 0.072 0.93 1.84 
 Station 3     
 GPR 0.701 0.128 0.74 2.61     
 EEMD-GPR 0.815 0.088 0.88 2.13     
 VMD-GPR 0.937 0.071 0.94 1.84     
State 2 
 Station 2 Station 3 
 GPR 0.745 0.094 0.85 0.101 0.784 0.086 0.85 0.98 
 EEMD-GPR 0.877 0.060 0.93 0.87 0.883 0.051 0.92 0.81 
 VMD-GPR 0.957 0.055 0.96 0.63 0.954 0.047 0.95 0.64 

Monthly streamflow prediction with high and stable performance is of great strategic significance and application value in formulating the rational allocation and optimal operation of water resources. In this study, in order to improve AI model capability, the novel EEMD and VMD hybrid methods were used for point and interval streamflow forecasting. For this purpose, first, the original time series were inserted into AI methods. In the next step, time series were decomposed into several IMF subseries using EEMD and VMD. The energy values of the IMFs were computed and the selected subseries based on their energy values were applied as AI method inputs. According to the obtained statistical indicators, it was found that the single AI approaches led to poor predictions. It was shown that the use of EEMD and VMD data processing approaches enhanced the prediction accuracy approximately 20% to 40%. However, the obtained results revealed that the VMD method performed more successfully than the EEMD method. Also, via the integrated models the maximum and minimum amounts of river streamflow were predicted with high accuracy. On the other hand, point predictions via AI methods do not convey any details about uncertainty of prediction; therefore, PI can be an essential index to quantify the reliability of AI-based modeling of hydrological time series. In this study, PIs of a GPR-based model were obtained using the LUBE method. For the stations without data, the streamflow was successfully predicted based only on streamflow parameters related to the previous stations. As the most important conclusion, applying the concept of data processing based on VMD resulted in significant improvements in predictions. Obtained results indicated that VMD-GPR led to more desirable results. The PICP was approximately 15% higher and the NMPIW was approximately 33% lower than the GPR method. The VMD-GPR model was successfully applied to three hydrological stations on the Housatonic River, to predict river streamflow time series. For verifying the applied methods' efficiency in this study, datasets for the Arkansas River were used. The results obtained in this study supported the premise that hybrid data-processing models have considerable potential and promise to be used as an alternative approach for river streamflow forecasting in different river basins. These results are very significant for flood prevention and river flow assessment. Therefore, the use of applied methods in this study could be useful for robust water resources engineering and management. It should, however, be noted that the AI models are data-driven models and the AI-based models are data-sensitive, so studies on more rivers are required to conclusively prove the advantages of the proposed models in estimating the RSF series. In addition, the efficiency and adaptability of the proposed models with other input variables and optimization algorithms could be further investigated.

All relevant data are included in the paper or its Supplementary Information.

Adamowski
K.
,
Prokoph
A.
&
Adamowski
J.
2009
Development of a new method of wavelet aided trend detection and estimation
.
Hydrology Processes
23
(
18
),
2686
2696
.
Adnan
R. M.
,
Yuan
X.
,
Kisi
O.
&
Yuan
Y.
2017
Streamflow forecasting using artificial neural network and support vector machine models
.
American Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS)
29
(
1
),
286
294
.
Chadalawada
J.
,
Havlicek
V.
&
Babovic
V.
2017
A genetic programming approach to system identification of rainfall–runoff models
.
Water Resources Management
31
(
12
),
3975
3992
.
Chen
J. L.
,
Yang
H.
,
Lv
M. Q.
,
Xiao
Z. L.
&
Wu
S. J.
2019
Estimation of monthly pan evaporation using support vector machine in Three Gorges Reservoir Area, China
.
Theoretical and Applied Climatology
138
(
1
),
1095
1107
.
Ebtehaj
I.
,
Bonakdari
H.
&
Zaji
A. H.
2016
A nonlinear simulation method based on a combination of multilayer perceptron and decision trees for predicting non-deposition sediment transport
.
Water Science and Technology: Water Supply
16
(
5
),
1198
1206
.
Freire
P. K. d. M. M.
,
Santos
C. A. G.
&
da Silva
G. B. L.
2019
Analysis of the use of discrete wavelet transforms coupled with ANN for short-term streamflow forecasting
.
Applied Soft Computing
80
,
494
505
.
Garg
V.
&
Jothiprakash
V.
2013
Evaluation of reservoir sedimentation using data driven techniques
.
Applied Soft Computing
13
(
8
),
3567
3581
.
Geng
G.
,
Wu
J.
,
Wang
Q.
,
Lei
T.
,
He
B.
,
Li
X.
,
Mo
X.
,
Luo
H.
,
Zhou
H.
&
Liu
D.
2016
Agricultural drought hazard analysis during 1980–2008: a global perspective
.
International Journal of Climatology
36
(
1
),
389
399
.
Gill
M. K.
,
Asefa
T.
,
Kemblowski
M. W.
&
Makee
M.
2006
Soil moisture prediction using Support Vector Machines
.
JAWRA Journal of the American Water Resources Association
42
(
4
),
1033
1046
.
Hipni
A.
,
El-Shafie
A.
,
Najah
A.
,
Karim
O. A.
,
Hussain
A.
&
Mukhlisin
M.
2013
Daily forecasting of dam water levels: comparing a support vector machine (SVM) model with adaptive neuro fuzzy inference system (ANFIS)
.
Water Resources Management
27
(
10
),
3803
3823
.
Huang
N. E.
,
Shen
Z.
,
Long
S. R.
,
Wu
M. C.
,
Shih
H. H.
,
Zheng
Q.
,
Yen
N. C.
,
Tung
C. C.
&
Liu
H. H.
1998
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis
.
Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences
454
(
1971
),
903
995
.
Huang
G. B.
,
Zhu
Q. Y.
&
Siew
C. K.
2006
Extreme learning machine: theory and applications
.
Neurocomputing
70
(
1–3
),
489
501
.
Kasiviswanathan
K. S.
,
Cibin
R.
,
Sudheer
K. P.
&
Chaubey
I.
2013
Constructing prediction interval for artificial neural network rainfall runoff models based on ensemble simulations
.
Journal of Hydrology
499
,
275
288
.
Khosravi
A.
,
Nahavandi
S.
,
Creighton
D.
&
Atiya
A. F.
2011
Comprehensive review of neural network-based prediction intervals and new advances
.
IEEE Transactions on Neural Networks
22
(
9
),
1341
1356
.
Krupnick
A.
,
Morgenstern
R.
,
Batz
M.
,
Nelson
P.
,
Burtraw
D.
,
Shih
J.-S.
&
McWilliams
M.
2006
Not a Sure Thing:Making Regulatory Choices under Uncertainty
.
Resources for the Future
,
Washington, DC, USA
.
Lei
Y.
,
He
Z.
&
Zi
Y.
2009
Application of the EEMD method to rotor fault diagnosis of rotating machinery
.
Mechanical Systems and Signal Processing
23
(
4
),
1327
1338
.
Li
Y.
,
Liang
Z.
,
Hu
Y.
,
Li
B.
,
Xu
B.
&
Wang
D.
2020
A multi-model integration method for monthly streamflow prediction: modified stacking ensemble strategy
.
Journal of Hydroinformatics
22
(
2
),
310
326
.
Madadgar
S.
&
Moradkhani
H.
2013
A Bayesian framework for probabilistic seasonal drought forecasting
.
Hydrometeorology
14
(
6
),
1685
1705
.
Meade
N.
&
Islam
T.
1995
Prediction intervals for growth curve forecasts
.
Journal of Forecasting
14
,
413
430
.
Mehr
A. D.
,
Kahya
E.
&
Özger
M.
2014
A gene–wavelet model for long lead time drought forecasting
.
Hydrology
517
,
691
699
.
Mishra
A. K.
,
Desai
V. R.
&
Singh
V. P.
2007
Drought forecasting using a hybrid stochastic and neural network model
.
Journal of Hydrologic Engineering
12
(
6
),
626
638
.
Rasmussen
C. E.
&
Nickisch
H.
2010
Gaussian processes for machine learning (GPML) toolbox
.
The Journal of Machine Learning Research
11
,
3011
3015
.
Rasmussen
C. E.
&
Williams
C. K. I.
2006
Gaussian Processes for Machine Learning
.
The MIT Press
,
Cambridge, MA, USA
.
Shrestha
D. L.
&
Solomatine
D. P.
2008
Data-driven approaches for estimating uncertainty in rainfall–runoff modelling
.
International Journal of River Basin Management
6
,
109
122
.
Sudheer
C.
,
Maheswaran
R.
,
Panigrahi
B. K.
&
Mathur
S.
2014
A hybrid SVM-PSO model for forecasting monthly streamflow
.
Neural Computing and Applications
24
(
6
),
1381
1389
.
Talei
A.
,
Chua
L. H. C.
&
Quek
C.
2010
A novel application of a neuro-fuzzy computational technique in event-based rainfall– runoff modelling
.
Expert Systems with Applications
37
,
7456
7468
.
Wu
Z. H.
&
Huang
N. E.
2009
Ensemble empirical mode decomposition: a noise assisted data analysis method
.
Advances in Adaptive Data Analysis
1
,
1
41
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).