Sediment transportation and accurate estimation of its rate is a significant issue for river engineers and researchers. In this study, the capability of kernel based approaches including Kernel Extreme Learning Machine (KELM) and Gaussian Process Regression (GPR) was assessed for predicting the river daily Suspended Sediment Discharge (SSD). For this aim, the Mississippi River, with three consecutive hydrometric stations, was selected as the case study. Based on the sediment and flow characteristics during the period of 2005–2008, several models were developed and tested under two scenarios (i.e. modeling based on each station's own data or the previous stations' data). Two post-processing techniques, namely Wavelet Transform (WT) and Ensemble Empirical Mode Decomposition (EEMD), were used for enhancing the SSD modeling capability. Also, data post-proceeding was done using Simple Linear Averaging (SLAM) and Nonlinear Kernel Extreme Learning Machine Ensemble (NKELME) methods. Obtained results indicated that the integrated models resulted in more accurate outcomes. Data processing enhanced the models' capability up to 35%. It was found that SSD modeling based on the station's own data led to better results; however, using the integrated approaches, the previous station's data could be applied successfully for the SSD modeling when a station's own data were not available.

  • Merge the advantages of the pre-post-processing and kernel based techniques for suspended sediment discharge prediction.

  • Two states of modeling based on a station's own data or the previous stations’ data were investigated.

  • Integrated hybrid techniques outperformed the single meta-model approaches.

Graphical Abstract

Graphical Abstract

Sediment transport is one of the most important issues in river engineering. So far, various and complex relationships have been proposed to predict the amount of suspended sediment transport rate such as velocity and critical shear stress based equations. However, the complex nature of sediment transport and lack of validated models make it difficult to model the suspended sediment concentration and suspended sediment discharge carried by rivers. Bhattacharya et al. (2004) stated that it is difficult to express the transport process through a deterministic mathematical framework. Based on laboratory experiments, Vongvisessomjai et al. (2010) studied the non-cohesive sediment transport in uniform flow at a non-deposition state. Harrington & Harrington (2013) evaluated the efficiency of the Sediment Rating Curve (SRC) method in modeling the suspended sediment load of the Bandon and Owenabue rivers in Ireland. Rajaee et al. (2009) and Chen & Chau (2016) indicated that the sediment rating curve and the auto-regressive integrated moving average model are inadequate to predict suspended sediment concentration under extreme hyperconcentrated flow conditions. Although the mentioned models led to promising results in sediment transport prediction, however, due to the importance of sediment transport and its impact on hydraulic structures, it is necessary to use other methods with higher efficiency (Morianou et al. 2017).

In recent years, the intelligence techniques such as Artificial Neural Networks (ANNs), Neuro-Fuzzy (NF), Genetic Programming (GP), Multivariate Adaptive Regression Splines (MARS), Kernel Extreme Learning Machine (KELM), and Gaussian Process Regression (GPR) have been used in assessing the complex hydraulic and hydrological phenomena (Roushangar & Ghasempour 2018) such as estimation of reference evapotranspiration (Yin et al. 2017), daily suspended sediment concentration modeling (Kaveh et al. 2017), bedload discharge modeling (Roushangar & Ghasempour 2018), side weir discharge coefficient modeling (Azamathulla et al. 2017), relative energy dissipation prediction (Saghebian 2019), roughness coefficient prediction in sever pipes (Roushangar et al. 2020), and modeling form resistance coefficient of movable bed channels (Saghebian et al. 2020). In general, the task of a machine learning algorithm can be described as follows: given a set of input variables and the associated output variable(s), the objective is learning a functional relationship for the input-output variables set. It should be noted that artificial intelligence models typically do not really represent the physics of a modeled process; they are just devices used to capture relationships between the relevant input and output variables. However, when the interrelationships among the relevant variables are poorly understood, finding the size and shape of the ultimate solution is difficult, and conventional mathematical analysis methods do not (or cannot) provide analytical solutions; these methods can predict the variable of interest with more accuracy.

On the other hand, hybrid models based on signal decomposition can be effective in increasing the time series prediction methods efficiency (Pachori et al. 2015). Wavelet analysis is one of the most common approaches used by researchers for this aim. Additionally, the Empirical Mode Decomposition (EMD) method, which is suitable for nonlinear and non-stationary time series (Huang et al. 1998), has been used recently. Unlike wavelet decomposition, empirical mode decomposition extracts the data oscillatory mode components without a priori determining the basis functions or level of decomposition (Labate et al. 2013). Also, according to Partalas et al. (2008) and Cloke & Pappenberger (2009) there is no unique model that is superior to others in all cases and the performances of different models may be different according to the condition of each intended parameter. Therefore, it is verified that the combination of outputs (from different models) through an ensemble method may lead to more accurate results. Integrating different models using an ensemble model as a post-processing method will represent different aspects of the underlying patterns more accurately (Zhang 2003).

Assessment of suspended sediment discharge is not a trivial matter, due to the multitude of factors influencing this parameter (e.g. flow and sediment conditions, bed material, bedforms, prophase sediment concentration, etc.). Due to the complex nature of sediment transport, the main aim of this study was temporal and special investigation of the SSD for successive hydrometric stations. Therefore, the Kernel Extreme Learning Machine (KELM) and Gaussian Process Regression (GPR) were used, as kernel based approaches. Kernel based approaches, based on quadratic optimization of convex function, can easily switch from linear to nonlinear separation. This is realized by nonlinear mapping using so-called kernel functions. Kernel based approaches such as KELM and GPR are a relatively new important method based on the different kernel types. Such models are based on statistical learning theory and are capable of adapting themselves to predict any variable of interest via sufficient inputs. Their training is fast, has high accuracy, and the probability of occurrence of data overtraining in this method is less. Also, Discrete Wavelet Transform (DWT) and Ensemble Empirical Mode Decomposition (EEMD) were used as pre-processing methods to improve the models' efficiency. In this regard, temporal features of the time series were decomposed via DWT and further breaking down was done via EEMD to obtain features with higher stationary properties. The time series decomposition to various periodicity scales with WT and further decomposition of sub-series using EEMD can lead to more stationary sub-series. Then, the sub-series energy values were calculated and the sub-series with higher energy were used as inputs in kernel based approaches to produce a multi-scale model for modeling the SSD time series. In the next step, the impacts of linear and nonlinear ensemble methods as post-processing approaches for improving the models' efficiency were assessed. For this aim, Simple Linear Averaging (SLAM) model was used as a linear ensemble method and Nonlinear Kernel Extreme Learning Machine Ensemble (NKELME) model was used as a nonlinear ensemble method. Between the GPR and KELM models, the KELM method was used for nonlinear ensembling. For evaluating the applied methods, daily sediment and flow data of the Mississippi River in the period of 2005–2008 were used and under two states various models were developed, in which at first, the intended parameter of each station was estimated using the station's own data, and then, the SSD was estimated using the previous stations' data.

Study area

The Mississippi River is the second longest river and chief river of the second-largest drainage system on the North American continent. From its traditional source of Lake Itasca in northern Minnesota, it flows generally south for 3,730 km to the Mississippi River Delta in the Gulf of Mexico. In the current study, daily data of the Mississippi River during the period of 2005–2008 were used. The Mississippi River basin has been repeatedly impacted by glacial ice during the last 2.5–3.0 million years. The continental glaciers eroded and deposited great masses of sediment and in many instances they were responsible for major adjustments in the direction and patterns of river drainage. Data on daily sediment load were available for areas in the USGS site where sufficient information on the sediment concentration and water discharge were obtained. The database on sediment records includes daily streamflow, daily mean concentration of suspended sediment, and daily suspended-sediment discharge. Three consecutive stations, namely station 1 (7,010,000), station 2 (7,020,500), and station 3 (7,022,000) were selected and suspended sediment discharge was investigated. During modeling, data from previous days at a determined point were used to predict the current suspended sediment discharge values and compare them with observed data. Although information about any changes in the study area (i.e. land use changes, erosion or deposition of the river bed, sediment conditions, etc.) was not available; however, the impact of these factors was reflected on the datasets and they had impact on modeling process. Table 1 shows the statistical characteristics of the selected stations. In this table, parameters Cs, Qs, and Wd are suspended sediment concentration, suspended sediment discharge, and flow discharge, respectively. Also, Figure 1 shows the location of the selected stations.

Table 1

Characteristics of the Mississippi River consecutive hydrometric stations for 2005–2008 periods

Hydrometric stationCs (mg/lit)
Wd (ft3/s)
Qs (ton/day)
MaxMinMaxMinMaxMin
716,000 63,000 1,510 59.6 200,000 12,600 
695,000 64,100 1,650 44.8 2,150,000 10,900 
710,000 68,300 1,260 40.3 1,740,000 11,300 
Hydrometric stationCs (mg/lit)
Wd (ft3/s)
Qs (ton/day)
MaxMinMaxMinMaxMin
716,000 63,000 1,510 59.6 200,000 12,600 
695,000 64,100 1,650 44.8 2,150,000 10,900 
710,000 68,300 1,260 40.3 1,740,000 11,300 
Figure 1

The location of the selected consecutive stations of Mississippi River.

Figure 1

The location of the selected consecutive stations of Mississippi River.

Close modal

Pre-processing approaches

One of the most popular approaches in time series processing is the Wavelet Transform (WT) (Farajzadeh & Alizadeh 2017). The WT uses the flexible window function (mother wavelet) in signal processing. The flexible window function can be changed over time according to the signal shape and compactness (Mehr et al. 2013). After using WT, the signal will decompose into two approximation (large-scale or low-frequency component) and detailed (small-scale component) components. An illustration of a 3-levels WT is shown in in Figure 2. In the first level, the original signal (x) is decomposed to two components of approximation (A1) and detailed (D1). In the second level, A1 is again decomposed to approximation (A2) and detailed (D2) components. Finally, in the third level, A2 is decomposed to A3 approximation and D3 detailed components. The sum of all detailed sub-series and approximation series obtained from the third level will be the original signal (i.e. X = D1 + D2 + D3 + A3). The other approach for time series processing is Empirical Mode Decomposition (EMD). The EMD method is an effective self-adaptive dyadic filter bank which is applied to the white noise (a random signal which has equal intensity at different frequencies). By applying this method, each signal can be decomposed into a number of Inherent Mode Functions (IMFs), which can be used to process nonlinear and non-stationary signals. One of the advantages of this method is the ability to determine the instantaneous frequency of the signal. At each step of the signal decomposition into its frequency components, the high frequency components are separated first and this process must continue until the component with the lowest frequency remains (see Lei et al. 2009 for more details). EEMD is developed based on EMD. The main benefit of EEMD is solving the mode mixing problem of EMD, which determines the true IMF as the mean of an ensemble of trials (Wu & Huang 2009). The EEMD algorithm can be described as: (1) for a given signal x(t), random white noise is added to the signal, (2) the noise-added signal is decomposed using EMD for obtaining IMF series, (3) steps 1 and 2 are repeated until the number of added white noises is greater than or equal to the number of trials, (4) for obtaining the ensemble IMF, the average of the sum of all IMFs is computed (Ij(t)), and (5) the original signal is formed as . For selecting the most effective IMFs and using them as inputs in the modeling process, their energy values can be calculated and the IMFs with higher energy can be used as inputs.

Figure 2

The steps of a time series decomposition into detail (D) and approximation (A) sub-series.

Figure 2

The steps of a time series decomposition into detail (D) and approximation (A) sub-series.

Close modal

Kernel based approaches

Kernel based (KB) approaches are new methods that are used for classification and regression purposes (Roushangar et al. 2019). Two important kernel based approaches are Gaussian Process Regression (GPR) and Kernel Extreme Learning Machine (KELM), which work based on the different kernel types such as Linear, Polynomial, Radial Basis Function (RBF), and Sigmoid functions in SVM and Exponential, Squared Exponential, Rational Quadratic, ARD Rational Quadratic, and so on in GPR. Kernel based approaches are based on statistical learning theory initiated and can be used for modeling the complex and nonlinear phenomenon. The kernel type affects the training and classification precision; therefore, the most important step in kernel based approaches is the appropriate selection of kernel type. These methods are memory intensive, and trickier to tune due to the importance of picking the right kernel (Babovic 2009). The aim of the KB approaches is to determine a function that has the most deviation from the actual target vectors for all given training data (Roushangar et al. 2020).

Extreme Learning Machine (ELM) is a Single Layer Feed Forward Neural Network (SLFFNN) preparing method initially introduced by Huang et al. (2006). SLFFNN is a straight framework where information weights linked to hidden neurons and hidden layer biases are randomly selected, while the weights among the hidden nodes are resolved logically. This strategy likewise has preferred execution and adapts progressively over the bygone era learning methods (Huang et al. 2006), in light of the fact that, differently from traditional techniques that involve numerous variables to set up, demonstrating a complex issue utilizing this technique does not need much human intercession to accomplish ideal parameters. The standard single-layer neural system with N random information (xi,yi) (where ), M hidden neurons, and the active function f(a) is shown as follows:
(1)
where is the weight vector that joins the input layer to the hidden layer, is the weight vector that joins the hidden layer to the target layer; ci shows the hidden neuron biases. The general SLFFNN network aim is minimizing the difference between the predicted (Oj) and target (Tj) values, which can be expressed as below:
(2)
Equation (2) can be summarized as:
(3)
where
(4)
(5)

The matrix T is identified as the target matrix of the hidden layers of the neural network. H is considered as the output matrix of the neural network. Huang et al. (2012) also introduced kernel functions in the design of ELM. Presently, a number of kernel functions are used in the design of ELM such as linear, radial basis, normalized polynomial, and polynomial kernel functions. Kernel function based ELM design is called Kernel Extreme Learning Machine (KELM). For more detail about KELM, readers and researchers are referred to Huang et al. (2012).

Also, deriving from the Bayesian framework, GPR can be considered as a random process to carry out nonparametric regression with the Gaussian process (Rasmussen & Nickisch 2010). In most prediction issues, GPR is preferred owing to its flexibility to provide the uncertainty representations.

Post-processing approaches

Ensembling the outputs of several artificial intelligence models can be considered as a useful method to increase the accuracy of time series prediction (Makridakis & Winkler 1983). In this study, in order to increase the models' efficiency two methods (i.e. the Simple Linear Averaging Method (SLAM) and the Nonlinear Kernel Extreme Learning Machine Ensemble (NKELME)) were used. The simple linear average can be expressed as follows:
(6)
where : the simple ensemble model output, gi(t): the output of the ith individual model (i.e. outputs of KELM and GPR), and D: single models number (i.e. T = 2).

Performance criteria

In the current study, the proposed model's efficiency was assessed via Correlation Coefficient (R), Determination Coefficient (DC), and Root Mean Square Errors (RMSE) criteria as follows:
(7)
where ,,,, N are the observed values, predicted values, mean observed values, mean predicted values, and number of data samples, respectively. Evidently, a high value for R and DC (up to one) and a small value for RMSE indicate a high efficiency of the model. It should be noted that in this study, all input variables were scaled between 0–1 in order to eliminate the input and output variables dimensions.

Models developing

Selection of appropriate variables as inputs is the most important step in modeling by means of machine learning methods. Based on Rajaee (2011), Himanshu et al. (2017), Nourani et al. (2019), and Salih et al. (2020), the most effective parameters in suspended sediment discharge are Cs, Qs, and Wd. Therefore, in this research, previous values of daily suspended sediment discharge and concentration and flow discharge over the period of 2005–2008 were used as inputs to model the SSD values. In modeling process, two states were considered: in the first state, the SSD parameter of selected stations was predicted based on each station's own data, and in the second state, modeling was done based on the data from previous stations. Also, the impact of data pre-processing on improving the models' accuracy was assessed using WT and EEMD methods. According to Aussem et al. (1998), the minimum decomposition level in WT method can be obtained as follows:
(8)
where L is the decomposition level and N is the length of the time series (Farajzadeh & Alizadeh 2017). In this study, L = 3 was used as the decomposition level value. The developed input combinations for the SSD prediction are shown in Table 2. Also, in Figure 3, the considered modeling process is shown. The WT is computed by successively passing a signal through high-pass filters and a low-pass filter, producing detail (D) and approximation (A) coefficients. Since the used input series were non-stationary and also the first and second detailed sub-series (high frequency sub-series) had a weak correlation with the target data, EEMD was used to decompose the sub-series again. Therefore, high frequency detailed sub-series were further decomposed to different number of IMFs. Further breakdown by EEMD caused sub-series to be obtained with more stationary properties and less noise. Since decomposing in two levels led to a number of sub-series (D3, A3, and several IMFs), feeding them into meta-models might have caused over-training and other issues. Therefore, the energy values of sub-series were calculated using Equation (9) to decrease the number of inputs and select dominant sub-series.
(9)
where S is the amount of SSD time series decomposed via EEMD, n is the month of the time series, and E denotes the energy value. In the next step, the linear and nonlinear post-processing methods were used and using three performance criteria, the results were compared. To compare the performance of applied pre-post-processing models, the total data were divided into three sets: the training, validation, and testing sets. The first 70% of the whole data were used for training the models and the last 30% of data were used for validating and testing the models (15% for validating and 15% for testing).
Table 2

KELM and GPR developed models

ModelInput variableOutput variable
Modeling based on the each station’ own data    
(I) QS (t-1) QS (t)    
(II) QS (t-1), QS (t-2)     
(III) CS (t-1), CS (t-2), Qs (t-1), Wd (t-1)     
ModelInput variableOutput variableMoselInput variableOutput variable
Modeling based on the previous station's data 
P(I) QS1 (t-1) QS2 (t) S(I) QS2 (t-1) QS3 (t) 
P(II) QS1 (t-1), QS1 (t-2)  S(II) QS2 (t-1), QS2 (t-2)  
P(III) QS1 (t), QS1 (t-1)  S(III) QS2 (t), QS2 (t-1)  
   S(IV) QS1, 2 (t-1), QS1, 2 (t-2), QS1, 2 (t-3)  
ModelInput variableOutput variable
Modeling based on the each station’ own data    
(I) QS (t-1) QS (t)    
(II) QS (t-1), QS (t-2)     
(III) CS (t-1), CS (t-2), Qs (t-1), Wd (t-1)     
ModelInput variableOutput variableMoselInput variableOutput variable
Modeling based on the previous station's data 
P(I) QS1 (t-1) QS2 (t) S(I) QS2 (t-1) QS3 (t) 
P(II) QS1 (t-1), QS1 (t-2)  S(II) QS2 (t-1), QS2 (t-2)  
P(III) QS1 (t), QS1 (t-1)  S(III) QS2 (t), QS2 (t-1)  
   S(IV) QS1, 2 (t-1), QS1, 2 (t-2), QS1, 2 (t-3)  

Note: In parameter Qsi, i shows the station number.

Figure 3

Considered modeling process in the study.

Figure 3

Considered modeling process in the study.

Close modal

Kernel based models' development

It should be noted that each artificial intelligence method has its own parameters for achieving the desired results, and the optimized amount of these parameters should be determined. For example, in designing the KELM and GPR approaches, selection of the appropriate type of kernel function is needed. There are various kernel functions that can be used based on the nature of the studied phenomenon. In this research, in order to determine the best performance of the GPR and KELM models and selection of the best-kernel function, the model (III) was tested for the SSD prediction in station 1 via various kernel functions. Figure 4 indicates the results of statistical parameters of different kernel functions for this model. According to Figure 4, the RBF kernel function in the KELM model and squared exponential in the GPR model were found to be the best kernel function. Therefore, the RBF and squared exponential kernels were selected as the core tool of the KELM and GPR, which were applied for the rest of the models.

Figure 4

The statistical parameters of KELM and GPR methods with different kernel functions for test series.

Figure 4

The statistical parameters of KELM and GPR methods with different kernel functions for test series.

Close modal

The results of the SSD modeling

Modeling based on raw data

Accurate prediction of the suspended sediment discharge in rivers or streams is very crucial for sustainable water resources and environmental systems. In this study, the suspended sediment discharge in the selected stations was assessed via the KELM and GPR kernel based approaches. The previous parameters of flow and suspended sediment were used for model development. The results obtained from the developed GPR and KELM models are listed in Table 3 and shown in Figure 5. The results indicated that in the state of modeling based on each station's own data, the model (III) with input parameters of CS(t-1), CS(t-2), Qs(t-1), and Wd(t-1) performed more successfully than the other models. Kisi et al. (2012) showed that in the suspended sediment modeling via genetic programming, the model whose inputs were the current water discharge and one previous water discharge and sediment load performed better. Also, the models whose inputs were the current and one immediate previously recorded water discharge and one and two previous sediment loads, as well as models whose input were the current water discharge and one previous sediment load, were ranked as the second and third best models, respectively. In this study, from the obtained results, it could be stated that the model (II) with two input parameters of Qs(t-1), Qs(t-2) yielded the desired accuracy. Therefore, the sediment discharge could be predicted using only the previous one and two days' sediment discharge variables. In the second state, it could be seen that between the selected stations, the SSD modeling of the third station based on data from both first and second stations performed more successfully. A comparison between the results of two considered states showed that modeling based on the each station's own data led to more desirable results. However, using the previous stations' data in modeling process yielded relatively accurate results; therefore, via the kernel based approaches, the previous stations' data could be used when the station's own data were unavailable. Artificial intelligence methods are very powerful tools and when the interrelationships among the relevant variables are difficult to understand and conventional mathematical analysis methods cannot provide analytical solutions, these methods can be used successfully. Choubin et al. (2018) evaluated the use of a Classification and Regression Tree (CART) model to estimate the SSD based on hydro-meteorological data. They indicated that the CART as an artificial intelligence model can be a helpful tool in basins where hydro-meteorological data are readily available. In this study, the applied methods had desirable efficiency in the SSD modeling under both considered states. It should be noted that, in the second state, an attempt was made to investigate the issue that the existing sub-basins between the consecutive stations may have noticeable impacts on the flow regime of the downstream station. For this state, if there be special conditions between the stations (such as diversion dams, intake structures, tributary, etc.) the relationship between the flow regime of the upstream and downstream may be affected. In this study, there was a tributary between stations 2 and 3. However, since artificial intelligence methods are black-box methods and do not show the physics of the phenomenon and, also, the effect of the tributary is reflected on all data; therefore, by comparing the obtained results for two states of the SSD modeling (i.e. the SSD modeling of station 2 using the station 1 data with the SSD modeling of station 3 using the station 2 data), it seems that the effect of the tributary on the modeling process cannot be accurately expressed.

Table 3

Statistical parameters of the KELM and GPR models for the SSD modeling; State 1 (without data processing)

ModelMethodPerformance criteria
Verification
Testing
Verification
Testing
RDCRMSERDCRMSERDCRMSERDCRMSE
State 1 
Station 1 Station 2 
(I) GPR 0.877 0.818 0.050 0.867 0.788 0.059 (I) GPR 0.847 0.827 0.047 0.837 0.778 0.067 
 KELM 0.891 0.83 0.048 0.88 0.803 0.057  KELM 0.861 0.842 0.045 0.852 0.790 0.065 
(II) GPR 0.887 0.847 0.039 0.877 0.818 0.058 (II) GPR 0.867 0.837 0.032 0.857 0.808 0.060 
 KELM 0.900 0.86 0.038 0.89 0.830 0.056  KELM 0.882 0.854 0.031 0.873 0.822 0.058 
(III) GPR 0.906 0.867 0.033 0.896 0.827 0.056 (III) GPR 0.867 0.857 0.029 0.857 0.827 0.055 
 KELM 0.920 0.88 0.032 0.91 0.841 0.054  KELM 0.888 0.871 0.028 0.870 0.844 0.053 
Station 3         
(I) GPR 0.877 0.837 0.043 0.847 0.808 0.064         
 KELM 0.890 0.850 0.042 0.861 0.820 0.062         
(II) GPR 0.906 0.867 0.036 0.867 0.837 0.054         
 KELM 0.921 0.881 0.035 0.882 0.850 0.052         
(III) GPR 0.926 0.887 0.029 0.887 0.847 0.050         
 KELM 0.943 0.905 0.028 0.905 0.862 0.048         
State 2 
Station 2-1 Station 3-2-1 
P(I) GPR 0.857 0.768 0.039 0.827 0.719 0.097 S(I) GPR 0.857 0.798 0.035 0.847 0.758 0.068 
 KELM 0.872 0.780 0.038 0.841 0.733 0.094  KELM 0.871 0.811 0.034 0.862 0.771 0.066 
P(II) GPR 0.857 0.798 0.039 0.818 0.729 0.095 S(II) GPR 0.867 0.818 0.033 0.847 0.788 0.064 
 KELM 0.873 0.810 0.038 0.832 0.742 0.092  KELM 0.880 0.833 0.032 0.861 0.801 0.062 
P(III) GPR 0.877 0.827 0.033 0.837 0.739 0.093 S(III) GPR 0.877 0.837 0.034 0.857 0.827 0.057 
 KELM 0.890 0.840 0.032 0.854 0.751 0.089  KELM 0.891 0.850 0.033 0.870 0.831 0.055 
        S(IV) GPR 0.887 0.867 0.032 0.877 0.835 0.054 
         KELM 0.914 0.881 0.031 0.891 0.843 0.052 
ModelMethodPerformance criteria
Verification
Testing
Verification
Testing
RDCRMSERDCRMSERDCRMSERDCRMSE
State 1 
Station 1 Station 2 
(I) GPR 0.877 0.818 0.050 0.867 0.788 0.059 (I) GPR 0.847 0.827 0.047 0.837 0.778 0.067 
 KELM 0.891 0.83 0.048 0.88 0.803 0.057  KELM 0.861 0.842 0.045 0.852 0.790 0.065 
(II) GPR 0.887 0.847 0.039 0.877 0.818 0.058 (II) GPR 0.867 0.837 0.032 0.857 0.808 0.060 
 KELM 0.900 0.86 0.038 0.89 0.830 0.056  KELM 0.882 0.854 0.031 0.873 0.822 0.058 
(III) GPR 0.906 0.867 0.033 0.896 0.827 0.056 (III) GPR 0.867 0.857 0.029 0.857 0.827 0.055 
 KELM 0.920 0.88 0.032 0.91 0.841 0.054  KELM 0.888 0.871 0.028 0.870 0.844 0.053 
Station 3         
(I) GPR 0.877 0.837 0.043 0.847 0.808 0.064         
 KELM 0.890 0.850 0.042 0.861 0.820 0.062         
(II) GPR 0.906 0.867 0.036 0.867 0.837 0.054         
 KELM 0.921 0.881 0.035 0.882 0.850 0.052         
(III) GPR 0.926 0.887 0.029 0.887 0.847 0.050         
 KELM 0.943 0.905 0.028 0.905 0.862 0.048         
State 2 
Station 2-1 Station 3-2-1 
P(I) GPR 0.857 0.768 0.039 0.827 0.719 0.097 S(I) GPR 0.857 0.798 0.035 0.847 0.758 0.068 
 KELM 0.872 0.780 0.038 0.841 0.733 0.094  KELM 0.871 0.811 0.034 0.862 0.771 0.066 
P(II) GPR 0.857 0.798 0.039 0.818 0.729 0.095 S(II) GPR 0.867 0.818 0.033 0.847 0.788 0.064 
 KELM 0.873 0.810 0.038 0.832 0.742 0.092  KELM 0.880 0.833 0.032 0.861 0.801 0.062 
P(III) GPR 0.877 0.827 0.033 0.837 0.739 0.093 S(III) GPR 0.877 0.837 0.034 0.857 0.827 0.057 
 KELM 0.890 0.840 0.032 0.854 0.751 0.089  KELM 0.891 0.850 0.033 0.870 0.831 0.055 
        S(IV) GPR 0.887 0.867 0.032 0.877 0.835 0.054 
         KELM 0.914 0.881 0.031 0.891 0.843 0.052 

Note: In state 2, the 3-2-1 means that the SSD values of station 3 are predicted based on the station 1 and 2’ data. Bold values indicate superior models.

Figure 5

(a) Comparison of the observed and predicted SSD for superior KELM model and (b) relative significance of each of the input parameters of the best models.

Figure 5

(a) Comparison of the observed and predicted SSD for superior KELM model and (b) relative significance of each of the input parameters of the best models.

Close modal

In the next step, for evaluating the impact of the independent parameters, sensitivity analysis was performed for the best model of each state. In this regard, the input variables of the best models were omitted one by one and the KELM model, which had higher efficiency compared with the GPR model, was rerun. Based on the results from Figure 5(b), it could be induced that in the state 1 parameter Qs(t-1) and in state 2 the parameter Qs1(t) for station 2 and the parameter Qs1, 2(t-1) for station 3 had the most significant impact on the SSD prediction.

Modeling based on pre-processing data

In this section, the effect of time series pre-processing on increasing the models' accuracy was investigated. Therefore, the time series were decomposed via the WT and EEMD methods. In time series decomposition by WT, a mother wavelet that is more similar to the signal should be selected. In this study, the Daubechies (db2, db4) and Symlets (sym2, sym4) mother wavelets were trained for decomposition of Qs(t-1) in station 1. According to Figure 6(a), it was found that the db4 mother wavelet led to better outcomes. Therefore, the db4 mother wavelet was used for time series decomposition. Also, in the second step, data was decomposed via EEMD. The principle of EEMD is decomposition of the signal to different IMFs and one residual signal. The sum of these signals will be the same original signal. The formation of IMFs is based on subtracting the basic function from the original signal. This process continues until the residual signal remains almost constant. In this study, time series were decomposed into 10 IMFs and one residual signal. In this study, details 1 and 2 sub-series were decomposed into nine IMFs and one residual signal. Further decomposition using EEMD leads to sub-series with more stationarity and with less noise.

Figure 6

(a) The RMSE values for decomposed Qs(t-1) using different mother wavelets for station 1, (b) decomposed time series using the db4-EEMD (6 sub-series) for station 1.

Figure 6

(a) The RMSE values for decomposed Qs(t-1) using different mother wavelets for station 1, (b) decomposed time series using the db4-EEMD (6 sub-series) for station 1.

Close modal

Since the number of input data increased after time series decomposition, the energy values of sub-series were calculated and the sub-series with higher energy were selected as inputs. Then, the selected sub-series were used as inputs in kernel based methods to predict the SSD. In Figure 6(b), the Qs(t-1) sub-series decomposed by db4-EEMD is shown for station 1. According to this figure, the IMFs 2 and 3 from detail 1 and the IMFs 4 and 5 from detail 2 were selected as appropriate IMFs based on their higher energy values in comparison with the other IMFs.

Then, the selected sub-series were used as inputs in the KELM and GPR models to predict the SSD. The results of the integrated pre-processing models are listed in Table 4. According to the results presented in Tables 3 and 4, it could be induced that data pre-processing significantly improved the results' accuracy and integrated models were more accurate than single methods. In fact, the use of WT and further decomposition of the detailed series led to an improvement in the outcomes. For example, for station 3 in state 1, the RMSE values of the KELM and GPR models for testing sets of the model (III) were 0.048 and 0.05, respectively; the values of RMSE for the WT-EEMD-KELM and WT-EEMD-GPR models decreased to 0.028 and 0.033, respectively. In general, the data pre-processing increased the models' accuracy by between 10 to 15% in calibration sets and 9 to 20% in verification sets. Hazarika et al. (2020) used machine learning methods with wavelet conjunction and indicated that the wavelet transformations model helps to analyze the time and frequency information to estimate sediment load by decomposing data over several phases. Their study showed that the hybrid models based on the Coiflet wavelet offered good performance. Also, Nourani et al. (2019) used the hybrid Wavelet-M5 model to model suspended sediment discharge of two different rivers. The obtained results showed the better performance of Wavelet-M5 model in comparison with individual ANN and M5 models. In this study, from the results, it could be seen that the integrated methods resulted in desirable accuracy in both states and the use of previous stations' data could be utilized as a reliable scenario to predict the SSD values for the stations with a lack of observational data.

Table 4

Statistical parameters of the WT-EEMD-kernel based models for the SSD modeling; State 2 (with data processing). Bold values indicate superior models

ModelMethodPerformance criteria
Verification
Testing
Verification
Testing
RDCRMSERDCRMSERDCRMSERDCRMSE
State 1 
Station 1 Station 2 
(I) GPR 0.950 0.930 0.036 0.930 0.890 0.042 (I) GPR 0.920 0.890 0.034 0.900 0.870 0.040 
 KELM 0.974 0.953 0.034 0.953 0.912 0.040  KELM 0.934 0.912 0.032 0.923 0.883 0.038 
(II) GPR 0.960 0.940 0.030 0.940 0.910 0.040 (II) GPR 0.930 0.920 0.024 0.920 0.900 0.031 
 KELM 0.984 0.964 0.029 0.964 0.933 0.038  KELM 0.944 0.943 0.023 0.943 0.914 0.029 
(III) GPR 0.980 0.960 0.024 0.950 0.940 0.037 (III) GPR 0.968 0.960 0.021 0.950 0.940 0.024 
 KELM 0.985 0.984 0.023 0.974 0.964 0.035  KELM 0.988 0.984 0.020 0.974 0.954 0.023 
Station 3         
(I) GPR 0.950 0.930 0.031 0.910 0.910 0.046         
 KELM 0.974 0.953 0.029 0.933 0.933 0.044         
(II) GPR 0.980 0.950 0.028 0.950 0.930 0.038         
 KELM 0.988 0.974 0.027 0.974 0.953 0.036         
(III) GPR 0.990 0.970 0.021 0.974 0.960 0.033         
 KELM 0.998 0.987 0.019 0.982 0.981 0.028         
State 2 
Station 2-1 Station 3-2-1 
P(I) GPR 0.930 0.890 0.045 0.890 0.810 0.067 S(I) GPR 0.930 0.890 0.025 0.910 0.850 0.047 
 KELM 0.953 0.912 0.027 0.912 0.830 0.064  KELM 0.953 0.912 0.024 0.933 0.871 0.045 
P(II) GPR 0.930 0.920 0.030 0.880 0.810 0.066 S(II) GPR 0.940 0.920 0.024 0.910 0.870 0.046 
 KELM 0.944 0.943 0.029 0.902 0.830 0.063  KELM 0.964 0.943 0.023 0.942 0.892 0.044 
P(III) GPR 0.980 0.930 0.024 0.950 0.840 0.060 S(III) GPR 0.980 0.940 0.025 0.950 0.920 0.037 
 KELM 0.988 0.953 0.023 0.974 0.861 0.051  KELM 0.995 0.964 0.024 0.974 0.934 0.035 
        S(IV) GPR 0.970 0.960 0.022 0.93 0.944 0.028 
         KELM 0.985 0.984 0.021 0.944 0.968 0.027 
ModelMethodPerformance criteria
Verification
Testing
Verification
Testing
RDCRMSERDCRMSERDCRMSERDCRMSE
State 1 
Station 1 Station 2 
(I) GPR 0.950 0.930 0.036 0.930 0.890 0.042 (I) GPR 0.920 0.890 0.034 0.900 0.870 0.040 
 KELM 0.974 0.953 0.034 0.953 0.912 0.040  KELM 0.934 0.912 0.032 0.923 0.883 0.038 
(II) GPR 0.960 0.940 0.030 0.940 0.910 0.040 (II) GPR 0.930 0.920 0.024 0.920 0.900 0.031 
 KELM 0.984 0.964 0.029 0.964 0.933 0.038  KELM 0.944 0.943 0.023 0.943 0.914 0.029 
(III) GPR 0.980 0.960 0.024 0.950 0.940 0.037 (III) GPR 0.968 0.960 0.021 0.950 0.940 0.024 
 KELM 0.985 0.984 0.023 0.974 0.964 0.035  KELM 0.988 0.984 0.020 0.974 0.954 0.023 
Station 3         
(I) GPR 0.950 0.930 0.031 0.910 0.910 0.046         
 KELM 0.974 0.953 0.029 0.933 0.933 0.044         
(II) GPR 0.980 0.950 0.028 0.950 0.930 0.038         
 KELM 0.988 0.974 0.027 0.974 0.953 0.036         
(III) GPR 0.990 0.970 0.021 0.974 0.960 0.033         
 KELM 0.998 0.987 0.019 0.982 0.981 0.028         
State 2 
Station 2-1 Station 3-2-1 
P(I) GPR 0.930 0.890 0.045 0.890 0.810 0.067 S(I) GPR 0.930 0.890 0.025 0.910 0.850 0.047 
 KELM 0.953 0.912 0.027 0.912 0.830 0.064  KELM 0.953 0.912 0.024 0.933 0.871 0.045 
P(II) GPR 0.930 0.920 0.030 0.880 0.810 0.066 S(II) GPR 0.940 0.920 0.024 0.910 0.870 0.046 
 KELM 0.944 0.943 0.029 0.902 0.830 0.063  KELM 0.964 0.943 0.023 0.942 0.892 0.044 
P(III) GPR 0.980 0.930 0.024 0.950 0.840 0.060 S(III) GPR 0.980 0.940 0.025 0.950 0.920 0.037 
 KELM 0.988 0.953 0.023 0.974 0.861 0.051  KELM 0.995 0.964 0.024 0.974 0.934 0.035 
        S(IV) GPR 0.970 0.960 0.022 0.93 0.944 0.028 
         KELM 0.985 0.984 0.021 0.944 0.968 0.027 

In the next part, an attempt was made to combine the results of pre-processing-kernel based methods and evaluate the simultaneous impact of pre-post-processing on the accuracy of the outputs. The obtained results are listed in Table 5 and shown in Figure 7. According to the results, it could be indicated that the combined methods yielded more accurate results regarding the single and pre-processing-kernel based approaches. It could be seen that both simple and nonlinear ensemble methods caused an increment in the applied models' efficiency, respectively by about 5% to 10% for verification sets and 5% to 15% for testing sets in comparison with pre-processing models. Based on the results listed in Tables 4 and 5, it seems that in enhancing the predictions' accuracy, the pre-processing methods performed more successfully than the post-processing methods. It should be noted that determining the extreme values of the SSD is the most important issue in river engineering. Therefore, in the use of different models, their performance in estimating minimum and maximum values of time series should be taken into account. According to Figure 7, it can be seen that the extreme values in the SSD series calculated more correctly by the pre-post-processing methods. The results showed that in the SSD series prediction, the nonlinear ensemble method performed better than the linear method. This issue stated that the efficiency of the linear method relates to the performance of individual models. Therefore, for artificial intelligence models with weak performance, the ensemble model results would not be desirable. However, using a nonlinear pattern for simulation would lead to more appropriate results.

The RMSE error criterion was used to graphically compare the performance of the single and integrated methods. The results are shown in Figure 8. As can be seen, in the SSD modeling process, the RMSE values were smaller for the integrated methods and the WT-EEMD- NKELME model led to more accurate results.

Table 5

Statistical parameters of the integrated pre-post-processing-kernel based approaches for selected stations

ModelMethodPerformance criteria
Method
Testing
Verification
Testing
RDCRMSERDCRMSERDCRMSERDCRMSE
State 1 
Station 1 Station 2 
(III) WT-EEMD-SLAM 0.994 0.989 0.021 0.979 0.981 0.033 (III) WT-EEMD-SLAM 0.989 0.986 0.018 0.979 0.961 0.021 
 WT-EEMD- NKELME 0.998 0.991 0.019 0.98 0.984 0.025  WT-EEMD- NKELME 0.998 0.994 0.017 0.983 0.974 0.019 
Station 3         
(III) WT-EEMD-SLAM 0.994 0.990 0.018 0.979 0.988 0.025         
 WT-EEMD- NKELME 0.998 0.994 0.016 0.98 0.992 0.023         
State 2 
Station 2-1 Station 3-2-1 
P(III) WT-EEMD-SLAM 0.989 0.955 0.021 0.989 0.863 0.043 S(IV) WT-EEMD-SLAM 0.989 0.986 0.019 0.989 0.970 0.025 
 WT-EEMD- NKELME 0.990 0.963 0.019 0.990 0.870 0.040  WT-EEMD- NKELME 0.992 0.994 0.018 0.990 0.978 0.023 
ModelMethodPerformance criteria
Method
Testing
Verification
Testing
RDCRMSERDCRMSERDCRMSERDCRMSE
State 1 
Station 1 Station 2 
(III) WT-EEMD-SLAM 0.994 0.989 0.021 0.979 0.981 0.033 (III) WT-EEMD-SLAM 0.989 0.986 0.018 0.979 0.961 0.021 
 WT-EEMD- NKELME 0.998 0.991 0.019 0.98 0.984 0.025  WT-EEMD- NKELME 0.998 0.994 0.017 0.983 0.974 0.019 
Station 3         
(III) WT-EEMD-SLAM 0.994 0.990 0.018 0.979 0.988 0.025         
 WT-EEMD- NKELME 0.998 0.994 0.016 0.98 0.992 0.023         
State 2 
Station 2-1 Station 3-2-1 
P(III) WT-EEMD-SLAM 0.989 0.955 0.021 0.989 0.863 0.043 S(IV) WT-EEMD-SLAM 0.989 0.986 0.019 0.989 0.970 0.025 
 WT-EEMD- NKELME 0.990 0.963 0.019 0.990 0.870 0.040  WT-EEMD- NKELME 0.992 0.994 0.018 0.990 0.978 0.023 
Figure 7

Comparison of the observed and predicted SSD for superior WT-EEMD- NKELME model.

Figure 7

Comparison of the observed and predicted SSD for superior WT-EEMD- NKELME model.

Close modal
Figure 8

Comparison of the values of the RMSE criterion for the superior models of the methods used in the SSD modeling.

Figure 8

Comparison of the values of the RMSE criterion for the superior models of the methods used in the SSD modeling.

Close modal

The accurate prediction of the SSD values of rivers is an important factor in improving water management. This study assessed the capability of time series pre-processing methods for SSD modeling. In this regard, in the first step, the time series without any processing were imposed to single kernel based methods. Then, time series were decomposed to several sub-series using WT and further decomposition was performed via EEMD. Finally, the obtained results from pre-processing methods were combined via two linear and nonlinear post-processing methods in order to enhance the forecasting efficiency. According to the results, it was found that using integrated methods increased the models' accuracy. The applied pre-processing method enhanced the KELM and GPR models' performance approximately between 10 and 20%. It showed that the SSD modeling based on each station's own data led to more desirable results. In this state, the model with inputs CS(t-1), CS(t-2), Qs(t-1), and Wd(t-1) was the superior model. However, using the integrated approaches, the previous station's data can be used when the stations' own data were unavailable. The obtained results revealed that the pre-processing methods performed more successfully than the SLAM and NKELME post-processing methods. The simple and nonlinear averaging techniques improved the efficiency of the applied models by about 5% to 15% compared with pre-processing models and the performance of the NKELME was higher than the SLAM. Also, it was found that the maximum and minimum values of the SSD variables were well predicted using the integrated models. Therefore, the integration of hybrid pre-post-processing techniques could be useful for the daily SSD modeling.

All relevant data are available from an online repository or repositories at https://waterdata.usgs.gov/nwis/sw.

Aussem
A.
,
Campbell
J.
&
Murtagh
F.
1998
Wavelet-based feature extraction and decomposition strategies for financial forecasting
.
Journal of Computational Finance
6
(
2
),
5
12
.
Azamathulla
H. M.
,
Haghiabi
A. H.
&
Parsaie
A.
2017
Prediction of side weir discharge coefficient by support vector machine technique
.
Water Supply
16
(
4
),
1002
1016
.
Babovic
V.
2009
Introducing knowledge into learning based on genetic programming
.
Journal of Hydroinformatics
11
(
3–4
),
181
193
.
Bhattacharya
B.
,
Price
R. K.
&
Solomatine
D. P.
2004
A data mining approach modeling sediment transport
. In:
6th International Conference on Hydroinformatics
, pp.
1663
1670
.
Chen
X. Y.
&
Chau
K. W.
2016
A hybrid double feedforward neural network for suspended sediment load estimation
.
Water Resources Managament
30
,
2179
2194
.
Choubin
B.
,
Darabi
H.
,
Rahmati
O.
,
Sajedi-Hosseini
F.
&
Kløve
B.
2018
River suspended sediment modelling using the CART model: a comparative study of machine learning techniques
.
Science of the Total Environment
615
,
272
281
.
Cloke
H. L.
&
Pappenberger
F.
2009
Ensemble flood forecasting: a review
.
Hydrology
375
(
3–4
),
613
626
.
Huang
N. E.
,
Shen
Z.
,
Long
S. R.
,
Wu
M. C.
,
Shih
H. H.
,
Zheng
Q.
,
Yen
N. C.
,
Tung
C. C.
&
Liu
H. H.
1998
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London
.
Series A: Mathematical, Physical and Engineering Sciences
454
,
903
995
.
Huang
G. B.
,
Zhu
Q. Y.
&
Siew
C. K.
2006
Extreme learning machine: theory and applications
.
Neurocomputing
70
(
1–3
),
489
501
.
Huang
G. B.
,
Zhou
H.
,
Ding
X.
&
Zhang
R.
2012
Extreme learning machine for regression and multiclass classification
.
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
42
(
2
),
513
529
.
Kisi
O.
,
Dailr
A. H.
,
Cimen
M.
&
Shiri
J.
2012
Suspended sediment modeling using genetic programming and soft computing techniques
.
Journal of Hydrology
450
,
48
58
.
Labate
D.
,
La Foresta
F.
,
Occhiuto
G.
,
Morabito
F. C.
,
Lay-Ekuakille
A.
&
Vergallo
P.
2013
Empirical mode decomposition vs. wavelet decomposition for the extraction of respiratory signal from single-channel ECG: a comparison
.
IEEE Sensors Journal
13
(
7
),
2666
2674
.
Lei
Y.
,
He
Z.
&
Zi
Y.
2009
Application of the EEMD method to rotor fault diagnosis of rotating machinery
.
Mechanical Systems and Signal Processing
23
(
4
),
1327
1338
.
Makridakis
S.
&
Winkler
R. L.
1983
Average of forecasts: some empirical results
.
Management Science
29
(
9
),
987
996
.
Morianou
G. G.
,
Kourgialas
N. N.
,
Karatzas
G. P.
&
Nikolaidis
N. P.
2017
River flow and sediment transport simulation based on a curvilinear and rectilinear grid modelling approach–a comparison study
.
Water Supply
17
(
5
),
1325
1334
.
Nourani
V.
,
Molajou
A.
,
Tajbakhsh
A. D.
&
Najafi
H.
2019
A wavelet based data mining technique for suspended sediment load modeling
.
Water Resources Management
33
(
5
),
1769
1784
.
Pachori
R. B.
,
Avinash
P.
,
Shashank
K.
,
Sharma
R.
&
Acharya
U. R.
2015
Application of empirical mode decomposition for analysis of normal and diabetic RR-interval signals
.
Expert Systems with Applications
42
(
9
),
4567
4581
.
Partalas
I.
,
Tsoumakas
G.
,
Hatzikos
E. V.
&
Vlahavas
I.
2008
Greedy regression ensemble selection: theory and an application to water quality prediction
.
Information Sciences
178
(
20
),
3867
3879
.
Rajaee
T.
,
Mirbagheri
S. A.
,
Zounemat-Kermani
M.
&
Nourani
V.
2009
Daily suspended sediment concentration simulation using ANN and neuro-fuzzy models
.
Science of the Total Environment
407
,
4916
4927
.
Rasmussen
C. E.
&
Nickisch
H.
2010
Gaussian processes for machine learning (GPML) toolbox
.
Journal of Machine Learning Research
11
,
3011
3015
.
Roushangar
K.
,
Matin
G. N.
,
Ghasempour
R.
&
Saghebian
S. M.
2019
Evaluation of the effective parameters on energy losses of rectangular and circular culverts via kernel-based approaches
.
Journal of Hydroinformatics
21
(
6
),
1014
1029
.
Saghebian
S. M.
,
Roushangar
K.
,
Ozgur Kirca
V. S.
&
Ghasempour
R.
2020
Modeling total resistance and form resistance of movable bed channels via experimental data and a kernel-based approach
.
Journal of Hydroinformatics
22
(
3
),
528
540
.
Salih
S. Q.
,
Sharafati
A.
,
Khosravi
K.
,
Faris
H.
,
Kisi
O.
,
Tao
H.
,
Ali
M.
&
Yaseen
Z. M.
2020
River suspended sediment load prediction based on river discharge information: application of newly developed data mining models
.
Hydrological Sciences Journal
65
(
4
),
624
637
.
Vongvisessomjai
N.
,
Tingsanchali
T.
&
Babel
M. S.
2010
Non-deposition design criteria for sewers with part-full flow
.
Urban Water Journal
7
(
1
),
61
77
.
Wu
Z. H.
&
Huang
N. E.
2009
Ensemble empirical mode decomposition: a noise assisted data analysis method
.
Advances in Adaptive Data Analysis
1
,
1
41
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).