## Abstract

Reasonable prediction of suspended sediment concentration is of great significance for river regulation and water conservancy project scheduling in upper and middle reaches. In order to solve the influence of nonlinear and non-stationary characteristics of sediment sequences on the prediction results and improve the prediction accuracy, a prediction model of sediment based on CEEMDAN-GRU was constructed. The monthly suspended sediment concentration data of Huayuankou hydrological station from 1960 to 2014 were ‘decomposed-predicted-reconstructed’ and compared with single GRU, SVM and LSTM model. The results reveal that the CEEMDAN-GRU coupling model has provided a superior alternative to the single model and its determination coefficients (DC) of the training set and testing set are greater than 0.71, qualified rate (QR) reaches up to 81%, average absolute error (MAE), mean absolute percentage error (MAPE) and mean square error (MSE) are 1.2841, 0.9675 and 4.3560 respectively. It is proved that the CEEMDAN-GRU model has a better performance and can be used in the mid- and long-term prediction of non-linear and non-stationary suspended sediment concentration series.

## HIGHLIGHTS

The study used 660 months suspended sediment concentration data, including flood period and dry period.

Proposed a suspended sediment concentration forecast model using CEEMDAN-GRU hybrid approach.

One hydrological station is used to build the CEEMDAN-GRU model, the other two stations are used to verify the stability and representativeness of the model.

The CEEMDAN-GRU model outperforms the GRU, SVM, and LSTM models.

## INTRODUCTION

The Yellow River is the river with the greatest sediment concentration in the world. The sediment problem has always been an important issue in the management and development of the Yellow River (Wang *et al.* 2016). Reasonable and accurate prediction of the suspended sediment concentration of the Yellow River is of great significance to the regulation of the lower reaches and the regulation of water conservancy projects in the upper and middle reaches (Xia *et al.* 2016). At present, the methods for predicting the suspended sediment concentration can be divided into two categories: the hydrodynamic method (Wang *et al.* 2019) and the simplified hydrologic method (Jin 2007). However, the traditional method requires more input data, takes a long time to calculate, and cannot fully describe the complex nonlinear characteristics of nature. In addition, there are many factors affecting the suspended sediment concentration. There is still a lack of mature and accurate models and methods. Therefore, the introduction of an artificial intelligence method with strong self-adaptability and high fault tolerance for hydrological forecast (Moosavi *et al.* 2013; Humphrey *et al.* 2016; Sharifi *et al.* 2018) has become a new research topic for scholars. Lafdani *et al.* (2013) studied the daily suspended sediment content of the Doiraj River in Iran based on the ANN-SVM model. Olyaie *et al.* (2015) compared various artificial intelligence approaches' performance for estimating the suspended sediment load of river systems. Khan *et al.* (2018) used an ANN neural network to simulate and predict the suspended sediment of the Ramganga River in the Ganga Basin of India. However, the presupposition of these models is based on the assumption of a data steady state, which is not consistent with the characteristics of non-stationary and nonlinear suspended sediment concentration (Karthikeyan & Kumar 2013), so the prediction effect of these models is limited. A decomposition method, empirical mode decomposition (EMD), proposed by Huang *et al.* (1998), has been widely used in hydrology (Huang *et al.* 2009; Shabri & Samsudin 2014; Sun *et al.* 2016). However, in the process of signal processing, there is always frequent modal aliasing, which destroys the physical significance of intrinsic mode functions (IMFs). To overcome this problem, ensemble empirical mode decomposition (EEMD) and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) are proposed by Wu & Huang (2009) and Maŕıa *et al.* (2011). Compared with EEMD, CEEMDAN can effectively reduce the number of iterations, increase the resolution accuracy, and is more suitable for nonlinear signal analysis. Gated recurrent unit (GRU) (Cho *et al.* 2014) is one of the most effective variants of the current recurrent neural network (RNN). It can accurately extract data structure features, with fast learning speed, few iterations, good prediction performance, and is more suitable for nonlinear signal analysis. This research constructs a ‘decomposition-reconstruction-prediction’ model based on CEEMDAN-GRU. Taking the suspended sediment concentration data of Huayuankou hydrological station as an example, the law of water and sediment change in the lower Yellow River is analyzed and predicted, which provides theoretical support for sediment control and water conservancy project operation and a new means of hydrological prediction.

## METHODS

### CEEMDAN

The method of CEEMDAN is an enhancement of EMD (Huang *et al.* 1998; Huang & Wu 2008) and EEMD (Wu & Huang 2009). The white noise added by EEMD is not completely cancelled out after multiple averaging. Although the error can be reduced by increasing the average number of times, the operation time of the algorithm will inevitably increase. The CEEMDAN method reduces the reconstruction error of the signal under the premise of reducing the average number of times by adding adaptive white noise with a limited number of times in each process.

The component of natural mode function obtained by decomposition is recorded as *IMF _{n}* in CEEMDAN. The operator is defined as the c-th natural mode component generated by the EMD algorithm, is the white noise signal satisfying the distribution of N(0,1), and is the original signal. The specific steps of the CEEMDAN method are as follows.

### GRU

### CEEMDAN-GRU

In order to improve the prediction accuracy for suspended sediment concentration, the CEEMDAN-GRU model is proposed in this study, which is a process of ‘decomposition-prediction-reconstruction’. CEEMDAN is used to decompose the suspended sediment concentration data into a set of subsequences with a finite number of frequencies from high to low. The subsequences include local characteristic data and residual terms of different time scales of the original data, which are relatively linear steady state. Then, GRU is used to predict the subsequences of different frequencies. The model structure is shown in Figure 2.

*n*is the total number of predicted samples, and

*m*is the total number of qualified predicted samples.

## CASE STUDY

### Data sources

Huayuankou hydrological station, located in the north of Zhengzhou City, Henan Province, was established in July 1938, with a catchment area of 730,000 km^{2}, accounting for 97% of the total area of the Yellow River basin. It is not only the most important water and sand control station in the Yellow River, but also the standard station for flood control and water resources regulation in the lower reaches of the Yellow River. It undertakes the important tasks of water resources development and utilization, river management, hydrological information and forecast.

Huayuankou hydrological station has a large number of timely and accurate hydrological observation data with good data integrity. In this study, 660 months of suspended sediment concentration from January 1960 to December 2014 in Huayuankou station are taken as the research data. The time series data is divided into a training set (1960/01–2003/12) and a testing set (2004/01–2014/12) (Figure 3).

### Multiscale decomposition of original data

The monthly suspended sediment concentration data of Huayuankou hydrological station from January 1960 to December 2014 is decomposed by CEEMDAN. The decomposed results include seven IMF components and one residual, as shown in Figure 4.

As can be seen from Figure 4, the monthly suspended sediment concentration time series data is decomposed into seven IMF components and one residual term with different time scales, from high to low frequency, from large to small amplitude. Among them, the frequency of IMF1 is the highest, the fluctuation is the largest, and the wavelength is the shortest. The frequency of IMF2 ∼ IMF7 is gradually reduced, the fluctuation is gradually weakened, the wavelength is gradually increased, and the periodicity is increasingly strong. The residual represents the change trend of monthly suspended sediment concentration with time at Huayuankou hydrological station from 1960 to 2014. As can be seen from Figure 4, the monthly suspended sediment concentration monitored by Huayuankou hydrological station showed a declining trend year by year.

### Normalization

### Parameter selection of CEEMDAN-GRU model

The GRU model is used to train the normalized training set data. It is considered that the monthly suspended sediment concentration at time t is related to the monthly suspended sediment concentration in the first five months, and the time step is 5. After repeated tests and adjustments, the excitation function is selected as Relu, the units of the model is selected as 50, the loss function is selected as Mean Absolute Error, and the model optimizer is selected as Adam.

### Prediction results of CEEMDAN-GRU model

Table 1 suggests the prediction data of monthly suspended sediment concentration of each component and the whole from the CEEMDAN-GRU model. The prediction effect of each IMF component obtained from the prediction model on the test set is shown in Figure 6.

. | Training set . | Testing set . | ||
---|---|---|---|---|

. | DC . | QR/% . | DC . | QR/% . |

IMF1 | 0.69 | 29 | 0.69 | 80 |

IMF2 | 0.99 | 73 | 0.98 | 91 |

IMF3 | 0.99 | 90 | 0.99 | 94 |

IMF4 | 0.99 | 99 | 0.99 | 99 |

IMF5 | 0.99 | 99 | 0.99 | 99 |

IMF6 | 0.99 | 99 | 0.99 | 99 |

IMF7 | 0.99 | 99 | 0.99 | 99 |

Residual | 0.99 | 100 | 0.97 | 97 |

Overall | 0.82 | 83 | 0.71 | 80 |

. | Training set . | Testing set . | ||
---|---|---|---|---|

. | DC . | QR/% . | DC . | QR/% . |

IMF1 | 0.69 | 29 | 0.69 | 80 |

IMF2 | 0.99 | 73 | 0.98 | 91 |

IMF3 | 0.99 | 90 | 0.99 | 94 |

IMF4 | 0.99 | 99 | 0.99 | 99 |

IMF5 | 0.99 | 99 | 0.99 | 99 |

IMF6 | 0.99 | 99 | 0.99 | 99 |

IMF7 | 0.99 | 99 | 0.99 | 99 |

Residual | 0.99 | 100 | 0.97 | 97 |

Overall | 0.82 | 83 | 0.71 | 80 |

It can be seen from Table 1 and Figure 5 that the prediction effect of the model is average in the IMF1 component. It may be that the highest frequency of IMF1 component still retains the nonlinear and unsteady characteristics of some original data, which makes it difficult to train the GRU model and the prediction effect is general. The prediction results of the remaining components are better. For the observed value and predicted value of the original data testing set, DC is 0.71, QR reaches up to 80%.

### Comparison of prediction results

The errors of the CEEMDAN-GRU model are compared with those of GRU, SVM and LSTM models, and the results are shown in Table 2 and Figure 6. It suggests that the predictions of the CEEMDAN-GRU model are in line with the expectation, and the cycle and trend are highly consistent with the series of measured data. It can be seen from Table 2 that the errors of the prediction results of the CEEMDAN-GRU model are smaller than those of the single prediction model. In which, MAE is 35.06% less than SVM, 59.81% less than LSTM and 48.22% less than GRU. MAPE is 26.85% less than SVM, 64.71% less than LSTM, and 50.92% less than GRU, which means the coupling model has the highest prediction accuracy on the stationary part. MSE is 71.88% less than SVM, 85.33% less than LSTM and 84.18% less than GRU, which means the coupling model has the best performance on the extreme part of the sequence. Compared with the single prediction model, the CEEMDAN-GRU model improves the prediction accuracy of monthly suspended sediment concentration data to a large extent, and the data obtained by the CEEMDAN method is better and more suitable for model training. In addition, compared with the study of Li *et al.* (2005) (DC = −0.61), the data series selected in this research is longer and more comprehensive, to reflect the characteristics of water and sediment change in the lower Yellow River. The DC of CEEMDAN-GRU is 0.71, which is closer to 1, indicating that the prediction result is better and more reliable.

Model . | MAE . | MAPE . | MSE . |
---|---|---|---|

SVM | 1.9728 | 1.3227 | 15.4887 |

LSTM | 3.1880 | 2.8336 | 29.6938 |

GRU | 2.4741 | 1.9712 | 27.5356 |

CEEMDAN-GRU | 1.2811 | 0.9675 | 4.3560 |

Model . | MAE . | MAPE . | MSE . |
---|---|---|---|

SVM | 1.9728 | 1.3227 | 15.4887 |

LSTM | 3.1880 | 2.8336 | 29.6938 |

GRU | 2.4741 | 1.9712 | 27.5356 |

CEEMDAN-GRU | 1.2811 | 0.9675 | 4.3560 |

To further illustrate the stability and representativeness of the CEEMDAN-GRU model, the Gaocun and Lijin stations were used to verify the model. MAE, MAPE and MSE were respectively 1.2348, 0.5147 and 2.8697 in Gaocun station and 1.2438, 0.8196 and 2.7725 in Lijin station. The results showed that the CEEMDAN-GRU model has the best performance in both Gaocun and Lijin stations. The prediction results of these two stations has further verified the good stability and high consistency of the CEEMDAN-GRU model.

## DISCUSSION

According to the residual component decomposed by CEEMDAN, it can be found that the suspended sediment concentration of Huayuankou hydrological station generally showed a downward trend, and the rate of decline experienced a process that was slow at first and then accelerated, especially after the 1980s. On the one hand, with the implementation of soil and water conservation policies and ecological restoration projects, the problem of soil and water loss has been effectively controlled (Wang *et al.* 2005). On the other hand, the completion and operation of Xiaolangdi reservoir has greatly improved the water and sediment conditions in the lower Yellow River, and the effect of the water and sediment regulation project is significant (Zhang *et al.* 2016).

The relative error of the model prediction results was large in September every year. The deviation between predicted value and measured value was high, and the relative error of the individual predicted value was more than 13%. Due to the heavy rainfall in summer, the amount of sediment entering the river is large, and the peak of river suspended sediment concentration and the amplitude of variation is increased. Therefore, the difficulty of model training is increased, leading to the decrease of prediction reliability. The relative error of the model was large around February in some years. On the one hand, this is because January and February of each year are the dry season of the Yellow River, and the suspended sediment concentration is lower than other months. On the other hand, the suspended sediment concentration in the dry period is related to the change of climate factors, and the annual change trend is relatively slow, which makes it difficult for the model to predict accurately.

The non-stationary and nonlinear characteristics of the sand-containing data will affect the training process of the single prediction model, resulting in the lower accuracy of the prediction results. The CEEMDAN method effectively reduces the number of iterations, increases the reconstruction accuracy, and is more suitable for the analysis of nonlinear signals. However, it can't be denied that it is still unable to completely record the complex characteristics of the original data, which may cause the loss of some data features in the process of signal decomposition. Using GRU as a prediction model can effectively solve the problem of long-distance dependence of monthly suspended sediment concentration data. The combined application of these two methods to the prediction of suspended sediment concentration has obvious advantages over the traditional single prediction model. In addition, due to the limitation of the GRU artificial neural network, the influence of the river bed boundary on the sediment concentration process is not considered, which makes the model unable to predict accurately. Therefore, further research is needed.

## CONCLUSION

In this research, a coupling model based on CEEMDAN-GRU was built to conduct training and prediction on the monthly suspended sediment concentration data of Huayuankou hydrological station from 1960 to 2014. The results indicated that the monthly suspended sediment concentration in Huayuankou hydrographic station was decreasing year by year, and the period and trend were consistent with the measured data. The determination coefficients-DC is 0.71, and qualified rate-QR reaches up to 80%, which indicates the prediction results are reliable.

The CEEMDAN-GRU model has the best performance compared with the GRU, SVM and LSTM single models, gaining the best MAE, MAPE and MSE. MAE is 1.2811, which means the difference between the predicted value and observed value is small. MAPE is 0.9675 and MSE is 4.3560, which means the coupling model has the best performance for the stationary part and extreme part respectively. CEEMDAN has obvious advantages in dealing with nonlinear and non-stationary time series data. Coupling the CEEMDAN method with the GRU method is feasible for mid- and long-term prediction of suspended sediment concentration.

The forecast effect of the CEEMDAN-GRU model in the flood season of some years was general. The internal relationship between river bed boundary, precipitation, runoff, temperature and other meteorological factors and suspended sediment concentration change is not considered, which needs further study.

## ACKNOWLEDGEMENTS

The authors wish to thank the National Natural Science Foundation of China (51609087, 51709114), National Key Research and Development Plan Project (2017YFC1501201), Key Scientific Research Project of Colleges and Universities in Henan Province (CN) (17A570004), Collaborative Innovation Center of Water Resources Efficient Utilization and Protection Engineering, Henan Province.