Abstract
The strong randomness exhibited by runoff series means the accuracy of flood forecasting still needs to be improved. Mode mixing can be dealt with using complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), and the endpoint effect of CEEMDAN can be successfully dealt with using the mutual information criterion. To increase the computational effectiveness of broad learning (BL), orthogonal triangular matrix decomposition (QR) was used. A novel improved coupled CEEMDAN-QRBL flood forecasting model was created and applied to the prediction of daily runoff in Xiaolangdi Reservoir based on the benefit of quick calculation by the model output layer. The findings indicate that the enhanced QRBL is 28.92% more computationally efficient than the BL model, and that the reconstruction error of CEEMDAN has been decreased by 48.22%. The MAE of the improved CEEMDAN-QRBL model is reduced by 12.36% and 16.31%, and the Ens is improved by 8.81% and 3.96%, respectively, when compared with the EMD-LSTM and CEEMDAN-GRU models. The predicted values of the CEEMDAN-QRBL model have a suitable fluctuation range thanks to the use of nonparametric kernel density estimation (NPKDE), which might serve as a useful benchmark for the distribution of regional water resources.
HIGHLIGHTS
A novel CEEMDAN-QRBL model for flood forecasting was constructed.
Orthogonal triangular matrix decomposition was used to improve broad learning to enhance its computational efficiency.
The mutual information criterion was used to improve CEEMDAN to suppress the endpoint effect of CEEMDAN.
Nonparametric kernel density estimation was used to analyse the confidence level of flood forecasting.
Graphical Abstract
NOMENCLATURE LIST
- CEEMDAN
Complete ensemble empirical mode decomposition adaptive noise
- BL
Broad learning model
- QR
Orthogonal triangular matrix decomposition
- QRBL
Improved broad learning model using orthogonal triangular matrix decomposition
- EMD
Empirical mode decomposition
- LSTM
Long short-term memory networks
- GRU
Gate recurrent unit
- NPKDE
Nonparametric kernel density estimation
- VMD
Variational mode decomposition
- IMF
Intrinsic mode function
- ANNs
Artificial neural networks
- SVD
Singular value decomposition
- Ens
Nash efficiency index
- R
Pearson correlation coefficient
- MAE
Mean absolute error
- RE
Relative error
- ELM
Extreme learning machine
- CNN-LSTM
Convolutional long short-term memory network
- x(t)
runoff observations series (m3·s−1)
- ε
variance of the noise
- n(t)
Gaussian white noise
- k
number of iterations
average value of IMF1(n)
- Rn
residual signal
- X
original input matrix
- H
output matrix of the enhancement layer
the weight of the output layer
training set
sigmoid activation function
- w
enhancement-layer weight of BL
- b
enhancement-layer bias of BL
- h(x)
row vector of the matrix
- U and V
orthogonal unitary matrix
- Σ
diagonal matrix
- A
objective matrix to be decomposed
orthogonal matrix
upper triangular matrix
- and
a reversible matrix
- θ
confidence level of the flood prediction error
- Ii(W0,Wi)
mutual information
mean
- σ
standard deviation
- g(x)
Gaussian kernel function
- N
sample size
bandwidth coefficient
- e
flood prediction error
- pre
predicted value
- true
flood observation value
- H(Wi)
entropy of
- t
number of iterations of PSO
- r1 and r2
uniformly distributed random numbers
- c1 and c2
acceleration constants of PSO
average observed values
observed values
predicted values
average predicted values
difference between the upper and lower limits of the confidence interval
- fno-interval
flood condition without intervals
flood condition with intervals
interval peak discharge of the reservoir during the period t
- qt
peak discharge of the reservoir during the period t
INTRODUCTION
River runoff, as an important component of the hydrologic cycle, plays an important role in accurate and reliable flood prediction. The strong randomness and uncertainty exhibited by runoff signals make accurate flood prediction a major challenge (Yao et al. 2019). Therefore, developing a more stable, accurate, and reliable flood forecasting model is an urgent task for hydrological researchers.
Existing flood forecasting models can be divided into two categories: mechanism-driven hydrological models and data-driven statistical models (Awol et al. 2021). Hydrological models usually use physical parameters such as weather, runoff and rainfall to construct mathematical models that use hydrological equations to describe the correlation between flood indices (Yue et al. 2020); however, the dynamic and non-smooth nature of hydrological evolution leaves the multi-step prediction accuracy of this model to be improved (Hadid et al. 2020). Distributed hydrological models combine physical factors such as rainfall and subsurface to design reservoir flood forecasting schemes (Li et al. 2020), however, the applicability of distributed hydrological models has been constrained by the construction of a large number of small- and medium-sized reservoirs (Pham et al. 2020). Data-driven models, which are frequently used in hydrologic forecasting, do not require knowledge of the underlying physical processes, in contrast to mechanism-driven models, and have the ability to capture nonlinear correlations between input and output data. For instance, using artificial neural networks to forecast runoff from the Winooski River in western Vermont, USA, has resulted in more precise forecasts (Besaw et al. 2010). The Little River in Georgia and Reynolds Creek in Idaho were used to simulate runoff, and the results demonstrated that the model could more accurately quantify the uncertainty in runoff simulations (Zhang et al. 2009). In comparison with conventional neural networks, the broad learning structure (BL) has greater feature-learning power thanks to its enlarged nodes. Southeast China's precipitation was predicted using a broad learning assimilation model, and the findings indicated that the BL-based framework could produce more precise daily precipitation forecasts (Zhou et al. 2022). While a hybrid model created by breaking down the runoff data into relatively smooth components and combining signal-processing techniques can achieve higher prediction accuracy, a single model's ability to capture nonlinear trends and high-frequency abrupt change points of the runoff signal is frequently constrained. For instance, a multi-scale flood forecasting model was developed using wavelet analysis combined with fuzzy weighted Markov. The results showed that the model exhibited high overall prediction accuracy (J. Zhang et al. 2021). Unlike wavelet decomposition, empirical mode decomposition (EMD) can effectively handle nonlinear and nonsmooth signals (Xie et al. 2019). A recent study applied empirical mode decomposition (EMD) and LSTM to long-term flood forecasting, and the experimental results showed that the EMD-based LSTM model had a better fit (Liu et al. 2020). Using variational mode decomposition (VMD) in combination with LSTM for short-term flood forecasting and evaluation, the results showed that the VMD-LSTM has better prediction accuracy, computational efficiency and stability (Zuo et al. 2020). Using CEEMDAN combined with gate recurrent unit (GRU) for water level prediction application, the results showed that CEEMDAN-GRU has stronger prediction performance (Tao et al. 2021). However, some issues remain in current research methods: in wavelet analysis it is frequently difficult to determine the wavelet basis, and the choice of wavelet basis has a significant impact on the results (Wang et al. 2020). Endpoint effects and mode-mixing issues plague EMD methods (Shan et al. 2022), and VMD requires pre-setting two parameters (Lv et al. 2022), bandwidth limit and number of decompositions, whereas CEEMDAN incorporates white noise. When compared with EMD, the CEEMDAN adds white noise to the signal, which effectively solves the mode-mixing problem; when compared with VMD, it does not require artificial parameter setting and has a strong adaptive capability.
With additional enhancement-layer nodes, broad learning (BL) architecture, a revolutionary non-parametric machine learning technique, offers higher feature-learning capabilities and is less influenced by parameters than conventional neural networks. BL can calculate the results of the output layer directly and does not require repeated iterations, which has apparent efficiency advantages over deep learning models like LSTM and GRU. The paper builds a new and improved CEEMDAN-QRBL flood forecasting model by suppressing the endpoint effect that exists in CEEMDAN, reducing the reconstruction error of CEEMDAN, and improving the BL model using orthogonal triangular decomposition (QR) to speed up the BL solution. These improvements help to further increase the accuracy and operational efficiency of flood forecasting. Because the prediction model has intrinsic uncertainty, the research also employed nonparametric kernel density estimation (NPKDE) to quantify the uncertainty of the prediction model (Li et al. 2017). This can help to improve the flood prediction model's real-time performance and dependability.
RESEARCH METHODS
CEEMDAN
x(t): runoff observations series (m3·s−1),
: variance of the noise,
n(t): Gaussian white noise,
k: number of iterations.
: average value of IMF1(n).
Rn: residual signal.
QRBL
X: original input matrix,
H: output matrix of the enhancement layer,
: the weight of the output layer,
: training set.
: sigmoid activation function,
w: enhancement-layer weight,
b: enhancement-layer bias,
h(x): row vector of the matrix .
[X|H]+: the Moore–Penrose generalised inverse of matrix .
U and V: orthogonal unitary matrix,
Σ: diagonal matrix.
A: objective matrix to be decomposed,
: orthogonal matrix,
: upper triangular matrix.
and : a reversible matrix.
NPKDE and confidence interval
: mean,
σ: standard deviation,
g(x): Gaussian kernel function.
: sample size,
: bandwidth coefficient.
e: flood prediction error,
pre: predicted value,
true: flood observation value.
elow: lower confidence limit,
eup: upper confidence limit,
θ: confidence level of the flood prediction error,
FLOOD PREDICTION AND UNCERTAINTY ANALYSIS
Boundary extension method based on mutual information criterion
- (1)
In Figure 2, we assume that an original signal comprises m maximum points and p minimum points . The first extreme point of the left boundary is taken as an example; then, the wavelet between the left endpoint and the second extreme point is selected as the research object, which is defined as .
- (2)
is taken as the corresponding point to the position of , and the wavelet of the same length as is intercepted to ensure that the timing position of in relative to is consistent with the timing position of relative to in .
- (3)Then, the mutual information value between and is calculated and used as the matching coefficient of each sub-waveform with . The wavelet with the largest matching coefficient is considered as the best matching waveform of , and the data of the length are extended before the wavelet to the left of . The mutual information is defined as (Tang et al. 2013):where
Ii(W0,Wi): mutual information ,
H(Wi): entropy of ,
H(Wi|W0): conditional entropy of when is known.
The stronger the correlation between and , the smaller the conditional entropy and the larger the mutual information .
- (4)
The same principle is used to extend the right boundary of the original signal.
Flood forecast and uncertainty analysis
- (1)
By using the daily runoff data collected from the Xiaolangdi Reservoir in the middle section of the Yellow River Basin from 2002 to 2019 as the research object, we predicted the future runoff of Xiaolangdi. The combined use of Sanmenxia Reservoir, Luhun Reservoir, Guxian Reservoir, and Xiaolangdi Reservoir in the middle section of the Yellow River Basin and the use of Dongping Lake for flood diversion can raise the flood control standard of the lower Yellow River to the level required to handle storms likely to occur only once in a thousand years, and it can thus essentially eliminate the threat of flooding in the lower Yellow River. In this study, we first performed pre-processing operations, such as filtering and interpolation, on runoff data, after which we normalised the filtered data.
- (2)
After normalising the runoff sequence, we decomposed it using the CEEMDAN model to obtain several eigenmode components (IMF1–IMFn) to complete the steady state of the unsteady sequence. We then used several adjacent data of IMF1–IMFn as input vectors and several connected data as output vectors.
- (3)
The model extracted the temporal characteristics of the original signal using CEEMDAN and predicted the time series using the QRBL model. To obtain the runoff trend, the divided test set and training set were fed into the QRBL model, which predicted the runoff for the next two, five, ten, 20, 25, and 30 days. The sample size, QRBL input feature, and model input time-step were fixed at 128, 11, and 9, and the hidden and output layers used the sigmoid activation function. The PSO optimisation algorithm was used to optimise QRBL using the training error of QRBL as the fitness function, and the number of QRBL neurons was determined to be 20, where the number of set populations was 30, the maximum number of iterations was 50, and the range of set neurons was [1, 64].
- The particle population consists of n particles in d-dimensional space, and the position of the ith particle is denoted as , ; the velocity of the ith particle is ; the optimal solution searched by the ith particle is noted as: ; the global optimal solution for the whole population is:; and the particle velocity and position updates are given by (Marini & Walczak 2015):where
t: number of iterations,
r1 and r2: uniformly distributed random numbers,
c1 and c2: acceleration constants.
- (4)We used the Nash index (Ens), Pearson correlation coefficient (R), mean absolute error (MAE), and relative error (RE) as criteria for evaluating the credibility and accuracy of the CEEMDAN-QRBL model. When the value range of Ens is and when Ens is close to 1, the model's credibility is high. When Ens is close to , the model's credibility is low. The correlation coefficient R is a statistical indicator that is used to reflect the closeness of the correlation between variables. RE and MAE were used to evaluate the real-time error and overall error of the model, respectively. The formulas are defined as (Anaraki et al. 2021):where
: average observed values,
: observed values,
: predicted values,
: average predicted values.
- (5)We used NPKDE to quantitatively analyse the uncertainty of the CEEMDAN-QRBL model on different time-scales. NPKDE calculates the probability density function, and then establishes the confidence interval to obtain the upper and lower limits of the runoff prediction interval. The evaluation criteria for interval prediction include coverage and interval width. The interval width is an indicator of forecasting effectiveness. Under the premise of ensuring coverage, the smaller the interval width, the better the forecasting effect. The definition of is expressed as (He et al. 2021):where
: difference between the upper and lower limits of the confidence interval where the runoff prediction value of the ith step is located.
- (6)The flood control criteria for reservoirs include the maximum peak cut criterion, minimum flood disaster criterion, and shortest disaster criterion. In this study, we adopt the maximum peak cut criterion as the flood control criterion for the reservoir. The maximum peak-shaving criterion uses the effect of reservoir flood storage to reduce the peak discharge of floods entering the reservoir and to meet the downstream flood control requirements. The peak-shaving effect is typically measured using the peak-shaving rate, and is defined as (Xin et al. 2020):where
fno-interval: flood condition without intervals,
finterval: flood condition with intervals,
: interval peak discharge of the reservoir during the period ,
qt: peak discharge of the reservoir during the period .
RESULTS AND ANALYSIS
. | Models . | Next 2 days . | Next 5 days . | Next 10 days . | Next 20 days . | Next 25 days . | Next 30 days . |
---|---|---|---|---|---|---|---|
Ens | CEEMDAN-QRBL | 0.92995 | 0.89613 | 0.83446 | 0.68627 | 0.67293 | 0.64185 |
CEEMDAN-GRU | 0.93233 | 0.88474 | 0.80308 | 0.67626 | 0.58678 | 0.60085 | |
EMD-LSTM | 0.91792 | 0.81953 | 0.75980 | 0.63578 | 0.59019 | 0.56071 | |
GRU | 0.77714 | 0.58919 | 0.36316 | 0.17778 | 0.17150 | 0.14938 | |
CNN-LSTM | 0.75308 | 0.55925 | 0.29823 | 0.10570 | 0.08489 | 0.04680 | |
ELM | 0.75757 | 0.52179 | 0.28338 | 0.15724 | 0.15368 | 0.13787 | |
MAE | CEEMDAN-QRBL | 148.78073 | 184.99992 | 225.25434 | 288.91577 | 247.09619 | 264.00577 |
CEEMDAN-GRU | 152.21632 | 193.29254 | 251.24709 | 311.77681 | 357.34991 | 358.06181 | |
EMD-LSTM | 149.37752 | 201.61919 | 239.57221 | 302.44978 | 320.76064 | 336.93696 | |
GRU | 228.35998 | 302.40872 | 386.01204 | 302.27619 | 445.81442 | 460.83995 | |
CNN-LSTM | 234.54698 | 319.59541 | 411.02853 | 480.96564 | 494.52684 | 516.92310 | |
ELM | 235.23004 | 311.64285 | 380.87185 | 434.17141 | 443.32723 | 451.64001 | |
R | CEEMDAN-QRBL | 15.14280 | 15.49406 | 17.31916 | 15.45331 | 14.39027 | 14.24100 |
CEEMDAN-GRU | 14.95629 | 15.38689 | 16.51420 | 16.46498 | 13.57475 | 15.52629 | |
EMD-LSTM | 12.93059 | 13.18191 | 14.08978 | 13.83061 | 14.16406 | 15.33817 | |
GRU | 12.85936 | 12.67823 | 12.81807 | 13.09732 | 13.07744 | 12.91972 | |
CNN-LSTM | 12.60618 | 12.01063 | 11.43171 | 11.97262 | 12.08955 | 11.62218 | |
ELM | 12.37492 | 11.89630 | 12.42819 | 11.69514 | 13.46951 | 12.84219 |
. | Models . | Next 2 days . | Next 5 days . | Next 10 days . | Next 20 days . | Next 25 days . | Next 30 days . |
---|---|---|---|---|---|---|---|
Ens | CEEMDAN-QRBL | 0.92995 | 0.89613 | 0.83446 | 0.68627 | 0.67293 | 0.64185 |
CEEMDAN-GRU | 0.93233 | 0.88474 | 0.80308 | 0.67626 | 0.58678 | 0.60085 | |
EMD-LSTM | 0.91792 | 0.81953 | 0.75980 | 0.63578 | 0.59019 | 0.56071 | |
GRU | 0.77714 | 0.58919 | 0.36316 | 0.17778 | 0.17150 | 0.14938 | |
CNN-LSTM | 0.75308 | 0.55925 | 0.29823 | 0.10570 | 0.08489 | 0.04680 | |
ELM | 0.75757 | 0.52179 | 0.28338 | 0.15724 | 0.15368 | 0.13787 | |
MAE | CEEMDAN-QRBL | 148.78073 | 184.99992 | 225.25434 | 288.91577 | 247.09619 | 264.00577 |
CEEMDAN-GRU | 152.21632 | 193.29254 | 251.24709 | 311.77681 | 357.34991 | 358.06181 | |
EMD-LSTM | 149.37752 | 201.61919 | 239.57221 | 302.44978 | 320.76064 | 336.93696 | |
GRU | 228.35998 | 302.40872 | 386.01204 | 302.27619 | 445.81442 | 460.83995 | |
CNN-LSTM | 234.54698 | 319.59541 | 411.02853 | 480.96564 | 494.52684 | 516.92310 | |
ELM | 235.23004 | 311.64285 | 380.87185 | 434.17141 | 443.32723 | 451.64001 | |
R | CEEMDAN-QRBL | 15.14280 | 15.49406 | 17.31916 | 15.45331 | 14.39027 | 14.24100 |
CEEMDAN-GRU | 14.95629 | 15.38689 | 16.51420 | 16.46498 | 13.57475 | 15.52629 | |
EMD-LSTM | 12.93059 | 13.18191 | 14.08978 | 13.83061 | 14.16406 | 15.33817 | |
GRU | 12.85936 | 12.67823 | 12.81807 | 13.09732 | 13.07744 | 12.91972 | |
CNN-LSTM | 12.60618 | 12.01063 | 11.43171 | 11.97262 | 12.08955 | 11.62218 | |
ELM | 12.37492 | 11.89630 | 12.42819 | 11.69514 | 13.46951 | 12.84219 |
Table 2 shows the reconstruction error of IMF in different scenarios and the running time between different models. Based on Table 2, we reduced the IMF reconstruction error by 48.22% by using mutual information to extend the boundary; compared with the BL model, we increased the calculation efficiency of QRBL by 28.92%. These enhancements enable the CEEMDAN-QRBL model to predict with greater accuracy and confidence.
Evaluation index . | Schemes . | Calculation result . |
---|---|---|
IMF reconstruction error | Endpoint extreme value unknown | 17.238152 |
Mutual information extension | 11.630431 | |
Run time (s) | BL | 0.832979 |
QRBL | 0.591433 | |
Serial CEEMDAN-QRBL | 23.026205 | |
Parallel CEEMDAN-QRBL | 0.075796 |
Evaluation index . | Schemes . | Calculation result . |
---|---|---|
IMF reconstruction error | Endpoint extreme value unknown | 17.238152 |
Mutual information extension | 11.630431 | |
Run time (s) | BL | 0.832979 |
QRBL | 0.591433 | |
Serial CEEMDAN-QRBL | 23.026205 | |
Parallel CEEMDAN-QRBL | 0.075796 |
CONCLUSION
- (1)
To enhance the performance of a single model, CEEMDAN divided the original, highly random sequence into a number of smoother component sequences. By using orthogonal triangular decomposition, the BL model was enhanced, resulting in a 28.92% increase in computational efficiency and a faster solution of the BL output layer. The computational efficiency of the CEEMDAN-QRBL model was increased by these improvements.
- (2)
The current study generally falls short in terms of improving the EMD-like methodology. The hybrid prediction accuracy of the model can be further increased by suppressing the endpoint effect of CEEMDAN, and the error is decreased by 48.22% when using the improved CEEMDAN with the mutual information criterion. The MAE of the improved CEEMDAN-QRBL model is reduced by 12.36% and 16.31% when compared with EMD-LSTM and CEEMDAN-GRU, and Ens is improved by 8.81% and 3.96%, respectively, with higher prediction accuracy and credibility.
- (3)
A reasonable range of predicted fluctuations of the model was provided by the non-parametric kernel density estimation calculations, which can be used as a guide for management choices. It should be noted that the article analyses and forecasts runoff using time series data without taking sediment, rainfall, or weather-related factors into account. The developed model performs well for short-term predictions, but the R-values for longer predictions need to be increased. Medium- and long-term predictions have limitations as well, so more effective techniques should be investigated for in-depth research.
AUTHOR CONTRIBUTIONS
Conceptualisation: Yang Liu, Shuai Bing Du, Li Hu Wang.
Data curation: Yang Liu, Li Hu Wang.
Formal analysis: Yang Liu.
Funding acquisition: Yang Liu.
Methodology: Shuai Bing Du, Li Hu Wang.
Writing – origin draft: Yang Liu, Shuai Bing Du, Li Hu Wang.
Writing – review & editing: Shuai Bing Du, Li Hu Wang.
ACKNOWLEDGEMENTS
The National Key Research and Development Project under Grant Strategic Research Projects in Key Area 16 and the Water Conservancy Science and Technology Research Project in Henan Province supported this work (Grant GG202042).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.