The strong randomness exhibited by runoff series means the accuracy of flood forecasting still needs to be improved. Mode mixing can be dealt with using complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), and the endpoint effect of CEEMDAN can be successfully dealt with using the mutual information criterion. To increase the computational effectiveness of broad learning (BL), orthogonal triangular matrix decomposition (QR) was used. A novel improved coupled CEEMDAN-QRBL flood forecasting model was created and applied to the prediction of daily runoff in Xiaolangdi Reservoir based on the benefit of quick calculation by the model output layer. The findings indicate that the enhanced QRBL is 28.92% more computationally efficient than the BL model, and that the reconstruction error of CEEMDAN has been decreased by 48.22%. The MAE of the improved CEEMDAN-QRBL model is reduced by 12.36% and 16.31%, and the Ens is improved by 8.81% and 3.96%, respectively, when compared with the EMD-LSTM and CEEMDAN-GRU models. The predicted values of the CEEMDAN-QRBL model have a suitable fluctuation range thanks to the use of nonparametric kernel density estimation (NPKDE), which might serve as a useful benchmark for the distribution of regional water resources.

  • A novel CEEMDAN-QRBL model for flood forecasting was constructed.

  • Orthogonal triangular matrix decomposition was used to improve broad learning to enhance its computational efficiency.

  • The mutual information criterion was used to improve CEEMDAN to suppress the endpoint effect of CEEMDAN.

  • Nonparametric kernel density estimation was used to analyse the confidence level of flood forecasting.

Graphical Abstract

Graphical Abstract
Graphical Abstract
CEEMDAN

Complete ensemble empirical mode decomposition adaptive noise

BL

Broad learning model

QR

Orthogonal triangular matrix decomposition

QRBL

Improved broad learning model using orthogonal triangular matrix decomposition

EMD

Empirical mode decomposition

LSTM

Long short-term memory networks

GRU

Gate recurrent unit

NPKDE

Nonparametric kernel density estimation

VMD

Variational mode decomposition

IMF

Intrinsic mode function

ANNs

Artificial neural networks

SVD

Singular value decomposition

Ens

Nash efficiency index

R

Pearson correlation coefficient

MAE

Mean absolute error

RE

Relative error

ELM

Extreme learning machine

CNN-LSTM

Convolutional long short-term memory network

x(t)

runoff observations series (m3·s−1)

ε

variance of the noise

n(t)

Gaussian white noise

k

number of iterations

average value of IMF1(n)

Rn

residual signal

X

original input matrix

H

output matrix of the enhancement layer

the weight of the output layer

training set

sigmoid activation function

w

enhancement-layer weight of BL

b

enhancement-layer bias of BL

h(x)

row vector of the matrix

U and V

orthogonal unitary matrix

Σ

diagonal matrix

A

objective matrix to be decomposed

orthogonal matrix

upper triangular matrix

and

a reversible matrix

θ

confidence level of the flood prediction error

Ii(W0,Wi)

mutual information

mean

σ

standard deviation

g(x)

Gaussian kernel function

N

sample size

bandwidth coefficient

e

flood prediction error

pre

predicted value

true

flood observation value

H(Wi)

entropy of

t

number of iterations of PSO

r1 and r2

uniformly distributed random numbers

c1 and c2

acceleration constants of PSO

average observed values

observed values

predicted values

average predicted values

difference between the upper and lower limits of the confidence interval

fno-interval

flood condition without intervals

flood condition with intervals

interval peak discharge of the reservoir during the period t

qt

peak discharge of the reservoir during the period t

River runoff, as an important component of the hydrologic cycle, plays an important role in accurate and reliable flood prediction. The strong randomness and uncertainty exhibited by runoff signals make accurate flood prediction a major challenge (Yao et al. 2019). Therefore, developing a more stable, accurate, and reliable flood forecasting model is an urgent task for hydrological researchers.

Existing flood forecasting models can be divided into two categories: mechanism-driven hydrological models and data-driven statistical models (Awol et al. 2021). Hydrological models usually use physical parameters such as weather, runoff and rainfall to construct mathematical models that use hydrological equations to describe the correlation between flood indices (Yue et al. 2020); however, the dynamic and non-smooth nature of hydrological evolution leaves the multi-step prediction accuracy of this model to be improved (Hadid et al. 2020). Distributed hydrological models combine physical factors such as rainfall and subsurface to design reservoir flood forecasting schemes (Li et al. 2020), however, the applicability of distributed hydrological models has been constrained by the construction of a large number of small- and medium-sized reservoirs (Pham et al. 2020). Data-driven models, which are frequently used in hydrologic forecasting, do not require knowledge of the underlying physical processes, in contrast to mechanism-driven models, and have the ability to capture nonlinear correlations between input and output data. For instance, using artificial neural networks to forecast runoff from the Winooski River in western Vermont, USA, has resulted in more precise forecasts (Besaw et al. 2010). The Little River in Georgia and Reynolds Creek in Idaho were used to simulate runoff, and the results demonstrated that the model could more accurately quantify the uncertainty in runoff simulations (Zhang et al. 2009). In comparison with conventional neural networks, the broad learning structure (BL) has greater feature-learning power thanks to its enlarged nodes. Southeast China's precipitation was predicted using a broad learning assimilation model, and the findings indicated that the BL-based framework could produce more precise daily precipitation forecasts (Zhou et al. 2022). While a hybrid model created by breaking down the runoff data into relatively smooth components and combining signal-processing techniques can achieve higher prediction accuracy, a single model's ability to capture nonlinear trends and high-frequency abrupt change points of the runoff signal is frequently constrained. For instance, a multi-scale flood forecasting model was developed using wavelet analysis combined with fuzzy weighted Markov. The results showed that the model exhibited high overall prediction accuracy (J. Zhang et al. 2021). Unlike wavelet decomposition, empirical mode decomposition (EMD) can effectively handle nonlinear and nonsmooth signals (Xie et al. 2019). A recent study applied empirical mode decomposition (EMD) and LSTM to long-term flood forecasting, and the experimental results showed that the EMD-based LSTM model had a better fit (Liu et al. 2020). Using variational mode decomposition (VMD) in combination with LSTM for short-term flood forecasting and evaluation, the results showed that the VMD-LSTM has better prediction accuracy, computational efficiency and stability (Zuo et al. 2020). Using CEEMDAN combined with gate recurrent unit (GRU) for water level prediction application, the results showed that CEEMDAN-GRU has stronger prediction performance (Tao et al. 2021). However, some issues remain in current research methods: in wavelet analysis it is frequently difficult to determine the wavelet basis, and the choice of wavelet basis has a significant impact on the results (Wang et al. 2020). Endpoint effects and mode-mixing issues plague EMD methods (Shan et al. 2022), and VMD requires pre-setting two parameters (Lv et al. 2022), bandwidth limit and number of decompositions, whereas CEEMDAN incorporates white noise. When compared with EMD, the CEEMDAN adds white noise to the signal, which effectively solves the mode-mixing problem; when compared with VMD, it does not require artificial parameter setting and has a strong adaptive capability.

With additional enhancement-layer nodes, broad learning (BL) architecture, a revolutionary non-parametric machine learning technique, offers higher feature-learning capabilities and is less influenced by parameters than conventional neural networks. BL can calculate the results of the output layer directly and does not require repeated iterations, which has apparent efficiency advantages over deep learning models like LSTM and GRU. The paper builds a new and improved CEEMDAN-QRBL flood forecasting model by suppressing the endpoint effect that exists in CEEMDAN, reducing the reconstruction error of CEEMDAN, and improving the BL model using orthogonal triangular decomposition (QR) to speed up the BL solution. These improvements help to further increase the accuracy and operational efficiency of flood forecasting. Because the prediction model has intrinsic uncertainty, the research also employed nonparametric kernel density estimation (NPKDE) to quantify the uncertainty of the prediction model (Li et al. 2017). This can help to improve the flood prediction model's real-time performance and dependability.

CEEMDAN

EMD was proposed by Huang et al. (1998), and this method overcomes the problem that the basic function of the wavelet method cannot be adapted. EEMD introduces Gaussian white noise based on EMD, which effectively solves the problem of mode mixing in the process of EMD decomposition. However, because the introduced white noise cannot be eliminated, CEEMDAN was proposed. CEEMDAN realises the adaptive addition of noise according to signal characteristics, effectively reducing the noise residue. CEEMDAN can be described as (Mousavi et al. 2020):
(1)
where
  • x(t): runoff observations series (m3·s−1),

  • : variance of the noise,

  • n(t): Gaussian white noise,

  • k: number of iterations.

We used the EMD method to decompose and obtain the first-order modal component intrinsic mode function (IMF) IMF1, after which we calculated the average value, which is defined as (X. Zhang et al. 2021):
(2)
where
  • : average value of IMF1(n).

When the remaining components cannot be decomposed (i.e. the residual signal changes monotonically), the iteration stops. The k modal components IMF1IMFk are obtained, and the result of the can be defined as (Fan et al. 2021):
(3)
where
  • Rn: residual signal.

Finally, the original signal is expressed as (Rezaie-Balf et al. 2019):
(4)

QRBL

The broad learning model was proposed by Chen & Liu (2017). The method uses a random vector neural network as a carrier and realises the horizontal expansion of the designed network by increasing the number of neural nodes. To ensure the generalisation of functions, the BL method effectively eliminates the shortcomings of an excessively long training-time. Compared with traditional artificial neural networks (ANNs), BL has stronger generalisation and higher processing efficiency. Figure 1 shows the basic principle.
Figure 1

Topological structure of broad learning.

Figure 1

Topological structure of broad learning.

Close modal
For any N different training samples , are multiplied by a set of random weights, and the random deviations are added to the enhancement layer. The weights remain unchanged in the subsequent process. After the data are activated, the matrix H is obtained. Finally, the original input matrix X of the input layer and the output matrix H of the enhancement layer are merged. Assuming that a BL network has N input neurons and M enhancement-layer neurons, the definition of the mathematical model is expressed as (Ali et al. 2022):
(5)
where
  • X: original input matrix,

  • H: output matrix of the enhancement layer,

  • : the weight of the output layer,

  • : training set.

The matrix is defined as (Sheng et al. 2020):
(6)
where
  • : sigmoid activation function,

  • w: enhancement-layer weight,

  • b: enhancement-layer bias,

  • h(x): row vector of the matrix .

The enhancement-layer neuron parameter is randomly generated according to the probability of any continuous sampling distribution. After the training samples are provided, the output matrix H of the enhancement layer is obtained, and it is transformed into the least square solution to solve (5), which is defined as follows (Xu et al. 2018):
(7)
where
  • [X|H]+: the Moore–Penrose generalised inverse of matrix .

In the traditional BL model, the singular value decomposition (SVD) method is usually employed to determine the output of the enhancement layer. For the m×n order matrix H, the following SVD decomposition can be performed (Zuo et al. 2022):
(8)
where
  • U and V: orthogonal unitary matrix,

  • Σ: diagonal matrix.

To determine the output of the enhancement layer of the BL model, the QR method was used to redefine the solution. The QR method has the advantages of simple calculation and higher efficiency compared with the SVD method. The basic form of QR decomposition can be defined as (Abdelfattah et al. 2021):
(9)
where
  • A: objective matrix to be decomposed,

  • : orthogonal matrix,

  • : upper triangular matrix.

This is related to the theory of the partitioned matrix (Rao 2020):
(10)
where
  • and : a reversible matrix.

Hence (Liu et al. 2021):
(11)
Then, QRBL is defined as:
(12)

NPKDE and confidence interval

NPKDE directly uses historical samples to calculate the probability density without assuming the theoretical distribution of the sample and without parameter estimation. Therefore, it can avoid the error caused by the misjudgement of the overall distribution of the traditional parameter estimation to a certain extent. In this study, we used the Gaussian kernel function as the kernel function of NPKDE. The Gaussian kernel function is defined as (Zheng et al. 2020):
(13)
where
  • : mean,

  • σ: standard deviation,

  • g(x): Gaussian kernel function.

The probability density equation of NPKDE is then defined as (Kamalov 2020):
(14)
where
  • : sample size,

  • : bandwidth coefficient.

After obtaining the probability density distribution of the flood prediction errors through NPKDE, the confidence interval can be used to quantitatively calculate the density distribution. The flood prediction error is the difference between the predicted value and the observed value at a certain moment and is defined as (Loveridge & Rahman 2021):
(15)
where
  • e: flood prediction error,

  • pre: predicted value,

  • true: flood observation value.

The confidence level of the flood prediction error e is expressed as (Latif & Mustafa 2020):
(16)
where
  • elow: lower confidence limit,

  • eup: upper confidence limit,

  • θ: confidence level of the flood prediction error,

and is the probability of the flood prediction error e in the interval . The confidence interval for runoff prediction is .

Boundary extension method based on mutual information criterion

In the CEEMDAN decomposition process, the upper and lower envelopes are obtained from the local maximum and minimum values of the signal through the cubic spline interpolation function. Because it is impossible to guarantee the signal endpoint is the maximum or minimum point, the envelope loses its physical meaning, which severely affects the signal decomposition result. The mutual information criterion is used to develop an endpoint extension method based on waveform matching to solve the boundary effect of CEEMDAN. Figure 2 is a schematic diagram of boundary extension based on the mutual information criterion, and its basic realisation is as follows.
  • (1)

    In Figure 2, we assume that an original signal comprises m maximum points and p minimum points . The first extreme point of the left boundary is taken as an example; then, the wavelet between the left endpoint and the second extreme point is selected as the research object, which is defined as .

  • (2)

    is taken as the corresponding point to the position of , and the wavelet of the same length as is intercepted to ensure that the timing position of in relative to is consistent with the timing position of relative to in .

  • (3)
    Then, the mutual information value between and is calculated and used as the matching coefficient of each sub-waveform with . The wavelet with the largest matching coefficient is considered as the best matching waveform of , and the data of the length are extended before the wavelet to the left of . The mutual information is defined as (Tang et al. 2013):
    (17)
    where
  • Ii(W0,Wi): mutual information ,

  • H(Wi): entropy of ,

  • H(Wi|W0): conditional entropy of when is known.

Figure 2

Boundary extension based on mutual information criterion.

Figure 2

Boundary extension based on mutual information criterion.

Close modal

The stronger the correlation between and , the smaller the conditional entropy and the larger the mutual information .

  • (4)

    The same principle is used to extend the right boundary of the original signal.

Flood forecast and uncertainty analysis

In this study, we combine the advantages of the CEEMDAN and QRBL models to develop a CEEMDAN-QRBL hybrid model that expresses time, space, and non-stationary characteristics simultaneously. On this basis, the NPKDE is used to further analyse the error distribution law, after which we establish the distribution model of the flood prediction error, obtain the probability density distribution function, and establish the flood prediction confidence interval. Under a confidence level, the uncertain flood prediction of the CEEMDAN-QRBL model is transformed into a certain flood interval prediction. Figure 3 shows the data flow chart of the flood prediction model, and its basic realisation is as follows.
  • (1)

    By using the daily runoff data collected from the Xiaolangdi Reservoir in the middle section of the Yellow River Basin from 2002 to 2019 as the research object, we predicted the future runoff of Xiaolangdi. The combined use of Sanmenxia Reservoir, Luhun Reservoir, Guxian Reservoir, and Xiaolangdi Reservoir in the middle section of the Yellow River Basin and the use of Dongping Lake for flood diversion can raise the flood control standard of the lower Yellow River to the level required to handle storms likely to occur only once in a thousand years, and it can thus essentially eliminate the threat of flooding in the lower Yellow River. In this study, we first performed pre-processing operations, such as filtering and interpolation, on runoff data, after which we normalised the filtered data.

  • (2)

    After normalising the runoff sequence, we decomposed it using the CEEMDAN model to obtain several eigenmode components (IMF1IMFn) to complete the steady state of the unsteady sequence. We then used several adjacent data of IMF1IMFn as input vectors and several connected data as output vectors.

  • (3)

    The model extracted the temporal characteristics of the original signal using CEEMDAN and predicted the time series using the QRBL model. To obtain the runoff trend, the divided test set and training set were fed into the QRBL model, which predicted the runoff for the next two, five, ten, 20, 25, and 30 days. The sample size, QRBL input feature, and model input time-step were fixed at 128, 11, and 9, and the hidden and output layers used the sigmoid activation function. The PSO optimisation algorithm was used to optimise QRBL using the training error of QRBL as the fitness function, and the number of QRBL neurons was determined to be 20, where the number of set populations was 30, the maximum number of iterations was 50, and the range of set neurons was [1, 64].

  • The particle population consists of n particles in d-dimensional space, and the position of the ith particle is denoted as , ; the velocity of the ith particle is ; the optimal solution searched by the ith particle is noted as: ; the global optimal solution for the whole population is:; and the particle velocity and position updates are given by (Marini & Walczak 2015):
    (18)
    (19)
    where
  • t: number of iterations,

  • r1 and r2: uniformly distributed random numbers,

  • c1 and c2: acceleration constants.

  • (4)
    We used the Nash index (Ens), Pearson correlation coefficient (R), mean absolute error (MAE), and relative error (RE) as criteria for evaluating the credibility and accuracy of the CEEMDAN-QRBL model. When the value range of Ens is and when Ens is close to 1, the model's credibility is high. When Ens is close to , the model's credibility is low. The correlation coefficient R is a statistical indicator that is used to reflect the closeness of the correlation between variables. RE and MAE were used to evaluate the real-time error and overall error of the model, respectively. The formulas are defined as (Anaraki et al. 2021):
    (20)
    (21)
    (22)
    (23)
    where
  • : average observed values,

  • : observed values,

  • : predicted values,

  • : average predicted values.

  • (5)
    We used NPKDE to quantitatively analyse the uncertainty of the CEEMDAN-QRBL model on different time-scales. NPKDE calculates the probability density function, and then establishes the confidence interval to obtain the upper and lower limits of the runoff prediction interval. The evaluation criteria for interval prediction include coverage and interval width. The interval width is an indicator of forecasting effectiveness. Under the premise of ensuring coverage, the smaller the interval width, the better the forecasting effect. The definition of is expressed as (He et al. 2021):
    (24)
    where
  • : difference between the upper and lower limits of the confidence interval where the runoff prediction value of the ith step is located.

  • (6)
    The flood control criteria for reservoirs include the maximum peak cut criterion, minimum flood disaster criterion, and shortest disaster criterion. In this study, we adopt the maximum peak cut criterion as the flood control criterion for the reservoir. The maximum peak-shaving criterion uses the effect of reservoir flood storage to reduce the peak discharge of floods entering the reservoir and to meet the downstream flood control requirements. The peak-shaving effect is typically measured using the peak-shaving rate, and is defined as (Xin et al. 2020):
    (25)
    (26)
    where
  • fno-interval: flood condition without intervals,

  • finterval: flood condition with intervals,

  • : interval peak discharge of the reservoir during the period ,

  • qt: peak discharge of the reservoir during the period .

Figure 3

Flood prediction and analysis.

Figure 3

Flood prediction and analysis.

Close modal
As the only integrated water conservancy hub that can be held accountable for flood control in the lower reaches of the Yellow River, this study uses runoff data from 2002 to 2019 from Xiaolangdi Reservoir as the research object. This reservoir plays a crucial role in controlling runoff in the lower reaches of the Yellow River. This paper compares various benchmark models, including the extreme learning machine (ELM), convolutional long short-term memory network (CNN-LSTM), and gate recurrent unit (GRU). The EMD-LSTM and CEEMDAN-GRU models mentioned in section 1 are compared with the CEEMDAN-QRBL model created in order to reflect the accuracy improvement effect of the models studied in this paper. The dropout value is 0.1, and the hidden layers of the LSTM and GRU are set to 64 layers. We also use Ens, MAE, RE, and R to evaluate the indicators and validate the overall performance of the proposed model (CEEMDAN-QRBL). Figure 4 depicts the RE levels of various models at different time-scales. Figures 4(a)–4(f) show the maximum, minimum, median, upper, and lower quartiles of prediction errors for the various prediction models over various forecast periods. CEEMDAN-QRBL had the lowest error level and the best accuracy across all forecast periods.
Figure 4

Relative errors of different models and different forecast periods: (a) prediction of the relative error for the next two days, (b) forecast of the relative error for the next five days, (c) prediction of the relative error for the next ten days, (d) forecast of the relative error for the next 20 days, (e) prediction of the relative error for the next 25 days, (f) prediction of the relative error for the next 30 days.

Figure 4

Relative errors of different models and different forecast periods: (a) prediction of the relative error for the next two days, (b) forecast of the relative error for the next five days, (c) prediction of the relative error for the next ten days, (d) forecast of the relative error for the next 20 days, (e) prediction of the relative error for the next 25 days, (f) prediction of the relative error for the next 30 days.

Close modal
Figure 5 and Table 1 show that when the prediction period is 20 days, the R value of the CEEMDAN-GRU model is 16.46498, the R value of the CEEMDAN-QRBL model is 15.45331, and when the prediction period is 30 days, the R value of the EMD-LSTM is 15.33817, the R value of the CEEMDAN-GRU model is 15.52629, the R value of the CEEMDAN-QRBL model is 14.24100, and the R value of the CEEMDAN-QRBL model does not reach the optimal level. When compared with other models in other prediction periods, the CEEMDAN-QRBL model has the best evaluation index results in each prediction period. In comparison with the EMD-LSTM model, the MAE of the improved CEEMDAN-QRBL model is reduced by 12.36% and Ens is improved by 8.81%, suggesting that CEEMDAN can aid in the model's ability to make predictions with greater accuracy when EMD mode-mixing is not present. The use of the mutual information criterion significantly reduces the endpoint effect of CEEMDAN, lowers the reconstruction error of CEEMDAN, and further improves the prediction accuracy and credibility of the model. Compared with the CEEMDAN-GRU model, the MAE of the improved CEEMDAN-QRBL model is reduced by 16.31%, and Ens is improved by 3.96%.
Table 1

Comparison of numerical results of various evaluation indicators

ModelsNext 2 daysNext 5 daysNext 10 daysNext 20 daysNext 25 daysNext 30 days
Ens CEEMDAN-QRBL 0.92995 0.89613 0.83446 0.68627 0.67293 0.64185 
CEEMDAN-GRU 0.93233 0.88474 0.80308 0.67626 0.58678 0.60085 
EMD-LSTM 0.91792 0.81953 0.75980 0.63578 0.59019 0.56071 
GRU 0.77714 0.58919 0.36316 0.17778 0.17150 0.14938 
CNN-LSTM 0.75308 0.55925 0.29823 0.10570 0.08489 0.04680 
ELM 0.75757 0.52179 0.28338 0.15724 0.15368 0.13787 
MAE CEEMDAN-QRBL 148.78073 184.99992 225.25434 288.91577 247.09619 264.00577 
CEEMDAN-GRU 152.21632 193.29254 251.24709 311.77681 357.34991 358.06181 
EMD-LSTM 149.37752 201.61919 239.57221 302.44978 320.76064 336.93696 
GRU 228.35998 302.40872 386.01204 302.27619 445.81442 460.83995 
CNN-LSTM 234.54698 319.59541 411.02853 480.96564 494.52684 516.92310 
ELM 235.23004 311.64285 380.87185 434.17141 443.32723 451.64001 
R CEEMDAN-QRBL 15.14280 15.49406 17.31916 15.45331 14.39027 14.24100 
CEEMDAN-GRU 14.95629 15.38689 16.51420 16.46498 13.57475 15.52629 
EMD-LSTM 12.93059 13.18191 14.08978 13.83061 14.16406 15.33817 
GRU 12.85936 12.67823 12.81807 13.09732 13.07744 12.91972 
CNN-LSTM 12.60618 12.01063 11.43171 11.97262 12.08955 11.62218 
ELM 12.37492 11.89630 12.42819 11.69514 13.46951 12.84219 
ModelsNext 2 daysNext 5 daysNext 10 daysNext 20 daysNext 25 daysNext 30 days
Ens CEEMDAN-QRBL 0.92995 0.89613 0.83446 0.68627 0.67293 0.64185 
CEEMDAN-GRU 0.93233 0.88474 0.80308 0.67626 0.58678 0.60085 
EMD-LSTM 0.91792 0.81953 0.75980 0.63578 0.59019 0.56071 
GRU 0.77714 0.58919 0.36316 0.17778 0.17150 0.14938 
CNN-LSTM 0.75308 0.55925 0.29823 0.10570 0.08489 0.04680 
ELM 0.75757 0.52179 0.28338 0.15724 0.15368 0.13787 
MAE CEEMDAN-QRBL 148.78073 184.99992 225.25434 288.91577 247.09619 264.00577 
CEEMDAN-GRU 152.21632 193.29254 251.24709 311.77681 357.34991 358.06181 
EMD-LSTM 149.37752 201.61919 239.57221 302.44978 320.76064 336.93696 
GRU 228.35998 302.40872 386.01204 302.27619 445.81442 460.83995 
CNN-LSTM 234.54698 319.59541 411.02853 480.96564 494.52684 516.92310 
ELM 235.23004 311.64285 380.87185 434.17141 443.32723 451.64001 
R CEEMDAN-QRBL 15.14280 15.49406 17.31916 15.45331 14.39027 14.24100 
CEEMDAN-GRU 14.95629 15.38689 16.51420 16.46498 13.57475 15.52629 
EMD-LSTM 12.93059 13.18191 14.08978 13.83061 14.16406 15.33817 
GRU 12.85936 12.67823 12.81807 13.09732 13.07744 12.91972 
CNN-LSTM 12.60618 12.01063 11.43171 11.97262 12.08955 11.62218 
ELM 12.37492 11.89630 12.42819 11.69514 13.46951 12.84219 
Figure 5

Comparison of evaluation indicators for different models at different times: (a) prediction of the MAE value for two, five, and ten days, (b) prediction of the MAE value for 20, 25, and 30 days, (c) prediction of the R value for two, five, and ten days, (d) prediction of the R value for 20, 25, and 30 days, (e) prediction of the Ens value for two, five, and ten days, (f) prediction of the Ens value for 20, 25, and 30 days.

Figure 5

Comparison of evaluation indicators for different models at different times: (a) prediction of the MAE value for two, five, and ten days, (b) prediction of the MAE value for 20, 25, and 30 days, (c) prediction of the R value for two, five, and ten days, (d) prediction of the R value for 20, 25, and 30 days, (e) prediction of the Ens value for two, five, and ten days, (f) prediction of the Ens value for 20, 25, and 30 days.

Close modal

Table 2 shows the reconstruction error of IMF in different scenarios and the running time between different models. Based on Table 2, we reduced the IMF reconstruction error by 48.22% by using mutual information to extend the boundary; compared with the BL model, we increased the calculation efficiency of QRBL by 28.92%. These enhancements enable the CEEMDAN-QRBL model to predict with greater accuracy and confidence.

Table 2

Comparison of numerical results of different schemes

Evaluation indexSchemesCalculation result
IMF reconstruction error Endpoint extreme value unknown 17.238152 
Mutual information extension 11.630431 
Run time (s) BL 0.832979 
QRBL 0.591433 
Serial CEEMDAN-QRBL 23.026205 
Parallel CEEMDAN-QRBL 0.075796 
Evaluation indexSchemesCalculation result
IMF reconstruction error Endpoint extreme value unknown 17.238152 
Mutual information extension 11.630431 
Run time (s) BL 0.832979 
QRBL 0.591433 
Serial CEEMDAN-QRBL 23.026205 
Parallel CEEMDAN-QRBL 0.075796 

The daily runoff predictions and associated prediction errors from the improved CEEMDAN-QRBL model were studied, and the nonparametric kernel density was calculated. The prediction error of the CEEMDAN-QRBL model is shown in Figure 6(a) as a histogram of probability density distribution, and the error curve follows a normal distribution, demonstrating the strong robustness of the built-in model. The daily runoff interval prediction results with confidence levels of 98%, 95%, 92%, and 90%, respectively, are shown in Figure 6(b) with the predicted values of the CEEMDAN-QRBL model superimposed with the error fluctuation intervals. Figure 6(b) illustrates how the prediction interval successfully achieves a broad coverage of the measured values, can quantify the uncertainty of the predicted daily runoff values with some degree of accuracy, and can better capture the daily runoff fluctuation range. The width of the confidence interval grows in proportion to the confidence level, showing that the wider the confidence interval, the greater the likelihood that it will contain the predicted value, which can effectively demonstrate the validity of the proposed method.
Figure 6

Confidence level of flood prediction based on CEEMDAN-QRBL: (a) CEEMDAN-QRBL error probability density distribution, (b) CEEMDAN-QRBL interval prediction with different confidence levels.

Figure 6

Confidence level of flood prediction based on CEEMDAN-QRBL: (a) CEEMDAN-QRBL error probability density distribution, (b) CEEMDAN-QRBL interval prediction with different confidence levels.

Close modal
  • (1)

    To enhance the performance of a single model, CEEMDAN divided the original, highly random sequence into a number of smoother component sequences. By using orthogonal triangular decomposition, the BL model was enhanced, resulting in a 28.92% increase in computational efficiency and a faster solution of the BL output layer. The computational efficiency of the CEEMDAN-QRBL model was increased by these improvements.

  • (2)

    The current study generally falls short in terms of improving the EMD-like methodology. The hybrid prediction accuracy of the model can be further increased by suppressing the endpoint effect of CEEMDAN, and the error is decreased by 48.22% when using the improved CEEMDAN with the mutual information criterion. The MAE of the improved CEEMDAN-QRBL model is reduced by 12.36% and 16.31% when compared with EMD-LSTM and CEEMDAN-GRU, and Ens is improved by 8.81% and 3.96%, respectively, with higher prediction accuracy and credibility.

  • (3)

    A reasonable range of predicted fluctuations of the model was provided by the non-parametric kernel density estimation calculations, which can be used as a guide for management choices. It should be noted that the article analyses and forecasts runoff using time series data without taking sediment, rainfall, or weather-related factors into account. The developed model performs well for short-term predictions, but the R-values for longer predictions need to be increased. Medium- and long-term predictions have limitations as well, so more effective techniques should be investigated for in-depth research.

Conceptualisation: Yang Liu, Shuai Bing Du, Li Hu Wang.

Data curation: Yang Liu, Li Hu Wang.

Formal analysis: Yang Liu.

Funding acquisition: Yang Liu.

Methodology: Shuai Bing Du, Li Hu Wang.

Writing – origin draft: Yang Liu, Shuai Bing Du, Li Hu Wang.

Writing – review & editing: Shuai Bing Du, Li Hu Wang.

The National Key Research and Development Project under Grant Strategic Research Projects in Key Area 16 and the Water Conservancy Science and Technology Research Project in Henan Province supported this work (Grant GG202042).

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abdelfattah
A.
,
Anzt
H.
,
Boman
E. G.
,
Carson
E.
,
Cojean
T.
,
Dongarra
J.
,
Fox
A.
,
Gates
M.
,
Higham
N. J.
,
Li
X. S.
,
Loe
J.
,
Luszczek
P.
,
Pranesh
S.
,
Rajamanickam
S.
,
Ribizel
T.
,
Smith
B. F.
,
Swirydowicz
K.
,
Thomas
S.
,
Tomov
S.
,
Tsai
Y. M.
&
Yang
U. M.
2021
A survey of numerical linear algebra methods utilizing mixed-precision arithmetic
.
The International Journal of High Performance Computing Applications
35
(
4
),
344
369
.
https://doi.org/10.1177/10943420211003313
.
Ali
M.
,
Deo
R. C.
,
Xiang
Y.
,
Prasad
R.
,
Li
J.
,
Farooque
A.
&
Yaseen
Z. M.
2022
Coupled online sequential extreme learning machine model with ant colony optimization algorithm for wheat yield prediction
.
Scientific Reports
12
,
5488
.
https://doi.org/10.1038/s41598-022-09482-5
.
Anaraki
M. V.
,
Farzin
S.
,
Mousavi
S.-F.
&
Karami
H.
2021
Uncertainty analysis of climate change impacts on flood frequency by using hybrid machine learning methods
.
Water Resources Management
35
(
1
),
199
223
.
https://doi.org/10.1007/s11269-020-02719-w
.
Besaw
L. E.
,
Rizzo
D. M.
,
Bierman
P. R.
&
Hackett
W. R.
2010
Advances in ungauged streamflow prediction using artificial neural networks
.
Journal of Hydrology
386
(
1–4
),
27
37
.
Chen
C. L. P.
&
Liu
Z.
2017
Broad Learning System: an effective and efficient incremental learning system without the need for deep architecture
.
IEEE Transactions on Neural Networks and Learning Systems
29
(
1
),
10
24
.
http://doi.org/10.1109/TNNLS.2017.2716952
.
Fan
M.
,
Xu
J.
,
Chen
Y.
&
Li
W.
2021
Modeling streamflow driven by climate change in data-scarce mountainous basins
.
Science of The Total Environment
790
,
148256
.
https://doi.org/10.1016/j.scitotenv.2021.148256
.
He
Y.
,
Fan
H.
,
Lei
X.
&
Wan
J.
2021
A runoff probability density prediction method based on B-spline quantile regression and kernel density estimation
.
Applied Mathematical Modelling
93
,
852
867
.
https://doi.org/10.1016/j.apm.2020.12.043
.
Huang
N. E.
,
Shen
Z.
,
Long
S. R.
,
Wu
M. C.
,
Shih
H. H.
,
Zheng
Q.
,
Yen
N.-C.
,
Tung
C. C.
&
Liu
H. H.
1998
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis
.
Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences
454
(
1971
),
903
995
.
http://doi.org/doi:10.1098/rspa.1998.0193
.
Kamalov
F.
2020
Kernel density estimation based sampling for imbalanced class distribution
.
Information Sciences
512
,
1192
1201
.
https://doi.org/10.1016/j.ins.2019.10.017
.
Li
W.
,
Zhou
J.
,
Sun
H.
,
Feng
K.
,
Zhang
H.
&
Tayyab
M.
2017
Impact of distribution type in Bayes probability flood forecasting
.
Water Resources Management
31
(
3
),
961
977
.
http://doi.org/10.1007/s11269-016-1557-6
.
Li
B.
,
Liang
Z.
,
Chang
Q.
,
Zhou
W.
,
Wang
H.
,
Wang
J.
&
Hu
Y.
2020
On the operational flood forecasting practices using low-quality data input of a distributed hydrological model
.
Sustainability
12
(
19
),
8268
.
http://dx.doi.org/10.3390/su12198268
.
Liu
D.
,
Jiang
W.
,
Mu
L.
&
Wang
S.
2020
Streamflow prediction using deep learning neural network: case study of Yangtze River
.
IEEE Access
8
,
90069
90086
.
http://doi.org/10.1109/ACCESS.2020.2993874
.
Liu
Y.
,
Wang
L.
,
Yang
L.
,
Liu
X.
&
Wang
L.
2021
Runoff prediction and analysis based on improved CEEMDAN-OS-QR-ELM
.
IEEE Access
9
,
57311
57324
.
https://doi.org/10.1109/ACCESS.2021.3072673
.
Lv
L.
,
Wu
Z.
,
Zhang
J.
,
Zhang
L.
,
Tan
Z.
&
Tian
Z.
2022
A VMD and LSTM based hybrid model of load forecasting for power grid security
.
IEEE Transactions on Industrial Informatics
18
(
9
),
6474
6482
.
https://doi.org/10.1109/TII.2021.3130237
.
Marini
F.
&
Walczak
B.
2015
Particle swarm optimization (PSO). A tutorial
.
Chemometrics and Intelligent Laboratory Systems
149
(
Part B
),
153
165
.
Pham
B. T.
,
Avand
M.
,
Janizadeh
S.
,
Phong
T. V.
,
Al-Ansari
N.
,
Ho
L. S.
,
Das
S.
,
Le
H. V.
,
Amini
A.
,
Bozchaloei
S. K.
,
Jafari
F.
&
Prakash
I.
2020
GIS based hybrid computational approaches for flash flood susceptibility assessment
.
Water
12
(
3
),
683
.
Rao
W.-J.
2020
Higher-order level spacings in random matrix theory based on Wigner's conjecture
.
Physical Review B
102
(
5
),
054202
.
https://doi.org/10.1103/PhysRevB.102.054202
.
Rezaie-Balf
M.
,
Naganna
S. R.
,
Kisi
O.
&
El-Shafie
A.
2019
Enhancing streamflow forecasting using the augmenting ensemble procedure coupled machine learning models: case study of Aswan High Dam
.
Hydrological Sciences Journal
64
(
13
),
1629
1646
.
https://doi.org/10.1080/02626667.2019.1661417
.
Shan
Q.
,
Liang
X.
&
Wang
C.
2022
Prediction of Luoma Lake water level based on improved ICEEMDAN-LSTM model
. In:
2022 7th International Conference on Computational Intelligence and Applications (ICCIA)
,
IEEE
,
Piscataway, NJ, USA
, pp.
95
102
.
Sheng
B.
,
Li
P.
,
Zhang
Y.
,
Mao
L.
&
Chen
C. L. P.
2020
GreenSea: visual soccer analysis using broad learning system
.
IEEE Transactions on Cybernetics
51
(
3
),
1463
1477
.
http://doi.org/10.1109/TCYB.2020.2988792
.
Tang
Y. B.
,
Gui W
H.
,
Peng
T.
&
Wei
O.
2013
Prediction method for dissolved gas concentration in transformer oil based on variable selection of mutual information
.
Chinese Journal of Scientific Instrument
34
(
7
),
53
59
.
Tao
S.
,
Yibin
W.
,
Wei
C.
&
Xuechun
L.
2021
Research on water level prediction on CEEMDAN-GRU model under the IMFs recombination
. In:
2021 2nd Asia Symposium on Signal Processing (ASSP)
,
IEEE
,
Piscataway, NJ, USA
, pp.
77
83
.
Xu
M.
,
Han
M.
,
Chen
C. L. P.
&
Qiu
T.
2018
Recurrent broad learning systems for time series prediction
.
IEEE Transactions on Cybernetics
50
(
4
),
1405
1417
.
Yao
C.
,
Ye
J.
,
He
Z.
,
Bastola
S.
,
Zhang
K.
&
Li
Z.
2019
Evaluation of flood prediction capability of the distributed Grid-Xinanjiang model driven by weather research and forecasting precipitation
.
Journal of Flood Risk Management
12
(
S1
),
e12544
.
https://doi.org/10.1111/jfr3.12544
.
Yue
Z.
,
Ai
P.
,
Yuan
D.
&
Xiong
C.
2020
Ensemble approach for mid-long term runoff forecasting using hybrid algorithms
.
Journal of Ambient Intelligence and Humanized Computing
13
(
11
),
5103
5122
.
http://doi.org/10.1007/s12652-020-02345-9.
Zhang
X.
,
Liang
F.
,
Srinivasan
R.
&
Van Liew
M.
2009
Estimating uncertainty of streamflow simulation using Bayesian neural networks
.
Water Resources Research
45
(
2
),
W02403
.
Zhang
J.
,
Wang
Y.
,
Zhao
Y.
&
Fang
H.
2021
Multi-scale flood prediction based on GM (1,2)-fuzzy weighted Markov and wavelet analysis
.
Journal of Water and Climate Change
12
(
6
),
2217
2231
.
http://doi.org/10.2166/wcc.2021.289
.
Zhang
X.
,
Zheng
Z.
&
Wang
K.
2021
Prediction of runoff in the upper Yangtze River based on CEEMDAN-NAR model
.
Water Supply
21
(
7
),
3307
3318
.
https://doi.org/10.2166/ws.2021.121
.
Zheng
Y.
,
Chen
B.
,
Wang
S.
,
Wang
W.
&
Qin
W.
2020
Mixture correntropy-based kernel extreme learning machines
.
IEEE Transactions on Neural Networks and Learning Systems
33
(
2
),
811
825
.
Zuo
G.
,
Luo
J.
,
Wang
N.
,
Lian
Y.
&
He
X.
2020
Decomposition ensemble model based on variational mode decomposition and long short-term memory for streamflow forecasting
.
Journal of Hydrology
585
,
124776
.
https://doi.org/10.1016/j.jhydrol.2020.124776
.
Zuo
Y.
,
Wang
N.
,
Jia
L.
,
Zhang
H.
,
Wang
Z.
&
Qin
Y.
2022
Fully decomposed singular value and fixed dictionary extreme learning machine for bogie fault diagnosis
.
IEEE Transactions on Intelligent Transportation Systems
23
(
8
),
10262
10274
.
https://doi.org/10.1109/TITS.2021.3089181
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).