## Abstract

The strong randomness exhibited by runoff series means the accuracy of flood forecasting still needs to be improved. Mode mixing can be dealt with using complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), and the endpoint effect of CEEMDAN can be successfully dealt with using the mutual information criterion. To increase the computational effectiveness of broad learning (BL), orthogonal triangular matrix decomposition (QR) was used. A novel improved coupled CEEMDAN-QRBL flood forecasting model was created and applied to the prediction of daily runoff in Xiaolangdi Reservoir based on the benefit of quick calculation by the model output layer. The findings indicate that the enhanced QRBL is 28.92% more computationally efficient than the BL model, and that the reconstruction error of CEEMDAN has been decreased by 48.22%. The MAE of the improved CEEMDAN-QRBL model is reduced by 12.36% and 16.31%, and the *E*_{ns} is improved by 8.81% and 3.96%, respectively, when compared with the EMD-LSTM and CEEMDAN-GRU models. The predicted values of the CEEMDAN-QRBL model have a suitable fluctuation range thanks to the use of nonparametric kernel density estimation (NPKDE), which might serve as a useful benchmark for the distribution of regional water resources.

## HIGHLIGHTS

A novel CEEMDAN-QRBL model for flood forecasting was constructed.

Orthogonal triangular matrix decomposition was used to improve broad learning to enhance its computational efficiency.

The mutual information criterion was used to improve CEEMDAN to suppress the endpoint effect of CEEMDAN.

Nonparametric kernel density estimation was used to analyse the confidence level of flood forecasting.

### Graphical Abstract

## NOMENCLATURE LIST

- CEEMDAN
Complete ensemble empirical mode decomposition adaptive noise

- BL
Broad learning model

- QR
Orthogonal triangular matrix decomposition

- QRBL
Improved broad learning model using orthogonal triangular matrix decomposition

- EMD
Empirical mode decomposition

- LSTM
Long short-term memory networks

- GRU
Gate recurrent unit

- NPKDE
Nonparametric kernel density estimation

- VMD
Variational mode decomposition

- IMF
Intrinsic mode function

- ANNs
Artificial neural networks

- SVD
Singular value decomposition

*E*_{ns}Nash efficiency index

*R*Pearson correlation coefficient

- MAE
Mean absolute error

- RE
Relative error

- ELM
Extreme learning machine

- CNN-LSTM
Convolutional long short-term memory network

*x*(*t*)runoff observations series (m

^{3·s−1})*ε*variance of the noise

*n*(*t*)Gaussian white noise

*k*number of iterations

average value of

*IMF*_{1}(*n*)*R*_{n}residual signal

*X*original input matrix

*H*output matrix of the enhancement layer

the weight of the output layer

training set

sigmoid activation function

*w*enhancement-layer weight of BL

*b*enhancement-layer bias of BL

*h*(*x*)row vector of the matrix

*U*and*V*orthogonal unitary matrix

*Σ*diagonal matrix

*A*objective matrix to be decomposed

orthogonal matrix

upper triangular matrix

- and
a reversible matrix

*θ*confidence level of the flood prediction error

*I*(_{i}*W*_{0},*W*)_{i}mutual information

mean

*σ*standard deviation

*g*(*x*)Gaussian kernel function

*N*sample size

bandwidth coefficient

*e*flood prediction error

_{pre}predicted value

_{true}flood observation value

*H*(*W*)_{i}entropy of

*t*number of iterations of PSO

*r*_{1}and*r*_{2}uniformly distributed random numbers

*c*_{1}and*c*_{2}acceleration constants of PSO

average observed values

observed values

predicted values

average predicted values

difference between the upper and lower limits of the confidence interval

*f*_{no-interval}flood condition without intervals

flood condition with intervals

interval peak discharge of the reservoir during the period

*t**q*_{t}peak discharge of the reservoir during the period

*t*

## INTRODUCTION

River runoff, as an important component of the hydrologic cycle, plays an important role in accurate and reliable flood prediction. The strong randomness and uncertainty exhibited by runoff signals make accurate flood prediction a major challenge (Yao *et al.* 2019). Therefore, developing a more stable, accurate, and reliable flood forecasting model is an urgent task for hydrological researchers.

Existing flood forecasting models can be divided into two categories: mechanism-driven hydrological models and data-driven statistical models (Awol *et al.* 2021). Hydrological models usually use physical parameters such as weather, runoff and rainfall to construct mathematical models that use hydrological equations to describe the correlation between flood indices (Yue *et al.* 2020); however, the dynamic and non-smooth nature of hydrological evolution leaves the multi-step prediction accuracy of this model to be improved (Hadid *et al.* 2020). Distributed hydrological models combine physical factors such as rainfall and subsurface to design reservoir flood forecasting schemes (Li *et al.* 2020), however, the applicability of distributed hydrological models has been constrained by the construction of a large number of small- and medium-sized reservoirs (Pham *et al.* 2020). Data-driven models, which are frequently used in hydrologic forecasting, do not require knowledge of the underlying physical processes, in contrast to mechanism-driven models, and have the ability to capture nonlinear correlations between input and output data. For instance, using artificial neural networks to forecast runoff from the Winooski River in western Vermont, USA, has resulted in more precise forecasts (Besaw *et al.* 2010). The Little River in Georgia and Reynolds Creek in Idaho were used to simulate runoff, and the results demonstrated that the model could more accurately quantify the uncertainty in runoff simulations (Zhang *et al.* 2009). In comparison with conventional neural networks, the broad learning structure (BL) has greater feature-learning power thanks to its enlarged nodes. Southeast China's precipitation was predicted using a broad learning assimilation model, and the findings indicated that the BL-based framework could produce more precise daily precipitation forecasts (Zhou *et al.* 2022). While a hybrid model created by breaking down the runoff data into relatively smooth components and combining signal-processing techniques can achieve higher prediction accuracy, a single model's ability to capture nonlinear trends and high-frequency abrupt change points of the runoff signal is frequently constrained. For instance, a multi-scale flood forecasting model was developed using wavelet analysis combined with fuzzy weighted Markov. The results showed that the model exhibited high overall prediction accuracy (J. Zhang *et al.* 2021). Unlike wavelet decomposition, empirical mode decomposition (EMD) can effectively handle nonlinear and nonsmooth signals (Xie *et al.* 2019). A recent study applied empirical mode decomposition (EMD) and LSTM to long-term flood forecasting, and the experimental results showed that the EMD-based LSTM model had a better fit (Liu *et al.* 2020). Using variational mode decomposition (VMD) in combination with LSTM for short-term flood forecasting and evaluation, the results showed that the VMD-LSTM has better prediction accuracy, computational efficiency and stability (Zuo *et al.* 2020). Using CEEMDAN combined with gate recurrent unit (GRU) for water level prediction application, the results showed that CEEMDAN-GRU has stronger prediction performance (Tao *et al.* 2021). However, some issues remain in current research methods: in wavelet analysis it is frequently difficult to determine the wavelet basis, and the choice of wavelet basis has a significant impact on the results (Wang *et al.* 2020). Endpoint effects and mode-mixing issues plague EMD methods (Shan *et al.* 2022), and VMD requires pre-setting two parameters (Lv *et al.* 2022), bandwidth limit and number of decompositions, whereas CEEMDAN incorporates white noise. When compared with EMD, the CEEMDAN adds white noise to the signal, which effectively solves the mode-mixing problem; when compared with VMD, it does not require artificial parameter setting and has a strong adaptive capability.

With additional enhancement-layer nodes, broad learning (BL) architecture, a revolutionary non-parametric machine learning technique, offers higher feature-learning capabilities and is less influenced by parameters than conventional neural networks. BL can calculate the results of the output layer directly and does not require repeated iterations, which has apparent efficiency advantages over deep learning models like LSTM and GRU. The paper builds a new and improved CEEMDAN-QRBL flood forecasting model by suppressing the endpoint effect that exists in CEEMDAN, reducing the reconstruction error of CEEMDAN, and improving the BL model using orthogonal triangular decomposition (QR) to speed up the BL solution. These improvements help to further increase the accuracy and operational efficiency of flood forecasting. Because the prediction model has intrinsic uncertainty, the research also employed nonparametric kernel density estimation (NPKDE) to quantify the uncertainty of the prediction model (Li *et al.* 2017). This can help to improve the flood prediction model's real-time performance and dependability.

## RESEARCH METHODS

### CEEMDAN

*et al.*(1998), and this method overcomes the problem that the basic function of the wavelet method cannot be adapted. EEMD introduces Gaussian white noise based on EMD, which effectively solves the problem of mode mixing in the process of EMD decomposition. However, because the introduced white noise cannot be eliminated, CEEMDAN was proposed. CEEMDAN realises the adaptive addition of noise according to signal characteristics, effectively reducing the noise residue. CEEMDAN can be described as (Mousavi

*et al.*2020):where

*x*(*t*): runoff observations series (m^{3}·s^{−1}),: variance of the noise,

*n*(*t*): Gaussian white noise,*k*: number of iterations.

*IMF*

_{1}, after which we calculated the average value, which is defined as (X. Zhang

*et al.*2021):where

: average value of

*IMF*_{1}(*n*).

*k*modal components

*IMF*

_{1}–

*IMF*

_{k}are obtained, and the result of the can be defined as (Fan

*et al.*2021):where

*R*: residual signal._{n}

*et al.*2019):

### QRBL

*N*different training samples , are multiplied by a set of random weights, and the random deviations are added to the enhancement layer. The weights remain unchanged in the subsequent process. After the data are activated, the matrix

*H*is obtained. Finally, the original input matrix

*X*of the input layer and the output matrix

*H*of the enhancement layer are merged. Assuming that a BL network has

*N*input neurons and

*M*enhancement-layer neurons, the definition of the mathematical model is expressed as (Ali

*et al.*2022):where

*X*: original input matrix,*H*: output matrix of the enhancement layer,: the weight of the output layer,

: training set.

*et al.*2020):where

: sigmoid activation function,

*w*: enhancement-layer weight,*b*: enhancement-layer bias,*h*(*x*): row vector of the matrix .

*H*of the enhancement layer is obtained, and it is transformed into the least square solution to solve (5), which is defined as follows (Xu

*et al.*2018):where

[

*X*|*H*]^{+}: the Moore–Penrose generalised inverse of matrix .

*m×n*order matrix

*H*, the following SVD decomposition can be performed (Zuo

*et al.*2022):where

*U*and*V*: orthogonal unitary matrix,*Σ*: diagonal matrix.

*et al.*2021):where

*A*: objective matrix to be decomposed,: orthogonal matrix,

: upper triangular matrix.

and : a reversible matrix.

*et al.*2021):

### NPKDE and confidence interval

*et al.*2020):where

: mean,

*σ*: standard deviation,*g*(*x*): Gaussian kernel function.

: sample size,

: bandwidth coefficient.

*e*: flood prediction error,_{pre}: predicted value,_{true}: flood observation value.

*e*is expressed as (Latif & Mustafa 2020):where

*e*_{low}: lower confidence limit,*e*_{up:}upper confidence limit,*θ*: confidence level of the flood prediction error,

*e*in the interval . The confidence interval for runoff prediction is .

## FLOOD PREDICTION AND UNCERTAINTY ANALYSIS

### Boundary extension method based on mutual information criterion

- (1)
In Figure 2, we assume that an original signal comprises

*m*maximum points and*p*minimum points . The first extreme point of the left boundary is taken as an example; then, the wavelet between the left endpoint and the second extreme point is selected as the research object, which is defined as . - (2)
is taken as the corresponding point to the position of , and the wavelet of the same length as is intercepted to ensure that the timing position of in relative to is consistent with the timing position of relative to in .

- (3)Then, the mutual information value between and is calculated and used as the matching coefficient of each sub-waveform with . The wavelet with the largest matching coefficient is considered as the best matching waveform of , and the data of the length are extended before the wavelet to the left of . The mutual information is defined as (Tang
*et al.*2013):where *I*(_{i}*W*_{0},*W*): mutual information ,_{i}*H*(*W*): entropy of ,_{i}*H*(*W*|_{i}*W*_{0}): conditional entropy of when is known.

The stronger the correlation between and , the smaller the conditional entropy and the larger the mutual information .

- (4)
The same principle is used to extend the right boundary of the original signal.

### Flood forecast and uncertainty analysis

- (1)
By using the daily runoff data collected from the Xiaolangdi Reservoir in the middle section of the Yellow River Basin from 2002 to 2019 as the research object, we predicted the future runoff of Xiaolangdi. The combined use of Sanmenxia Reservoir, Luhun Reservoir, Guxian Reservoir, and Xiaolangdi Reservoir in the middle section of the Yellow River Basin and the use of Dongping Lake for flood diversion can raise the flood control standard of the lower Yellow River to the level required to handle storms likely to occur only once in a thousand years, and it can thus essentially eliminate the threat of flooding in the lower Yellow River. In this study, we first performed pre-processing operations, such as filtering and interpolation, on runoff data, after which we normalised the filtered data.

- (2)
After normalising the runoff sequence, we decomposed it using the CEEMDAN model to obtain several eigenmode components (

*IMF*_{1}–*IMF*_{n}) to complete the steady state of the unsteady sequence. We then used several adjacent data of*IMF*_{1}–*IMF*_{n}as input vectors and several connected data as output vectors. - (3)
The model extracted the temporal characteristics of the original signal using CEEMDAN and predicted the time series using the QRBL model. To obtain the runoff trend, the divided test set and training set were fed into the QRBL model, which predicted the runoff for the next two, five, ten, 20, 25, and 30 days. The sample size, QRBL input feature, and model input time-step were fixed at 128, 11, and 9, and the hidden and output layers used the sigmoid activation function. The PSO optimisation algorithm was used to optimise QRBL using the training error of QRBL as the fitness function, and the number of QRBL neurons was determined to be 20, where the number of set populations was 30, the maximum number of iterations was 50, and the range of set neurons was [1, 64].

- The particle population consists of
*n*particles in*d*-dimensional space, and the position of the*i*^{th}particle is denoted as , ; the velocity of the*i*^{th}particle is ; the optimal solution searched by the*i*^{th}particle is noted as: ; the global optimal solution for the whole population is:; and the particle velocity and position updates are given by (Marini & Walczak 2015):where *t:*number of iterations,*r*_{1}and*r*_{2:}uniformly distributed random numbers,*c*_{1}and*c*_{2:}acceleration constants.- (4)We used the Nash index (
*E*_{ns}), Pearson correlation coefficient (*R*), mean absolute error (MAE), and relative error (RE) as criteria for evaluating the credibility and accuracy of the CEEMDAN-QRBL model. When the value range of*E*_{ns}is and when*E*_{ns}is close to 1, the model's credibility is high. When*E*_{ns}is close to , the model's credibility is low. The correlation coefficient*R*is a statistical indicator that is used to reflect the closeness of the correlation between variables. RE and MAE were used to evaluate the real-time error and overall error of the model, respectively. The formulas are defined as (Anaraki*et al.*2021):where : average observed values,

: observed values,

: predicted values,

: average predicted values.

- (5)We used NPKDE to quantitatively analyse the uncertainty of the CEEMDAN-QRBL model on different time-scales. NPKDE calculates the probability density function, and then establishes the confidence interval to obtain the upper and lower limits of the runoff prediction interval. The evaluation criteria for interval prediction include coverage and interval width. The interval width is an indicator of forecasting effectiveness. Under the premise of ensuring coverage, the smaller the interval width, the better the forecasting effect. The definition of is expressed as (He
*et al.*2021):where : difference between the upper and lower limits of the confidence interval where the runoff prediction value of the

*i*th step is located.- (6)The flood control criteria for reservoirs include the maximum peak cut criterion, minimum flood disaster criterion, and shortest disaster criterion. In this study, we adopt the maximum peak cut criterion as the flood control criterion for the reservoir. The maximum peak-shaving criterion uses the effect of reservoir flood storage to reduce the peak discharge of floods entering the reservoir and to meet the downstream flood control requirements. The peak-shaving effect is typically measured using the peak-shaving rate, and is defined as (Xin
*et al.*2020):where *f*_{no-interval}: flood condition without intervals,*f*_{interval}: flood condition with intervals,: interval peak discharge of the reservoir during the period ,

*q*: peak discharge of the reservoir during the period ._{t}

## RESULTS AND ANALYSIS

*E*

_{ns}, MAE, RE, and

*R*to evaluate the indicators and validate the overall performance of the proposed model (CEEMDAN-QRBL). Figure 4 depicts the RE levels of various models at different time-scales. Figures 4(a)–4(f) show the maximum, minimum, median, upper, and lower quartiles of prediction errors for the various prediction models over various forecast periods. CEEMDAN-QRBL had the lowest error level and the best accuracy across all forecast periods.

*R*value of the CEEMDAN-GRU model is 16.46498, the

*R*value of the CEEMDAN-QRBL model is 15.45331, and when the prediction period is 30 days, the

*R*value of the EMD-LSTM is 15.33817, the

*R*value of the CEEMDAN-GRU model is 15.52629, the

*R*value of the CEEMDAN-QRBL model is 14.24100, and the

*R*value of the CEEMDAN-QRBL model does not reach the optimal level. When compared with other models in other prediction periods, the CEEMDAN-QRBL model has the best evaluation index results in each prediction period. In comparison with the EMD-LSTM model, the MAE of the improved CEEMDAN-QRBL model is reduced by 12.36% and

*E*

_{ns}is improved by 8.81%, suggesting that CEEMDAN can aid in the model's ability to make predictions with greater accuracy when EMD mode-mixing is not present. The use of the mutual information criterion significantly reduces the endpoint effect of CEEMDAN, lowers the reconstruction error of CEEMDAN, and further improves the prediction accuracy and credibility of the model. Compared with the CEEMDAN-GRU model, the MAE of the improved CEEMDAN-QRBL model is reduced by 16.31%, and

*E*

_{ns}is improved by 3.96%.

. | Models . | Next 2 days . | Next 5 days . | Next 10 days . | Next 20 days . | Next 25 days . | Next 30 days . |
---|---|---|---|---|---|---|---|

E_{ns} | CEEMDAN-QRBL | 0.92995 | 0.89613 | 0.83446 | 0.68627 | 0.67293 | 0.64185 |

CEEMDAN-GRU | 0.93233 | 0.88474 | 0.80308 | 0.67626 | 0.58678 | 0.60085 | |

EMD-LSTM | 0.91792 | 0.81953 | 0.75980 | 0.63578 | 0.59019 | 0.56071 | |

GRU | 0.77714 | 0.58919 | 0.36316 | 0.17778 | 0.17150 | 0.14938 | |

CNN-LSTM | 0.75308 | 0.55925 | 0.29823 | 0.10570 | 0.08489 | 0.04680 | |

ELM | 0.75757 | 0.52179 | 0.28338 | 0.15724 | 0.15368 | 0.13787 | |

MAE | CEEMDAN-QRBL | 148.78073 | 184.99992 | 225.25434 | 288.91577 | 247.09619 | 264.00577 |

CEEMDAN-GRU | 152.21632 | 193.29254 | 251.24709 | 311.77681 | 357.34991 | 358.06181 | |

EMD-LSTM | 149.37752 | 201.61919 | 239.57221 | 302.44978 | 320.76064 | 336.93696 | |

GRU | 228.35998 | 302.40872 | 386.01204 | 302.27619 | 445.81442 | 460.83995 | |

CNN-LSTM | 234.54698 | 319.59541 | 411.02853 | 480.96564 | 494.52684 | 516.92310 | |

ELM | 235.23004 | 311.64285 | 380.87185 | 434.17141 | 443.32723 | 451.64001 | |

R | CEEMDAN-QRBL | 15.14280 | 15.49406 | 17.31916 | 15.45331 | 14.39027 | 14.24100 |

CEEMDAN-GRU | 14.95629 | 15.38689 | 16.51420 | 16.46498 | 13.57475 | 15.52629 | |

EMD-LSTM | 12.93059 | 13.18191 | 14.08978 | 13.83061 | 14.16406 | 15.33817 | |

GRU | 12.85936 | 12.67823 | 12.81807 | 13.09732 | 13.07744 | 12.91972 | |

CNN-LSTM | 12.60618 | 12.01063 | 11.43171 | 11.97262 | 12.08955 | 11.62218 | |

ELM | 12.37492 | 11.89630 | 12.42819 | 11.69514 | 13.46951 | 12.84219 |

. | Models . | Next 2 days . | Next 5 days . | Next 10 days . | Next 20 days . | Next 25 days . | Next 30 days . |
---|---|---|---|---|---|---|---|

E_{ns} | CEEMDAN-QRBL | 0.92995 | 0.89613 | 0.83446 | 0.68627 | 0.67293 | 0.64185 |

CEEMDAN-GRU | 0.93233 | 0.88474 | 0.80308 | 0.67626 | 0.58678 | 0.60085 | |

EMD-LSTM | 0.91792 | 0.81953 | 0.75980 | 0.63578 | 0.59019 | 0.56071 | |

GRU | 0.77714 | 0.58919 | 0.36316 | 0.17778 | 0.17150 | 0.14938 | |

CNN-LSTM | 0.75308 | 0.55925 | 0.29823 | 0.10570 | 0.08489 | 0.04680 | |

ELM | 0.75757 | 0.52179 | 0.28338 | 0.15724 | 0.15368 | 0.13787 | |

MAE | CEEMDAN-QRBL | 148.78073 | 184.99992 | 225.25434 | 288.91577 | 247.09619 | 264.00577 |

CEEMDAN-GRU | 152.21632 | 193.29254 | 251.24709 | 311.77681 | 357.34991 | 358.06181 | |

EMD-LSTM | 149.37752 | 201.61919 | 239.57221 | 302.44978 | 320.76064 | 336.93696 | |

GRU | 228.35998 | 302.40872 | 386.01204 | 302.27619 | 445.81442 | 460.83995 | |

CNN-LSTM | 234.54698 | 319.59541 | 411.02853 | 480.96564 | 494.52684 | 516.92310 | |

ELM | 235.23004 | 311.64285 | 380.87185 | 434.17141 | 443.32723 | 451.64001 | |

R | CEEMDAN-QRBL | 15.14280 | 15.49406 | 17.31916 | 15.45331 | 14.39027 | 14.24100 |

CEEMDAN-GRU | 14.95629 | 15.38689 | 16.51420 | 16.46498 | 13.57475 | 15.52629 | |

EMD-LSTM | 12.93059 | 13.18191 | 14.08978 | 13.83061 | 14.16406 | 15.33817 | |

GRU | 12.85936 | 12.67823 | 12.81807 | 13.09732 | 13.07744 | 12.91972 | |

CNN-LSTM | 12.60618 | 12.01063 | 11.43171 | 11.97262 | 12.08955 | 11.62218 | |

ELM | 12.37492 | 11.89630 | 12.42819 | 11.69514 | 13.46951 | 12.84219 |

Table 2 shows the reconstruction error of IMF in different scenarios and the running time between different models. Based on Table 2, we reduced the IMF reconstruction error by 48.22% by using mutual information to extend the boundary; compared with the BL model, we increased the calculation efficiency of QRBL by 28.92%. These enhancements enable the CEEMDAN-QRBL model to predict with greater accuracy and confidence.

Evaluation index . | Schemes . | Calculation result . |
---|---|---|

IMF reconstruction error | Endpoint extreme value unknown | 17.238152 |

Mutual information extension | 11.630431 | |

Run time (s) | BL | 0.832979 |

QRBL | 0.591433 | |

Serial CEEMDAN-QRBL | 23.026205 | |

Parallel CEEMDAN-QRBL | 0.075796 |

Evaluation index . | Schemes . | Calculation result . |
---|---|---|

IMF reconstruction error | Endpoint extreme value unknown | 17.238152 |

Mutual information extension | 11.630431 | |

Run time (s) | BL | 0.832979 |

QRBL | 0.591433 | |

Serial CEEMDAN-QRBL | 23.026205 | |

Parallel CEEMDAN-QRBL | 0.075796 |

## CONCLUSION

- (1)
To enhance the performance of a single model, CEEMDAN divided the original, highly random sequence into a number of smoother component sequences. By using orthogonal triangular decomposition, the BL model was enhanced, resulting in a 28.92% increase in computational efficiency and a faster solution of the BL output layer. The computational efficiency of the CEEMDAN-QRBL model was increased by these improvements.

- (2)
The current study generally falls short in terms of improving the EMD-like methodology. The hybrid prediction accuracy of the model can be further increased by suppressing the endpoint effect of CEEMDAN, and the error is decreased by 48.22% when using the improved CEEMDAN with the mutual information criterion. The MAE of the improved CEEMDAN-QRBL model is reduced by 12.36% and 16.31% when compared with EMD-LSTM and CEEMDAN-GRU, and

*E*_{ns}is improved by 8.81% and 3.96%, respectively, with higher prediction accuracy and credibility. - (3)
A reasonable range of predicted fluctuations of the model was provided by the non-parametric kernel density estimation calculations, which can be used as a guide for management choices. It should be noted that the article analyses and forecasts runoff using time series data without taking sediment, rainfall, or weather-related factors into account. The developed model performs well for short-term predictions, but the

*R*-values for longer predictions need to be increased. Medium- and long-term predictions have limitations as well, so more effective techniques should be investigated for in-depth research.

## AUTHOR CONTRIBUTIONS

**Conceptualisation:** Yang Liu, Shuai Bing Du, Li Hu Wang.

**Data curation:** Yang Liu, Li Hu Wang.

**Formal analysis:** Yang Liu.

**Funding acquisition:** Yang Liu.

**Methodology:** Shuai Bing Du, Li Hu Wang.

**Writing – origin draft:** Yang Liu, Shuai Bing Du, Li Hu Wang.

**Writing – review & editing:** Shuai Bing Du, Li Hu Wang.

## ACKNOWLEDGEMENTS

The National Key Research and Development Project under Grant Strategic Research Projects in Key Area 16 and the Water Conservancy Science and Technology Research Project in Henan Province supported this work (Grant GG202042).

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.