## Abstract

Accurate runoff prediction is vital in efficiently managing water resources. In this paper, a hybrid prediction model combining complete ensemble empirical mode decomposition with adaptive noise, variational mode decomposition, CABES, and long short-term memory network (CEEMDAN-VMD-CABES-LSTM) is proposed. Firstly, CEEMDAN is used to decompose the original data, and the high-frequency component is decomposed using VMD. Then, each component is input into the LSTM optimized by CABES for prediction. Finally, the results of individual component predictions are combined and reconstructed to produce the monthly runoff predictions. The hybrid model is employed to predict the monthly runoff at the Xiajiang hydrological station and the Yingluoxia hydrological station. A comprehensive comparison is conducted with other models including back propagation (BP), LSTM, etc. The assessment of each model's prediction performance uses four evaluation indexes. Results reveal that the CEEMDAN-VMD-CABES-LSTM model showcased the highest forecast accuracy among all the models evaluated. Compared with the single LSTM, the root mean square error (RMSE) and mean absolute percentage error (MAPE) of the Xiajiang hydrological station decreased by 71.09 and 65.26%, respectively, and the RMSE and MAPE of the Yingluoxia hydrological station decreased by 65.13 and 40.42%, respectively. The R and NSEC of both sites are near 1.

## HIGHLIGHTS

A novel model (CEEMDAN-VMD-CABES-LSTM) is proposed for monthly runoff prediction.

CEEMDAN-VMD effectively reduces the complexity of the original monthly runoff series.

CABES enhances the generalization ability and prediction performance of the LSTM.

The four evaluation indicators and seven benchmark models are employed to verify the superiority of the developed model.

## INTRODUCTION

Effective water resource management and prediction are crucial for sustainable social and economic development, given the impact of climate change and human activities (Barati *et al.* 2014). As an important task in hydrology, monthly runoff prediction is widely used in agriculture, urban planning, hydropower station operation, and other fields. However, the monthly runoff data usually present complex nonlinear, nonstationary, and multi-time scale characteristics, which brings great challenges to researchers. As a result, hydrological experts have demonstrated considerable interest in creating accurate and reliable models for predicting runoff, aiming to improve the precision of runoff series predictions.

Currently, there are primarily two types of runoff prediction models: process-driven models and data-driven models (Radfar & Rockaway Thomas 2016; Ruiming 2018; Najafzadeh & Anvari 2023). Based on hydrology, the process-driven model describes hydrological processes in the basin by establishing physical or mathematical equations, such as precipitation, evaporation, infiltration, flow, etc., and uses these equations to simulate and predict runoff (Atashi *et al.* 2023). However, the process-driven model needs more input data and parameters, which leads to high modeling complexity and high requirements for model parameter determination and calibration. In addition, the data-driven model relies on historical observation data and statistical analysis techniques to construct the prediction model instead of relying on physical processes or traditional hydrological equations. By examining historical data from a specific watershed, including variables such as rainfall, runoff, and other relevant factors, and identifying patterns and models within these data, it is possible to generate reliable predictions for hydrological forecasting in data-scarce regions. This approach entails analyzing the relationships between variables and applying them to runoff prediction, thereby facilitating accurate predictions even in areas with limited data availability. Therefore, the data-driven model has been used by more and more hydrological workers for many years. A lot of research on runoff forecasting has utilized data-driven models, such as support vector machine (SVM) (Huang *et al.* 2016; Mustafa *et al.* 2022; Mo *et al.* 2023; Xu *et al.* 2023), autoregressive integral moving average model (ARIMA) (Hosseini *et al.* 2016; Yan *et al.* 2022; Wang *et al.* 2023a), extreme learning machine (ELM) (Yuan *et al.* 2020; Yan *et al.* 2023), artificial neural network (ANN) (Ikram *et al.* 2022; Wang *et al.* 2015), and long short-term memory neural network (LSTM) (Fang *et al.* 2021; Peng *et al.* 2022; Yao *et al.* 2023).

Data-driven models are widely used in all areas of life due to the rapid development of computer technology (Araghinejad *et al.* 2018; Onalo *et al.* 2018; Kumar *et al.* 2022; Rahbar *et al.* 2022; Xu *et al.* 2022). ANN, SVM, ELM, and other models are widely used in the field of runoff prediction, but they all have their shortcomings and shortcomings (Sohail *et al.* 2008; Tongle *et al.* 2016; Alizadeh *et al.* 2017; Zhang *et al.* 2018a; Liu *et al.* 2019; Wang *et al.* 2020; Chen *et al.* 2023). The BP model requires many iterations and calculations, making it prone to getting stuck in a local optimal solution instead of reaching the global optimal solution. The SVM model's performance heavily relies on kernel function selection and parameter adjustment, and improper selection can easily lead to poor prediction results. The hidden layer's weights and biases in the Elm model are generated randomly. Different random initializations may lead to different model performances, and the stability of the model is relatively low. LSTM has the advantages of strong processing ability of time series data, strong multi-input data fusion ability, long-term memory ability, and adaptive learning ability, and has been extensively utilized across multiple research disciplines. Khosravi *et al.* (2023) used LSTM to predict the sensitivity of soil erosion in the Haraz basin in northern Iran. The results showed that LSTM had good prediction performance, which reflected the characteristics of long-term dependence in LSTM capture data. Zhang *et al.* (2018b) used LSTM to predict the depth of groundwater level, providing an effective method for areas where it is difficult to obtain hydrogeological data. Qiu *et al.* (2021) used LSTM to predict river water temperature, which provided a powerful tool for ecological management and river water temperature prediction. Chen *et al.* (2020) used LSTM to predict the daily reference evapotranspiration of the Northeast China Plain. The outcomes revealed that LSTM exhibited excellent predictive capabilities, demonstrating strong performance both within the study area and in external locations. Over the past few years, hydrologists have increasingly adopted LSTM models for runoff prediction, leading to numerous significant findings in the field. Consequently, the present study employs the LSTM model to construct a monthly runoff prediction model.

During practical implementation, the random selection of super parameters in the LSTM model often results in difficulties in achieving satisfactory prediction accuracy, as the model tends to get trapped in local optimal solutions (Wang *et al.* 2023e). As a result, researchers are now combining optimization algorithms with individual prediction models to enhance overall predictive performance by optimizing model parameters (Wang *et al.* 2023c, 2023f). This approach aims to address the limitations of the LSTM model and improve the accuracy of predictions. Zhang *et al.* (2022a) combined SSA with LSTM to predict dam deformation. Based on the results, it was found that the optimized LSTM significantly improved the accuracy of predictions. Wang *et al.* (2022) used bald eagle search (BES) to optimize the structural parameters of Least Squares Support Vector Machine (LSSVM) to predict the survival risk of Esophageal squamous cell carcinoma (ESCC) patients and achieved good prediction results. Currently, no algorithm can solve all problems due to the theorem of no free lunch. Given the limitations of BES in practical optimization problems, Wang *et al.* (2023d) proposed an improved bald eagle search algorithm with Cauchy mutation and adaptive weight factor (CABES). Compared with the original BES, CABES integrates Cauchy mutation and adaptive optimization, which improves the optimization ability of BES, good results have been obtained in practical engineering problems. Consequently, this study utilizes CABES to optimize the hyperparameters of LSTM to improve the predictive ability of the model.

Currently, because of the original runoff series' non-equilibrium and nonlinear nature, hydrologists are progressively adopting data preprocessing techniques to enhance the predictive performance of models. Zhang *et al.* (2022b) combined empirical mode decomposition (EMD) and LSTM to predict groundwater depth. Compared with a single LSTM, the prediction accuracy was significantly improved. Despite the application of EMD, mode mixing issues persist in the decomposition process. To address this concern, Torres *et al.* (2011) enhancements to EMD by incorporating the concept of introducing white noise to the ensemble empirical mode decomposition (EEMD) technique, thereby mitigating mode aliasing. As a result, they proposed the method known as complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN). Guo *et al.* (2022) integrated CEEMDAN with LSTM to forecast the annual precipitation in Zhengzhou. The findings indicated that by employing CEEMDAN, the various fluctuation patterns in the precipitation sequence were effectively isolated. This approach successfully addressed the issues of modal ambiguity and residual noise encountered during the decomposition process of the reconstructed sequence, ultimately leading to a reduction in reconstruction errors. However, the initial Intrinsic Mode Functions (IMF) obtained by decomposing the original time series through CEEMDAN displays high frequency and complexity, which can impact the prediction of the overall model trend. Zuo *et al.* (2020) used variational mode decomposition (VMD) combined with LSTM to predict daily runoff. The findings demonstrated that VMD exhibits excellent robustness to noise and effectively mitigates mode mixing with noise. However, one limitation of VMD is the need to determine appropriate parameters to achieve the desired decomposition outcome. This implies that careful parameter selection is necessary to achieve satisfactory results when using VMD for decomposition purposes. In order to address the limitations of the decomposition technology mentioned above, this paper uses CEEMDAN-VMD to preprocess the runoff series. CEEMDAN-VMD is based on CEEMDAN decomposition. VMD is used to decompose the most complex subsequence obtained after CEEMDAN decomposition, which further improves the accuracy of decomposition, more comprehensively extracts the characteristic information in runoff series, and enhances the overall accuracy of predictions. Yang *et al.* (2023a) applied the combination of CEEMDAN and VMD to practical engineering, and the results showed that the model's prediction accuracy enhanced after VMD secondary decomposition of CEEMDAN's high-frequency subsequence.

By coupling CEEMDAN, VMD, CABES, and LSTM, this paper presents a novel CEEMDAN-VMD-CABES-LSTM monthly runoff prediction model, and then compares the model with seven benchmark models and presents a new approach to monthly runoff prediction.

The study presents several key contributions, which can be summarized as follows:

- (1)
By performing secondary decomposition of the high-frequency components of CEEMDAN through VMD, it is possible to more accurately capture the essential features of runoff series such as trends and periods, reducing the difficulty of model prediction and compensating for the limitations of primary decomposition.

- (2)
CABES is used for optimization to maximize the model's performance and improve the prediction accuracy of LSTM in response to the difficulties in selecting LSTM hyperparameters and the lack of convergence protection.

- (3)
The four evaluation indicators and seven benchmark models are employed to verify the superiority of the developed model at the Xiajiang hydrological station and the Yingluoxia hydrological station.

The rest of this study is structured as follows: Section 2 presents an overview of the methodologies employed in this study, including CEEMDAN, VMD, BES and CABES, LSTM, and the proposed CEEMDAN-VMD-CABES-LSTM model. Additionally, this section presents the evaluation indicators that are utilized to evaluate the performance of the model. Section 3 provides information concerning the study area and the dataset utilized in this research. Section 4 offers comprehensive details regarding the input and decomposition processes employed by the models. Section 5 consists of a comparative analysis and discussion of the prediction results obtained from various models. The conclusions drawn from this study are summarized in Section 6.

## METHODOLOGY AND EVALUATION INDICATORS

### CEEMDAN

CEEMDAN is a new noise-aided data analysis method proposed by Torres *et al.* (2011). It not only makes up for the modal confusion and residual noise in the reconstruction sequence of EMD and EEMD but also improves the computational efficiency. The detailed procedure is outlined as follows:

- (1)
- (2)
The first eigenmode component of CEEMDAN decomposition is obtained by the overall average of the N-generated modal components.

- (3)
- (4)A fresh signal is generated by incorporating paired Gaussian white noise, both positive and negative, into the original signal . The initial modal component is achieved through EMD using the new signal as the carrier. As a result, the second eigenmodal component of CEEMDAN decomposition can be obtained:
- (5)
- (6)

### 2.2. VMD

*k*eigenmode state functions with constrained bandwidth, and subsequently minimize the cumulative bandwidth estimates across all modes, to obtain the corresponding modal component signal and related parameters. The VMD variational constraint model is as follows:where and correspond to the decomposed IMF modes and their corresponding center frequencies, respectively. is the Dirac function,

*k*is the number of modes,

*f*is the original signal, is the convolution operator, and is the gradient operation. is the spectrum after the Hilbert transform, which is multiplied by the exponential term to adjust the estimated values of , and then the spectrum of modes is integrated into the basic frequency band. If the optimal solution to the above problem is required, the constrained variational problem needs to be transformed into an unconstrained variational problem for a solution. Therefore, the quadratic penalty factor and the Lagrange multiplication operator are introduced to obtain the following expression of the unconstrained variational problem:

VMD decomposition steps are as follows:

Then stop the iteration, otherwise return to (2).

In the above equation, , , and are the Fourier transforms corresponding to , , and , respectively.

### 2.3. BES

BES is a metaheuristic algorithm introduced by Alsattar *et al.* (2020), drawing inspiration from the search and hunting patterns of eagles. The algorithm comprises three main stages: selecting search space, searching space prey, and diving to capture prey. During the optimization process, each eagle represents a feasible solution to the problem, and the optimal solution is determined by iterating and updating the solution set at each stage until the iteration process concludes.

#### Select search space

*r*is a random number between [0,1]; is the average position of all eagles; and represents the current position of the

*i*-th eagle.

#### Search space prey stage

*a*is the control spiral flight angle parameter and the value range is [5,10];

*R*is the parameter of the number of control spiral flight turns, with a value range of [0.5,2]; is the polar angle of the spiral equation; is the polar diameter of the spiral equation; and are polar coordinates of eagles; the value range is [−1,1]; and represents the

*i-*th eagle's next update position.

#### Dive to catch prey

### CABES

To enhance the search capabilities and avoid the local optimization challenge in the BES algorithm, two strategies were introduced: the Cauchy mutation strategy in the spatial search stage and an adaptive weight factor in the prey search stage in our previous research (Wang *et al.* 2023d). The proposed an improved BES algorithm is called CABES. CABES incorporates the Cauchy mutation strategy and adaptive weight factor strategy, which enhance the algorithm's global search ability and local development ability. This balanced approach between exploration and exploitation expands the search space of the group, ultimately improving the convergence accuracy and speed of the BES algorithm.

#### Cauchy mutation strategy

#### Adaptive weight strategy

*it*is the number of iterations in progress.

#### The algorithm process of CABES

In summary, after adopting the above two improvement strategies, the solving steps of CABES are shown below.

Step 1: Algorithm parameter initialization: population *N*, total number of iterations *T*, search interval upper bound (*ub*) and lower bound (*lb*), and dimension *D*. Randomly initialize the population and compute the fitness values for every individual in the population, selecting the optimal fitness value and the optimal individual.

Step 2: In the selection stage, Equations (12) and (22) are used to generate new individual positions and positions after the Cauchy variation, respectively. The two are compared to select the optimal individual fitness value.

Step 3: In the search phase, use Equations (13) and (24) to generate new individual positions and positions after the adaptive weight strategy, compare the two, and select the optimal individual fitness value.

Step 4: During the capture phase, generate a new position based on Equation (17).

Step 5: Compare the optimal fitness values generated by each iteration and select the optimal individual for preservation.

Step 6: Determine if the maximum number of iterations has been reached. If not, calculate the optimal value and output it. If yes, skip to Step 2.

### LSTM

### Proposed model

*et al.*2004). The super parameters of LSTM were optimized by CABES to enhance the fitting and generalization capability of LSTM. Then, the optimized LSTM is used to predict the subsequences, respectively. Lastly, the outcomes of each subsequence's predictions are combined and reconstructed to produce the final prediction results. Figure 2 illustrates the flowchart of the mixed CEEMDAN-VMD-CABES-LSTM prediction model. The specific procedures are outlined as follows:

Step 1: CEEMDAN decomposition. CEEMDAN is utilized to decompose the raw runoff sequence into multiple subsequences, and subsequently, the sample entropy is computed for each of these subsequences.

Step 2: VMD secondary decomposition. The subsequence with the maximum sample entropy is decomposed twice to weaken the nonstationarity of the runoff series.

*i*-th sample of the original data sequence and the normalized data sequence, respectively. and are the maximum and minimum values in the sample, respectively.

Step 4: Model optimization. In this paper, the parameters of LSTM are optimized using CABES. Consequently, the CABES-LSTM model is constructed as a result.

Step 5: Model prediction. The normalized subsequences are fed into the CABES-LSTM model for prediction, yielding the predicted values for each subsequence. These predicted outcomes of each subsequence are then combined and reconstructed to obtain the final predicted result.

### Evaluation indexes

*R*) (Lee 2022; Min

*et al.*2023; Yang & Li 2023; Yang

*et al.*2023b; Zhang & Yan 2023). The calculation equation for each evaluation criterion is as follows:

## STUDY AREAS AND DATASET

Xiajiang hydrologic station and Yingluoxia hydrologic station are chosen as the research area in this study. The Xiajiang hydrological station is located in the middle reaches of the Ganjiang River, in Xiashan village, Luo'ao Town, Xiajiang County, Jiangxi Province, with a control basin area of 62,724 km^{2}. The Ganjiang River begins its journey in Shicheng County, located in Ganzhou City, Jiangxi Province. Spanning a total length of 766 km, it covers a vast drainage area of 83,500 km^{2}. The river has a natural descent of 937 m and experiences an average annual flow rate of 2,130 m^{3}/s. Figure 3 shows the watershed overview of the Xiajiang Hydrological Station.

The Yingluoxia hydrological station is located at the mountain pass of the Heihe River trunk, Longqu Township, Ganzhou District, Zhangye City, Gansu Province, with a catchment area of 10,009 km^{2}. It is a phased surface in the upper and middle reaches of the Heihe River. The river stretches for 956 km and exhibits an annual runoff volume of 1.68 billion m^{3}. Figure 4 shows the watershed overview of the Yingluoxia Hydrological Station.

## MODEL DEVELOPMENT

### Decomposition results

*m*is 2. The calculation results are listed in Table 1. It can be seen from Table 1 that IMF1 has the largest sample entropy, indicating that the sequence has the highest complexity, so VMD is used to decompose it twice to reduce its complexity. The value of the decomposition number

*k*in VMD has a significant impact on the decomposition effect. When the decomposition number is large, a mixing phenomenon will occur. When the number of decompositions is small, the information of the original signal is easy to be lost. Different

*K*values are calculated and their corresponding center frequencies are observed. Finally,

*K*is selected as 7 and other parameters are the default values. The subsequence IMFS of IMF1 after secondary decomposition by VMD is shown in Figures 8 and 9.

. | IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . | IMF7 . | IMF8 . | IMF9 . | Res . |
---|---|---|---|---|---|---|---|---|---|---|

Xiajiang | 2.04 | 1.09 | 1.49 | 0.83 | 0.64 | 0.56 | 0.31 | 0.16 | 0.03 | 0.001 |

Yingluoxia | 1.83 | 1.69 | 0.92 | 0.81 | 0.60 | 0.57 | 0.40 | 0.10 | —— | 0.02 |

. | IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . | IMF7 . | IMF8 . | IMF9 . | Res . |
---|---|---|---|---|---|---|---|---|---|---|

Xiajiang | 2.04 | 1.09 | 1.49 | 0.83 | 0.64 | 0.56 | 0.31 | 0.16 | 0.03 | 0.001 |

Yingluoxia | 1.83 | 1.69 | 0.92 | 0.81 | 0.60 | 0.57 | 0.40 | 0.10 | —— | 0.02 |

### The number of input variables

*et al.*2012) are commonly used statistical tools to analyze the autocorrelation and partial correlation of time series data. They can identify the correlation structure in the sequence and determine the appropriate input variables. Figures 10 and 11 show the ACF and PACF of the Xiajiang hydrological station and the Yingluoxia hydrological station, respectively. From Figure 10, it can be seen that the estimated ACF of Xiajiang reaches its peak at 12, while most of its estimated PACF values fall within the 95% confidence interval after a lag of 12. Therefore, the input variable is 12, which predicts the runoff for the next month based on the runoff data from the previous 12 months. Similarly, the input variable of Yingluoxia is 12, according to Figure 11.

### BP and LSTM model

Two individual prediction models have been devised, employing distinct configurations for each function of BP. Specifically, the tansig function is adopted as the activation function for the hidden layer, the trainlm function serves as the training function, and the purelin function is utilized for the output layer. The network undergoes 5,000 iterations with an expected error threshold of 0.01. A learning rate of 0.1 is employed, and MSE is employed as the performance function for the network. Choosing appropriate hyperparameters is beneficial for improving the predictive and generalization abilities of LSTM. An appropriate learning rate can ensure that the model converges to a good local optimal solution during the training process, which can effectively improve the training efficiency and performance of the model. The number of neuron nodes determines the capacity and complexity of the network. Choosing an appropriate number of neuron nodes can help balance the fitting and generalization abilities of the model, avoiding overfitting or underfitting problems. The regularization coefficient is used to control the complexity of the model, and an appropriate regularization coefficient can balance the complexity of the model and the degree of fit between the training data, effectively preventing the model from overfitting the training data. Therefore, the LSTM model in this article selects the learning rate, number of neuron nodes, and regularization coefficient as the hyperparameters of the model. The number of iterations, learning rate, and number of hidden layer nodes for LSTM are 0.005, 800, and 100, respectively.

### SSA-LSTM, BES-LSTM, and CABES-LSTM models

Three original runoff series models without decomposition algorithm are established: LSTM optimized by sparrow algorithm, LSTM optimized by original Condor, and LSTM optimized by improved Condor. SSA, BES, and CABES optimize the learning rate, regularization coefficient, and the number of hidden layer nodes of LSTM. The initial population of the three optimization algorithms is 10 and the number of iterations is 20.

### CEEMDAN-CABES-LSTM, VMD-CABES-LSTM, and CEEMDAN-VMD-CABES-LSTM models

The two original runoff series are processed by a decomposition algorithm and then predicted. Specifically, the original runoff series are processed using the CEEMDAN, VMD, and a combination of CEEMDAN and VMD algorithms for decomposition. Subsequently, the improved vulture-optimized LSTM model is employed for prediction.

## RESULTS AND DISCUSSION

*F*-test) (Saberi-Movahed

*et al.*2020), and the performance of the model was evaluated using four evaluation indicators:

*R*, NSEC, RMSE, and MAPE. The results related to the

*F*-test are shown in Table 2. The calculation outcomes of the evaluation indices for each model during both the training and testing periods are presented in Table 3. The comparison diagram of the prediction of each model of the Xiajiang hydrological station and the Yingluoxia hydrological station during the training period and the testing period is shown in Figures 12–15 and the comparison diagram of evaluation indexes is shown in Figures 15–22.

Sites . | Model . | SSR . | SSE . | MSR . | MSE . | F
. | P
. |
---|---|---|---|---|---|---|---|

Xiajiang | BP | 4.22 × 10^{6} | 3.55 × 10^{8} | 4.22 × 10^{6} | 1.30 × 10^{6} | 3.24 | 0.07 |

LSTM | 5.73 × 10^{4} | 3.10 × 10^{8} | 5.73 × 10^{4} | 1.14 × 10^{6} | 0.05 | 0.82 | |

SSA-LSTM | 1.67 × 10^{6} | 3.04 × 10^{8} | 1.67 × 10^{6} | 1.12 × 10^{6} | 0.34 | 0.56 | |

BES-LSTM | 3.16 × 10^{5} | 3.56 × 10^{8} | 3.16 × 10^{5} | 1.31 × 10^{6} | 1.28 | 0.26 | |

CABES-LSTM | 3.16 × 10^{5} | 3.24 × 10^{8} | 3.16 × 10^{5} | 1.19 × 10^{6} | 0.26 | 0.61 | |

CEEMDAN-CABES-LSTM | 3.28 × 10^{4} | 3.85 × 10^{8} | 3.28 × 10^{4} | 1.41 × 10^{6} | 0.02 | 0.88 | |

VMD-CABES-LSTM | 9.82 × 10^{3} | 3.46 × 10^{8} | 9.82 × 10^{3} | 1.27 × 10^{6} | 0.01 | 0.93 | |

CEEMDAN-VMD-CABES-LSTM | 57.92 | 4.04 × 10^{8} | 57.92 | 1.49 × 10^{6} | 0.00 | 1.00 | |

Yingluoxia | BP | 2.39 × 10^{3} | 4.84 × 10^{5} | 2.39 × 10^{3} | 1.92 × 10^{3} | 1.24 | 0.27 |

LSTM | 344.31 | 5.38 × 10^{5} | 344.31 | 2.13 × 10^{3} | 0.16 | 0.69 | |

SSA-LSTM | 1.14 × 10^{3} | 4.36 × 10^{5} | 1.14 × 10^{3} | 1.73 × 10^{3} | 0.66 | 0.42 | |

BES-LSTM | 9.76 | 4.87 × 10^{5} | 9.76 | 1.93 × 10^{3} | 0.01 | 0.94 | |

CABES-LSTM | 729.29 | 4.43 × 10^{5} | 729.29 | 1.76 × 10^{3} | 0.41 | 0.52 | |

CEEMDAN-CABES-LSTM | 0.14 | 4.59 × 10^{5} | 0.14 | 1.82 × 10^{3} | 0.01 | 0.99 | |

VMD-CABES-LSTM | 18.18 | 4.73 × 10^{5} | 18.18 | 1.88 × 10^{3} | 0.01 | 0.92 | |

CEEMDAN-VMD-CABES-LSTM | 58.80 | 4.89 × 10^{5} | 58.80 | 1.94 × 10^{3} | 0.03 | 0.86 |

Sites . | Model . | SSR . | SSE . | MSR . | MSE . | F
. | P
. |
---|---|---|---|---|---|---|---|

Xiajiang | BP | 4.22 × 10^{6} | 3.55 × 10^{8} | 4.22 × 10^{6} | 1.30 × 10^{6} | 3.24 | 0.07 |

LSTM | 5.73 × 10^{4} | 3.10 × 10^{8} | 5.73 × 10^{4} | 1.14 × 10^{6} | 0.05 | 0.82 | |

SSA-LSTM | 1.67 × 10^{6} | 3.04 × 10^{8} | 1.67 × 10^{6} | 1.12 × 10^{6} | 0.34 | 0.56 | |

BES-LSTM | 3.16 × 10^{5} | 3.56 × 10^{8} | 3.16 × 10^{5} | 1.31 × 10^{6} | 1.28 | 0.26 | |

CABES-LSTM | 3.16 × 10^{5} | 3.24 × 10^{8} | 3.16 × 10^{5} | 1.19 × 10^{6} | 0.26 | 0.61 | |

CEEMDAN-CABES-LSTM | 3.28 × 10^{4} | 3.85 × 10^{8} | 3.28 × 10^{4} | 1.41 × 10^{6} | 0.02 | 0.88 | |

VMD-CABES-LSTM | 9.82 × 10^{3} | 3.46 × 10^{8} | 9.82 × 10^{3} | 1.27 × 10^{6} | 0.01 | 0.93 | |

CEEMDAN-VMD-CABES-LSTM | 57.92 | 4.04 × 10^{8} | 57.92 | 1.49 × 10^{6} | 0.00 | 1.00 | |

Yingluoxia | BP | 2.39 × 10^{3} | 4.84 × 10^{5} | 2.39 × 10^{3} | 1.92 × 10^{3} | 1.24 | 0.27 |

LSTM | 344.31 | 5.38 × 10^{5} | 344.31 | 2.13 × 10^{3} | 0.16 | 0.69 | |

SSA-LSTM | 1.14 × 10^{3} | 4.36 × 10^{5} | 1.14 × 10^{3} | 1.73 × 10^{3} | 0.66 | 0.42 | |

BES-LSTM | 9.76 | 4.87 × 10^{5} | 9.76 | 1.93 × 10^{3} | 0.01 | 0.94 | |

CABES-LSTM | 729.29 | 4.43 × 10^{5} | 729.29 | 1.76 × 10^{3} | 0.41 | 0.52 | |

CEEMDAN-CABES-LSTM | 0.14 | 4.59 × 10^{5} | 0.14 | 1.82 × 10^{3} | 0.01 | 0.99 | |

VMD-CABES-LSTM | 18.18 | 4.73 × 10^{5} | 18.18 | 1.88 × 10^{3} | 0.01 | 0.92 | |

CEEMDAN-VMD-CABES-LSTM | 58.80 | 4.89 × 10^{5} | 58.80 | 1.94 × 10^{3} | 0.03 | 0.86 |

Sites . | Model . | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|---|

R
. | NSEC . | RMSE . | MAPE(%) . | R
. | NSEC . | RMSE . | MAPE(%) . | ||

Xiajiang | BP | 0.723 | 0.495 | 1,010.885 | 70.854 | 0.689 | 0.410 | 943.569 | 61.844 |

LSTM | 0.737 | 0.543 | 961.780 | 53.221 | 0.718 | 0.515 | 855.616 | 50.778 | |

SSA-LSTM | 0.753 | 0.573 | 931.568 | 51.715 | 0.758 | 0.566 | 809.044 | 47.210 | |

BES-LSTM | 0.802 | 0.641 | 851.596 | 46.802 | 0.778 | 0.614 | 740.280 | 45.390 | |

CABES-LSTM | 0.820 | 0.671 | 815.795 | 43.456 | 0.785 | 0.662 | 701.009 | 42.677 | |

CEEMDAN-CABES-LSTM | 0.922 | 0.850 | 550.990 | 37.079 | 0.938 | 0.866 | 450.527 | 27.688 | |

VMD-CABES-LSTM | 0.945 | 0.883 | 486.467 | 32.710 | 0.939 | 0.881 | 424.483 | 25.842 | |

CEEMDAN-VMD-CABES-LSTM | 0.980 | 0.960 | 284.797 | 22.639 | 0.980 | 0.959 | 247.336 | 17.638 | |

Yingluoxia | BP | 0.884 | 0.772 | 21.024 | 33.991 | 0.898 | 0.802 | 19.763 | 29.665 |

LSTM | 0.912 | 0.829 | 18.224 | 20.233 | 0.911 | 0.822 | 18.758 | 19.325 | |

SSA-LSTM | 0.919 | 0.831 | 18.632 | 25.409 | 0.915 | 0.835 | 18.461 | 19.527 | |

BES-LSTM | 0.925 | 0.856 | 16.727 | 18.470 | 0.915 | 0.831 | 18.277 | 19.001 | |

CABES-LSTM | 0.930 | 0.865 | 16.160 | 16.572 | 0.916 | 0.837 | 17.953 | 19.201 | |

CEEMDAN-CABES-LSTM | 0.967 | 0.935 | 11.242 | 23.785 | 0.969 | 0.935 | 11.310 | 21.979 | |

VMD-CABES-LSTM | 0.980 | 0.959 | 8.902 | 16.475 | 0.980 | 0.958 | 9.057 | 13.703 | |

CEEMDAN-VMD-CABES-LSTM | 0.989 | 0.977 | 6.659 | 14.504 | 0.989 | 0.978 | 6.540 | 11.514 |

Sites . | Model . | Training . | Testing . | ||||||
---|---|---|---|---|---|---|---|---|---|

R
. | NSEC . | RMSE . | MAPE(%) . | R
. | NSEC . | RMSE . | MAPE(%) . | ||

Xiajiang | BP | 0.723 | 0.495 | 1,010.885 | 70.854 | 0.689 | 0.410 | 943.569 | 61.844 |

LSTM | 0.737 | 0.543 | 961.780 | 53.221 | 0.718 | 0.515 | 855.616 | 50.778 | |

SSA-LSTM | 0.753 | 0.573 | 931.568 | 51.715 | 0.758 | 0.566 | 809.044 | 47.210 | |

BES-LSTM | 0.802 | 0.641 | 851.596 | 46.802 | 0.778 | 0.614 | 740.280 | 45.390 | |

CABES-LSTM | 0.820 | 0.671 | 815.795 | 43.456 | 0.785 | 0.662 | 701.009 | 42.677 | |

CEEMDAN-CABES-LSTM | 0.922 | 0.850 | 550.990 | 37.079 | 0.938 | 0.866 | 450.527 | 27.688 | |

VMD-CABES-LSTM | 0.945 | 0.883 | 486.467 | 32.710 | 0.939 | 0.881 | 424.483 | 25.842 | |

CEEMDAN-VMD-CABES-LSTM | 0.980 | 0.960 | 284.797 | 22.639 | 0.980 | 0.959 | 247.336 | 17.638 | |

Yingluoxia | BP | 0.884 | 0.772 | 21.024 | 33.991 | 0.898 | 0.802 | 19.763 | 29.665 |

LSTM | 0.912 | 0.829 | 18.224 | 20.233 | 0.911 | 0.822 | 18.758 | 19.325 | |

SSA-LSTM | 0.919 | 0.831 | 18.632 | 25.409 | 0.915 | 0.835 | 18.461 | 19.527 | |

BES-LSTM | 0.925 | 0.856 | 16.727 | 18.470 | 0.915 | 0.831 | 18.277 | 19.001 | |

CABES-LSTM | 0.930 | 0.865 | 16.160 | 16.572 | 0.916 | 0.837 | 17.953 | 19.201 | |

CEEMDAN-CABES-LSTM | 0.967 | 0.935 | 11.242 | 23.785 | 0.969 | 0.935 | 11.310 | 21.979 | |

VMD-CABES-LSTM | 0.980 | 0.959 | 8.902 | 16.475 | 0.980 | 0.958 | 9.057 | 13.703 | |

CEEMDAN-VMD-CABES-LSTM | 0.989 | 0.977 | 6.659 | 14.504 | 0.989 | 0.978 | 6.540 | 11.514 |

Given that the prediction results during the validation period provide a more accurate evaluation of the model's performance, it is essential to conduct a comprehensive analysis of the forecast outcomes during this period. According to the test results in Table 2, the *P* of all models is greater than 0.05, indicating that there is no significant difference among all models. Based on the observation depicted in Figures 13 and 15, it can be inferred that the accuracy of the individual prediction model is relatively low. Both BP and LSTM models can only demonstrate the overall trend of the runoff series, but there is a significant deviation in the predicted values. Among SSA-LSTM, BES-LSTM, and CABES-LSTM without data pretreatment, CABES-LSTM has a better prediction effect. The prediction accuracy of the CEEMDAN-VMD-CABES-LSTM model is higher than CEEMDAN-CABES-LSTM and VMD-CABES-LSTM. The CEEMDAN-VMD-CABES-LSTM model exhibits the most accurate fitting effect, as the predicted sequence trend aligns closely with the raw sequence trend. Furthermore, the prediction results closely align with the measured data.

The evaluation results for each model during the validation period are presented in Table 3, from which the following comprehensive analysis can be provided:

- (1)
In the case of monthly runoff prediction for the Xiajiang hydrological station and the Yingluoxia hydrological station, both BP and LSTM are single prediction models. The performance of LSTM surpasses that of BP in terms of prediction accuracy. For the Xiajiang hydrological station, the LSTM model shows significant improvements compared to the BP model. Specifically, the

*R*and NSEC values increased from 0.689 and 0.410 to 0.718 and 0.515, respectively. Additionally, the RMSE decreased by 9.32% and the MAPE decreased by 17.89%. In the case of the Yingluoxia,*R*and NSEC of the LSTM model increased, and RMSE and MAPE decreased by 0.215 and 8.179, respectively. In a word, the evaluation index results indicate that the LSTM model is superior to BP in capturing the characteristics of runoff and is more suitable for monthly runoff prediction. However, it should be noted that a single prediction model may struggle to achieve optimal prediction accuracy. - (2)
SSA-LSTM, BES-LSTM, and CABES-LSTM are intelligent algorithm-optimized LSTM models. In comparison to the LSTM single prediction model, the intelligent algorithm-optimized LSTM model enhances the prediction accuracy. In this study, three optimization algorithms are utilized to enhance the performance of the model. These algorithms are employed to optimize the learning rate, the number of hidden layer nodes, and the regularization coefficient of the LSTM. Their objective is to improve the overall effectiveness of the model. The results of four evaluation indexes of the Xiajiang hydrological station and the Yingluoxia hydrological station show that CABES-LSTM can obtain higher prediction accuracy. The optimization performance of CABES for LSTM is higher than SSA and BES. In the case of the Xiajiang hydrological station, comparing the SSA-LSTM model to the LSTM model, we observed an improvement in the

*R*by 5.57% and the NSEC by 9.90%. Additionally, the RMSE decreased by 5.44% and the MAPE decreased by 7.03%. Furthermore, when comparing the BES-LSTM model to the LSTM model, the*R*increased by 8.36% and the NSEC increased by 19.22%. In contrast, the RMSE decreased by 13.48% and the MAPE decreased by 10.61%. Lastly, when comparing the CABES-LSTM model to the LSTM model, we observed an even more substantial improvement. The*R*increased by 9.33% and the NSEC increased by 28.54%. Alongside this, the RMSE decreased by 18.07% and the MAPE decreased by 15.95%. For the Yingluoxia hydrological station, when compared to the LSTM model, the SSA-LSTM model displays a slight improvement with a 0.44% increase in*R*, while the RMSE decreased by 1.05%. On the other hand, the CABES-LSTM model exhibited a more noticeable improvement. The*R*increased by 0.55% and the NSEC increased by 1.82%. Furthermore, the RMSE and MAPE decrease by 4.29 and 0.64%, respectively. Based on the results obtained from the three models, it can be found that the CABES-LSTM performs best in terms of prediction accuracy among the intelligent algorithm-optimized LSTM models. This is because CABES has a stronger spatial search ability, which can more comprehensively find parameter combinations and obtain the optimal solution. Therefore, in this study, CABES is chosen to optimize the parameters of LSTM. Simultaneously, employing data preprocessing techniques can mitigate the nonlinearity and imbalance present in the original runoff series, thereby enhancing the model's prediction performance. The integration of data preprocessing and an intelligently optimized prediction model can effectively elevate the prediction accuracy. - (3)
CEEMDAN-CABES-LSTM, VMD-CABES-LSTM, and CEEMDAN-VMD-CABES-LSTM are the models that use CABES-LSTM to predict after data preprocessing. Compared with the CABES-LSTM without data preprocessing, the prediction accuracy of the model after data preprocessing has been significantly improved, and the proposed CEEMDAN-VMD-CABES-LSTM model demonstrates the utmost prediction accuracy among all the models considered. For the Xiajiang hydrological station, comparing CEEMDAN-VMD-CABES-LSTM to CABES-LST,

*R*and NSEC of CEEMDAN-VMD-CABES-LSTM increase by 24.84 and 44.86%, respectively, and RMSE and MAPE decrease by 64.72 and 58.67%, respectively. Compared with CEEMDAN-CABES-LSTM,*R*and NSEC of CEEMDAN-VMD-CABES-LSTM increase by 4.48 and 10.74%, respectively, RMSE and MAPE decrease by 45.10 and 36.30%, respectively. Compared with VMD-CABES-LSTM,*R*and NSEC of CEEMDAN-VMD-CABES-LSTM increase by 4.37 and 8.85%, respectively, RMSE and MAPE decrease by 41.73 and 31.75%, respectively. In the case of the Yingluoxia hydrological station, compared with the CABES-LSTM model, the*R*and NSEC of CEEMDAN-VMD-CABES-LSTM increase by 7.97 and 16.85%, respectively, and RMSE and MAPE decrease by 63.57 and 33.84%, respectively. In contrast to CEEMDAN-CABES-LSTM, the CEEMDAN-VMD-CABES-LSTM exhibits notable improvements. Specifically, the*R*increased by 2.06%, and the NSEC increased by 4.60%. Furthermore, the RMSE decreased by 42.18% and the MAPE decreased by 42.20%. Similarly, when comparing CEEMDAN-VMD-CABES-LSTM to VMD-CABES-LSTM, we observed improvements. The*R*increased by 0.92%, and the NSEC increased by 2.09%. Moreover, the RMSE decreased by 27.79% and the MAPE decreased by 18.12%. This is because VMD performs secondary decomposition on the high-frequency subsequences decomposed by CEEMDAN, further separating the noise in the high-frequency subsequences and enhancing the extraction of signal features, which helps to more accurately capture local features and subtle changes of the signal. These results demonstrate that employing the CEEMDAN-VMD method is more suitable for preprocessing nonlinear and nonstationary sequences. This approach can effectively adapt to the sequence characteristics, and the decomposition result aligns better with the requirements for accurate prediction using the CABES-LSTM model, ultimately leading to a substantial enhancement in prediction accuracy.

From Figures 16–19, it can be observed that the *R*, NSEC, and RMSE calculated by the CEEMDAN-VMD-CABES-LSTM model in the four Taylor diagrams are much better than those calculated by other models, indicating that the CEEMDAN-VMD-CABES-LSTM model produces the predicted values that are most closely aligned with the actual values, resulting in the highest prediction accuracy.

Based on the analysis of the four violin diagrams presented in Figures 19–23, we can make the following conclusion: the data distribution shape of CEEMDAN-VMD-CABES-LSTM is the most similar to the distribution of measured data compared to the other models. Additionally, the quartile range of CEEMDAN-VMD-CABES-LSTM is approximately equivalent, indicating a high level of consistency between the predicted and actual values. In comparison to the other models, CEEMDAN-VMD-CABES-LSTM exhibits superior accuracy and stability in the task of runoff prediction.

In addition, the research results of hydrological stations using the same research area in previous literature are selected to compare the proposed model's predictive performance rationally and objectively. The results are shown in Table 4. The Yingluoxia hydrological station is selected for comparison. From the data in Table 4, the *R* and MAPE of the proposed model at Yingluoxia hydrological station are 0.989 and 11.514, respectively, which are significantly better than the 0.932 and 44.963 of the CEEMDAN-SSA-ELM model and the 0.940 and 23.840 of the RLMD-SSA-ELMAN model. The model proposed in this paper shows better predictive performance and more accurate prediction results, proving the superiority of the proposed model method.

Author . | Research . | Model . | Evaluation indicators . |
---|---|---|---|

Wang et al. (2023b) | Yingluoxia | CEEMDAN-SSA-ELM | R = 0.932 |

MAPE = 44.963 | |||

Xu et al. (2024) | Yingluoxia | RLMD-SSA-ELMAN | R = 0.940 |

MAPE = 23.840 | |||

Proposed model | Yingluoxia | CEEMDAN-VMD-CABES-LSTM | R = 0.989 |

MAPE = 11.514 |

Author . | Research . | Model . | Evaluation indicators . |
---|---|---|---|

Wang et al. (2023b) | Yingluoxia | CEEMDAN-SSA-ELM | R = 0.932 |

MAPE = 44.963 | |||

Xu et al. (2024) | Yingluoxia | RLMD-SSA-ELMAN | R = 0.940 |

MAPE = 23.840 | |||

Proposed model | Yingluoxia | CEEMDAN-VMD-CABES-LSTM | R = 0.989 |

MAPE = 11.514 |

To summarize, the proposed hybrid model, CEEMDAN-VMD-CABES-LSTM, demonstrates the most effective prediction performance. When compared to the other seven benchmark models, CEEMDAN-VMD-CABES-LSTM exhibits the highest values of *R* and NSEC at both the Xiajiang and Yingluoxia hydrological stations. The *R* and NSEC values are close to 1, indicating a high level of reliability in the model's predictions. Additionally, the model achieves the smallest values of RMSE and MAPE among all the models, indicating its superior prediction accuracy. Overall, the results obtained from the four evaluation indexes demonstrate that this model is capable of accurately predicting the monthly runoff series for both study areas.

## CONCLUSIONS

The precise and dependable prediction of medium and long-term runoff directly influences the effective utilization and comprehensive allocation of water resources. Due to the nonlinear and nonstationary nature of runoff series, achieving satisfactory prediction results using a single model can often be challenging. To enhance the precision of monthly runoff prediction, this paper introduces a hybrid model called CEEMDAN-VMD-CABES-LSTM, which combines data preprocessing technology and an optimization algorithm. Firstly, the original runoff series is decomposed using CEEMDAN to isolate the high-frequency component IMF1. Secondly, IMF1 is decomposed twice by VMD to get the subsequence as stable as possible. Subsequently, the LSTM model is optimized using CABES to create the CABES-LSTM model, where each component serves as an input for the combined prediction model. At last, the results of the prediction are overlaid and reconstructed. The Xiajiang hydrological station and the Yingluoxia hydrological station were chosen as the sites for implementing the proposed model, and compared with BP, LSTM, SSA-LSTM, BES-LSTM, CABES-LSTM, CEEMDAN-CABES-LSTM, and VMD-CABES-LSTM. The key findings can be summarized as follows:

- (1)
The use of the CABES enables CEEMDAN-VMD-CABES-LSTM to automatically adjust the parameters of LSTM to obtain the best model configuration. This optimization technique significantly enhances the model's fitting capacity and generalization performance, playing a crucial role in the prediction task.

- (2)
CEEMDAN and VMD provide significant improvements to the performance of the CEEMDAN-VMD-CABES-LSTM model. By conducting a secondary data decomposition, it becomes possible to extract and separate the distinct frequency components and patterns present in the monthly runoff time series with effectiveness. Consequently, the decomposed subsequences exhibit higher purity and more pronounced periodic characteristics, which contribute to enhancing the predictive accuracy of the model.

- (3)
The CEEMDAN-VMD-CABES-LSTM model can model data at multiple scales and capture important periodic and dynamic characteristics through decomposition, prediction, and reconstruction. By employing a multi-scale feature representation, the model's predictive capability is enhanced, facilitating better adaptation to runoff variations characterized by diverse scales and frequencies.

In conclusion, the proposed CEEMDAN-VMD-CABES-LSTM model in this study significantly enhances the accuracy of runoff prediction and contributes to improved accuracy and robustness in monthly runoff forecasting. It serves as a reliable prediction tool for water resources management and decision-making. The CEEMDAN-VMD-CABES-LSTM model has shown potential in signal decomposition and prediction, but it also faces some challenges and limitations. This model involves many parameters, and selecting appropriate parameter values typically requires experience and experimental adjustments, which can be a time-consuming task. When multiple decomposition and prediction stages are used in the model, errors may propagate in different stages, leading to unstable or inaccurate prediction results. Future research can focus on improving the adaptability, computational efficiency, interpretability, and generalization performance of models to make them more suitable for various application fields.

## ACKNOWLEDGEMENTS

The authors are grateful to the support of the special project for collaborative innovation of science and technology in 2021 (No: 202121206) and Henan Province University Scientific and Technological Innovation Team (No: 18IRTSTHN009).

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.