Accurate runoff prediction is vital in efficiently managing water resources. In this paper, a hybrid prediction model combining complete ensemble empirical mode decomposition with adaptive noise, variational mode decomposition, CABES, and long short-term memory network (CEEMDAN-VMD-CABES-LSTM) is proposed. Firstly, CEEMDAN is used to decompose the original data, and the high-frequency component is decomposed using VMD. Then, each component is input into the LSTM optimized by CABES for prediction. Finally, the results of individual component predictions are combined and reconstructed to produce the monthly runoff predictions. The hybrid model is employed to predict the monthly runoff at the Xiajiang hydrological station and the Yingluoxia hydrological station. A comprehensive comparison is conducted with other models including back propagation (BP), LSTM, etc. The assessment of each model's prediction performance uses four evaluation indexes. Results reveal that the CEEMDAN-VMD-CABES-LSTM model showcased the highest forecast accuracy among all the models evaluated. Compared with the single LSTM, the root mean square error (RMSE) and mean absolute percentage error (MAPE) of the Xiajiang hydrological station decreased by 71.09 and 65.26%, respectively, and the RMSE and MAPE of the Yingluoxia hydrological station decreased by 65.13 and 40.42%, respectively. The R and NSEC of both sites are near 1.

  • A novel model (CEEMDAN-VMD-CABES-LSTM) is proposed for monthly runoff prediction.

  • CEEMDAN-VMD effectively reduces the complexity of the original monthly runoff series.

  • CABES enhances the generalization ability and prediction performance of the LSTM.

  • The four evaluation indicators and seven benchmark models are employed to verify the superiority of the developed model.

Effective water resource management and prediction are crucial for sustainable social and economic development, given the impact of climate change and human activities (Barati et al. 2014). As an important task in hydrology, monthly runoff prediction is widely used in agriculture, urban planning, hydropower station operation, and other fields. However, the monthly runoff data usually present complex nonlinear, nonstationary, and multi-time scale characteristics, which brings great challenges to researchers. As a result, hydrological experts have demonstrated considerable interest in creating accurate and reliable models for predicting runoff, aiming to improve the precision of runoff series predictions.

Currently, there are primarily two types of runoff prediction models: process-driven models and data-driven models (Radfar & Rockaway Thomas 2016; Ruiming 2018; Najafzadeh & Anvari 2023). Based on hydrology, the process-driven model describes hydrological processes in the basin by establishing physical or mathematical equations, such as precipitation, evaporation, infiltration, flow, etc., and uses these equations to simulate and predict runoff (Atashi et al. 2023). However, the process-driven model needs more input data and parameters, which leads to high modeling complexity and high requirements for model parameter determination and calibration. In addition, the data-driven model relies on historical observation data and statistical analysis techniques to construct the prediction model instead of relying on physical processes or traditional hydrological equations. By examining historical data from a specific watershed, including variables such as rainfall, runoff, and other relevant factors, and identifying patterns and models within these data, it is possible to generate reliable predictions for hydrological forecasting in data-scarce regions. This approach entails analyzing the relationships between variables and applying them to runoff prediction, thereby facilitating accurate predictions even in areas with limited data availability. Therefore, the data-driven model has been used by more and more hydrological workers for many years. A lot of research on runoff forecasting has utilized data-driven models, such as support vector machine (SVM) (Huang et al. 2016; Mustafa et al. 2022; Mo et al. 2023; Xu et al. 2023), autoregressive integral moving average model (ARIMA) (Hosseini et al. 2016; Yan et al. 2022; Wang et al. 2023a), extreme learning machine (ELM) (Yuan et al. 2020; Yan et al. 2023), artificial neural network (ANN) (Ikram et al. 2022; Wang et al. 2015), and long short-term memory neural network (LSTM) (Fang et al. 2021; Peng et al. 2022; Yao et al. 2023).

Data-driven models are widely used in all areas of life due to the rapid development of computer technology (Araghinejad et al. 2018; Onalo et al. 2018; Kumar et al. 2022; Rahbar et al. 2022; Xu et al. 2022). ANN, SVM, ELM, and other models are widely used in the field of runoff prediction, but they all have their shortcomings and shortcomings (Sohail et al. 2008; Tongle et al. 2016; Alizadeh et al. 2017; Zhang et al. 2018a; Liu et al. 2019; Wang et al. 2020; Chen et al. 2023). The BP model requires many iterations and calculations, making it prone to getting stuck in a local optimal solution instead of reaching the global optimal solution. The SVM model's performance heavily relies on kernel function selection and parameter adjustment, and improper selection can easily lead to poor prediction results. The hidden layer's weights and biases in the Elm model are generated randomly. Different random initializations may lead to different model performances, and the stability of the model is relatively low. LSTM has the advantages of strong processing ability of time series data, strong multi-input data fusion ability, long-term memory ability, and adaptive learning ability, and has been extensively utilized across multiple research disciplines. Khosravi et al. (2023) used LSTM to predict the sensitivity of soil erosion in the Haraz basin in northern Iran. The results showed that LSTM had good prediction performance, which reflected the characteristics of long-term dependence in LSTM capture data. Zhang et al. (2018b) used LSTM to predict the depth of groundwater level, providing an effective method for areas where it is difficult to obtain hydrogeological data. Qiu et al. (2021) used LSTM to predict river water temperature, which provided a powerful tool for ecological management and river water temperature prediction. Chen et al. (2020) used LSTM to predict the daily reference evapotranspiration of the Northeast China Plain. The outcomes revealed that LSTM exhibited excellent predictive capabilities, demonstrating strong performance both within the study area and in external locations. Over the past few years, hydrologists have increasingly adopted LSTM models for runoff prediction, leading to numerous significant findings in the field. Consequently, the present study employs the LSTM model to construct a monthly runoff prediction model.

During practical implementation, the random selection of super parameters in the LSTM model often results in difficulties in achieving satisfactory prediction accuracy, as the model tends to get trapped in local optimal solutions (Wang et al. 2023e). As a result, researchers are now combining optimization algorithms with individual prediction models to enhance overall predictive performance by optimizing model parameters (Wang et al. 2023c, 2023f). This approach aims to address the limitations of the LSTM model and improve the accuracy of predictions. Zhang et al. (2022a) combined SSA with LSTM to predict dam deformation. Based on the results, it was found that the optimized LSTM significantly improved the accuracy of predictions. Wang et al. (2022) used bald eagle search (BES) to optimize the structural parameters of Least Squares Support Vector Machine (LSSVM) to predict the survival risk of Esophageal squamous cell carcinoma (ESCC) patients and achieved good prediction results. Currently, no algorithm can solve all problems due to the theorem of no free lunch. Given the limitations of BES in practical optimization problems, Wang et al. (2023d) proposed an improved bald eagle search algorithm with Cauchy mutation and adaptive weight factor (CABES). Compared with the original BES, CABES integrates Cauchy mutation and adaptive optimization, which improves the optimization ability of BES, good results have been obtained in practical engineering problems. Consequently, this study utilizes CABES to optimize the hyperparameters of LSTM to improve the predictive ability of the model.

Currently, because of the original runoff series' non-equilibrium and nonlinear nature, hydrologists are progressively adopting data preprocessing techniques to enhance the predictive performance of models. Zhang et al. (2022b) combined empirical mode decomposition (EMD) and LSTM to predict groundwater depth. Compared with a single LSTM, the prediction accuracy was significantly improved. Despite the application of EMD, mode mixing issues persist in the decomposition process. To address this concern, Torres et al. (2011) enhancements to EMD by incorporating the concept of introducing white noise to the ensemble empirical mode decomposition (EEMD) technique, thereby mitigating mode aliasing. As a result, they proposed the method known as complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN). Guo et al. (2022) integrated CEEMDAN with LSTM to forecast the annual precipitation in Zhengzhou. The findings indicated that by employing CEEMDAN, the various fluctuation patterns in the precipitation sequence were effectively isolated. This approach successfully addressed the issues of modal ambiguity and residual noise encountered during the decomposition process of the reconstructed sequence, ultimately leading to a reduction in reconstruction errors. However, the initial Intrinsic Mode Functions (IMF) obtained by decomposing the original time series through CEEMDAN displays high frequency and complexity, which can impact the prediction of the overall model trend. Zuo et al. (2020) used variational mode decomposition (VMD) combined with LSTM to predict daily runoff. The findings demonstrated that VMD exhibits excellent robustness to noise and effectively mitigates mode mixing with noise. However, one limitation of VMD is the need to determine appropriate parameters to achieve the desired decomposition outcome. This implies that careful parameter selection is necessary to achieve satisfactory results when using VMD for decomposition purposes. In order to address the limitations of the decomposition technology mentioned above, this paper uses CEEMDAN-VMD to preprocess the runoff series. CEEMDAN-VMD is based on CEEMDAN decomposition. VMD is used to decompose the most complex subsequence obtained after CEEMDAN decomposition, which further improves the accuracy of decomposition, more comprehensively extracts the characteristic information in runoff series, and enhances the overall accuracy of predictions. Yang et al. (2023a) applied the combination of CEEMDAN and VMD to practical engineering, and the results showed that the model's prediction accuracy enhanced after VMD secondary decomposition of CEEMDAN's high-frequency subsequence.

By coupling CEEMDAN, VMD, CABES, and LSTM, this paper presents a novel CEEMDAN-VMD-CABES-LSTM monthly runoff prediction model, and then compares the model with seven benchmark models and presents a new approach to monthly runoff prediction.

The study presents several key contributions, which can be summarized as follows:

  • (1)

    By performing secondary decomposition of the high-frequency components of CEEMDAN through VMD, it is possible to more accurately capture the essential features of runoff series such as trends and periods, reducing the difficulty of model prediction and compensating for the limitations of primary decomposition.

  • (2)

    CABES is used for optimization to maximize the model's performance and improve the prediction accuracy of LSTM in response to the difficulties in selecting LSTM hyperparameters and the lack of convergence protection.

  • (3)

    The four evaluation indicators and seven benchmark models are employed to verify the superiority of the developed model at the Xiajiang hydrological station and the Yingluoxia hydrological station.

The rest of this study is structured as follows: Section 2 presents an overview of the methodologies employed in this study, including CEEMDAN, VMD, BES and CABES, LSTM, and the proposed CEEMDAN-VMD-CABES-LSTM model. Additionally, this section presents the evaluation indicators that are utilized to evaluate the performance of the model. Section 3 provides information concerning the study area and the dataset utilized in this research. Section 4 offers comprehensive details regarding the input and decomposition processes employed by the models. Section 5 consists of a comparative analysis and discussion of the prediction results obtained from various models. The conclusions drawn from this study are summarized in Section 6.

CEEMDAN

CEEMDAN is a new noise-aided data analysis method proposed by Torres et al. (2011). It not only makes up for the modal confusion and residual noise in the reconstruction sequence of EMD and EEMD but also improves the computational efficiency. The detailed procedure is outlined as follows:

  • (1)
    The Gaussian white noise is added to the factorization signal to obtain a new signal , where . The first-order eigenmode component is obtained by the EMD decomposition of the new signal.
    formula
    (1)
  • (2)

    The first eigenmode component of CEEMDAN decomposition is obtained by the overall average of the N-generated modal components.

  • (3)
    Compute the residual by subtracting the first modal component from the original data:
    formula
    (2)
  • (4)
    A fresh signal is generated by incorporating paired Gaussian white noise, both positive and negative, into the original signal . The initial modal component is achieved through EMD using the new signal as the carrier. As a result, the second eigenmodal component of CEEMDAN decomposition can be obtained:
    formula
    (3)
  • (5)
    The residual equation after removing the second modal component is as follows:
    formula
    (4)
  • (6)
    Continue repeating the aforementioned steps until the resulting residual signal becomes a monotonic function that cannot be further decomposed, marking the end of the algorithm. At this stage, the total number of eigenmode components obtained is M, and the original signal is decomposed into:
    formula
    (5)

2.2. VMD

Dragomiretskiy & Zosso (2014) introduced the VMD as a time-frequency analysis technique. VMD can determine the appropriate number of modal decompositions for the selected signal sequence based on the specific circumstances, divide the original signal into k eigenmode state functions with constrained bandwidth, and subsequently minimize the cumulative bandwidth estimates across all modes, to obtain the corresponding modal component signal and related parameters. The VMD variational constraint model is as follows:
formula
(6)
where and correspond to the decomposed IMF modes and their corresponding center frequencies, respectively. is the Dirac function, k is the number of modes, f is the original signal, is the convolution operator, and is the gradient operation. is the spectrum after the Hilbert transform, which is multiplied by the exponential term to adjust the estimated values of , and then the spectrum of modes is integrated into the basic frequency band. If the optimal solution to the above problem is required, the constrained variational problem needs to be transformed into an unconstrained variational problem for a solution. Therefore, the quadratic penalty factor and the Lagrange multiplication operator are introduced to obtain the following expression of the unconstrained variational problem:
formula
(7)
The optimal solution of the unconstrained variational problem in Equation (5) can be obtained by iterative search using the alternating direction multiplier algorithm and taking the extreme point of the Lagrange function. The IMF's and center frequency are expressed as follows:
formula
(8)
formula
(9)

VMD decomposition steps are as follows:

  • (1)

    Initialize , and n;

  • (2)

    Update and by Equations (6) and (7);

  • (3)
    Value of update :
    formula
    (10)
  • (4)
    Give the determination accuracy , if it meets:
    formula
    (11)

Then stop the iteration, otherwise return to (2).

In the above equation, , , and are the Fourier transforms corresponding to , , and , respectively.

2.3. BES

BES is a metaheuristic algorithm introduced by Alsattar et al. (2020), drawing inspiration from the search and hunting patterns of eagles. The algorithm comprises three main stages: selecting search space, searching space prey, and diving to capture prey. During the optimization process, each eagle represents a feasible solution to the problem, and the optimal solution is determined by iterating and updating the solution set at each stage until the iteration process concludes.

Select search space

The eagle chooses the most suitable search area by considering the optimal position and average position of the population, along with the current position of the individual as empirical information. Its mathematical model is as follows:
formula
(12)
where represents the optimal position determined by the eagle population; is the control factor, and the value range spans from 1.5 to 2; r is a random number between [0,1]; is the average position of all eagles; and represents the current position of the i-th eagle.

Search space prey stage

The eagle employs a spiral flight pattern within the search space to identify the optimal position for capturing its prey. The mathematical model representing this behavior is as follows:
formula
(13)
formula
(14)
formula
(15)
formula
(16)
where a is the control spiral flight angle parameter and the value range is [5,10]; R is the parameter of the number of control spiral flight turns, with a value range of [0.5,2]; is the polar angle of the spiral equation; is the polar diameter of the spiral equation; and are polar coordinates of eagles; the value range is [−1,1]; and represents the i-th eagle's next update position.

Dive to catch prey

The eagle swoops down from the determined optimal position to capture its prey. The mathematical model is:
formula
(17)
formula
(18)
formula
(19)
formula
(20)
where and are the exercise intensity control parameters of eagle diving and the value range is [1,2].

CABES

To enhance the search capabilities and avoid the local optimization challenge in the BES algorithm, two strategies were introduced: the Cauchy mutation strategy in the spatial search stage and an adaptive weight factor in the prey search stage in our previous research (Wang et al. 2023d). The proposed an improved BES algorithm is called CABES. CABES incorporates the Cauchy mutation strategy and adaptive weight factor strategy, which enhance the algorithm's global search ability and local development ability. This balanced approach between exploration and exploitation expands the search space of the group, ultimately improving the convergence accuracy and speed of the BES algorithm.

Cauchy mutation strategy

To address the issue of BES easily falling into local optima, a Cauchy mutation strategy is used, which utilizes the extended distribution at both ends of the Cauchy distribution and the small peak at the origin to perform mutation perturbation on the optimal individual. The longer distribution at both ends helps the algorithm break out of the local optimal dilemma. The mathematical model of the standard Cauchy distribution is shown in Equation (21):
formula
(21)
Introducing the Cauchy mutation strategy into the BES selection stage, the position update Equation (12) of the algorithm population is changed, and the updated position equation is shown in Equation (22).
formula
(22)

Adaptive weight strategy

Whether an algorithm achieves a balance between exploration and development is a criterion for evaluating its performance. In the selection stage, the BES algorithm introduces the Cauchy mutation strategy, effectively enhancing its global search ability and enhancing its exploration ability. To improve the development capability of the algorithm in the later iteration stage and achieve a balance between the exploration and development stages of BES, Wang introduced adaptive weight factors in the search stage. As the iteration progresses, the factor gradually decreases from large to small, and its mathematical model is shown in Equation (23).
formula
(23)
where it is the number of iterations in progress.
The adaptive weight factor is relatively large in the early stage of iteration, which helps to enhance the algorithm's search ability. In the later stage of iteration, the factor is small, which helps to improve the convergence speed of the algorithm. The adaptive weight factor strategy is introduced into the search phase to improve the convergence accuracy and speed of the algorithm. The updated position equation is shown in Equation (24).
formula
(24)

The algorithm process of CABES

In summary, after adopting the above two improvement strategies, the solving steps of CABES are shown below.

Step 1: Algorithm parameter initialization: population N, total number of iterations T, search interval upper bound (ub) and lower bound (lb), and dimension D. Randomly initialize the population and compute the fitness values for every individual in the population, selecting the optimal fitness value and the optimal individual.

Step 2: In the selection stage, Equations (12) and (22) are used to generate new individual positions and positions after the Cauchy variation, respectively. The two are compared to select the optimal individual fitness value.

Step 3: In the search phase, use Equations (13) and (24) to generate new individual positions and positions after the adaptive weight strategy, compare the two, and select the optimal individual fitness value.

Step 4: During the capture phase, generate a new position based on Equation (17).

Step 5: Compare the optimal fitness values generated by each iteration and select the optimal individual for preservation.

Step 6: Determine if the maximum number of iterations has been reached. If not, calculate the optimal value and output it. If yes, skip to Step 2.

LSTM

LSTM network refers to the concepts of cell state and gate (Hochreiter & Schmidhuber 1997). By employing such architectures, the protection and control of information can be significantly achieved. This effectively addresses the issue of the vanishing gradient in recurrent neural networks and enhances the capability to capture long-term dependencies in time series data. Figure 1 depicts the structure of the LSTM unit.
Figure 1

LSTM unit structure diagram.

Figure 1

LSTM unit structure diagram.

Close modal
As can be seen in Figure 1, the LSTM cell structure is mainly composed of a forgetting gate , input gate , and output gate . Among them, determines whether to store the historical information of the cell, determines whether to allow the upper input to enter the hidden layer, and determines whether to allow the output of the cell to enter the next cell. The calculation equations are as follows:
formula
(25)
formula
(26)
formula
(27)
formula
(28)
formula
(29)
formula
(30)
where are the weight matrix; are the offset matrix; are the input of the previous cell and the output of this cell, respectively; are the information status of two adjacent units, respectively; is the sigmoid function; and is the point-by-point multiplication.

Proposed model

To improve the precision of monthly runoff series forecasts, a CEEMDAN-VMD-CABES-LSTM hybrid prediction model is proposed in this paper. Firstly, CEEMDAN decomposes the original runoff series into several subsequences, and calculates the sample entropy of each component; Secondly, VMD is used to decompose the subsequence with the largest sample entropy (Richman et al. 2004). The super parameters of LSTM were optimized by CABES to enhance the fitting and generalization capability of LSTM. Then, the optimized LSTM is used to predict the subsequences, respectively. Lastly, the outcomes of each subsequence's predictions are combined and reconstructed to produce the final prediction results. Figure 2 illustrates the flowchart of the mixed CEEMDAN-VMD-CABES-LSTM prediction model. The specific procedures are outlined as follows:
Figure 2

Flow chart of the hybrid CEEMDAN-VMD-CABES-LSTM model.

Figure 2

Flow chart of the hybrid CEEMDAN-VMD-CABES-LSTM model.

Close modal
Figure 3

Overview of the Ganjiang River Basin.

Figure 3

Overview of the Ganjiang River Basin.

Close modal
Figure 4

Overview of the Heihe River Basin.

Figure 4

Overview of the Heihe River Basin.

Close modal

Step 1: CEEMDAN decomposition. CEEMDAN is utilized to decompose the raw runoff sequence into multiple subsequences, and subsequently, the sample entropy is computed for each of these subsequences.

Step 2: VMD secondary decomposition. The subsequence with the maximum sample entropy is decomposed twice to weaken the nonstationarity of the runoff series.

Step 3: Data preprocessing. Preprocessing of data is conducted by dividing the entire set of monthly runoff data into two subsets: the training set and the testing set. Subsequently, normalization is applied to both sets using the following equation, constraining the values within the range of [0,1]. This normalization process aims to expedite the convergence speed of the model.
formula
(31)
where and are the i-th sample of the original data sequence and the normalized data sequence, respectively. and are the maximum and minimum values in the sample, respectively.

Step 4: Model optimization. In this paper, the parameters of LSTM are optimized using CABES. Consequently, the CABES-LSTM model is constructed as a result.

Step 5: Model prediction. The normalized subsequences are fed into the CABES-LSTM model for prediction, yielding the predicted values for each subsequence. These predicted outcomes of each subsequence are then combined and reconstructed to obtain the final predicted result.

Evaluation indexes

To validate the predictive superiority of the CEEMDAN-VMD-CABES-LSTM model, it was compared against other models such as LSTM, BES-LSTM, CABES-LSTM, VMD-CABES-LSTM, and CEEMDAN-CABES-LSTM models. The evaluation of the prediction accuracy of the CEEMDAN-VMD-CABES-LSTM hybrid model is conducted using several metrics, including the Nash efficiency coefficient (NSEC), mean absolute percentage error (MAPE), root mean square error (RMSE), and correlation coefficient (R) (Lee 2022; Min et al. 2023; Yang & Li 2023; Yang et al. 2023b; Zhang & Yan 2023). The calculation equation for each evaluation criterion is as follows:
formula
(32)
formula
(33)
formula
(34)
formula
(35)
where and are predicted value and measured value of the sample and and are the mean of all predicted values and measured values.

Xiajiang hydrologic station and Yingluoxia hydrologic station are chosen as the research area in this study. The Xiajiang hydrological station is located in the middle reaches of the Ganjiang River, in Xiashan village, Luo'ao Town, Xiajiang County, Jiangxi Province, with a control basin area of 62,724 km2. The Ganjiang River begins its journey in Shicheng County, located in Ganzhou City, Jiangxi Province. Spanning a total length of 766 km, it covers a vast drainage area of 83,500 km2. The river has a natural descent of 937 m and experiences an average annual flow rate of 2,130 m3/s. Figure 3 shows the watershed overview of the Xiajiang Hydrological Station.

The Yingluoxia hydrological station is located at the mountain pass of the Heihe River trunk, Longqu Township, Ganzhou District, Zhangye City, Gansu Province, with a catchment area of 10,009 km2. It is a phased surface in the upper and middle reaches of the Heihe River. The river stretches for 956 km and exhibits an annual runoff volume of 1.68 billion m3. Figure 4 shows the watershed overview of the Yingluoxia Hydrological Station.

In this study, the monthly runoff data of the Xiajiang hydrological station from 1959 to 2016 and the Yingluoxia hydrological station from 1956 to 2009 were selected as the objects of study, with the first 80% as the training period and the rest as the test period. Figure 5 showcases the raw monthly runoff series for both the Xiajiang hydrological station and the Yingluoxia hydrological station.
Figure 5

Original monthly runoff series.

Figure 5

Original monthly runoff series.

Close modal

Decomposition results

The raw data are decomposed using the CEEMDAN technique, and the sample entropy is calculated. The parameters for this process include setting the standard deviation of noise to 0.2 and the maximum number of iterations to 5,000. The decomposition results of CEEMDAN at the Xiajiang hydrological station and the Yingluoxia hydrological station are shown in Figures 6 and 7. The sample entropy of each subsequence is computed. The conditional threshold is determined as 0.2 times the standard deviation of the sequence, and m is 2. The calculation results are listed in Table 1. It can be seen from Table 1 that IMF1 has the largest sample entropy, indicating that the sequence has the highest complexity, so VMD is used to decompose it twice to reduce its complexity. The value of the decomposition number k in VMD has a significant impact on the decomposition effect. When the decomposition number is large, a mixing phenomenon will occur. When the number of decompositions is small, the information of the original signal is easy to be lost. Different K values are calculated and their corresponding center frequencies are observed. Finally, K is selected as 7 and other parameters are the default values. The subsequence IMFS of IMF1 after secondary decomposition by VMD is shown in Figures 8 and 9.
Table 1

Sample entropy of CEEMDAN components

IMF1IMF2IMF3IMF4IMF5IMF6IMF7IMF8IMF9Res
Xiajiang 2.04 1.09 1.49 0.83 0.64 0.56 0.31 0.16 0.03 0.001 
Yingluoxia 1.83 1.69 0.92 0.81 0.60 0.57 0.40 0.10 —— 0.02 
IMF1IMF2IMF3IMF4IMF5IMF6IMF7IMF8IMF9Res
Xiajiang 2.04 1.09 1.49 0.83 0.64 0.56 0.31 0.16 0.03 0.001 
Yingluoxia 1.83 1.69 0.92 0.81 0.60 0.57 0.40 0.10 —— 0.02 
Figure 6

CEEMDAN decomposition results of original runoff series of the Xiajiang hydrological station.

Figure 6

CEEMDAN decomposition results of original runoff series of the Xiajiang hydrological station.

Close modal
Figure 7

CEEMDAN decomposition results of original runoff series of the Yingluoxia hydrological station.

Figure 7

CEEMDAN decomposition results of original runoff series of the Yingluoxia hydrological station.

Close modal
Figure 8

VMD secondary decomposition results of the Xiajiang hydrological station.

Figure 8

VMD secondary decomposition results of the Xiajiang hydrological station.

Close modal
Figure 9

VMD secondary decomposition results of the Yingluoxia hydrological station.

Figure 9

VMD secondary decomposition results of the Yingluoxia hydrological station.

Close modal

The number of input variables

By carefully selecting a suitable quantity of input variables for each model, we can effectively capture the nonlinear characteristics inherent in runoff series. Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) (Flores et al. 2012) are commonly used statistical tools to analyze the autocorrelation and partial correlation of time series data. They can identify the correlation structure in the sequence and determine the appropriate input variables. Figures 10 and 11 show the ACF and PACF of the Xiajiang hydrological station and the Yingluoxia hydrological station, respectively. From Figure 10, it can be seen that the estimated ACF of Xiajiang reaches its peak at 12, while most of its estimated PACF values fall within the 95% confidence interval after a lag of 12. Therefore, the input variable is 12, which predicts the runoff for the next month based on the runoff data from the previous 12 months. Similarly, the input variable of Yingluoxia is 12, according to Figure 11.
Figure 10

ACF and PACF plots of original runoff series at the Xiajiang hydrological station.

Figure 10

ACF and PACF plots of original runoff series at the Xiajiang hydrological station.

Close modal
Figure 11

ACF and PACF plots of original runoff series at the Yingluoxia hydrological station.

Figure 11

ACF and PACF plots of original runoff series at the Yingluoxia hydrological station.

Close modal

BP and LSTM model

Two individual prediction models have been devised, employing distinct configurations for each function of BP. Specifically, the tansig function is adopted as the activation function for the hidden layer, the trainlm function serves as the training function, and the purelin function is utilized for the output layer. The network undergoes 5,000 iterations with an expected error threshold of 0.01. A learning rate of 0.1 is employed, and MSE is employed as the performance function for the network. Choosing appropriate hyperparameters is beneficial for improving the predictive and generalization abilities of LSTM. An appropriate learning rate can ensure that the model converges to a good local optimal solution during the training process, which can effectively improve the training efficiency and performance of the model. The number of neuron nodes determines the capacity and complexity of the network. Choosing an appropriate number of neuron nodes can help balance the fitting and generalization abilities of the model, avoiding overfitting or underfitting problems. The regularization coefficient is used to control the complexity of the model, and an appropriate regularization coefficient can balance the complexity of the model and the degree of fit between the training data, effectively preventing the model from overfitting the training data. Therefore, the LSTM model in this article selects the learning rate, number of neuron nodes, and regularization coefficient as the hyperparameters of the model. The number of iterations, learning rate, and number of hidden layer nodes for LSTM are 0.005, 800, and 100, respectively.

SSA-LSTM, BES-LSTM, and CABES-LSTM models

Three original runoff series models without decomposition algorithm are established: LSTM optimized by sparrow algorithm, LSTM optimized by original Condor, and LSTM optimized by improved Condor. SSA, BES, and CABES optimize the learning rate, regularization coefficient, and the number of hidden layer nodes of LSTM. The initial population of the three optimization algorithms is 10 and the number of iterations is 20.

CEEMDAN-CABES-LSTM, VMD-CABES-LSTM, and CEEMDAN-VMD-CABES-LSTM models

The two original runoff series are processed by a decomposition algorithm and then predicted. Specifically, the original runoff series are processed using the CEEMDAN, VMD, and a combination of CEEMDAN and VMD algorithms for decomposition. Subsequently, the improved vulture-optimized LSTM model is employed for prediction.

The established mixed model and seven comparative models were used to predict the various sequences of the Xiajiang hydrological station and the Yingluoxia hydrological station. The overall effectiveness of the selected model was tested through the Fisher test (F-test) (Saberi-Movahed et al. 2020), and the performance of the model was evaluated using four evaluation indicators: R, NSEC, RMSE, and MAPE. The results related to the F-test are shown in Table 2. The calculation outcomes of the evaluation indices for each model during both the training and testing periods are presented in Table 3. The comparison diagram of the prediction of each model of the Xiajiang hydrological station and the Yingluoxia hydrological station during the training period and the testing period is shown in Figures 1215 and the comparison diagram of evaluation indexes is shown in Figures 1522.
Table 2

Comparison of the F-test results

SitesModelSSRSSEMSRMSEFP
Xiajiang BP 4.22 × 106 3.55 × 108 4.22 × 106 1.30 × 106 3.24 0.07 
LSTM 5.73 × 104 3.10 × 108 5.73 × 104 1.14 × 106 0.05 0.82 
SSA-LSTM 1.67 × 106 3.04 × 108 1.67 × 106 1.12 × 106 0.34 0.56 
BES-LSTM 3.16 × 105 3.56 × 108 3.16 × 105 1.31 × 106 1.28 0.26 
CABES-LSTM 3.16 × 105 3.24 × 108 3.16 × 105 1.19 × 106 0.26 0.61 
CEEMDAN-CABES-LSTM 3.28 × 104 3.85 × 108 3.28 × 104 1.41 × 106 0.02 0.88 
VMD-CABES-LSTM 9.82 × 103 3.46 × 108 9.82 × 103 1.27 × 106 0.01 0.93 
CEEMDAN-VMD-CABES-LSTM 57.92 4.04 × 108 57.92 1.49 × 106 0.00 1.00 
Yingluoxia BP 2.39 × 103 4.84 × 105 2.39 × 103 1.92 × 103 1.24 0.27 
LSTM 344.31 5.38 × 105 344.31 2.13 × 103 0.16 0.69 
SSA-LSTM 1.14 × 103 4.36 × 105 1.14 × 103 1.73 × 103 0.66 0.42 
BES-LSTM 9.76 4.87 × 105 9.76 1.93 × 103 0.01 0.94 
CABES-LSTM 729.29 4.43 × 105 729.29 1.76 × 103 0.41 0.52 
CEEMDAN-CABES-LSTM 0.14 4.59 × 105 0.14 1.82 × 103 0.01 0.99 
VMD-CABES-LSTM 18.18 4.73 × 105 18.18 1.88 × 103 0.01 0.92 
CEEMDAN-VMD-CABES-LSTM 58.80 4.89 × 105 58.80 1.94 × 103 0.03 0.86 
SitesModelSSRSSEMSRMSEFP
Xiajiang BP 4.22 × 106 3.55 × 108 4.22 × 106 1.30 × 106 3.24 0.07 
LSTM 5.73 × 104 3.10 × 108 5.73 × 104 1.14 × 106 0.05 0.82 
SSA-LSTM 1.67 × 106 3.04 × 108 1.67 × 106 1.12 × 106 0.34 0.56 
BES-LSTM 3.16 × 105 3.56 × 108 3.16 × 105 1.31 × 106 1.28 0.26 
CABES-LSTM 3.16 × 105 3.24 × 108 3.16 × 105 1.19 × 106 0.26 0.61 
CEEMDAN-CABES-LSTM 3.28 × 104 3.85 × 108 3.28 × 104 1.41 × 106 0.02 0.88 
VMD-CABES-LSTM 9.82 × 103 3.46 × 108 9.82 × 103 1.27 × 106 0.01 0.93 
CEEMDAN-VMD-CABES-LSTM 57.92 4.04 × 108 57.92 1.49 × 106 0.00 1.00 
Yingluoxia BP 2.39 × 103 4.84 × 105 2.39 × 103 1.92 × 103 1.24 0.27 
LSTM 344.31 5.38 × 105 344.31 2.13 × 103 0.16 0.69 
SSA-LSTM 1.14 × 103 4.36 × 105 1.14 × 103 1.73 × 103 0.66 0.42 
BES-LSTM 9.76 4.87 × 105 9.76 1.93 × 103 0.01 0.94 
CABES-LSTM 729.29 4.43 × 105 729.29 1.76 × 103 0.41 0.52 
CEEMDAN-CABES-LSTM 0.14 4.59 × 105 0.14 1.82 × 103 0.01 0.99 
VMD-CABES-LSTM 18.18 4.73 × 105 18.18 1.88 × 103 0.01 0.92 
CEEMDAN-VMD-CABES-LSTM 58.80 4.89 × 105 58.80 1.94 × 103 0.03 0.86 
Table 3

Results of each model evaluation index of the two sites

SitesModelTraining
Testing
RNSECRMSEMAPE(%)RNSECRMSEMAPE(%)
Xiajiang BP 0.723 0.495 1,010.885 70.854 0.689 0.410 943.569 61.844 
LSTM 0.737 0.543 961.780 53.221 0.718 0.515 855.616 50.778 
SSA-LSTM 0.753 0.573 931.568 51.715 0.758 0.566 809.044 47.210 
BES-LSTM 0.802 0.641 851.596 46.802 0.778 0.614 740.280 45.390 
CABES-LSTM 0.820 0.671 815.795 43.456 0.785 0.662 701.009 42.677 
CEEMDAN-CABES-LSTM 0.922 0.850 550.990 37.079 0.938 0.866 450.527 27.688 
VMD-CABES-LSTM 0.945 0.883 486.467 32.710 0.939 0.881 424.483 25.842 
CEEMDAN-VMD-CABES-LSTM 0.980 0.960 284.797 22.639 0.980 0.959 247.336 17.638 
Yingluoxia BP 0.884 0.772 21.024 33.991 0.898 0.802 19.763 29.665 
LSTM 0.912 0.829 18.224 20.233 0.911 0.822 18.758 19.325 
SSA-LSTM 0.919 0.831 18.632 25.409 0.915 0.835 18.461 19.527 
BES-LSTM 0.925 0.856 16.727 18.470 0.915 0.831 18.277 19.001 
CABES-LSTM 0.930 0.865 16.160 16.572 0.916 0.837 17.953 19.201 
CEEMDAN-CABES-LSTM 0.967 0.935 11.242 23.785 0.969 0.935 11.310 21.979 
VMD-CABES-LSTM 0.980 0.959 8.902 16.475 0.980 0.958 9.057 13.703 
CEEMDAN-VMD-CABES-LSTM 0.989 0.977 6.659 14.504 0.989 0.978 6.540 11.514 
SitesModelTraining
Testing
RNSECRMSEMAPE(%)RNSECRMSEMAPE(%)
Xiajiang BP 0.723 0.495 1,010.885 70.854 0.689 0.410 943.569 61.844 
LSTM 0.737 0.543 961.780 53.221 0.718 0.515 855.616 50.778 
SSA-LSTM 0.753 0.573 931.568 51.715 0.758 0.566 809.044 47.210 
BES-LSTM 0.802 0.641 851.596 46.802 0.778 0.614 740.280 45.390 
CABES-LSTM 0.820 0.671 815.795 43.456 0.785 0.662 701.009 42.677 
CEEMDAN-CABES-LSTM 0.922 0.850 550.990 37.079 0.938 0.866 450.527 27.688 
VMD-CABES-LSTM 0.945 0.883 486.467 32.710 0.939 0.881 424.483 25.842 
CEEMDAN-VMD-CABES-LSTM 0.980 0.960 284.797 22.639 0.980 0.959 247.336 17.638 
Yingluoxia BP 0.884 0.772 21.024 33.991 0.898 0.802 19.763 29.665 
LSTM 0.912 0.829 18.224 20.233 0.911 0.822 18.758 19.325 
SSA-LSTM 0.919 0.831 18.632 25.409 0.915 0.835 18.461 19.527 
BES-LSTM 0.925 0.856 16.727 18.470 0.915 0.831 18.277 19.001 
CABES-LSTM 0.930 0.865 16.160 16.572 0.916 0.837 17.953 19.201 
CEEMDAN-CABES-LSTM 0.967 0.935 11.242 23.785 0.969 0.935 11.310 21.979 
VMD-CABES-LSTM 0.980 0.959 8.902 16.475 0.980 0.958 9.057 13.703 
CEEMDAN-VMD-CABES-LSTM 0.989 0.977 6.659 14.504 0.989 0.978 6.540 11.514 
Figure 12

Prediction results of each model during the training period of the Xiajiang hydrological station.

Figure 12

Prediction results of each model during the training period of the Xiajiang hydrological station.

Close modal
Figure 13

Prediction results of each model during the testing period of the Xiajiang hydrological station.

Figure 13

Prediction results of each model during the testing period of the Xiajiang hydrological station.

Close modal
Figure 14

Prediction results of each model during the training period of the Yingluoxia hydrological station.

Figure 14

Prediction results of each model during the training period of the Yingluoxia hydrological station.

Close modal
Figure 15

Prediction results of each model during the testing period of the Yingluoxia hydrological station.

Figure 15

Prediction results of each model during the testing period of the Yingluoxia hydrological station.

Close modal
Figure 16

Taylor diagram of eight models in the training period of the Xiajiang hydrological station.

Figure 16

Taylor diagram of eight models in the training period of the Xiajiang hydrological station.

Close modal
Figure 17

Taylor diagram of eight models in the testing period of the Xiajiang hydrological station.

Figure 17

Taylor diagram of eight models in the testing period of the Xiajiang hydrological station.

Close modal
Figure 18

Taylor diagram of eight models in the training period of the Yingluoxia hydrological station.

Figure 18

Taylor diagram of eight models in the training period of the Yingluoxia hydrological station.

Close modal
Figure 19

Taylor diagram of eight models in the testing period of the Yingluoxia hydrological station.

Figure 19

Taylor diagram of eight models in the testing period of the Yingluoxia hydrological station.

Close modal
Figure 20

Violin diagram of eight models of the Xiajiang hydrological station during the training period.

Figure 20

Violin diagram of eight models of the Xiajiang hydrological station during the training period.

Close modal
Figure 21

Violin diagram of eight models of the Xiajiang hydrological station during the testing period.

Figure 21

Violin diagram of eight models of the Xiajiang hydrological station during the testing period.

Close modal
Figure 22

Violin diagram of eight models of the Yingluoxia hydrological station during the training period.

Figure 22

Violin diagram of eight models of the Yingluoxia hydrological station during the training period.

Close modal
Figure 23

Violin diagram of eight models of the Yingluoxia hydrological station during the testing period.

Figure 23

Violin diagram of eight models of the Yingluoxia hydrological station during the testing period.

Close modal

Given that the prediction results during the validation period provide a more accurate evaluation of the model's performance, it is essential to conduct a comprehensive analysis of the forecast outcomes during this period. According to the test results in Table 2, the P of all models is greater than 0.05, indicating that there is no significant difference among all models. Based on the observation depicted in Figures 13 and 15, it can be inferred that the accuracy of the individual prediction model is relatively low. Both BP and LSTM models can only demonstrate the overall trend of the runoff series, but there is a significant deviation in the predicted values. Among SSA-LSTM, BES-LSTM, and CABES-LSTM without data pretreatment, CABES-LSTM has a better prediction effect. The prediction accuracy of the CEEMDAN-VMD-CABES-LSTM model is higher than CEEMDAN-CABES-LSTM and VMD-CABES-LSTM. The CEEMDAN-VMD-CABES-LSTM model exhibits the most accurate fitting effect, as the predicted sequence trend aligns closely with the raw sequence trend. Furthermore, the prediction results closely align with the measured data.

The evaluation results for each model during the validation period are presented in Table 3, from which the following comprehensive analysis can be provided:

  • (1)

    In the case of monthly runoff prediction for the Xiajiang hydrological station and the Yingluoxia hydrological station, both BP and LSTM are single prediction models. The performance of LSTM surpasses that of BP in terms of prediction accuracy. For the Xiajiang hydrological station, the LSTM model shows significant improvements compared to the BP model. Specifically, the R and NSEC values increased from 0.689 and 0.410 to 0.718 and 0.515, respectively. Additionally, the RMSE decreased by 9.32% and the MAPE decreased by 17.89%. In the case of the Yingluoxia, R and NSEC of the LSTM model increased, and RMSE and MAPE decreased by 0.215 and 8.179, respectively. In a word, the evaluation index results indicate that the LSTM model is superior to BP in capturing the characteristics of runoff and is more suitable for monthly runoff prediction. However, it should be noted that a single prediction model may struggle to achieve optimal prediction accuracy.

  • (2)

    SSA-LSTM, BES-LSTM, and CABES-LSTM are intelligent algorithm-optimized LSTM models. In comparison to the LSTM single prediction model, the intelligent algorithm-optimized LSTM model enhances the prediction accuracy. In this study, three optimization algorithms are utilized to enhance the performance of the model. These algorithms are employed to optimize the learning rate, the number of hidden layer nodes, and the regularization coefficient of the LSTM. Their objective is to improve the overall effectiveness of the model. The results of four evaluation indexes of the Xiajiang hydrological station and the Yingluoxia hydrological station show that CABES-LSTM can obtain higher prediction accuracy. The optimization performance of CABES for LSTM is higher than SSA and BES. In the case of the Xiajiang hydrological station, comparing the SSA-LSTM model to the LSTM model, we observed an improvement in the R by 5.57% and the NSEC by 9.90%. Additionally, the RMSE decreased by 5.44% and the MAPE decreased by 7.03%. Furthermore, when comparing the BES-LSTM model to the LSTM model, the R increased by 8.36% and the NSEC increased by 19.22%. In contrast, the RMSE decreased by 13.48% and the MAPE decreased by 10.61%. Lastly, when comparing the CABES-LSTM model to the LSTM model, we observed an even more substantial improvement. The R increased by 9.33% and the NSEC increased by 28.54%. Alongside this, the RMSE decreased by 18.07% and the MAPE decreased by 15.95%. For the Yingluoxia hydrological station, when compared to the LSTM model, the SSA-LSTM model displays a slight improvement with a 0.44% increase in R, while the RMSE decreased by 1.05%. On the other hand, the CABES-LSTM model exhibited a more noticeable improvement. The R increased by 0.55% and the NSEC increased by 1.82%. Furthermore, the RMSE and MAPE decrease by 4.29 and 0.64%, respectively. Based on the results obtained from the three models, it can be found that the CABES-LSTM performs best in terms of prediction accuracy among the intelligent algorithm-optimized LSTM models. This is because CABES has a stronger spatial search ability, which can more comprehensively find parameter combinations and obtain the optimal solution. Therefore, in this study, CABES is chosen to optimize the parameters of LSTM. Simultaneously, employing data preprocessing techniques can mitigate the nonlinearity and imbalance present in the original runoff series, thereby enhancing the model's prediction performance. The integration of data preprocessing and an intelligently optimized prediction model can effectively elevate the prediction accuracy.

  • (3)

    CEEMDAN-CABES-LSTM, VMD-CABES-LSTM, and CEEMDAN-VMD-CABES-LSTM are the models that use CABES-LSTM to predict after data preprocessing. Compared with the CABES-LSTM without data preprocessing, the prediction accuracy of the model after data preprocessing has been significantly improved, and the proposed CEEMDAN-VMD-CABES-LSTM model demonstrates the utmost prediction accuracy among all the models considered. For the Xiajiang hydrological station, comparing CEEMDAN-VMD-CABES-LSTM to CABES-LST, R and NSEC of CEEMDAN-VMD-CABES-LSTM increase by 24.84 and 44.86%, respectively, and RMSE and MAPE decrease by 64.72 and 58.67%, respectively. Compared with CEEMDAN-CABES-LSTM, R and NSEC of CEEMDAN-VMD-CABES-LSTM increase by 4.48 and 10.74%, respectively, RMSE and MAPE decrease by 45.10 and 36.30%, respectively. Compared with VMD-CABES-LSTM, R and NSEC of CEEMDAN-VMD-CABES-LSTM increase by 4.37 and 8.85%, respectively, RMSE and MAPE decrease by 41.73 and 31.75%, respectively. In the case of the Yingluoxia hydrological station, compared with the CABES-LSTM model, the R and NSEC of CEEMDAN-VMD-CABES-LSTM increase by 7.97 and 16.85%, respectively, and RMSE and MAPE decrease by 63.57 and 33.84%, respectively. In contrast to CEEMDAN-CABES-LSTM, the CEEMDAN-VMD-CABES-LSTM exhibits notable improvements. Specifically, the R increased by 2.06%, and the NSEC increased by 4.60%. Furthermore, the RMSE decreased by 42.18% and the MAPE decreased by 42.20%. Similarly, when comparing CEEMDAN-VMD-CABES-LSTM to VMD-CABES-LSTM, we observed improvements. The R increased by 0.92%, and the NSEC increased by 2.09%. Moreover, the RMSE decreased by 27.79% and the MAPE decreased by 18.12%. This is because VMD performs secondary decomposition on the high-frequency subsequences decomposed by CEEMDAN, further separating the noise in the high-frequency subsequences and enhancing the extraction of signal features, which helps to more accurately capture local features and subtle changes of the signal. These results demonstrate that employing the CEEMDAN-VMD method is more suitable for preprocessing nonlinear and nonstationary sequences. This approach can effectively adapt to the sequence characteristics, and the decomposition result aligns better with the requirements for accurate prediction using the CABES-LSTM model, ultimately leading to a substantial enhancement in prediction accuracy.

From Figures 1619, it can be observed that the R, NSEC, and RMSE calculated by the CEEMDAN-VMD-CABES-LSTM model in the four Taylor diagrams are much better than those calculated by other models, indicating that the CEEMDAN-VMD-CABES-LSTM model produces the predicted values that are most closely aligned with the actual values, resulting in the highest prediction accuracy.

Based on the analysis of the four violin diagrams presented in Figures 1923, we can make the following conclusion: the data distribution shape of CEEMDAN-VMD-CABES-LSTM is the most similar to the distribution of measured data compared to the other models. Additionally, the quartile range of CEEMDAN-VMD-CABES-LSTM is approximately equivalent, indicating a high level of consistency between the predicted and actual values. In comparison to the other models, CEEMDAN-VMD-CABES-LSTM exhibits superior accuracy and stability in the task of runoff prediction.

In addition, the research results of hydrological stations using the same research area in previous literature are selected to compare the proposed model's predictive performance rationally and objectively. The results are shown in Table 4. The Yingluoxia hydrological station is selected for comparison. From the data in Table 4, the R and MAPE of the proposed model at Yingluoxia hydrological station are 0.989 and 11.514, respectively, which are significantly better than the 0.932 and 44.963 of the CEEMDAN-SSA-ELM model and the 0.940 and 23.840 of the RLMD-SSA-ELMAN model. The model proposed in this paper shows better predictive performance and more accurate prediction results, proving the superiority of the proposed model method.

Table 4

Comparison of results with previous studies

AuthorResearchModelEvaluation indicators
Wang et al. (2023b)  Yingluoxia CEEMDAN-SSA-ELM R = 0.932 
MAPE = 44.963 
Xu et al. (2024)  Yingluoxia RLMD-SSA-ELMAN R = 0.940 
MAPE = 23.840 
Proposed model Yingluoxia CEEMDAN-VMD-CABES-LSTM R = 0.989 
MAPE = 11.514 
AuthorResearchModelEvaluation indicators
Wang et al. (2023b)  Yingluoxia CEEMDAN-SSA-ELM R = 0.932 
MAPE = 44.963 
Xu et al. (2024)  Yingluoxia RLMD-SSA-ELMAN R = 0.940 
MAPE = 23.840 
Proposed model Yingluoxia CEEMDAN-VMD-CABES-LSTM R = 0.989 
MAPE = 11.514 

In addition, Figures 24 and 25 show the extreme value bar charts during the validation period of the Xiajiang hydrological station and the Yingluoxia hydrological station. From the figure, it can be seen that compared to other models, CEEMDAN-VMD-CABES-LSTM has a closer prediction of extreme values to the measured values, proving that this model not only has excellent overall prediction performance compared to other models, but also has achieved very good results in predicting extreme values.
Figure 24

Forecast period extreme value bar chart of the Xiajiang hydrological station.

Figure 24

Forecast period extreme value bar chart of the Xiajiang hydrological station.

Close modal
Figure 25

Forecast period extreme value bar chart of the Yingluoxia hydrological station.

Figure 25

Forecast period extreme value bar chart of the Yingluoxia hydrological station.

Close modal

To summarize, the proposed hybrid model, CEEMDAN-VMD-CABES-LSTM, demonstrates the most effective prediction performance. When compared to the other seven benchmark models, CEEMDAN-VMD-CABES-LSTM exhibits the highest values of R and NSEC at both the Xiajiang and Yingluoxia hydrological stations. The R and NSEC values are close to 1, indicating a high level of reliability in the model's predictions. Additionally, the model achieves the smallest values of RMSE and MAPE among all the models, indicating its superior prediction accuracy. Overall, the results obtained from the four evaluation indexes demonstrate that this model is capable of accurately predicting the monthly runoff series for both study areas.

The precise and dependable prediction of medium and long-term runoff directly influences the effective utilization and comprehensive allocation of water resources. Due to the nonlinear and nonstationary nature of runoff series, achieving satisfactory prediction results using a single model can often be challenging. To enhance the precision of monthly runoff prediction, this paper introduces a hybrid model called CEEMDAN-VMD-CABES-LSTM, which combines data preprocessing technology and an optimization algorithm. Firstly, the original runoff series is decomposed using CEEMDAN to isolate the high-frequency component IMF1. Secondly, IMF1 is decomposed twice by VMD to get the subsequence as stable as possible. Subsequently, the LSTM model is optimized using CABES to create the CABES-LSTM model, where each component serves as an input for the combined prediction model. At last, the results of the prediction are overlaid and reconstructed. The Xiajiang hydrological station and the Yingluoxia hydrological station were chosen as the sites for implementing the proposed model, and compared with BP, LSTM, SSA-LSTM, BES-LSTM, CABES-LSTM, CEEMDAN-CABES-LSTM, and VMD-CABES-LSTM. The key findings can be summarized as follows:

  • (1)

    The use of the CABES enables CEEMDAN-VMD-CABES-LSTM to automatically adjust the parameters of LSTM to obtain the best model configuration. This optimization technique significantly enhances the model's fitting capacity and generalization performance, playing a crucial role in the prediction task.

  • (2)

    CEEMDAN and VMD provide significant improvements to the performance of the CEEMDAN-VMD-CABES-LSTM model. By conducting a secondary data decomposition, it becomes possible to extract and separate the distinct frequency components and patterns present in the monthly runoff time series with effectiveness. Consequently, the decomposed subsequences exhibit higher purity and more pronounced periodic characteristics, which contribute to enhancing the predictive accuracy of the model.

  • (3)

    The CEEMDAN-VMD-CABES-LSTM model can model data at multiple scales and capture important periodic and dynamic characteristics through decomposition, prediction, and reconstruction. By employing a multi-scale feature representation, the model's predictive capability is enhanced, facilitating better adaptation to runoff variations characterized by diverse scales and frequencies.

In conclusion, the proposed CEEMDAN-VMD-CABES-LSTM model in this study significantly enhances the accuracy of runoff prediction and contributes to improved accuracy and robustness in monthly runoff forecasting. It serves as a reliable prediction tool for water resources management and decision-making. The CEEMDAN-VMD-CABES-LSTM model has shown potential in signal decomposition and prediction, but it also faces some challenges and limitations. This model involves many parameters, and selecting appropriate parameter values typically requires experience and experimental adjustments, which can be a time-consuming task. When multiple decomposition and prediction stages are used in the model, errors may propagate in different stages, leading to unstable or inaccurate prediction results. Future research can focus on improving the adaptability, computational efficiency, interpretability, and generalization performance of models to make them more suitable for various application fields.

The authors are grateful to the support of the special project for collaborative innovation of science and technology in 2021 (No: 202121206) and Henan Province University Scientific and Technological Innovation Team (No: 18IRTSTHN009).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Alizadeh
M. J.
,
Shahheydari
H.
,
Kavianpour
M. R.
,
Shamloo
H.
&
Barati
R.
2017
Prediction of longitudinal dispersion coefficient in natural rivers using a cluster-based Bayesian network
.
Environmental Earth Sciences
76
(
2
),
86
.
doi:10.1007/s12665-016-6379-6
.
Alsattar
H. A.
,
Zaidan
A. A.
&
Zaidan
B. B.
2020
Novel meta-heuristic bald eagle search optimisation algorithm
.
Artificial Intelligence Review
53
(
3
),
2237
2264
.
doi:10.1007/s10462-019-09732-5
.
Araghinejad
S.
,
Fayaz
N.
&
Hosseini-Moghari
S.-M.
2018
Development of a hybrid data driven model for hydrological estimation
.
Water Resources Management
32
(
11
),
3737
3750
.
doi:10.1007/s11269-018-2016-3
.
Atashi
V.
,
Barati
R.
&
Lim
Y. H.
2023
Distributed Muskingum model with a whale optimization algorithm for river flood routing
.
Journal of Hydroinformatics
.
doi:10.2166/hydro.2023.029
.
Barati
R.
,
Neyshabouri
S. A. A. S.
&
Ahmadi
G.
2014
Development of empirical models with high accuracy for estimation of drag coefficient of flow around a smooth sphere: An evolutionary approach
.
Powder Technology
257
,
11
19
.
doi:10.1016/j.powtec.2014.02.045
.
Chen
X. P.
,
Li
Y. P.
,
Gao
P. P.
,
Liu
J.
&
Zhang
H.
2023
A multi-scenario BP-neural-network ecologically-extended input-output model for synergetic management of water-electricity nexus system – A case study of Fujian province
.
Journal of Cleaner Production
399
,
136581
.
doi:10.1016/j.jclepro.2023.136581
.
Dragomiretskiy
K.
&
Zosso
D.
2014
Variational mode decomposition
.
IEEE Transactions on Signal Processing
62
(
3
),
531
544
.
doi:10.1109/TSP.2013.2288675
.
Fang
Z.
,
Wang
Y.
,
Peng
L.
&
Hong
H.
2021
Predicting flood susceptibility using LSTM neural networks
.
Journal of Hydrology
594
,
125734
.
doi:10.1016/j.jhydrol.2020.125734
.
Flores
J. H. F.
,
Engel
P. M.
&
Pinto
R. C.
2012
Autocorrelation and partial autocorrelation functions to improve neural networks models on univariate time series forecasting
. In:
The 2012 International Joint Conference on Neural Networks (IJCNN)
, pp.
1
8
.
doi:10.1109/IJCNN.2012.6252470
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
(
8
),
1735
1780
.
doi:10.1162/neco.1997.9.8.1735
.
Hosseini
K.
,
Nodoushan
E. J.
,
Barati
R.
&
Shahheydari
H.
2016
Optimal design of labyrinth spillways using meta-heuristic algorithms
.
KSCE Journal of Civil Engineering
20
(
1
),
468
477
.
doi:10.1007/s12205-015-0462-5
.
Huang
S.
,
Chang
J.
,
Huang
Q.
,
Chen
Y.
&
Leng
G.
2016
Quantifying the relative contribution of climate and human impacts on runoff change based on the Budyko hypothesis and SVM model
.
Water Resources Management
30
(
7
),
2377
2390
.
doi:10.1007/s11269-016-1286-x
.
Ikram
R. M. A.
,
Ewees
A. A.
,
Parmar
K. S.
,
Yaseen
Z. M.
,
Shahid
S.
&
Kisi
O.
2022
The viability of extended marine predators algorithm-based artificial neural networks for streamflow prediction
.
Applied Soft Computing
131
,
109739
.
doi:10.1016/j.asoc.2022.109739
.
Khosravi
K.
,
Rezaie
F.
,
Cooper
J. R.
,
Kalantari
Z.
,
Abolfathi
S.
&
Hatamiafkoueieh
J.
2023
Soil water erosion susceptibility assessment using deep learning algorithms
.
Journal of Hydrology
618
,
129229
.
doi:10.1016/j.jhydrol.2023.129229
.
Kumar
M.
,
Elbeltagi
A.
,
Pande
C. B.
,
Ahmed
A. N.
,
Chow
M. F.
,
Pham
Q. B.
,
Kumari
A.
&
Kumar
D.
2022
Applications of Data-driven Models for Daily Discharge Estimation Based on Different Input Combinations
.
Water Resources Management
36
(
7
),
2201
2221
.
doi:10.1007/s11269-022-03136-x
.
Liu
H.
,
Zhu
Y.
,
Pei
S.
,
Savić
D.
,
Fu
G.
,
Zhang
C.
,
Yuan
Y.
&
Zhang
J.
2019
Flow regime identification for air valves failure evaluation in water pipelines using pressure data
.
Water Research
165
,
115002
.
doi:10.1016/j.watres.2019.115002
.
Min
X.
,
Hao
B.
,
Sheng
Y.
,
Huang
Y.
&
Qin
J.
2023
Transfer performance of gated recurrent unit model for runoff prediction based on the comprehensive spatiotemporal similarity of catchments
.
Journal of Environmental Management
330
,
117182
.
doi:10.1016/j.jenvman.2022.117182
.
Mo
C.
,
Yan
Z.
,
Ma
R.
,
Lei
X.
,
Deng
Y.
,
Lai
S.
,
Huang
K.
&
Mo
X.
2023
Investigation of the EWT-PSO-SVM Model for Runoff Forecasting in the Karst Area
.
Applied Sciences.
doi:10.3390/app13095693
.
Mustafa
M. D.
,
Mansoor
T.
&
Muzzammil
M.
2022
Support vector machine (SVM) approach to develop the discharge prediction model for triangular labyrinth weir
.
Water Supply
22
(
12
),
8942
8956
.
doi:10.2166/ws.2022.393
.
Najafzadeh
M.
&
Anvari
S.
2023
Long-lead streamflow forecasting using computational intelligence methods while considering uncertainty issue
.
Environmental Science and Pollution Research
30
(
35
),
84474
84490
.
doi:10.1007/s11356-023-28236-y
.
Onalo
D.
,
Adedigba
S.
,
Khan
F.
,
James
L. A.
&
Butt
S.
2018
Data driven model for sonic well log prediction
.
Journal of Petroleum Science and Engineering
170
,
1022
1037
.
doi:10.1016/j.petrol.2018.06.072
.
Peng
A.
,
Zhang
X.
,
Xu
W.
&
Tian
Y.
2022
Effects of training data on the learning performance of LSTM network for runoff simulation
.
Water Resources Management
36
(
7
),
2381
2394
.
doi:10.1007/s11269-022-03148-7
.
Qiu
R.
,
Wang
Y.
,
Rhoads
B.
,
Wang
D.
,
Qiu
W.
,
Tao
Y.
&
Wu
J.
2021
River water temperature forecasting using a deep learning method
.
Journal of Hydrology
595
,
126016
.
doi:10.1016/j.jhydrol.2021.126016
.
Radfar
A.
&
Rockaway Thomas
D.
2016
Captured runoff prediction model by permeable pavements using artificial neural networks
.
Journal of Infrastructure Systems
22
(
3
),
04016007
.
doi:10.1061/(ASCE)IS.1943-555X.0000284
.
Rahbar
A.
,
Mirarabi
A.
,
Nakhaei
M.
,
Talkhabi
M.
&
Jamali
M.
2022
A comparative analysis of data-driven models (SVR, ANFIS, and ANNs) for daily karst spring discharge prediction
.
Water Resources Management
36
(
2
),
589
609
.
doi:10.1007/s11269-021-03041-9
.
Richman
J. S.
,
Lake
D. E.
&
Moorman
J. R.
2004
Sample Entropy, Methods in Enzymology
.
Academic Press
, pp.
172
-
184
.
doi:10.1016/S0076-6879(04)84011-4
.
Ruiming
F.
2018
Wavelet based relevance vector machine model for monthly runoff prediction
.
Water Quality Research Journal
54
(
2
),
134
141
.
doi:10.2166/wcc.2018.196
.
Sohail
A.
,
Watanabe
K.
&
Takeuchi
S.
2008
Runoff analysis for a small watershed of Tono area Japan by back propagation artificial neural network with seasonal data
.
Water Resources Management
22
(
1
),
1
22
.
doi:10.1007/s11269-006-9141-0
.
Tongle
X.
,
Yingbo
W.
&
Kang
C.
2016
Tailings saturation line prediction based on genetic algorithm and BP neural network
.
Journal of Intelligent & Fuzzy Systems
30
,
1947
1955
.
doi:10.3233/IFS-151905
.
Torres
M. E.
,
Colominas
M. A.
,
Schlotthauer
G.
&
Flandrin
P.
2011
A complete ensemble empirical mode decomposition with adaptive noise
. In
2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, pp.
4144
4147
.
doi:10.1109/ICASSP.2011.5947265
.
Wang
W.-c.
,
Chau
K.-w.
,
Qiu
L.
&
Chen
Y.-b.
2015
Improving forecasting accuracy of medium and long-term runoff using artificial neural network based on EEMD decomposition
.
Environmental Research
139
,
46
54
.
doi:10.1016/j.envres.2015.02.002
.
Wang
Y.
,
Fang
Z.
,
Hong
H.
&
Peng
L.
2020
Flood susceptibility mapping using convolutional neural network frameworks
.
Journal of Hydrology
582
,
124482
.
doi:10.1016/j.jhydrol.2019.124482
.
Wang
Y.
,
Zhang
W.
,
Sun
J.
,
Wang
L.
,
Song
X.
&
Zhao
X.
2022
Survival Risk Prediction of Esophageal Squamous Cell Carcinoma Based on BES-LSSVM
.
Computational Intelligence and Neuroscience
2022
,
3895590
.
doi:10.1155/2022/3895590
.
Wang
W.-c.
,
Cheng
Q.
,
Chau
K.-w.
,
Hu
H.
,
Zang
H.-f.
&
Xu
D.-m.
2023b
An enhanced monthly runoff time series prediction using extreme learning machine optimized by salp swarm algorithm based on time varying filtering based empirical mode decomposition
.
Journal of Hydrology
620
,
129460
.
doi:10.1016/j.jhydrol.2023.129460
.
Wang
W.-c.
,
Tian
W.-c.
,
Xu
D.-m.
,
Chau
K.-w.
,
Ma
Q.
&
Liu
C.-j.
2023c
Muskingum Models’ Development and their Parameter Estimation: A State-of-the-art Review
.
Water Resources Management
37
(
8
),
3129
3150
.
doi:10.1007/s11269-023-03493-1
.
Wang
W.
,
Tian
W.
,
Chau
K.-w.
,
Xue
Y.
,
Xu
L.
&
Zang
H.
2023d
An Improved Bald Eagle Search Algorithm with Cauchy Mutation and Adaptive Weight Factor for Engineering Optimization
.
Computer Modeling in Engineering & Sciences
136
(
2
),
1603
1642
.
doi:10.32604/cmes.2023.026231
.
Wang
W.
,
Tian
W.
,
Chau
K.
,
Zang
H.
,
Ma
M.
,
Feng
Z.
&
Xu
D.
2023e
Multi-Reservoir Flood Control Operation Using Improved Bald Eagle Search Algorithm with ε Constraint Method
.
Water
.
doi:10.3390/w15040692
.
Wang
W.
,
Tian
W.
,
Xu
L.
,
Liu
C.
&
Xu
D.
2023f
Mε-OIDE algorithm for solving constrained optimization problems and its application in flood control operation of reservoir group
.
Shuili Xuebao/Journal of Hydraulic Engineering
54
(
2
),
148
158
.
doi:10.13243/j.cnki.slxb.20220396
.
Xu
Y.
,
Hu
Y.
,
Rao
X.
,
Zhao
H.
,
Zhong
X.
,
Peng
X.
,
Zhan
W.
,
Sheng
G.
&
Liu
D.
2022
A fractal physics-based data-driven model for water-flooding reservoir (FlowNet-fractal)
.
Journal of Petroleum Science and Engineering
210
,
109960
.
doi:10.1016/j.petrol.2021.109960
.
Xu
D.-m.
,
Wang
X.
,
Wang
W.-c.
,
Chau
K.-w.
&
Zang
H.-f.
2023
Improved monthly runoff time series prediction using the SOA–SVM model based on ICEEMDAN–WD decomposition
.
Journal of Hydroinformatics
25
(
3
),
943
970
.
doi:10.2166/hydro.2023.172
.
Xu
D.-m.
,
Hu
X.-x.
,
Wang
W.-c.
,
Chau
K.-w.
,
Zang
H.-f.
&
Wang
J.
2024
A new hybrid model for monthly runoff prediction using ELMAN neural network based on decomposition-integration structure with local error correction method
.
Expert Systems with Applications
238
,
121719
.
doi:10.1016/j.eswa.2023.121719
.
Yan
B.
,
Mu
R.
,
Guo
J.
,
Liu
Y.
,
Tang
J.
&
Wang
H.
2022
Flood risk analysis of reservoirs based on full-series ARIMA model under climate change
.
Journal of Hydrology
610
,
127979
.
doi:10.1016/j.jhydrol.2022.127979
.
Yan
P.
,
Zhang
Z.
,
Hou
Q.
,
Lei
X.
,
Liu
Y.
&
Wang
H.
2023
A novel IBAS-ELM model for prediction of water levels in front of pumping stations
.
Journal of Hydrology
616
,
128810
.
doi:10.1016/j.jhydrol.2022.128810
.
Yang
H.
&
Li
W.
2023
Data Decomposition, Seasonal Adjustment Method and Machine Learning Combined for Runoff Prediction: A Case Study
.
Water Resources Management
37
(
1
),
557
581
.
doi:10.1007/s11269-022-03389-6
.
Yang
K.
,
Wang
Y.
,
Li
M.
,
Li
X.
,
Wang
H.
&
Xiao
Q.
2023a
Modeling topological nature of gas–liquid mixing process inside rectangular channel using RBF-NN combined with CEEMDAN-VMD
.
Chemical Engineering Science
267
,
118353
.
doi:10.1016/j.ces.2022.118353
.
Yang
S.
,
Zhang
Y.
&
Zhang
Z.
2023b
Runoff prediction based on dynamic spatiotemporal graph neural network
.
Water
.
doi:10.3390/w15132463
.
Zhang
J.
&
Yan
H.
2023
A long short-term components neural network model with data augmentation for daily runoff forecasting
.
Journal of Hydrology
617
,
128853
.
doi:10.1016/j.jhydrol.2022.128853
.
Zhang
D.
,
Lin
J.
,
Peng
Q.
,
Wang
D.
,
Yang
T.
,
Sorooshian
S.
,
Liu
X.
&
Zhuang
J.
2018a
Modeling and simulating of reservoir operation using the artificial neural network, support vector regression, deep learning algorithm
.
Journal of Hydrology
565
,
720
736
.
doi:10.1016/j.jhydrol.2018.08.050
.
Zhang
J.
,
Zhu
Y.
,
Zhang
X.
,
Ye
M.
&
Yang
J.
2018b
Developing a long short-term memory (LSTM) based model for predicting water table depth in agricultural areas
.
Journal of Hydrology
561
,
918
929
.
doi:10.1016/j.jhydrol.2018.04.065
.
Zhang
X.
,
Chen
H.
,
Zhu
G.
,
Zhao
D.
&
Duan
B.
2022b
A new groundwater depth prediction model based on EMD-LSTM
.
Water Supply
22
(
6
),
5974
5988
.
doi:10.2166/ws.2022.230
.
Zuo
G.
,
Luo
J.
,
Wang
N.
,
Lian
Y.
&
He
X.
2020
Decomposition ensemble model based on variational mode decomposition and long short-term memory for streamflow forecasting
.
Journal of Hydrology
585
,
124776
.
doi:10.1016/j.jhydrol.2020.124776
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).