Runoff forecasting is crucial for water resources management, demanding precise models. This study proposes a runoff forecasting model that utilises a hybrid variational mode decomposition (VMD), convolutional neural network (CNN), and long short-term memory network (LSTM) based on the attention mechanism (AM) to enhance the accuracy and stability of runoff forecasting. The volatility of the runoff sequence was significantly reduced by the VMD. The AM focused on extracting the most critical information from the features. The VMD–CNN–AM–LSTM model, using the two-stage decomposition forecasting framework, was used to predict daily runoff at the Jianli (JL) hydrological station in the section from Yichang to JL from 1 October 2006 to 30 October 2022. This model outperformed the model without the hybrid of VMD and/or AM in forecasting runoff, with a root mean square error value of 646.160, a mean absolute error value of 424.124, a mean absolute percentage error value of 2.54%, and an R2 value of 0.9933. Model stability was assessed using the bias-variance, which was found to be significantly more stable than the model without VMD and AM. The VMD and AM optimises runoff forecasting at the target station by utilising upstream stations’ runoff. This improves the accuracy and stability of the model, providing technical support for water resources planning and management.

  • A developed VMD–CNN– attention mechanism (AM)– long short-term memory network (LSTM) hybrid model boosts short-term river runoff forecast precision and stability.

  • The model captures river runoff's spatio-temporal dynamics through VMD and AM integration.

  • The model's enhanced predictive stability is verified by bias-variance analysis.

Runoff is a crucial factor influencing water supply for domestic, industrial, and agricultural use and plays a pivotal role in the socio-economic development of a region. The ability to regulate natural runoff is closely linked to the development of industrial and agricultural production, shaping human life and economic progress (Huang et al. 2015). Effective short-term runoff forecasting is essential for informed decision-making in water resource management, providing valuable insights for flood and drought mitigation (Xiao et al. 2022). However, short-term runoff forecasting requires estimation based on past runoff values, taking into account time constraints and various influencing factors. This process has the characteristics of uncertainty, conditionality, and multi-programme (Zhang & Yan 2023). Therefore, the construction of a short-term runoff model that considers spatial and temporal characteristics is crucial for accurate model forecasting.

While time series models like autoregressive models (AR), autoregressive moving average model, and autoregressive integrated moving average model have traditionally been applied in runoff forecasting (Mohammadi et al. 2006; Li et al. 2015; Zurey et al. 2020), their reliance on linear assumptions limits their effectiveness for non-stationary, nonlinear time series data (Fathian 2021). To address this limitation, data-driven models have gained prominence in runoff forecasting. These models include convolutional neural networks (CNN), long short-term memory networks (LSTMs), decision trees, and support vector machines (Erdal & Karakurt 2013; Huang et al. 2014; Zakizadeh et al. 2020; Li et al. 2021). It is important to note that each data-driven model has its own unique characteristics. For instance, CNNs are mainly intended to extract local features from continuous data (Guo et al. 2018), whereas LSTM models are specifically designed to capture temporal dependencies (Yu et al. 2019). The feature extraction capabilities of a single data-driven model for time series are limited. Additionally, the forecasting accuracy of these machine learning methods is constrained by data quality, particularly in the presence of irregular vacancies and noise in real-world load data (Deng et al. 2020; Zhou & Kang 2023).

Hybrid methods combining different algorithms, such as the attention mechanism (AM) with LSTM, have been proposed to improve forecasting accuracy (Ding et al. 2020). The AM assigns different weights to different implicit states, thereby amplifying the influence of crucial information and improving the accuracy of load forecasts (Ding et al. 2020). Another way to improve forecasting accuracy involves the decomposition of the sequence prior to prediction. Common decomposition methods include wavelet transform, empirical mode decomposition (EMD), and variational mode decomposition (VMD). Xiao & Wang (2021) utilized an empirical EMD approach for load sequence decomposition, followed by integrating obtained eigenmodal components into a hybrid neural network for prediction. In comparison, Li et al. (2021) and Zuo et al. (2020a) employed VMD to decompose signals for runoff forecasting, effectively avoiding mode aliasing, simplifying the model, and enhancing prediction accuracy. However, after performing sequence decomposition, these methods tend to use only the decomposed sequence itself to make predictions and do not take into account the influence of external factors (e.g., runoff from upstream stations) on the forecasted runoff.

Accurately predicting runoff involves capturing both spatial and temporal variations. Traditional models often focus on temporal aspects, neglecting spatial dependencies, which are crucial in regions with complex hydrological dynamics. This study addresses this gap by incorporating upstream runoff stations as key spatio-temporal inputs into the VMD–CNN–AM–LSTM hybrid model. The inclusion of these upstream stations allows the model to better account for spatial interactions in runoff processes, thereby improving the accuracy and stability of short-term forecasts. This study focuses specifically on the middle reaches of the Yangtze River, spanning from Yichang (YC) to Jianli (JL). The application of this hybrid model to runoff forecasting in the JL hydrological station demonstrates its potential in providing valuable technical support for water resources management planning in the region.

VMD–CNN–AM–LSTM runoff forecasting model framework

This study presents a runoff forecasting model using VMD–CNN–AM–LSTM, established within the foresight period at hydrological stations upstream based on the decomposition outcomes of historical runoff time series. VMD decomposition may be subject to the following issues: decomposing the entire runoff time series (including training and validation sets) simultaneously may result in the incorporation of information from the validation period (which is, in fact, future information and unavailable) into the model training stage. Furthermore, decomposing the training set and validation set separately may be influenced by the boundary effect, leading to inferior prediction. The boundary effect refers to the phenomenon whereby the decomposition values in proximity to the boundary deviate from the actual values. In order to circumvent the utilisation of future information in the model training stage and to diminish the impact of boundary effects on the prediction accuracy, this study employs the two-stage decomposition forecasting framework, whose efficacy has been validated in runoff prediction (Zuo et al. 2020a, 2020b). The VMD–CNN–AM–LSTM framework is depicted in Figure 1. The TensorFlow framework has been employed to construct and train our deep learning model. TensorFlow offers efficient computational performance and flexible network structure definition capabilities, providing robust support. Additionally, Python has been utilized to write all relevant code implementations. The ease of use and abundant library support of Python have enabled the rapid setup of a comprehensive experimental process.
Figure 1

VMD–CNN–AM–LSTM framework.

Figure 1

VMD–CNN–AM–LSTM framework.

Close modal

The runoff forecasting model is broken down into four steps:

Step 1: Dataset division. The original time series is divided into a training set and a validation set. A higher ratio helps capture complex temporal dependencies, especially when dealing with long-term data sequences (Joseph 2022). The 9:1 split was chosen to maximize the data available for training, as runoff forecasting can be highly sensitive to the amount of training data.

Step 2: The training set is pre-decomposed using the target station, with the parameters of the signal decomposition algorithm optimised. The validation set Vi is then attached to the training set, and the additional set Ai is decomposed using VMD. Based on the decomposition results of the additional set, additional sample sets are generated, and the last sample in each additional sample set is taken as the validation sample.

Step 3: Forecasting using the CNN–AM–LSTM model. The influencing factor of upstream runoff is introduced. The decomposed modes are separately forecasted, and their results are linearly combined to obtain the final prediction of runoff at the target station. The input data were normalised using z-score normalization.

Step 4: Evaluating the forecast result of the model. Several metrics have been chosen to thoroughly evaluate the performance, stability, and accuracy of the VMD–CNN–AM–LSTM model and other models.

Variational mode decomposition

Variational mode decomposition (VMD) is an adaptive and non-recursive signal decomposition method (Dragomiretskiy & Zosso 2014). It is based on the concepts of Wiener filtering, Hilbert transform, and frequency mixing. The method decomposes the signal into K sub-signals of small relative amplitude in different frequency bands. The algorithm's core idea is to construct and solve the optimal solution of the variational problem. This effectively decomposes low- and high-frequency signals, reducing the high complexity, non-linearity, and non-smoothness of the time series. The specific steps are as follows:

  • (1) Establish the constrained variational problem. To ensure minimal estimated bandwidths of each mode and satisfy that the sum of all modes is equal to the original signal, the constrained variational expression is established assuming the original signal f(t) is decomposed into K mode components with different frequency characteristics.
    (1)
    where is the kth mode component of the signal decomposition; is the frequency centre of the kth mode component; indicates the Dirac distribution; and is the convolution operator.
  • (2) The optimal solution to the constrained variational problem is obtained by performing a Lagrange transform, which converts the problem into an unconstrained variational problem that can be solved:
    (2)
    where is the quadratic penalty term factor and is the Lagrange multiplier operator.

To find the optimal solution to the problem, the alternating direction multiplier method is used to update each mode component and its centre frequency as follows:
(3)

The decomposition effect of VMD can be influenced by K, α, τ, and ε. If K is too small, the extraction of IMF from the original signal may be ineffective, while larger values of K may result in redundancy of IMF information. Smaller values of α may lead to a larger bandwidth, redundancy of information expression, and increased additional noise. Conversely, larger values of α may lead to a smaller bandwidth and loss of effective information. Equation (3) shows that λ ensures optimal convergence of VMD with appropriate values of τ > 0 for low-noise signals, while λ prevents VMD from converging with values of τ > 0 for high-noise signals. To avoid this shortcoming, τ can be set to zero, but this results in some error when reconstructing the decomposed signal by summation. The value of ε affects the reconstruction error of VMD.

Attention mechanism

In predicting runoff, input features vary in their degree of influence on the results. The AM calculates the probability distributions of different input features, focusing on the important ones and disregarding irrelevant information, thereby enhancing the model's ability to utilize key features (Bahdanau et al. 2014). This study employs a soft AM to distribute weight values of attention between 0 and 1. A two-dimensional matrix with m rows and n columns is formed for m input features and n time nodes.

Set the input sequence at time t to be . The weights are calculated as follows:
(4)
where denotes the weight of the ith feature of the tth time input sequence, .
The temporal weights of each input feature are combined to obtain an attention matrix for each input feature:
(5)
Set the time series of an input feature as , and the process of computing the attention weights of the time series of the input feature as follows:
(6)
where denotes the weight of an individual time,.
The temporal weights of each input feature are combined to obtain an attention matrix for each time:
(7)
The above-obtained feature attention weight matrix A and time attention weight matrix B are Hadamard products to obtain the time and feature attention weight matrix C:
(8)

CNN–AM–LSTM model

The model used in this study is the CNN–AM–LSTM model with an AM. It includes an input layer, a CNN layer, an AM layer, LSTM layers, a flattened layer, a fully connected layer, and an output layer. The input layer receives the labels and related factor feature sets. Two LSTM layers, each with four neurons, are added to learn and remember load data regularities. Subsequently, a one-dimensional CNN (Conv1d) layer is added to effectively extract the internal features of the load data. The number of convolutional kernels is three in the convolutional layer, and the step size is set to 1. An AM layer is introduced to apply attention weights to the output of the convolutional layer, generating attention-weighted feature representations. Following this, two LSTM layers, each with 32 neurons, are added. Next, the model adds a flattened layer (Flatten) and a fully connected layer with a ReLU activation function containing 25 neurons to one-dimensionalize the features. Finally, the output layer produces the prediction results with an output dimension of 1, corresponding to the dimension of the prediction target. The CNN–AM–LSTM structure is illustrated in Figure 2.
Figure 2

CNN–AM–LSTM structure.

Figure 2

CNN–AM–LSTM structure.

Close modal

The VMD–CNN–AM–LSTM model not only decomposes runoff time series data using VMD to handle non-stationary and nonlinear characteristics but also integrates upstream runoff data to capture spatial dependencies. The CNN component further enhances this by extracting localized features from the input data. The AM then assigns weights to critical features, ensuring that both spatial (runoff from upstream stations) and temporal (lagged runoff values) dependencies are effectively captured in the prediction process.

The Changjiang River is China's largest river, with a basin area of 1 million km². It is a vital hydroelectric and energy hub due to major water conservancy projects such as Xiluodu and the Three Gorges (Figure 3). This study focuses on the YC- JL section within the Yangtze River Basin, which is located downstream of the Three Gorges Water Conservancy project. This section, which includes major cities, such as YC, Jingzhou, and Yueyang, is of great importance for social and economic development.
Figure 3

Study area.

The hydrological dynamics in this segment are complex, with numerous inflow and outflow processes in the mainstream. This complexity poses challenges for traditional runoff forecasting methods.

There are four hydrological stations along the YC–JL section: YC, Zhicheng (ZC), Shashi (SS), and JL. The analysis will cover the period from 1 October 2006 to 30 October 2022, which coincides with the completion of the Three Gorges Project. The total number of data points is 5,145. To ensure accuracy, we compared flow series data and found similar hydrological conditions from YC to JL. Therefore, there is no need to adjust the measured runoff at each station for processing.

Our previous studies have indicated that the travel time for runoff from YC to JL is approximately 3 days (Zhou & Kang 2023). To enhance the accuracy of our runoff forecasting model, we consider the influence of the first 3 days of runoff from three upstream stations (YC, ZC, and SS) when predicting the JL station. This is because the characteristics of upstream runoff can impact the size and timing of daily load curves for regional runoff, which can result in more uncertain load data.

VMD decomposition

The VMD is parameterised using the training set. In general, selecting α = 2,000, τ = 0, ε = 1 × 10−7 can provide optimal denoising and effective separation of intrinsic mode functions (IMFs). The only parameter that requires adjustment is the number of mode components, denoted as K. The primary distinction between different modes lies in their centre frequencies. Thus, the selection of the appropriate K-value is determined by analyzing the centre frequency distribution across various mode numbers. Table 1 displays the centre frequencies at different mode numbers. Notably, the centre frequencies for K = 4 (0.000230, 0.014072, and 0.039768 Hz) closely resemble those for K = 3 (0.00243, 0.016477, and 0.049812 Hz).

Table 1

The centre frequency of different mode numbers

Mode numbersCentre frequency (Hz)
0.000276 0.022084     
0.000243 0.016477 0.049812    
0.000230 0.014072 0.039768 0.072638   
0.000132 0.00561 0.021272 0.0479908 0.087900  
0.0000710 0.004108625 0.019258536 0.043177531 0.072183613 0.12828097 
Mode numbersCentre frequency (Hz)
0.000276 0.022084     
0.000243 0.016477 0.049812    
0.000230 0.014072 0.039768 0.072638   
0.000132 0.00561 0.021272 0.0479908 0.087900  
0.0000710 0.004108625 0.019258536 0.043177531 0.072183613 0.12828097 

To further determine the value of K, the correlation between adjacent mode components was analysed using different numbers of modes. Table 2 displays the correlation coefficients between adjacent modes. When K is less than 3, the correlation coefficients between adjacent modes decrease sequentially, which indicates that the mode decomposition is normal; once K becomes 4, the level of correlation coefficient between adjacent modes, after decreasing, appears to increase, which indicates that the mode components begin to overlap (Table 2). Therefore, for the VMD decomposition in this study, the value of K was set to 3.

Table 2

Correlation coefficients of adjacent modes

Mode numbersCorrelation coefficient
C12C23C34C45C56
0.13216     
0.175878 0.111295    
0.231503 0.138274 0.151127   
0.278946 0.052338 0.105346 0.129991  
0.242677 0.051622 0.116159 0.14685 0.113728 
Mode numbersCorrelation coefficient
C12C23C34C45C56
0.13216     
0.175878 0.111295    
0.231503 0.138274 0.151127   
0.278946 0.052338 0.105346 0.129991  
0.242677 0.051622 0.116159 0.14685 0.113728 

The decomposition results are presented in Figure 4. The raw load data exhibit evident annual, decadal, and daily periodicity. Of the three mode components obtained from decomposition, IMF1 has the largest amplitude. Even though the change is smooth, the amplitude difference is significant, posing difficulties in training. IMF2 and IMF3 demonstrate comparatively smaller amplitudes and exhibit significant and systematic changes, facilitating the prediction of future patterns. Furthermore, the residual component has a high amplitude and should not be disregarded.
Figure 4

IMFs and residuals by VMD based on the origin runoff.

Figure 4

IMFs and residuals by VMD based on the origin runoff.

Close modal

Effectiveness testing of AM

To confirm the efficacy of the AM, we compare the prediction results of various hybrids of CNN–AM–LSTM and CNN–LSTM, LSTM, and CNN. The results of the comparison between the simulated and measured values are shown in Figure 5. We assess and detail the overall predictive performance of each model in Table 3.
Table 3

Forecasting performance of different model

ModelsRMSEMAER2MAPE(%)
CNN–AM–LSTM 895.6738 518.2583 0.9884 2.87 
CNN–LSTM 922.8554 571.1692 0.9886 3.13 
LSTM 852.9706 565.6301 0.9876 3.60 
CNN 842.4297 557.9231 0.9890 3.56 
VMD–CNN–AM–LSTM 646.1602 424.1244 0.9933 2.54 
VMD–CNN–LSTM 790.7234 490.6196 0.9896 2.93 
VMD–LSTM 744.045 520.9984 0.9912 3.29 
VMD–CNN 808.1079 483.4139 0.9913 2.68 
ModelsRMSEMAER2MAPE(%)
CNN–AM–LSTM 895.6738 518.2583 0.9884 2.87 
CNN–LSTM 922.8554 571.1692 0.9886 3.13 
LSTM 852.9706 565.6301 0.9876 3.60 
CNN 842.4297 557.9231 0.9890 3.56 
VMD–CNN–AM–LSTM 646.1602 424.1244 0.9933 2.54 
VMD–CNN–LSTM 790.7234 490.6196 0.9896 2.93 
VMD–LSTM 744.045 520.9984 0.9912 3.29 
VMD–CNN 808.1079 483.4139 0.9913 2.68 
Figure 5

CNN–AM–LSTM, CNN–LSTM, LSTM, and CNN prediction results.

Figure 5

CNN–AM–LSTM, CNN–LSTM, LSTM, and CNN prediction results.

Close modal
The root mean square error (RMSE) is a statistical measure that quantifies the discrepancy between predicted and actual values. It is particularly sensitive to outliers, or instances where the predicted value differs significantly from the true value. Table 3 illustrates that the CNN model exhibits the smallest RMSE, followed by LSTM, CNN–AM–LSTM, and finally, CNN–LSTM. The RMSE value of the CNN–AM–LSTM model is not the smallest, indicating a significant discrepancy between the predicted and true values. The mean absolute error (MAE) is a measure of average absolute error between the predicted and observed values. The smaller MAE value for the CNN–AM–LSTM model indicates that the model exhibits a lower mean error between the overall predicted and true values. Furthermore, the CNN–AM–LSTM model exhibits the smallest mean absolute percentage error (MAPE) value of 2.87%. Figure 6 illustrates that the incorporation of the AM significantly enhances runoff prediction in the valley region. However, the improvement is less discernible in the peak region, where the LSTM and CNN models demonstrate superior simulation results. This is because the AM assigns weights to each upstream station through learning. Given the nature of the hydrological runoff process in the study area is intricate, the volume of runoff in the flood season is more uncertain than in the non-flood season, and the time series noise is more significant. Consequently, a single weight setting is inadequate for accurately simulating the peak value.
Figure 6

VMD–CNN–AM–LSTM, VMD–CNN–LSTM, VMD–LSTM, and VMD–CNN prediction results.

Figure 6

VMD–CNN–AM–LSTM, VMD–CNN–LSTM, VMD–LSTM, and VMD–CNN prediction results.

Close modal

The forecasting results of the CNN and LSTM models are slightly superior to those of the CNN–LSTM model. This could be attributable to the fact that the CNN is primarily employed to extract the local features of the sequence data, while the LSTM is used to capture the time dependence. There may be information loss when employing the output of the CNN as input to the LSTM. Since the CNN downsamples and extracts features from the input sequence, the loss of detailed information may occur if there is excessive noise. This may be crucial for the temporal modelling of the LSTM. Therefore, reducing the impact of noise can improve the model's simulation accuracy.

Effectiveness testing of VMD

To validate the efficacy of VMD decomposition in enhancing the accuracy of simulation, four models were constructed: VMD–CNN–AM–LSTM, VMD–CNN–LSTM, VMD–LSTM, and VMD–CNN. Figure 6 illustrates the forecasting results. It can be seen that all four models predicted the runoff trend more accurately and that the data accuracy was enhanced by decomposition in comparison to Figure 5.

Table 3 presents an evaluation and listing of the overall predictive performances of each model. The assessment criteria indicate that the VMD–CNN–AM–LSTM model exhibits the highest accuracy, with all indices exhibiting considerable improvement and the MAPE at 2.54%. This value is 0.33% lower than the value obtained without the use of VMD decomposition. The results indicate that the utilisation of a hybrid of VMD and AM effectively enhances the prediction accuracy of the daily runoff curve's peak and valley areas in comparison to a hybrid without VMD. This is attributed to the reduction in the complexity of the load sequence and enhancement of the prediction model's ability to capture stochastic variations, which is enabled by VMD. Furthermore, the effective utilisation of the decomposed sequence's characteristics by AM highlights the significance of the hybrid of VMD and AM in predicting the complexity of the factors.

Forecasting stability

The predictive performance of models is measured by both the accuracy and stability of predictions. In this study, the predictive stability of all models was considered. The bias-variance (BV) is a crucial metric used to evaluate predictive stability and was calculated according to the method described by Xiao et al. (2015) and listed in the Supplementary Material. It considers the average difference between the observed value and the predictive value over all the observed data and predictive data, which represents the performance variability. The smaller the BV value, the more stable the model is. Figure 7 displays the BV value of the prediction stability of the proposed model (VMD–CNN–AM–LSTM) in comparison to the other models. The smaller value in the VMD–CNN–AM–LSTM implies a more stable performance when compared to alternative methods.
Figure 7

Prediction error results in terms of BV.

Figure 7

Prediction error results in terms of BV.

Close modal

Overall performance

Short-term runoff forecasting models play an important role in flood control and water supply planning. This study introduces VMD and AM to the field of short-term runoff forecasting and demonstrates the potential of the VMD–CNN–AM–LSTM model, especially in accurately predicting peak runoff areas. Our proposed VMD–CNN–AM–LSTM model has shown promising results in runoff forecasting, outperforming other models in terms of RMSE, MAE, R2, and MAPE. This suggests that the integration of VMD and AM has contributed to the improved accuracy and stability of the model.

Comparison with previous studies

Previous studies on runoff forecasting have employed various hybrid models, including VMD–LSTM–gradient boosting decision tree (GBDT) (Sun et al. 2022) and ensemble empirical mode decomposition (EEMD)–LSTM (Zuo et al. 2020b). For instance, Sun et al. applied the VMD–LSTM–GBDT model, achieving an R² of 0.989, while Zuo et al. reported a performance of 0.9366 using the EEMD–LSTM–GBDT model. In comparison, our VMD–CNN–AM–LSTM model achieved an R2 of 0.9933, demonstrating superior predictive accuracy. This improvement is attributed to the integration of the AM, which enhances feature extraction and the model's ability to focus on crucial temporal dependencies in the runoff data.

The hybrid models used in previous studies, such as EMD–LSTM or wavelet-based models (Li et al. 2021; Xiao & Wang 2021), focused on sequence decomposition but did not incorporate spatial dependencies or attention mechanisms. In contrast, our VMD–CNN–AM–LSTM model benefits from the CNN's ability to capture local spatial features, while the AM dynamically assigns weights to significant inputs, enhancing temporal feature selection. This hybrid integration allows our model to reduce noise more effectively and capture complex patterns, outperforming models that rely solely on temporal decomposition (e.g., VMD or EMD).

While most previous studies have focused primarily on temporal aspects of runoff forecasting, our model integrates both spatial and temporal dimensions by incorporating runoff data from upstream stations. This approach allows the VMD–CNN–AM–LSTM model to better capture the spatio-temporal complexities of runoff, as evidenced by the model's superior performance across all evaluation metrics. For example, the BV decomposition analysis shows that our model offers greater stability compared to models like VMD–LSTM–GBDT (Sun et al. 2022), making it particularly suitable for regions with complex hydrological dynamics.

Our study advances the field of runoff forecasting by introducing a hybrid model that combines VMD, CNN, AM, and LSTM. This comprehensive approach addresses both temporal and spatial complexities, offering improved accuracy and stability. The integration of VMD and AM, in particular, has proven to be a significant advancement, enabling more reliable forecasting, especially in peak runoff periods, as compared to earlier models that focus solely on temporal decomposition or lack attention mechanisms.

Influence of sub-models

It is important to note that increasing the complexity of a model does not always lead to improved performance. The selection of model complexity should be guided by the characteristics of the given dataset (Ng et al. 2023). An analysis of the effectiveness of AM revealed that the CNN–AM–LSTM model did not exhibit superior accuracy when compared to the CNN–LSTM model. Incorporating AM does not necessarily guarantee improved accuracy in runoff prediction. The weight assignment of the AM may introduce additional errors, particularly in regions with complex hydrological processes, increased uncertainty during flood seasons, and significant time series noise, where it may struggle to accurately simulate peak conditions.

When comparing the CNN–LSTM model to the CNN and LSTM models, it was found to be more stable, although its simulation accuracy was slightly lower than that of the individual CNN and LSTM models. This discrepancy can be attributed to the fact that CNN is primarily designed to extract localized features from sequential data (Guo et al. 2018), while LSTM is specifically designed to capture temporal dependencies (Yu et al. 2019). The use of the output of the CNN as the input for the LSTM may result in the loss of critical information. The CNN downsamples the input sequences and extracts features, which may omit detailed information necessary for the LSTM's temporal modelling. Consequently, reducing noise can effectively improve the accuracy of our model's simulations.

In this study, we integrated upstream hydrological data in addition to AM and VMD, thereby enhancing the physical reliability of runoff predictions for the target stations. The incorporation of historical station data and upstream hydrological data has facilitated a deeper understanding of the correlations between runoff and temporal rate changes across different time frequencies, as noted by Song et al. (2017). Furthermore, the inclusion of upstream hydrological data in our predictions enhances operational feasibility, as highlighted by Chen et al. (2020). Ahmed et al. (2022) conducted research on the use of meteorological data, such as rainfall and sunlight, to predict runoff decomposition data in rivers, which yielded positive results. However, short-term river forecasting can be challenging due to the collection and processing of meteorological data. The correlations between meteorological elements and runoff may vary across different basins, which can significantly impact model accuracy (Nguyen-Huy et al. 2017; Ahmed et al. 2021). Therefore, our study focuses on the use of upstream runoff data for predictions, which offers significant advantages in terms of data processing and collection.

Although this study does not include a full quantitative sensitivity analysis, it is clear from the model's design that certain input factors significantly influence its performance. The upstream runoff data, particularly from the YC, ZC, and SS stations, is crucial for the model's accuracy, as these stations capture spatial dependencies that affect the runoff at the downstream JL station. The timing and magnitude of upstream runoff during flood periods play a decisive role in determining prediction accuracy.

Additionally, the VMD-decomposed components help mitigate noise in the runoff data, improving the model's stability. Variations in the quality of these components could potentially affect the model's predictive ability, particularly during periods of low flow when noise is more prevalent. The AM incorporated in the model further enhances performance by assigning appropriate weights to the most relevant features, ensuring that the model remains resilient to fluctuations in input quality.

Limitations and prospects

While our model shows promising results, it is important to acknowledge its limitations. The accuracy of predictions may be influenced by uncertainties in input data, and the model's performance may vary under different hydrological conditions. Additionally, the selection of K in VMD introduces some subjectivity, and further research could explore robust methods for determining this parameter. During the analysis of the runoff data, we observed significant autocorrelation, especially during high-flow periods. This autocorrelation suggests that past runoff values have a strong influence on future values, which the VMD–CNN–AM–LSTM model captures effectively in short-term predictions. However, our focus remains on immediate, short-term forecasting for operational water resource management. For future studies, exploring longer lead times and the impact of autocorrelation on extended predictions could provide additional insights into the temporal dynamics of runoff. To further improve runoff forecasting models, future research could focus on incorporating additional environmental factors, exploring the impact of climate change, and refining the AM. Furthermore, an investigation into the transferability of the model to different regions and the integration of real-time data could contribute to its practical application.

This study introduces a hybrid VMD–CNN–AM–LSTM model incorporating upstream hydrological station runoff sequences that is introduced for short-term runoff prediction. The model's simulation accuracy and stability are assessed. The key findings are as follows:

  • (1) VMD preprocessing reduces data randomness and non-stationarity, enhancing predictive accuracy. Leveraging CNN for feature extraction, coupled with AM emphasizing key features, not only captures vital information efficiently but also improves training efficiency and reduces time.

  • (2) The incorporation of upstream station runoff features enables the model to accommodate load fluctuations due to temporal changes, thereby enhancing prediction accuracy across various runoff periods.

  • (3) The evaluation of the model using RMSE, MAE, R2, MAPE, and BV demonstrates that it outperforms conventional methods and hybrid models in terms of prediction accuracy and stability.

This model, which incorporates VMD, CNN, AM, and LSTM, is designed to capture the spatial and temporal characteristics of runoff, thereby enhancing the accuracy and stability of forecasting. The application of this hybrid model to runoff forecasting at the JL hydrological station provides evidence of its potential to provide valuable technical support for water resources management planning in the region.

No human participant is involved in this study.

All authors have read and agreed to publish the manuscript in this version.

Conceptualized by H.C. and L.K.; rendered support of funding acquisition of L.K.; development of methodology by H.C. and W.Z.; visualized by Y.W., J.Y., and R.Q.; wrote the original draft by H.C.; wrote the reviewed and edited by L.K. and L.Z.

This work was supported by the National Key Research and Development Program of China (Grant No. 2022YFC3002704) and China Yangzi Power Co., Ltd (Z242302051).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Ahmed, A. A. M., Deo, R.C., Raj, N., Ghahramani, A., Feng, Q., Yin, Z. & Yang, L.
(
2021
)
Deep learning forecasts of soil moisture: convolutional neural network and gated recurrent unit models coupled with satellite-derived MODIS, observations and synoptic-scale climate index data
,
Remote Sensing
, 13 (4), 554.
Ahmed
A. A. M.
, Deo, R. C., Ghahramani, A., Feng, Q., Raj, N., Yin, Z. & Yang, L. (
2022
)
New double decomposition deep learning methods for river water level forecasting
,
Science of The Total Environment
,
831
,
154722
.
Bahdanau
D.
,
Cho
K.
&
Bengio
Y.
(
2014
)
Neural Machine Translation by Jointly Learning to Align and Translate
.
San Diego, CA: CoRR
.
abs/1409.0473
.
Chen, X., Huang, J., Han, Z., Gao, H., Liu, M., Li, Z., Liu, X., Li, Q., Qi, H. & Huang, Y.
(
2020
)
The importance of short lag-time in the runoff forecasting model based on long short-term memory
,
Journal of Hydrology
,
589
,
125359
.
Deng
D.
,
Li
J.
,
Zhang
Z.
,
Teng
Y.
&
Huang
Q.
(
2020
)
Short-term electric load forecasting based on EEMD-GRU-MLR (in Chinese)
,
Power System Technology
,
44
(
02
),
593
602
.
Ding
Y.
,
Zhu
Y.
,
Feng
J.
,
Zhang
P.
&
Cheng
Z.
(
2020
)
Interpretable spatio-temporal attention LSTM model for flood forecasting
,
Neurocomputing
,
403
,
348
359
.
Dragomiretskiy
K.
&
Zosso
D.
(
2014
)
Variational mode decomposition
,
IEEE Transactions on Signal Processing
,
62
(
3
),
531
544
.
Fathian
F.
(
2021
)
Introduction of multiple/multivariate linear and nonlinear time series models in forecasting streamflow process
. In: Sharma, P. & Machiwal, D. (Eds.)
Advances in Streamflow Forecasting
, Amsterdam: Elsevier, pp.
87
113
.
Guo, K., Sui L., Qiu, J., Yu, J., Wang, J. & Yao, S.
(
2018
)
Angel-eye: A complete design flow for mapping CNN onto embedded FPGA
,
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
,
37
(
1
),
35
47
.
Huang
S.
,
Chang
J.
,
Huang
Q.
&
Chen
Y.
(
2014
)
Monthly streamflow prediction using modified EMD-based support vector machine
,
Journal of Hydrology
,
511
,
764
775
.
Joseph
V. R.
(
2022
)
Optimal ratio for data splitting
,
Statistical Analysis and Data Mining: The ASA Data Science Journal.
,
15
,
531
538
.
Li
M.
,
Wang
Q. J.
,
Bennett
J. C.
&
Robertson
D. E.
(
2015
)
A strategy to overcome adverse effects of autoregressive updating of streamflow forecasts
,
Hydrology and Earth System Sciences
,
19
(
1
),
1
15
.
Mohammadi
K.
,
Eslami
H. R.
&
Kahawita
R.
(
2006
)
Parameter estimation of an ARMA model for river flow forecasting using goal programming
,
Journal of Hydrology
,
331
(
1
),
293
299
.
Ng, K. W., Huang, C.H., Koo, C. H., Chong, K. L., El-Shafie, A. & Ahmed, A. N.
(
2023
)
A review of hybrid deep learning applications for streamflow forecasting
,
Journal of Hydrology
,
625
,
130141
.
Nguyen-Huy
T.
,
Deo
R. C.
,
An-Vo
D.
,
Mushtaq
S.
&
Khan
S.
(
2017
)
Copula-statistical precipitation forecasting model in Australia's agro-ecological zones
,
Agricultural Water Management
,
191
,
153
172
.
Song
H.
,
Rajan
D.
,
Thiagarajan
J. J.
&
Spanias
A.
(
2017
)
Attend and Diagnose: Clinical Time Series Analysis using Attention Models
.
Sun, X., Zhang, H., Wang, J., Shi, C., Hua, D. & Li, J.
(
2022
)
Ensemble streamflow forecasting based on variational mode decomposition and long short term memory
,
Scientific Reports
,
12
(
1
),
518
.
Xiao
Y.
&
Wang
K.
(
2021
). '
Research on streamflow forecast based on EEMD and long short-term memory
',
Proceedings – 2021 International Conference on Artificial Intelligence and Electromechanical Automation
.
AIEA 2021
, pp.
328
333
.
Yu
Y.
,
Si
X.
,
Hu
C.
&
Zhang
J.
(
2019
)
A review of recurrent neural networks: lSTM cells and network architectures
,
Neural Computation
,
31
(
7
),
1235
1270
.
Zakizadeh
H.
,
Ahmadi
H.
,
Zehtabian
G.
,
Moeini
A.
&
Moghaddamnia
A.
(
2020
)
A novel study of SWAT and ANN models for runoff simulation with application on dataset of metrological stations
,
Physics and Chemistry of The Earth, Parts A/B/C
,
120
,
102899
.
Zuo
G.
,
Luo
J.
,
Wang
N.
,
Lian
Y.
&
He
X.
(
2020b
)
Two-stage variational mode decomposition and support vector regression for streamflow forecasting
,
Hydrology and Earth System Sciences.
,
24
,
5491
5518
.
Zurey
W. N. H. F.
,
Ismail
S.
&
Mustapha
A.
(
2020
)
Forecasting accuracy: A comparative study between artificial neural network and autoregressive model for streamflow
,
IAES International Journal of Artificial Intelligence
,
9
(
3
),
464
472
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data