Abstract
Accurate runoff prediction is of great significance for flood prevention and mitigation, agricultural irrigation, and reservoir scheduling in watersheds. To address the strong non-linear and non-stationary characteristics of runoff series, a hybrid model of monthly runoff prediction, variational mode decomposition (VMD)–long short-term memory (LSTM)–Transformer, is proposed. Firstly, VMD is used to decompose the runoff series into multiple modal components, and the sample entropy of each modal component is calculated and divided into high-frequency and low-frequency components. The LSTM model is then used to predict the high-frequency components and the transformer to predict the low-frequency components. Finally, the prediction results are summed to obtain the final prediction results. The Mann–Kendall trend test method is used to analyze the runoff characteristics of the Miyun Reservoir, and the constructed VMD–LSTM–Transformer model is used to forecast the runoff of the Miyun Reservoir. The prediction results are compared and evaluated with those of VMD–LSTM, VMD–Transformer, empirical mode decomposition (EMD)–LSTM–Transformer, and empirical mode decomposition (EMD)–LSTM models. The results show that the Nash–Sutcliffe efficiency coefficient (NSE) value of this model is 0.976, mean absolute error (MAE) is 0.206 × 107 m3, mean absolute percentage error (MAPE) is 0.381%, and root mean squared error (RMSE) is 0.411 × 107 m3, all of which are better than other models, indicating that the VMD–LSTM–Transformer model has higher prediction accuracy and can be applied to runoff prediction in the actual study area.
HIGHLIGHTS
The VMD–LSTM–Transformer model proposed in this paper achieves higher accuracy in monthly runoff prediction compared to other models.
The VMD decomposition method used in the model improves the completeness and adequacy of time series decomposition.
By using LSTM and Transformer models for different frequency components, the proposed model achieves better prediction accuracy.
INTRODUCTION
Water resources have become a significant global issue due to rapid economic and societal development (Tang et al. 2022). Rational use of water resources, a safe, efficient, and renewable natural resource, can effectively alleviate energy shortages (Qin et al. 2022). Runoff forecasting, an essential technology for the rational use of water resources, is currently a research focus and difficulty in related disciplines (Xu et al. 2022). Medium and long-term runoff forecasting predicts runoff processes with a lead time of more than 3 days based on hydrological phenomena and using causality analysis and mathematical modeling as input conditions (Mohammadi 2021). Timely and accurate forecast data, as a non-engineering measure, provides essential reference information for flood control decision-making in the water conservancy department (Guo et al. 2004). Compared with short-term runoff forecasting, medium and long-term runoff forecasting results can provide more decision-making time for water resources allocation, flood control and benefit promotion, and watershed ecological protection, playing an important role in these areas (Emerton et al. 2016).
Traditional runoff prediction models mainly include two methods: causality analysis and mathematical statistics (Liu et al. 2022a, 2022b). The causality analysis method collects relevant meteorological characteristics and establishes a model after determining the relationship between meteorological and hydrological information. However, due to the high requirements for the completeness and accuracy of hydrological data, the causality analysis method lacks implementation feasibility in practical runoff prediction, especially in hydrological forecasting departments (Zhang et al. 2021). The mathematical statistics method mines the internal patterns of historical runoff and explores the statistical laws of the internal influencing factors of the runoff sequence through mathematical statistics methods. This method is based on an understanding of the formation of the runoff patterns and further constructs sequence prediction models (Wilby et al. 2003). The mathematical statistics method mainly includes two categories: multivariate linear analysis and time series analysis runoff statistical models, which are based on historical data. The overall prediction accuracy depends on the completeness of the data, and the accuracy of runoff prediction will be limited when the data are missing (Machiwal & Jha 2006).
In recent years, deep learning methods have been increasingly applied to the field of forecasting, resulting in significant research breakthroughs. Among the deep learning algorithms available in the 21st century, recurrent neural networks (RNNs) have shown advantages in processing time series data by maintaining a state between different inputs (Schmidt et al. 2019). The RNN framework includes feedback connections that consider both current and adjacent information in the data. Hochreiter & Schmidhuber (1997) proposed the first long short-term memory (LSTM) neural network, which is a special type of RNN that solved the problem of long sequence dependence and gradient explosion. The LSTM framework includes ‘control gates’ to manage the flow of information and prevent model disturbances caused by useless data, as well as ‘forget gates’ to discard unneeded information. LSTM has feedback loops in the recurrent layer, allowing it to store information in memory over time. Consequently, compared with other neural networks, LSTM can better utilize the temporal characteristics of time series data (Wan et al. 2020). The Transformer is another neural sequence model that has an encoder–decoder structure. It has a strong long-term dependency modeling ability, and its multi-head attention mechanism can effectively explore the intrinsic correlations of sequence data. As a result, the Transformer has received considerable attention in the field of time series analysis (Liu et al. 2022a, 2022b).
Hydrological processes are the result of specific natural climate conditions. The challenge in processing runoff data lies in its non-stationary nature, which is characterized by the fluctuating amplitude and frequency, as well as trend changes (Khaliq et al. 2006). While machine learning has strong adaptive capabilities, relying solely on a single prediction model may overlook hidden patterns in the training data, thus diminishing the model's accuracy. To address this challenge, researchers have employed a divide-and-conquer modeling approach, combining time–frequency analysis with data-driven models to extract hidden features such as high-frequency disturbances, medium-frequency fluctuations, and long-term trends from runoff sequences (Rolim & de Souza Filho 2020). This method significantly improves the prediction accuracy of runoff and has been widely used in fields such as energy, finance, transportation, and hydrology. This approach forms a stable decomposition–prediction–reconstruction framework. The decomposed prediction combination model is a hybrid method that combines signal decomposition techniques with prediction models (Altan et al. 2021). Analyzing hydrological data can greatly enhance the prediction accuracy of prediction models due to the time complexity of hydrological time series changes. As a result, the decomposed prediction combination model has become widely used in solving hydrological prediction problems. The decomposition methods utilized in the decomposed hybrid models can be classified into three categories: wavelet transform (WT), methods based on empirical mode decomposition (EMD), and variational mode decomposition (VMD) (Liu et al. 2021). Wavelet decomposition requires human input in terms of pre-setting the mother wavelet function and decomposition layers. On the other hand, EMD and ensemble empirical mode decomposition (EEMD) can adaptively decompose time series into intrinsic mode components based on local characteristics, making them more universal than WT. However, these methods may suffer from spurious components and mode mixing, which limits prediction accuracy. VMD is a non-recursive variational mode decomposition that can reduce these issues and is an effective method for decomposing complex non-linear and non-stationary time series (Niu et al. 2020). Currently, scholars are combining algorithms to divide decomposed components into high-frequency and medium-low frequency components. These components are predicted separately using different methods, and the predicted values are added to obtain the final result. Zhao et al. (2022) used an EEMD–LSTM–autoregressive integrated moving average (ARIMA) coupled model to predict monthly precipitation in Luoyang City. They used the permutation entropy (PE) algorithm to divide the different frequency mode components obtained by EEMD into high-frequency and low-frequency components. They separately predicted these components using the LSTM neural network and ARIMA. The results showed that the prediction accuracy of the model was higher than that of EMD–LSTM, EEMD–LSTM, and EEMD–ARIMA.
In summary, this paper proposes a novel hybrid model VMD–LSTM–Transformer for monthly runoff prediction for runoff sequences with strong randomness and volatility characteristics, and uses a combination of neural network and deep learning model for frequency separation prediction, and applies it to the monthly runoff prediction of the Miyun Reservoir in Beijing. The runoff sequence is first decomposed using VMD, and the intrinsic mode components (IMF1, IMF2, …, IMFn) are divided into high-frequency and low-frequency components based on the sample entropy (SE). LSTM neural network, with its powerful feature extraction ability and strong processing capability for non-linear data structures, is used to predict the high-frequency components of the runoff sequence. The Transformer is used to predict the low-frequency components, which have weaker non-linearity. The predicted values of high and low frequencies are added to obtain the final prediction result of the runoff sequence.
MATERIALS AND METHODS
Decomposition-prediction model
A single prediction model may not be sufficient to identify the underlying patterns in the original signal. The decomposition–prediction modeling approach commonly employs a decomposition algorithm to break down non-stationary runoff sequences into several simpler sub-sequences. These sub-sequences contain hidden features such as the original sequence structure and periodicity. Each sub-sequence is then modeled to simplify the process of building the model. This modeling method enhances the prediction accuracy of each sub-sequence and ensures the accuracy of the original runoff sequence prediction, resulting in a stable combination model framework (Song & Chen 2021).
Variational mode decomposition




















LSTM




















Transformer

The pos represents the position index of the input word vector, dmodel is the dimension of the word vector. The 2i-th and 2i+ 1-th dimensions of the position information correspond to Equations (12) and (13), respectively. Finally, the position information matrix is added to the input matrix. The feed forward part consists of two linear layers and an activation function, which enhances the expressive power of word representations through non-linear transformations. The Add&Norm part includes a residual connection and a LayerNorm module, which addresses the problem of gradient vanishing. The Transformer model structure is shown in Figure 2.
VMD–LSTM–Transformer coupled model
SE is determined by the probabilities of two sets of sequences matching m and m + 1 points under tolerance, represented by Bm(r) and Bm+1(r), respectively. Previous research has shown that SE is statistically significant only when m is set to 1 or 2 and r falls between 0.1 and 0.25 times the standard deviation of the test sequence. To ensure statistical significance, we have set m = 2 and r to 0.2 times the standard deviation of the test sequence in our study.
Parameter setting
The accuracy of hydrological prediction models relies on three essential factors: high-quality input data, appropriate model selection, and a reasonable model structure. Once the model type is chosen, the model parameters have the most significant influence on the final prediction performance. Model training involves adjusting the values of each parameter to make the model output as close as possible to the actual value (Magnusson et al. 2015). Machine learning model parameters are categorized into model parameters and hyperparameters based on their function and determination method. Model parameters are internal system variables configured within the model that are automatically obtained through training data. Hyperparameters, or tuning parameters, are external configuration variables set in advance during model establishment that cannot be obtained through model learning. Hyperparameters are the primary adjustment knobs that control the model's structure, function, efficiency, and other aspects. This paper uses grid search to list the parameter values at certain intervals, trains models under different parameter scenarios using training data, evaluates the performance of these models on a validation set, and selects the best parameter value based on the evaluation results.
The specific parameters related to the VMD decomposition effect are: penalty factor , mode number K, noise tolerance parameter
, and convergence error
. If K is set too large, there will be a problem of mode repetition, and if K is set too small, there will be under-decomposition, which will mainly affect the bandwidth of IMF. After repeated experiments, it is found that
, K = 6;
and
are usually set to default values, with
set to 0.3 and
set to 10−7. The hyperparameters of the LSTM model are set as follows: the maximum number of iterations is set to 200, the learning rate is set to 0.01, the number of hidden layers is set to 2, the number of hidden layer output units is set to 100, and the L2 regularization parameter is set to 10−6. The hyperparameters of the Transformer model are set as follows: the number of layers of the Transformer is 3, the number of self-attention heads is 8, the dimension of the hidden layer is 512, the dropout probability is 0.1, the learning rate is 0.01, and the maximum sequence length is 1,000.
Evaluation indicators



CASE STUDY
Study area
Data source
Interannual variation curves of runoff, precipitation, and temperature in the Miyun Reservoir.
Interannual variation curves of runoff, precipitation, and temperature in the Miyun Reservoir.
Runoff characteristics analysis
M–K mutation test for the inlet runoff sequence of the Miyun Reservoir.
RESULTS AND DISCUSSION
Results
Results of VMD decomposition of monthly runoff from the Miyun Reservoir.
Sample entropy values of decomposed subseries of monthly runoff series of the Miyun Reservoir
Component . | IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . |
---|---|---|---|---|---|---|
SE | 1.06 | 0.96 | 0.85 | 0.76 | 0.68 | 0.54 |
Component . | IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . |
---|---|---|---|---|---|---|
SE | 1.06 | 0.96 | 0.85 | 0.76 | 0.68 | 0.54 |
Comparison of predicted and true monthly runoff values of the Miyun Reservoir.
Discussion
Comparison of evaluation index values of different models for runoff prediction results
No. . | Model . | NSE . | MAE (107m3) . | MAPE (%) . | RMSE (107m3) . |
---|---|---|---|---|---|
1 | VMD–LSTM–Transformer | 0.976 | 0.206 | 0.381% | 0.411 |
2 | VMD–LSTM | 0.916 | 1.112 | 2.059% | 1.779 |
3 | VMD–Transformer | 0.932 | 1.177 | 2.179% | 2.236 |
4 | EMD–LSTM–Transformer | 0.901 | 1.291 | 2.391% | 2.105 |
5 | EMD–LSTM | 0.878 | 1.335 | 2.473% | 2.096 |
No. . | Model . | NSE . | MAE (107m3) . | MAPE (%) . | RMSE (107m3) . |
---|---|---|---|---|---|
1 | VMD–LSTM–Transformer | 0.976 | 0.206 | 0.381% | 0.411 |
2 | VMD–LSTM | 0.916 | 1.112 | 2.059% | 1.779 |
3 | VMD–Transformer | 0.932 | 1.177 | 2.179% | 2.236 |
4 | EMD–LSTM–Transformer | 0.901 | 1.291 | 2.391% | 2.105 |
5 | EMD–LSTM | 0.878 | 1.335 | 2.473% | 2.096 |
Taylor diagram comparing the prediction results of different models.
The runoff prediction model proposed in this paper is of great significance for optimizing water resources allocation and reservoir operation. Knowing how much water is expected to enter a reservoir allows for better planning of release, ensuring a balance between meeting water demand, flood control, and reservoir filling. Optimizing reservoir scheduling can enable more sustainable water management and improve water availability for all uses throughout the year.
CONCLUSION
This paper proposes a coupled VMD–LSTM–Transformer model for predicting the incoming runoff volume of the Miyun Reservoir in Beijing. The accuracy of the runoff series predictions from various models is compared and analyzed, leading to the following conclusions:
- (1)
The VMD decomposition method improves the completeness and adequacy of time series decomposition, reduces the interference of random components to deterministic components, and thus improves the model's prediction ability.
- (2)
The LSTM model is used to train the prediction for the higher frequency components, while the Transformer model is used for the lower frequency components, further improving the model's prediction accuracy.
- (3)
The VMD–LSTM–Transformer model's prediction results outperform those of VMD–LSTM, VMD–Transformer, EMD–LSTM–Transformer, and EMD–LSTM. Therefore, this model provides a reliable method for monthly runoff time series prediction.
- (4)
The combined prediction method proposed in this study combines sample entropy, prediction model, and error analysis to establish a runoff forecast model, which can achieve more accurate prediction accuracy, and is an efficient and practical prediction method, which can provide valuable support for watershed water resources management decisions.
- (5)
Future research can consider adding precipitation, evaporation, and temperature factors to improve the prediction accuracy. In addition, given the presence of negative water balance factors, an in-depth understanding of projections of future water balance in the catchment will be the focus of our next study.
Overall, the proposed model provides a new approach to monthly runoff prediction research. By applying VMD decomposition and using LSTM and Transformer models for different frequency components, the model achieves higher accuracy in predicting the incoming runoff volume. In future research, adding more factors can further improve the model's prediction accuracy.
AUTHOR CONTRIBUTION
All authors contributed to the study's conception and design. Writing and editing: S.G. and Y.W.; preliminary data collection: X.Z. and H.C. All authors read and approved the final manuscript.
FUNDING
This work was supported by the Key Scientific Research Project of Colleges and Universities in Henan Province (CN) [grant numbers 17A570004]. This work was also funded by the North China University of Water Resources and Electric Power Innovation Ability Improvement Project for Postgraduates [grant number NCWUYC-2023006].
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.