Accurate runoff prediction is of great significance for flood prevention and mitigation, agricultural irrigation, and reservoir scheduling in watersheds. To address the strong non-linear and non-stationary characteristics of runoff series, a hybrid model of monthly runoff prediction, variational mode decomposition (VMD)–long short-term memory (LSTM)–Transformer, is proposed. Firstly, VMD is used to decompose the runoff series into multiple modal components, and the sample entropy of each modal component is calculated and divided into high-frequency and low-frequency components. The LSTM model is then used to predict the high-frequency components and the transformer to predict the low-frequency components. Finally, the prediction results are summed to obtain the final prediction results. The Mann–Kendall trend test method is used to analyze the runoff characteristics of the Miyun Reservoir, and the constructed VMD–LSTM–Transformer model is used to forecast the runoff of the Miyun Reservoir. The prediction results are compared and evaluated with those of VMD–LSTM, VMD–Transformer, empirical mode decomposition (EMD)–LSTM–Transformer, and empirical mode decomposition (EMD)–LSTM models. The results show that the Nash–Sutcliffe efficiency coefficient (NSE) value of this model is 0.976, mean absolute error (MAE) is 0.206 × 107 m3, mean absolute percentage error (MAPE) is 0.381%, and root mean squared error (RMSE) is 0.411 × 107 m3, all of which are better than other models, indicating that the VMD–LSTM–Transformer model has higher prediction accuracy and can be applied to runoff prediction in the actual study area.

  • The VMD–LSTM–Transformer model proposed in this paper achieves higher accuracy in monthly runoff prediction compared to other models.

  • The VMD decomposition method used in the model improves the completeness and adequacy of time series decomposition.

  • By using LSTM and Transformer models for different frequency components, the proposed model achieves better prediction accuracy.

Water resources have become a significant global issue due to rapid economic and societal development (Tang et al. 2022). Rational use of water resources, a safe, efficient, and renewable natural resource, can effectively alleviate energy shortages (Qin et al. 2022). Runoff forecasting, an essential technology for the rational use of water resources, is currently a research focus and difficulty in related disciplines (Xu et al. 2022). Medium and long-term runoff forecasting predicts runoff processes with a lead time of more than 3 days based on hydrological phenomena and using causality analysis and mathematical modeling as input conditions (Mohammadi 2021). Timely and accurate forecast data, as a non-engineering measure, provides essential reference information for flood control decision-making in the water conservancy department (Guo et al. 2004). Compared with short-term runoff forecasting, medium and long-term runoff forecasting results can provide more decision-making time for water resources allocation, flood control and benefit promotion, and watershed ecological protection, playing an important role in these areas (Emerton et al. 2016).

Traditional runoff prediction models mainly include two methods: causality analysis and mathematical statistics (Liu et al. 2022a, 2022b). The causality analysis method collects relevant meteorological characteristics and establishes a model after determining the relationship between meteorological and hydrological information. However, due to the high requirements for the completeness and accuracy of hydrological data, the causality analysis method lacks implementation feasibility in practical runoff prediction, especially in hydrological forecasting departments (Zhang et al. 2021). The mathematical statistics method mines the internal patterns of historical runoff and explores the statistical laws of the internal influencing factors of the runoff sequence through mathematical statistics methods. This method is based on an understanding of the formation of the runoff patterns and further constructs sequence prediction models (Wilby et al. 2003). The mathematical statistics method mainly includes two categories: multivariate linear analysis and time series analysis runoff statistical models, which are based on historical data. The overall prediction accuracy depends on the completeness of the data, and the accuracy of runoff prediction will be limited when the data are missing (Machiwal & Jha 2006).

In recent years, deep learning methods have been increasingly applied to the field of forecasting, resulting in significant research breakthroughs. Among the deep learning algorithms available in the 21st century, recurrent neural networks (RNNs) have shown advantages in processing time series data by maintaining a state between different inputs (Schmidt et al. 2019). The RNN framework includes feedback connections that consider both current and adjacent information in the data. Hochreiter & Schmidhuber (1997) proposed the first long short-term memory (LSTM) neural network, which is a special type of RNN that solved the problem of long sequence dependence and gradient explosion. The LSTM framework includes ‘control gates’ to manage the flow of information and prevent model disturbances caused by useless data, as well as ‘forget gates’ to discard unneeded information. LSTM has feedback loops in the recurrent layer, allowing it to store information in memory over time. Consequently, compared with other neural networks, LSTM can better utilize the temporal characteristics of time series data (Wan et al. 2020). The Transformer is another neural sequence model that has an encoder–decoder structure. It has a strong long-term dependency modeling ability, and its multi-head attention mechanism can effectively explore the intrinsic correlations of sequence data. As a result, the Transformer has received considerable attention in the field of time series analysis (Liu et al. 2022a, 2022b).

Hydrological processes are the result of specific natural climate conditions. The challenge in processing runoff data lies in its non-stationary nature, which is characterized by the fluctuating amplitude and frequency, as well as trend changes (Khaliq et al. 2006). While machine learning has strong adaptive capabilities, relying solely on a single prediction model may overlook hidden patterns in the training data, thus diminishing the model's accuracy. To address this challenge, researchers have employed a divide-and-conquer modeling approach, combining time–frequency analysis with data-driven models to extract hidden features such as high-frequency disturbances, medium-frequency fluctuations, and long-term trends from runoff sequences (Rolim & de Souza Filho 2020). This method significantly improves the prediction accuracy of runoff and has been widely used in fields such as energy, finance, transportation, and hydrology. This approach forms a stable decomposition–prediction–reconstruction framework. The decomposed prediction combination model is a hybrid method that combines signal decomposition techniques with prediction models (Altan et al. 2021). Analyzing hydrological data can greatly enhance the prediction accuracy of prediction models due to the time complexity of hydrological time series changes. As a result, the decomposed prediction combination model has become widely used in solving hydrological prediction problems. The decomposition methods utilized in the decomposed hybrid models can be classified into three categories: wavelet transform (WT), methods based on empirical mode decomposition (EMD), and variational mode decomposition (VMD) (Liu et al. 2021). Wavelet decomposition requires human input in terms of pre-setting the mother wavelet function and decomposition layers. On the other hand, EMD and ensemble empirical mode decomposition (EEMD) can adaptively decompose time series into intrinsic mode components based on local characteristics, making them more universal than WT. However, these methods may suffer from spurious components and mode mixing, which limits prediction accuracy. VMD is a non-recursive variational mode decomposition that can reduce these issues and is an effective method for decomposing complex non-linear and non-stationary time series (Niu et al. 2020). Currently, scholars are combining algorithms to divide decomposed components into high-frequency and medium-low frequency components. These components are predicted separately using different methods, and the predicted values are added to obtain the final result. Zhao et al. (2022) used an EEMD–LSTM–autoregressive integrated moving average (ARIMA) coupled model to predict monthly precipitation in Luoyang City. They used the permutation entropy (PE) algorithm to divide the different frequency mode components obtained by EEMD into high-frequency and low-frequency components. They separately predicted these components using the LSTM neural network and ARIMA. The results showed that the prediction accuracy of the model was higher than that of EMD–LSTM, EEMD–LSTM, and EEMD–ARIMA.

In summary, this paper proposes a novel hybrid model VMD–LSTM–Transformer for monthly runoff prediction for runoff sequences with strong randomness and volatility characteristics, and uses a combination of neural network and deep learning model for frequency separation prediction, and applies it to the monthly runoff prediction of the Miyun Reservoir in Beijing. The runoff sequence is first decomposed using VMD, and the intrinsic mode components (IMF1, IMF2, …, IMFn) are divided into high-frequency and low-frequency components based on the sample entropy (SE). LSTM neural network, with its powerful feature extraction ability and strong processing capability for non-linear data structures, is used to predict the high-frequency components of the runoff sequence. The Transformer is used to predict the low-frequency components, which have weaker non-linearity. The predicted values of high and low frequencies are added to obtain the final prediction result of the runoff sequence.

Decomposition-prediction model

A single prediction model may not be sufficient to identify the underlying patterns in the original signal. The decomposition–prediction modeling approach commonly employs a decomposition algorithm to break down non-stationary runoff sequences into several simpler sub-sequences. These sub-sequences contain hidden features such as the original sequence structure and periodicity. Each sub-sequence is then modeled to simplify the process of building the model. This modeling method enhances the prediction accuracy of each sub-sequence and ensures the accuracy of the original runoff sequence prediction, resulting in a stable combination model framework (Song & Chen 2021).

Variational mode decomposition

The signal can be decomposed into its different modes using various methods, such as EMD, WT, singular spectrum analysis (SSA), and VMD. Among these methods, VMD is a signal processing algorithm introduced by Dragomiretskiy & Zosso (2013). Unlike traditional time-domain recursive filtering methods, VMD determines the central frequency and bandwidth of each component by iteratively searching for the optimal solution of the variational model. This effectively suppresses endpoint effects and mode aliasing phenomena. VMD is a data-driven decomposition method that separates a complex time series into several intrinsic mode functions (IMFs) with varying timescales and a residue component. VMD effectively captures the underlying variability of the time series at different timescales by breaking it down into IMFs, each representing a specific oscillation or trend. VMD is adaptive and can adapt to the inherent characteristics of the data, making it suitable for analyzing complex and non-stationary time series like monthly runoff. The input signal, denoted by , can be expressed as the variational model shown in the following equation:
formula
(1)
where K is the number of variational mode IMFs. and are the IMF set and their corresponding central frequencies. is the impulse function, is the partial derivative with respect to time t, and is the convolution operator.
formula
(2)
The alternating direction method of multipliers is used to solve Equation (2), obtaining the modal , central frequency , and Lagrange multiplier in the frequency domain. Equation (3) is then calculated as follows:
formula
(3)
where n is the current iteration, is the noise tolerance parameter, , , and are the Fourier transforms of , , and , respectively, and , , and are continuously updated until they are less than the convergence error .
formula
(4)

LSTM

LSTM is a variant of RNN. The LSTM network improves the basic network unit module of the recurrent neural network by adding an input gate i, an output gate o, a forget gate f, and a memory controller c (Chen et al. 2019). During training, LSTM can selectively remember effective information and discard irrelevant information, solving the problems of gradient vanishing and explosion in the later time steps of the recurrent neural network. By establishing dependencies between data at different time periods, LSTM can learn long-term dependent information and thus has certain advantages in time series prediction. LSTM has the ability to learn and retain information over long periods, which is essential for capturing long-term dependencies in the time series data, such as seasonal patterns or trends. LSTM can capture complex non-linear relationships between past and present observations, enabling it to model intricate patterns in the monthly runoff series. The basic structure unit of LSTM is shown in Figure 1.
Figure 1

LSTM model structure.

Figure 1

LSTM model structure.

Close modal
The exact calculation process is given as follows:
formula
(5)
formula
(6)
formula
(7)
formula
(8)
formula
(9)
formula
(10)
where denotes the current input. , denote the output of the hidden layer at time t and , respectively. , denote the cell state at time t and . , , denote the output of the forgetting gate, input gate, and output gate at time t, respectively. , , , denote the weight vectors, and , , , denote the bias vectors. and are the sigmoid function and the hyperbolic tangent function.

Transformer

Transformer is a neural sequence model with an encoder–decoder architecture, which has strong long-term dependency modeling ability. The model uses a multi-head attention mechanism to effectively explore the inherent correlation of sequence data, thus it has also been widely used in the field of time series (Mo et al. 2021). Transformer can capture long-range dependencies in the time series, allowing it to consider all past observations simultaneously, rather than being limited by the sequential nature of LSTM. Transformer's self-attention mechanism allows for parallel computation, making it more efficient for longer time series. Traditional Seq2Seq models based on RNN units cannot be parallelized due to their recurrent structure, resulting in slow training speed on large-scale data. The Transformer model uses the encoder–decoder structure of Seq2Seq models but abandons traditional RNN units, and instead uses multi-head attention mechanisms to learn the dependency relationship between word vectors. This model has significantly improved both training speed and computational accuracy. The Transformer structure is shown in Figure 2, with an internal encoder–decoder structure using a multi-head attention model to learn the relationship between word vectors. The multi-head attention model divides the input into three parts: query vectors (Q), key vectors (K), and value vectors (V). The query and key vectors are used to calculate attention weights, and the result is multiplied by the value vector. The formula for calculating the scaled dot-product attention is:
formula
(11)
where is the dimensionality of K. Since the attention mechanism is insensitive to the positional information of word embedding, the Transformer model introduces positional encodings, which are defined as follows:
formula
(12)
formula
(13)
Figure 2

Transformer model architecture.

Figure 2

Transformer model architecture.

Close modal

The pos represents the position index of the input word vector, dmodel is the dimension of the word vector. The 2i-th and 2i+ 1-th dimensions of the position information correspond to Equations (12) and (13), respectively. Finally, the position information matrix is added to the input matrix. The feed forward part consists of two linear layers and an activation function, which enhances the expressive power of word representations through non-linear transformations. The Add&Norm part includes a residual connection and a LayerNorm module, which addresses the problem of gradient vanishing. The Transformer model structure is shown in Figure 2.

VMD–LSTM–Transformer coupled model

Entropy is a measure of disorder within a system, commonly used in thermodynamics. It can be measured through methods such as approximate entropy and sample entropy. SE is an improved version of approximate entropy, which reflects the self-similarity of time series in certain patterns (Arunkumar et al. 2018). A smaller SE value indicates higher self-similarity and regularity within the corresponding time series, while more complex components within the time series contain more random noise components. SE has been increasingly utilized in various fields such as mechanical signal analysis, fault diagnosis, and medical signal processing. SE can be represented by SE(m,r,N), where m denotes the embedding dimension and N denotes the length of the runoff sequence. For a given time series, SE is defined as follows:
formula
(14)

SE is determined by the probabilities of two sets of sequences matching m and m + 1 points under tolerance, represented by Bm(r) and Bm+1(r), respectively. Previous research has shown that SE is statistically significant only when m is set to 1 or 2 and r falls between 0.1 and 0.25 times the standard deviation of the test sequence. To ensure statistical significance, we have set m = 2 and r to 0.2 times the standard deviation of the test sequence in our study.

This study predicts monthly runoff sequences using the decomposition-prediction-reconstruction approach. Firstly, we decompose the runoff sequence with the VMD method to obtain several mode functions IMF1, IMF2, …, IMFn. Then, based on the sample values, we separate it into high-frequency and low-frequency components. The LSTM neural network has a powerful feature extraction ability and can handle non-linear data structures well. Therefore, we use the LSTM model to forecast the high-frequency component of the runoff sequence. Since the low-frequency component has weaker non-linearity, we predict it using the Transformer model. We then combine the predicted results of the high-frequency and low-frequency components to obtain the final runoff prediction. By combining VMD with LSTM and Transformer, the VMD–LSTM–Transformer model takes advantage of the strengths of each component, providing a more robust and accurate prediction of monthly runoff. Finally, we evaluate the model's applicability using evaluation metrics. The VMD–LSTM–Transformer combined model's flowchart is illustrated in Figure 3.
Figure 3

Flowchart of the combined VMD–LSTM–Transformer model.

Figure 3

Flowchart of the combined VMD–LSTM–Transformer model.

Close modal

Parameter setting

The accuracy of hydrological prediction models relies on three essential factors: high-quality input data, appropriate model selection, and a reasonable model structure. Once the model type is chosen, the model parameters have the most significant influence on the final prediction performance. Model training involves adjusting the values of each parameter to make the model output as close as possible to the actual value (Magnusson et al. 2015). Machine learning model parameters are categorized into model parameters and hyperparameters based on their function and determination method. Model parameters are internal system variables configured within the model that are automatically obtained through training data. Hyperparameters, or tuning parameters, are external configuration variables set in advance during model establishment that cannot be obtained through model learning. Hyperparameters are the primary adjustment knobs that control the model's structure, function, efficiency, and other aspects. This paper uses grid search to list the parameter values at certain intervals, trains models under different parameter scenarios using training data, evaluates the performance of these models on a validation set, and selects the best parameter value based on the evaluation results.

The specific parameters related to the VMD decomposition effect are: penalty factor , mode number K, noise tolerance parameter , and convergence error . If K is set too large, there will be a problem of mode repetition, and if K is set too small, there will be under-decomposition, which will mainly affect the bandwidth of IMF. After repeated experiments, it is found that , K = 6; and are usually set to default values, with set to 0.3 and set to 10−7. The hyperparameters of the LSTM model are set as follows: the maximum number of iterations is set to 200, the learning rate is set to 0.01, the number of hidden layers is set to 2, the number of hidden layer output units is set to 100, and the L2 regularization parameter is set to 10−6. The hyperparameters of the Transformer model are set as follows: the number of layers of the Transformer is 3, the number of self-attention heads is 8, the dimension of the hidden layer is 512, the dropout probability is 0.1, the learning rate is 0.01, and the maximum sequence length is 1,000.

Evaluation indicators

In this paper, Nash–Sutcliffe efficiency coefficient (NSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE) as quantitative evaluation criteria, and the calculation equations are as follows:
formula
(15)
formula
(16)
formula
(17)
formula
(18)
where and represent the predicted runoff series and its corresponding original runoff series respectively, is the monthly average runoff volume of the original runoff series, and n is the number of runoff series values.

Study area

The Miyun Reservoir, Beijing's largest drinking water source, is situated in the northern part of the city with coordinates of 40°29.0′–40°30.5′N and 116°50.0′–117°3.5′E, as depicted in Figure 4. The Miyun Reservoir is formed by the confluence of two tributaries of the Chaohe River and the Baihe River in the upper reaches of the Chaobai River. With a controlled watershed area of 15,788 km2, the Miyun Reservoir is the largest reservoir in North China with a maximum storage capacity of 4.375 × 109 m3, of which 9.27 × 108 m3 for flood control. The normal storage level is 157.5 m, and the flood limit is 152.0 m. The average annual inflow runoff is 9.05 × 108 m3, the 100-year, the 1,000-year, and the maximum possible flood peak discharge are 15,800, 22,600, and 23,300 m3/s, respectively, and with the average water depth of 30 m, the reservoir is a crucial water source and ecological barrier for the city. The annual precipitation at the Miyun Reservoir hydrological station is 632.5 mm, with an average annual water surface evaporation of 1,037 mm and an average annual temperature of 10.9 °C (Qin et al. 2020). Due to the impact of climate change and human activities, the inflow to the reservoir has been decreasing annually since 2000. However, the completion of the South-to-North Water Diversion Project in 2014 has resulted in an increase in reservoir storage. To optimize reservoir regulation planning and water resource management, predicting the inflow runoff to the Miyun Reservoir is necessary.
Figure 4

Location of the Miyun Reservoir.

Figure 4

Location of the Miyun Reservoir.

Close modal

Data source

To verify the efficacy of the proposed monthly runoff prediction model, we utilized a sample of 720 months of monthly runoff series, ranging from January 1960 to December 2019, collected from the Miyun hydrological station. Figure 5 displays the interannual variation curves of runoff, precipitation, and air temperature in the Miyun Reservoir. To fulfill the simulation requirements, we divided the data into two parts, namely, the model training and validation parts, using the recommended principle of a 75 and 25% division (Chen et al. 2020). This division has been used in prior research studies as well. The monthly runoff series of the Miyun Reservoir can be observed in Figure 6.
Figure 5

Interannual variation curves of runoff, precipitation, and temperature in the Miyun Reservoir.

Figure 5

Interannual variation curves of runoff, precipitation, and temperature in the Miyun Reservoir.

Close modal
Figure 6

Monthly runoff series from the Miyun Reservoir.

Figure 6

Monthly runoff series from the Miyun Reservoir.

Close modal

Runoff characteristics analysis

To assess the intra-annual distribution process of multi-year average monthly runoff from the Miyun Reservoir, we present the results in Figure 7. It can be seen from the figure that there is an obvious summer flood process in the annual distribution of the Miyun Reservoir, and the annual distribution is uneven, with obvious seasonality and the boundary between abundance and drought. The annual runoff recharge is mainly concentrated in summer. Our analysis shows that the annual runoff exhibits a single-peak distribution with high middle and low ends. The box span from September to June of the following year is small, indicating a low interannual variation of runoff during this period. However, from July to August, more anomalies are observed, and the span is larger, indicating a more drastic interannual variation of monthly runoff during the flood season. Hence, we conclude that predicting the July–August runoff is more challenging.
Figure 7

Monthly runoff box diagram of the Miyun Reservoir.

Figure 7

Monthly runoff box diagram of the Miyun Reservoir.

Close modal
To test for sudden changes and trends in hydrological and meteorological time series, the Mann–Kendall (M–K) trend test is a widely used distribution-free test that is based on rank relationships (Fathian et al. 2016). This test has several advantages, such as not requiring specific distribution laws, being highly quantifiable, and being minimally influenced by sample outliers and regular distribution disturbances. It is recommended by the World Meteorological Organization. In this study, we utilized the M–K statistical test with a significant level and critical value to test for sudden changes in the runoff series of the Miyun Reservoir. The results are shown in Figure 8. We found that the UF and UB curves intersected in 1983, and the intersection point is within the confidence level interval. This indicates that the annual runoff in 1983 underwent a sudden decrease.
Figure 8

M–K mutation test for the inlet runoff sequence of the Miyun Reservoir.

Figure 8

M–K mutation test for the inlet runoff sequence of the Miyun Reservoir.

Close modal

Results

To accurately predict the runoff series, it is important to consider the complex multiscale and non-smooth nature of the data. Therefore, we decomposed the monthly runoff series using VMD to obtain several relatively smooth components with significant frequency changes. This approach enables us to fully capture the runoff information and reduce prediction errors. The decomposition using VMD yielded six IMF components, as illustrated in Figure 9.
Figure 9

Results of VMD decomposition of monthly runoff from the Miyun Reservoir.

Figure 9

Results of VMD decomposition of monthly runoff from the Miyun Reservoir.

Close modal
To validate our proposed combined monthly runoff prediction model, we first decomposed the complex and non-smooth monthly runoff time series into six relatively smooth components using VMD. Table 1 summarizes the statistical characteristics of each IMF component, with SE values greater than 0.6 identified as high-frequency components (IMF1–IMF5) and those less than 0.6 as low-frequency components (IMF6). We predict the high-frequency components using an LSTM model and the low-frequency component using a Transformer model. To evaluate the performance of our model, we compared the predicted and actual values of the inlet runoff of the Miyun Reservoir, as shown in Figure 10.
Table 1

Sample entropy values of decomposed subseries of monthly runoff series of the Miyun Reservoir

ComponentIMF1IMF2IMF3IMF4IMF5IMF6
SE 1.06 0.96 0.85 0.76 0.68 0.54 
ComponentIMF1IMF2IMF3IMF4IMF5IMF6
SE 1.06 0.96 0.85 0.76 0.68 0.54 
Figure 10

Comparison of predicted and true monthly runoff values of the Miyun Reservoir.

Figure 10

Comparison of predicted and true monthly runoff values of the Miyun Reservoir.

Close modal

Discussion

To assess the effectiveness of our VMD–LSTM–Transformer model, we compared its performance with that of other models: VMD–LSTM, VMD–Transformer, EMD–LSTM–Transformer, and EMD–LSTM. Several evaluation indexes commonly used in hydrological modeling, such as NSE, MAE, MAPE, and RMSE, are used. The prediction results for each model were evaluated and compared using different validation sets, as shown in Figures 11 and 12, and the corresponding evaluation indices are summarized in Table 2. Among them, the NSE of the VMD–LSTM–Transformer model is higher than other models, while MAE, MAPE, and RMSE are smaller than other models. Our comparative analysis revealed that the VMD-based decomposition scheme significantly improved the prediction accuracy of the model. Specifically, model 1 outperformed model 4, and model 2 outperformed model 5. The reason for this improvement lies in VMD's ability to separate the monthly runoff series into high- and low-frequency components, which allows the LSTM–Transformer model to capture peak and valley changes more effectively. This is particularly useful for predicting individual larger peaks, leading to further improvement in the model's accuracy. However, we observed that the prediction accuracy of both model 2 and model 3 was lower than that of model 1, and that of model 5 was lower than that of model 4. This suggests that the VMD-based decomposition scheme is crucial for improving the model's accuracy, but other factors also need to be considered.
Table 2

Comparison of evaluation index values of different models for runoff prediction results

No.ModelNSEMAE (107m3)MAPE (%)RMSE (107m3)
VMD–LSTM–Transformer 0.976 0.206 0.381% 0.411 
VMD–LSTM 0.916 1.112 2.059% 1.779 
VMD–Transformer 0.932 1.177 2.179% 2.236 
EMD–LSTM–Transformer 0.901 1.291 2.391% 2.105 
EMD–LSTM 0.878 1.335 2.473% 2.096 
No.ModelNSEMAE (107m3)MAPE (%)RMSE (107m3)
VMD–LSTM–Transformer 0.976 0.206 0.381% 0.411 
VMD–LSTM 0.916 1.112 2.059% 1.779 
VMD–Transformer 0.932 1.177 2.179% 2.236 
EMD–LSTM–Transformer 0.901 1.291 2.391% 2.105 
EMD–LSTM 0.878 1.335 2.473% 2.096 
Figure 11

Violin plot comparing the prediction results of different models.

Figure 11

Violin plot comparing the prediction results of different models.

Close modal
Figure 12

Taylor diagram comparing the prediction results of different models.

Figure 12

Taylor diagram comparing the prediction results of different models.

Close modal

The runoff prediction model proposed in this paper is of great significance for optimizing water resources allocation and reservoir operation. Knowing how much water is expected to enter a reservoir allows for better planning of release, ensuring a balance between meeting water demand, flood control, and reservoir filling. Optimizing reservoir scheduling can enable more sustainable water management and improve water availability for all uses throughout the year.

This paper proposes a coupled VMD–LSTM–Transformer model for predicting the incoming runoff volume of the Miyun Reservoir in Beijing. The accuracy of the runoff series predictions from various models is compared and analyzed, leading to the following conclusions:

  • (1)

    The VMD decomposition method improves the completeness and adequacy of time series decomposition, reduces the interference of random components to deterministic components, and thus improves the model's prediction ability.

  • (2)

    The LSTM model is used to train the prediction for the higher frequency components, while the Transformer model is used for the lower frequency components, further improving the model's prediction accuracy.

  • (3)

    The VMD–LSTM–Transformer model's prediction results outperform those of VMD–LSTM, VMD–Transformer, EMD–LSTM–Transformer, and EMD–LSTM. Therefore, this model provides a reliable method for monthly runoff time series prediction.

  • (4)

    The combined prediction method proposed in this study combines sample entropy, prediction model, and error analysis to establish a runoff forecast model, which can achieve more accurate prediction accuracy, and is an efficient and practical prediction method, which can provide valuable support for watershed water resources management decisions.

  • (5)

    Future research can consider adding precipitation, evaporation, and temperature factors to improve the prediction accuracy. In addition, given the presence of negative water balance factors, an in-depth understanding of projections of future water balance in the catchment will be the focus of our next study.

Overall, the proposed model provides a new approach to monthly runoff prediction research. By applying VMD decomposition and using LSTM and Transformer models for different frequency components, the model achieves higher accuracy in predicting the incoming runoff volume. In future research, adding more factors can further improve the model's prediction accuracy.

All authors contributed to the study's conception and design. Writing and editing: S.G. and Y.W.; preliminary data collection: X.Z. and H.C. All authors read and approved the final manuscript.

This work was supported by the Key Scientific Research Project of Colleges and Universities in Henan Province (CN) [grant numbers 17A570004]. This work was also funded by the North China University of Water Resources and Electric Power Innovation Ability Improvement Project for Postgraduates [grant number NCWUYC-2023006].

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Arunkumar
N.
,
Kumar
K. R.
&
Venkataraman
V.
2018
Entropy features for focal EEG and non focal EEG
.
Journal of Computational Science
27
,
440
444
.
Chen
Y.
,
Song
L.
,
Liu
Y.
,
Yang
L.
&
Li
D.
2020
A review of the artificial neural network models for water quality prediction
.
Applied Sciences
10
(
17
),
5776
.
Dragomiretskiy
K.
&
Zosso
D.
2013
Variational mode decomposition
.
IEEE Transactions on Signal Processing
62
(
3
),
531
544
.
Emerton
R. E.
,
Stephens
E. M.
,
Pappenberger
F.
,
Pagano
T. C.
,
Weerts
A. H.
,
Wood
A. W.
&
Cloke
H. L.
2016
Continental and global scale flood forecasting systems
.
Wiley Interdisciplinary Reviews: Water
3
(
3
),
391
418
.
Fathian
F.
,
Dehghan
Z.
,
Bazrkar
M. H.
&
Eslamian
S.
2016
Trends in hydrological and climatic variables affected by four variations of the Mann-Kendall approach in Urmia Lake basin, Iran
.
Hydrological Sciences Journal
61
(
5
),
892
904
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
(
8
),
1735
1780
.
Khaliq
M. N.
,
Ouarda
T. B.
,
Ondo
J. C.
,
Gachon
P.
&
Bobée
B.
2006
Frequency analysis of a sequence of dependent and/or non-stationary hydro-meteorological observations: a review
.
Journal of Hydrology
329
(
3–4
),
534
552
.
Liu
G.
,
Tang
Z.
,
Qin
H.
,
Liu
S.
,
Shen
Q.
,
Qu
Y.
&
Zhou
J.
2022a
Short-term runoff prediction using deep learning multi-dimensional ensemble method
.
Journal of Hydrology
609
,
127762
.
Liu
Y.
,
Wang
Z.
,
Yu
X.
,
Chen
X.
&
Sun
M.
2022b
Memory-based transformer with shorter window and longer horizon for multivariate time series forecasting
.
Pattern Recognition Letters
160
,
26
33
.
Machiwal
D.
&
Jha
M. K.
2006
Time series analysis of hydrologic data for water resources planning and management: a review
.
Journal of Hydrology and Hydromechanics
54
(
3
),
237
257
.
Magnusson
J.
,
Wever
N.
,
Essery
R.
,
Helbig
N.
,
Winstral
A.
&
Jonas
T.
2015
Evaluating snow models with varying process representations for hydrological applications
.
Water Resources Research
51
(
4
),
2707
2723
.
Mo
Y.
,
Wu
Q.
,
Li
X.
&
Huang
B.
2021
Remaining useful life estimation via transformer encoder enhanced by a gated convolutional unit
.
Journal of Intelligent Manufacturing
32
,
1997
2006
.
Mohammadi
B.
2021
A review on the applications of machine learning for runoff modeling
.
Sustainable Water Resources Management
7
(
6
),
98
.
Qin
L.
,
Lei
P.
,
Lei
Q.
,
Liu
H.
,
Li
X.
,
Zhang
H.
&
Lindsey
S.
2020
Evaluating the effect of dam construction on the phosphorus fractions in sediments in a reservoir of drinking water source, China
.
Environmental Monitoring and Assessment
192
,
1
11
.
Qin
J.
,
Duan
W.
,
Chen
Y.
,
Dukhovny
V. A.
,
Sorokin
D.
,
Li
Y.
&
Wang
X.
2022
Comprehensive evaluation and sustainable development of water–energy–food–ecology systems in Central Asia
.
Renewable and Sustainable Energy Reviews
157
,
112061
.
Schmidt
J.
,
Marques
M. R.
,
Botti
S.
&
Marques
M. A.
2019
Recent advances and applications of machine learning in solid-state materials science
.
npj Computational Materials
5
(
1
),
83
.
Tang
W.
,
Pei
Y.
,
Zheng
H.
,
Zhao
Y.
,
Shu
L.
&
Zhang
H.
2022
Twenty years of China's water pollution control: experiences and challenges
.
Chemosphere
295
,
133875
.
Wan
H.
,
Guo
S.
,
Yin
K.
,
Liang
X.
&
Lin
Y.
2020
CTS–LSTM: LSTM-based neural networks for correlated time series prediction
.
Knowledge-Based Systems
191
,
105239
.
Wilby
R. L.
,
Abrahart
R. J.
&
Dawson
C. W.
2003
Detection of conceptual model rainfall–runoff processes inside an artificial neural network
.
Hydrological Sciences Journal
48
(
2
),
163
181
.
Xu
H.
,
Berres
A.
,
Liu
Y.
,
Allen-Dumas
M. R.
&
Sanyal
J.
2022
An overview of visualization and visual analytics applications in water resources management
.
Environmental Modelling & Software
153
,
105396
.
Zhang
P.
,
Cao
W.
&
Li
W.
2021
Surface and high-altitude combined rainfall forecasting using convolutional neural network
.
Peer-to-Peer Networking and Applications
14
,
1765
1777
.
Zhao
J.
,
Nie
G.
&
Wen
Y.
2022
Monthly precipitation prediction in Luoyang city based on EEMD–LSTM–ARIMA model
.
Water Science & Technology
87
(
1
),
318
335
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).