## Abstract

Accurate runoff prediction is of great significance for flood prevention and mitigation, agricultural irrigation, and reservoir scheduling in watersheds. To address the strong non-linear and non-stationary characteristics of runoff series, a hybrid model of monthly runoff prediction, variational mode decomposition (VMD)–long short-term memory (LSTM)–Transformer, is proposed. Firstly, VMD is used to decompose the runoff series into multiple modal components, and the sample entropy of each modal component is calculated and divided into high-frequency and low-frequency components. The LSTM model is then used to predict the high-frequency components and the transformer to predict the low-frequency components. Finally, the prediction results are summed to obtain the final prediction results. The Mann–Kendall trend test method is used to analyze the runoff characteristics of the Miyun Reservoir, and the constructed VMD–LSTM–Transformer model is used to forecast the runoff of the Miyun Reservoir. The prediction results are compared and evaluated with those of VMD–LSTM, VMD–Transformer, empirical mode decomposition (EMD)–LSTM–Transformer, and empirical mode decomposition (EMD)–LSTM models. The results show that the Nash–Sutcliffe efficiency coefficient (NSE) value of this model is 0.976, mean absolute error (MAE) is 0.206 × 10^{7} m^{3}, mean absolute percentage error (MAPE) is 0.381%, and root mean squared error (RMSE) is 0.411 × 10^{7} m^{3}, all of which are better than other models, indicating that the VMD–LSTM–Transformer model has higher prediction accuracy and can be applied to runoff prediction in the actual study area.

## HIGHLIGHTS

The VMD–LSTM–Transformer model proposed in this paper achieves higher accuracy in monthly runoff prediction compared to other models.

The VMD decomposition method used in the model improves the completeness and adequacy of time series decomposition.

By using LSTM and Transformer models for different frequency components, the proposed model achieves better prediction accuracy.

## INTRODUCTION

Water resources have become a significant global issue due to rapid economic and societal development (Tang *et al.* 2022). Rational use of water resources, a safe, efficient, and renewable natural resource, can effectively alleviate energy shortages (Qin *et al.* 2022). Runoff forecasting, an essential technology for the rational use of water resources, is currently a research focus and difficulty in related disciplines (Xu *et al.* 2022). Medium and long-term runoff forecasting predicts runoff processes with a lead time of more than 3 days based on hydrological phenomena and using causality analysis and mathematical modeling as input conditions (Mohammadi 2021). Timely and accurate forecast data, as a non-engineering measure, provides essential reference information for flood control decision-making in the water conservancy department (Guo *et al.* 2004). Compared with short-term runoff forecasting, medium and long-term runoff forecasting results can provide more decision-making time for water resources allocation, flood control and benefit promotion, and watershed ecological protection, playing an important role in these areas (Emerton *et al.* 2016).

Traditional runoff prediction models mainly include two methods: causality analysis and mathematical statistics (Liu *et al.* 2022a, 2022b). The causality analysis method collects relevant meteorological characteristics and establishes a model after determining the relationship between meteorological and hydrological information. However, due to the high requirements for the completeness and accuracy of hydrological data, the causality analysis method lacks implementation feasibility in practical runoff prediction, especially in hydrological forecasting departments (Zhang *et al.* 2021). The mathematical statistics method mines the internal patterns of historical runoff and explores the statistical laws of the internal influencing factors of the runoff sequence through mathematical statistics methods. This method is based on an understanding of the formation of the runoff patterns and further constructs sequence prediction models (Wilby *et al.* 2003). The mathematical statistics method mainly includes two categories: multivariate linear analysis and time series analysis runoff statistical models, which are based on historical data. The overall prediction accuracy depends on the completeness of the data, and the accuracy of runoff prediction will be limited when the data are missing (Machiwal & Jha 2006).

In recent years, deep learning methods have been increasingly applied to the field of forecasting, resulting in significant research breakthroughs. Among the deep learning algorithms available in the 21st century, recurrent neural networks (RNNs) have shown advantages in processing time series data by maintaining a state between different inputs (Schmidt *et al.* 2019). The RNN framework includes feedback connections that consider both current and adjacent information in the data. Hochreiter & Schmidhuber (1997) proposed the first long short-term memory (LSTM) neural network, which is a special type of RNN that solved the problem of long sequence dependence and gradient explosion. The LSTM framework includes ‘control gates’ to manage the flow of information and prevent model disturbances caused by useless data, as well as ‘forget gates’ to discard unneeded information. LSTM has feedback loops in the recurrent layer, allowing it to store information in memory over time. Consequently, compared with other neural networks, LSTM can better utilize the temporal characteristics of time series data (Wan *et al.* 2020). The Transformer is another neural sequence model that has an encoder–decoder structure. It has a strong long-term dependency modeling ability, and its multi-head attention mechanism can effectively explore the intrinsic correlations of sequence data. As a result, the Transformer has received considerable attention in the field of time series analysis (Liu *et al.* 2022a, 2022b).

Hydrological processes are the result of specific natural climate conditions. The challenge in processing runoff data lies in its non-stationary nature, which is characterized by the fluctuating amplitude and frequency, as well as trend changes (Khaliq *et al.* 2006). While machine learning has strong adaptive capabilities, relying solely on a single prediction model may overlook hidden patterns in the training data, thus diminishing the model's accuracy. To address this challenge, researchers have employed a divide-and-conquer modeling approach, combining time–frequency analysis with data-driven models to extract hidden features such as high-frequency disturbances, medium-frequency fluctuations, and long-term trends from runoff sequences (Rolim & de Souza Filho 2020). This method significantly improves the prediction accuracy of runoff and has been widely used in fields such as energy, finance, transportation, and hydrology. This approach forms a stable decomposition–prediction–reconstruction framework. The decomposed prediction combination model is a hybrid method that combines signal decomposition techniques with prediction models (Altan *et al.* 2021). Analyzing hydrological data can greatly enhance the prediction accuracy of prediction models due to the time complexity of hydrological time series changes. As a result, the decomposed prediction combination model has become widely used in solving hydrological prediction problems. The decomposition methods utilized in the decomposed hybrid models can be classified into three categories: wavelet transform (WT), methods based on empirical mode decomposition (EMD), and variational mode decomposition (VMD) (Liu *et al.* 2021). Wavelet decomposition requires human input in terms of pre-setting the mother wavelet function and decomposition layers. On the other hand, EMD and ensemble empirical mode decomposition (EEMD) can adaptively decompose time series into intrinsic mode components based on local characteristics, making them more universal than WT. However, these methods may suffer from spurious components and mode mixing, which limits prediction accuracy. VMD is a non-recursive variational mode decomposition that can reduce these issues and is an effective method for decomposing complex non-linear and non-stationary time series (Niu *et al.* 2020). Currently, scholars are combining algorithms to divide decomposed components into high-frequency and medium-low frequency components. These components are predicted separately using different methods, and the predicted values are added to obtain the final result. Zhao *et al.* (2022) used an EEMD–LSTM–autoregressive integrated moving average (ARIMA) coupled model to predict monthly precipitation in Luoyang City. They used the permutation entropy (PE) algorithm to divide the different frequency mode components obtained by EEMD into high-frequency and low-frequency components. They separately predicted these components using the LSTM neural network and ARIMA. The results showed that the prediction accuracy of the model was higher than that of EMD–LSTM, EEMD–LSTM, and EEMD–ARIMA.

In summary, this paper proposes a novel hybrid model VMD–LSTM–Transformer for monthly runoff prediction for runoff sequences with strong randomness and volatility characteristics, and uses a combination of neural network and deep learning model for frequency separation prediction, and applies it to the monthly runoff prediction of the Miyun Reservoir in Beijing. The runoff sequence is first decomposed using VMD, and the intrinsic mode components (IMF1, IMF2, …, IMF*n*) are divided into high-frequency and low-frequency components based on the sample entropy (SE). LSTM neural network, with its powerful feature extraction ability and strong processing capability for non-linear data structures, is used to predict the high-frequency components of the runoff sequence. The Transformer is used to predict the low-frequency components, which have weaker non-linearity. The predicted values of high and low frequencies are added to obtain the final prediction result of the runoff sequence.

## MATERIALS AND METHODS

### Decomposition-prediction model

A single prediction model may not be sufficient to identify the underlying patterns in the original signal. The decomposition–prediction modeling approach commonly employs a decomposition algorithm to break down non-stationary runoff sequences into several simpler sub-sequences. These sub-sequences contain hidden features such as the original sequence structure and periodicity. Each sub-sequence is then modeled to simplify the process of building the model. This modeling method enhances the prediction accuracy of each sub-sequence and ensures the accuracy of the original runoff sequence prediction, resulting in a stable combination model framework (Song & Chen 2021).

#### Variational mode decomposition

*K*is the number of variational mode IMFs. and are the IMF set and their corresponding central frequencies. is the impulse function, is the partial derivative with respect to time

*t*, and is the convolution operator.

*n*is the current iteration, is the noise tolerance parameter, , , and are the Fourier transforms of , , and , respectively, and , , and are continuously updated until they are less than the convergence error .

#### LSTM

*i*, an output gate

*o*, a forget gate

*f*, and a memory controller

*c*(Chen

*et al.*2019). During training, LSTM can selectively remember effective information and discard irrelevant information, solving the problems of gradient vanishing and explosion in the later time steps of the recurrent neural network. By establishing dependencies between data at different time periods, LSTM can learn long-term dependent information and thus has certain advantages in time series prediction. LSTM has the ability to learn and retain information over long periods, which is essential for capturing long-term dependencies in the time series data, such as seasonal patterns or trends. LSTM can capture complex non-linear relationships between past and present observations, enabling it to model intricate patterns in the monthly runoff series. The basic structure unit of LSTM is shown in Figure 1.

*t*and , respectively. , denote the cell state at time

*t*and . , , denote the output of the forgetting gate, input gate, and output gate at time

*t*, respectively. , , , denote the weight vectors, and , , , denote the bias vectors. and are the sigmoid function and the hyperbolic tangent function.

#### Transformer

*et al.*2021). Transformer can capture long-range dependencies in the time series, allowing it to consider all past observations simultaneously, rather than being limited by the sequential nature of LSTM. Transformer's self-attention mechanism allows for parallel computation, making it more efficient for longer time series. Traditional Seq2Seq models based on RNN units cannot be parallelized due to their recurrent structure, resulting in slow training speed on large-scale data. The Transformer model uses the encoder–decoder structure of Seq2Seq models but abandons traditional RNN units, and instead uses multi-head attention mechanisms to learn the dependency relationship between word vectors. This model has significantly improved both training speed and computational accuracy. The Transformer structure is shown in Figure 2, with an internal encoder–decoder structure using a multi-head attention model to learn the relationship between word vectors. The multi-head attention model divides the input into three parts: query vectors (

*Q*), key vectors (

*K*), and value vectors (

*V*). The query and key vectors are used to calculate attention weights, and the result is multiplied by the value vector. The formula for calculating the scaled dot-product attention is:where is the dimensionality of

*K*. Since the attention mechanism is insensitive to the positional information of word embedding, the Transformer model introduces positional encodings, which are defined as follows:

The *pos* represents the position index of the input word vector, *d*_{model} is the dimension of the word vector. The 2*i*-th and 2*i**+* 1-th dimensions of the position information correspond to Equations (12) and (13), respectively. Finally, the position information matrix is added to the input matrix. The feed forward part consists of two linear layers and an activation function, which enhances the expressive power of word representations through non-linear transformations. The Add&Norm part includes a residual connection and a LayerNorm module, which addresses the problem of gradient vanishing. The Transformer model structure is shown in Figure 2.

### VMD–LSTM–Transformer coupled model

*et al.*2018). A smaller SE value indicates higher self-similarity and regularity within the corresponding time series, while more complex components within the time series contain more random noise components. SE has been increasingly utilized in various fields such as mechanical signal analysis, fault diagnosis, and medical signal processing. SE can be represented by SE(

*m*,

*r*,

*N*), where

*m*denotes the embedding dimension and

*N*denotes the length of the runoff sequence. For a given time series, SE is defined as follows:

SE is determined by the probabilities of two sets of sequences matching *m* and *m* + 1 points under tolerance, represented by *B ^{m}*(

*r*) and

*B*

^{m}^{+1}(

*r*), respectively. Previous research has shown that SE is statistically significant only when

*m*is set to 1 or 2 and

*r*falls between 0.1 and 0.25 times the standard deviation of the test sequence. To ensure statistical significance, we have set

*m*= 2 and

*r*to 0.2 times the standard deviation of the test sequence in our study.

*n*. Then, based on the sample values, we separate it into high-frequency and low-frequency components. The LSTM neural network has a powerful feature extraction ability and can handle non-linear data structures well. Therefore, we use the LSTM model to forecast the high-frequency component of the runoff sequence. Since the low-frequency component has weaker non-linearity, we predict it using the Transformer model. We then combine the predicted results of the high-frequency and low-frequency components to obtain the final runoff prediction. By combining VMD with LSTM and Transformer, the VMD–LSTM–Transformer model takes advantage of the strengths of each component, providing a more robust and accurate prediction of monthly runoff. Finally, we evaluate the model's applicability using evaluation metrics. The VMD–LSTM–Transformer combined model's flowchart is illustrated in Figure 3.

### Parameter setting

The accuracy of hydrological prediction models relies on three essential factors: high-quality input data, appropriate model selection, and a reasonable model structure. Once the model type is chosen, the model parameters have the most significant influence on the final prediction performance. Model training involves adjusting the values of each parameter to make the model output as close as possible to the actual value (Magnusson *et al.* 2015). Machine learning model parameters are categorized into model parameters and hyperparameters based on their function and determination method. Model parameters are internal system variables configured within the model that are automatically obtained through training data. Hyperparameters, or tuning parameters, are external configuration variables set in advance during model establishment that cannot be obtained through model learning. Hyperparameters are the primary adjustment knobs that control the model's structure, function, efficiency, and other aspects. This paper uses grid search to list the parameter values at certain intervals, trains models under different parameter scenarios using training data, evaluates the performance of these models on a validation set, and selects the best parameter value based on the evaluation results.

The specific parameters related to the VMD decomposition effect are: penalty factor , mode number *K*, noise tolerance parameter , and convergence error . If *K* is set too large, there will be a problem of mode repetition, and if *K* is set too small, there will be under-decomposition, which will mainly affect the bandwidth of IMF. After repeated experiments, it is found that , *K* = 6; and are usually set to default values, with set to 0.3 and set to 10^{−7}. The hyperparameters of the LSTM model are set as follows: the maximum number of iterations is set to 200, the learning rate is set to 0.01, the number of hidden layers is set to 2, the number of hidden layer output units is set to 100, and the L2 regularization parameter is set to 10^{−6}. The hyperparameters of the Transformer model are set as follows: the number of layers of the Transformer is 3, the number of self-attention heads is 8, the dimension of the hidden layer is 512, the dropout probability is 0.1, the learning rate is 0.01, and the maximum sequence length is 1,000.

### Evaluation indicators

*n*is the number of runoff series values.

## CASE STUDY

### Study area

^{2}, the Miyun Reservoir is the largest reservoir in North China with a maximum storage capacity of 4.375 × 10

^{9}m

^{3}, of which 9.27 × 10

^{8}m

^{3}for flood control. The normal storage level is 157.5 m, and the flood limit is 152.0 m. The average annual inflow runoff is 9.05 × 10

^{8}m

^{3}, the 100-year, the 1,000-year, and the maximum possible flood peak discharge are 15,800, 22,600, and 23,300 m

^{3}/s, respectively, and with the average water depth of 30 m, the reservoir is a crucial water source and ecological barrier for the city. The annual precipitation at the Miyun Reservoir hydrological station is 632.5 mm, with an average annual water surface evaporation of 1,037 mm and an average annual temperature of 10.9 °C (Qin

*et al.*2020). Due to the impact of climate change and human activities, the inflow to the reservoir has been decreasing annually since 2000. However, the completion of the South-to-North Water Diversion Project in 2014 has resulted in an increase in reservoir storage. To optimize reservoir regulation planning and water resource management, predicting the inflow runoff to the Miyun Reservoir is necessary.

### Data source

*et al.*2020). This division has been used in prior research studies as well. The monthly runoff series of the Miyun Reservoir can be observed in Figure 6.

### Runoff characteristics analysis

*et al.*2016). This test has several advantages, such as not requiring specific distribution laws, being highly quantifiable, and being minimally influenced by sample outliers and regular distribution disturbances. It is recommended by the World Meteorological Organization. In this study, we utilized the M–K statistical test with a significant level and critical value to test for sudden changes in the runoff series of the Miyun Reservoir. The results are shown in Figure 8. We found that the UF and UB curves intersected in 1983, and the intersection point is within the confidence level interval. This indicates that the annual runoff in 1983 underwent a sudden decrease.

## RESULTS AND DISCUSSION

### Results

Component . | IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . |
---|---|---|---|---|---|---|

SE | 1.06 | 0.96 | 0.85 | 0.76 | 0.68 | 0.54 |

Component . | IMF1 . | IMF2 . | IMF3 . | IMF4 . | IMF5 . | IMF6 . |
---|---|---|---|---|---|---|

SE | 1.06 | 0.96 | 0.85 | 0.76 | 0.68 | 0.54 |

### Discussion

No. . | Model . | NSE . | MAE (10^{7}m^{3})
. | MAPE (%) . | RMSE (10^{7}m^{3})
. |
---|---|---|---|---|---|

1 | VMD–LSTM–Transformer | 0.976 | 0.206 | 0.381% | 0.411 |

2 | VMD–LSTM | 0.916 | 1.112 | 2.059% | 1.779 |

3 | VMD–Transformer | 0.932 | 1.177 | 2.179% | 2.236 |

4 | EMD–LSTM–Transformer | 0.901 | 1.291 | 2.391% | 2.105 |

5 | EMD–LSTM | 0.878 | 1.335 | 2.473% | 2.096 |

No. . | Model . | NSE . | MAE (10^{7}m^{3})
. | MAPE (%) . | RMSE (10^{7}m^{3})
. |
---|---|---|---|---|---|

1 | VMD–LSTM–Transformer | 0.976 | 0.206 | 0.381% | 0.411 |

2 | VMD–LSTM | 0.916 | 1.112 | 2.059% | 1.779 |

3 | VMD–Transformer | 0.932 | 1.177 | 2.179% | 2.236 |

4 | EMD–LSTM–Transformer | 0.901 | 1.291 | 2.391% | 2.105 |

5 | EMD–LSTM | 0.878 | 1.335 | 2.473% | 2.096 |

The runoff prediction model proposed in this paper is of great significance for optimizing water resources allocation and reservoir operation. Knowing how much water is expected to enter a reservoir allows for better planning of release, ensuring a balance between meeting water demand, flood control, and reservoir filling. Optimizing reservoir scheduling can enable more sustainable water management and improve water availability for all uses throughout the year.

## CONCLUSION

This paper proposes a coupled VMD–LSTM–Transformer model for predicting the incoming runoff volume of the Miyun Reservoir in Beijing. The accuracy of the runoff series predictions from various models is compared and analyzed, leading to the following conclusions:

- (1)
The VMD decomposition method improves the completeness and adequacy of time series decomposition, reduces the interference of random components to deterministic components, and thus improves the model's prediction ability.

- (2)
The LSTM model is used to train the prediction for the higher frequency components, while the Transformer model is used for the lower frequency components, further improving the model's prediction accuracy.

- (3)
The VMD–LSTM–Transformer model's prediction results outperform those of VMD–LSTM, VMD–Transformer, EMD–LSTM–Transformer, and EMD–LSTM. Therefore, this model provides a reliable method for monthly runoff time series prediction.

- (4)
The combined prediction method proposed in this study combines sample entropy, prediction model, and error analysis to establish a runoff forecast model, which can achieve more accurate prediction accuracy, and is an efficient and practical prediction method, which can provide valuable support for watershed water resources management decisions.

- (5)
Future research can consider adding precipitation, evaporation, and temperature factors to improve the prediction accuracy. In addition, given the presence of negative water balance factors, an in-depth understanding of projections of future water balance in the catchment will be the focus of our next study.

Overall, the proposed model provides a new approach to monthly runoff prediction research. By applying VMD decomposition and using LSTM and Transformer models for different frequency components, the model achieves higher accuracy in predicting the incoming runoff volume. In future research, adding more factors can further improve the model's prediction accuracy.

## AUTHOR CONTRIBUTION

All authors contributed to the study's conception and design. Writing and editing: S.G. and Y.W.; preliminary data collection: X.Z. and H.C. All authors read and approved the final manuscript.

## FUNDING

This work was supported by the Key Scientific Research Project of Colleges and Universities in Henan Province (CN) [grant numbers 17A570004]. This work was also funded by the North China University of Water Resources and Electric Power Innovation Ability Improvement Project for Postgraduates [grant number NCWUYC-2023006].

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.