ABSTRACT
Precise long-term runoff prediction holds crucial significance in water resource management. Although the long short-term memory (LSTM) model is widely adopted for long-term runoff prediction, it encounters challenges such as error accumulation and low computational efficiency. To address these challenges, we utilized a novel method to predict runoff based on a Transformer and the base flow separation approach (BS-Former) in the Ningxia section of the Yellow River Basin. To evaluate the effectiveness of the Transformer model and its responsiveness to the base flow separation technique, we constructed LSTM and artificial neural network (ANN) models as benchmarks for comparison. The results show that Transformer outperforms the other models in terms of predictive performance and that base flow separation significantly improves the performance of the Transformer model. Specifically, the performance of BS-Former in predicting runoff 7 days in advance is comparable to that of the BS-LSTM and BS-ANN models with lead times of 4 and 2 days, respectively. In general, the BS-Former model is a promising tool for long-term runoff prediction.
HIGHLIGHTS
The effectiveness of a Transformer model for simulating and predicting long-term daily runoff is explored.
The response of the Transformer model to prior hydrological knowledge in base flow separation is analyzed.
The potential of base flow separation techniques to improve the ability to predict the runoff lead time in deep learning models is explored.
INTRODUCTION
Runoff simulation has long been recognized as a critical research endeavor in the field of hydrological science. However, the complex interplay of basin geomorphology, climate change, and anthropogenic activities leads to significant spatiotemporal variability in the distribution of river runoff, thereby posing a formidable challenge in accurately predicting hydrological processes (Senthil Kumar et al. 2005; Xu et al. 2022). Over the past few years, the development of runoff prediction models has attracted increasing attention. These models can be broadly categorized into two main types: mechanism-based (i.e., process-driven) and nonmechanism-based (i.e., data-driven) models (Bittelli et al. 2010; Kratzert et al. 2018; He et al. 2023). Traditional hydrological prediction models are based on complex mathematical formulations aimed at simulating and predicting the physical mechanisms that govern runoff formation. These models require a thorough understanding of the hydrological processes underlying runoff generation, including the influences of terrain, geomorphology, geological structure, meteorology, hydrology, and other relevant parameters, all of which serve as critical inputs to such a model (Qin et al. 2019). Thus, establishing traditional mechanism-based models is a time-consuming process that involves the input of numerous parameters related to the runoff formation process. Moreover, the accuracy of these models relies heavily on the physical interpretation of the model parameters. However, the complex hydraulic connectivity of some basins presents a significant challenge in terms of obtaining input data and parameters for the runoff formation process, thereby impeding the effective simulation of the process and leading to inaccurate prediction results (Yoon et al. 2011). As a result, traditional mechanism-based models may be inadequate for effectively addressing the problem of runoff prediction in such contexts. Alternatively, data-driven models have been described as ‘black box’ models because they do not directly consider the internal physical mechanisms underlying the system under investigation. Instead, they rely on statistical analysis of the input‒output relationships between various factors in the system and transfer functions to describe the system's response (Zhang et al. 2021; Yin et al. 2022a; Liu et al. 2022b). Data-driven models operate independently of prior knowledge and can be applied in information-limited scenarios. The currently available data-driven models primarily consist of statistical models, machine learning models, and deep learning models (Liu et al. 2022b).
Statistical models assume a linear input‒output relationship within a system and utilize relevant statistical methods to predict future runoff based on historical observation data. While statistical models can effectively capture linear relationships between historical and future data, the runoff process often exhibits complex nonlinear relationships. Furthermore, the runoff process is influenced by both natural environmental factors and human activities, and traditional statistical models often fall short in comprehensively integrating these multifaceted influences, resulting in inadequate performance (Liu et al. 2022b). As artificial intelligence technology has continued to advance, machine learning models have emerged as effective tools for addressing complex runoff prediction challenges. This is primarily due to their capacity to capture complex, nonstationary, dynamic, and nonlinear relationships between multiple input and output variables (Kisi 2011; Liu et al. 2022b). Nevertheless, although traditional machine learning methods have shown promise in runoff prediction, their simple structure and limited memory capacity limit their ability to extract deep information. As the dimensionality of the input features increases, these limitations further constrain the predictive performance of such a model (Xie et al. 2019).
Deep learning is a promising area of machine learning that utilizes neural network architectures composed of multiple layers, enabling the extraction of more intricate and abstract features from data. Therefore, deep learning addresses the limitations of previous models (Kratzert et al. 2018). In the field of hydrology, deep learning has garnered considerable attention, achieving superior performance in runoff prediction compared to traditional neural network methods (Xu et al. 2023). This advancement has greatly facilitated the development of runoff prediction models. In particular, recurrent neural networks (RNNs) stand out due to their remarkable capacity for data processing and long-term learning of runoff series data (Hu et al. 2018). However, in the directed loop structure of a traditional RNN, the output of the previous node is used as the input for the current step. Therefore, RNNs have difficulty learning long-term correlations in time series data, which can lead to gradient vanishing, in turn affecting the prediction performance of such a model (Shen et al. 2019). A long short-term memory (LSTM) network is an improved recursive neural network that solves this problem of vanishing gradients. Despite the impressive performance of LSTM models in streamflow prediction, their recursive nature means that they cannot account for direct connections between arbitrary positions, thereby hindering the effective extraction of meaningful information related to nonadjacent position connections. Consequently, the LSTM architecture is not well suited for multistep-ahead streamflow prediction. Furthermore, the limited ability of LSTM models to fully capture time series features may result in error accumulation and consequent degradation in the prediction performance (Yin et al. 2022a).
Subsequently, Bahdanau et al. (2015) introduced an attention mechanism that can establish direct connections between any two positions in a time series and capture the long-term correlation of the series by computing the ‘attention’ between any pair of points (Liu et al. 2022a). This mechanism facilitates multistep-ahead prediction and utilizes a nonautoregressive method, which effectively mitigates the problem of error accumulation. On the basis of extensive investigations of attention mechanisms, Vaswani et al. (2017) introduced a novel model called a Transformer, which relies solely on attention mechanisms. This model initially exhibited promising performance in natural language processing applications. Specifically, the Transformer architecture depends exclusively on attention mechanisms to establish global correlations between the input and output and has strong temporal coupling and global correlation feature extraction capabilities (Fang et al. 2024; Jin et al. 2024). Compared to an LSTM model, a Transformer model with a self-attention mechanism offers the advantage of parallelized computation, thereby enhancing the efficiency of processing long time series data (Masafu & Williams 2024). In addition, the self-attention mechanism in a Transformer model is capable of capturing intricate relationships between various time points in a time series, thus rendering it better suited for long-term multistep-ahead streamflow prediction (Yin et al. 2022b). However, the applications of Transformer models in hydrology are still limited. Therefore, more research is needed to validate the effectiveness of such models in hydrological prediction.
Data-driven models such as deep learning models do not involve the explicit modeling of any physical processes. These models instead focus on learning the interrelationships between flow characteristics and various variables. Consequently, in the presence of substantial noise interference, machine learning models may tend to excessively emphasize abnormal fluctuations within the data while ignoring genuine trends, thus weakening the stability and reliability of these relationships (Jiahai & Aiyun 2019). Therefore, it is imperative to explore methods of enhancing the capacity of deep learning models in streamflow prediction and methods of reducing errors. Numerous studies have demonstrated that the method of decomposing runoff time series and individually modeling its components exhibits superior performance compared to the traditional approach of directly modeling the entire series (Wang et al. 2021; Zhang et al. 2022a). Accordingly, in this study, we incorporate relevant prior hydrological information into a model to guide its learning and comprehension of specific flow patterns, thereby revealing deeper underlying patterns within the flow data more accurately and consequently enhancing the model performance. In particular, two major components of runoff are base flow and surface flow. Studies indicate that a combination model created by building machine learning models to separately predict these two components and subsequently overlaying them produces runoff prediction results that are superior to those obtained by a single machine learning model (Corzo & Solomatine 2007; Tongal & Booij 2018; Huang et al. 2020). However, the current research primarily focuses on short-term runoff prediction at the monthly scale, with limited exploration into long-term daily-scale runoff prediction incorporating base flow and surface flow as prediction variables. Most existing studies have predominantly emphasized enhancing the predictive accuracy of machine learning models, while the impact of base flow separation techniques on the lead time forecasting capabilities of those models has received little attention. Furthermore, there is a lack of validation for recently proposed advanced deep learning models in the context of runoff prediction based on base flow separation. In particular, no studies have yet explored whether baseflow separation techniques can enhance the predictive performance of Transformer models. Therefore, the specific objectives of this study are (1) to study the effectiveness of Transformer models for simulating and predicting long-term daily runoff, (2) to analyze Transformer model response to prior hydrological knowledge such as base flow, and (3) to explore base flow separation to enhance the potential of machine learning models for lead time predictions of runoff.
METHOD
Transformer model
Each layer of the encoder is composed of two sublayers: a multihead self-attention mechanism and a feedforward network that applies nonlinear transformations to the output of the attention mechanism (Zhang et al. 2019). Each sublayer includes residual connections and layer normalization. The decoder, on the other hand, has three sublayers per layer and includes an additional encoder–decoder attention sublayer that acts on the multihead attention output. The first attention module in the decoder includes a mask. Finally, the output of the decoder is linearly transformed to generate the prediction. The attention module combinations used in the Transformer model are shown in Table 1.
Parts . | Self or cross -attention . | Single or multihead . | Yes or No – mask . | Attention name . |
---|---|---|---|---|
Encoder | Self-attention | Multihead | No | Multihead attention |
Decoder | Self-attention | Multihead | Yes | Masked multihead attention |
Cross-attention | Multihead | No | Encoder–decoder attention |
Parts . | Self or cross -attention . | Single or multihead . | Yes or No – mask . | Attention name . |
---|---|---|---|---|
Encoder | Self-attention | Multihead | No | Multihead attention |
Decoder | Self-attention | Multihead | Yes | Masked multihead attention |
Cross-attention | Multihead | No | Encoder–decoder attention |
BS-Former model
Performance evaluation and benchmarks
DATASET AND EXPERIMENTS
Study area and data
The study area has a typical continental climate, with scarce precipitation, and the rainy season is mostly concentrated from June to September, with an annual rainfall of approximately 225 mm. During periods of little or no rainfall, the water flow in the rivers primarily depends on the release of stored groundwater. Therefore, the base flow, as a crucial component of streamflow, plays an extremely important role in maintaining the sustainability of the Yellow River ecosystem (Hu et al. 2021). The Ningxia section of the Yellow River is flat and has an average elevation of approximately 1,000 m. There are three large hydrological stations on the mainstream of the Ningxia section of the Yellow River from southwest to northeast, namely, Xiaheyan, Qingtongxia, and Shizuishan, with annual runoffs of 27.24, 20.89, and 24.61 billion m3, respectively (Wang et al. 2019). The annual runoff in the study area is unevenly distributed in time and space, with more runoff in the higher part than in the northern part, and the runoff in the flood season accounts for 70–80% of the annual runoff. Since the 1990s, the runoff of the Yellow River has been continuously decreasing. In addition, the study area is facing various challenges, such as supply and demand imbalance, water resource shortages, serious soil erosion, and a fragile natural ecology, which seriously restrict local economic development and ecological environment construction (Miao et al. 2022).
The study focuses on the Ningxia segment of the Yellow River Basin, where daily flow metrics were collected from the Xiaheyan, Qingtongxia, and Shizuishan hydrological stations spanning from 1982 to 2018. Daily precipitation and temperature readings were also gathered from 18 meteorological stations within the study area. The precise locations of these stations are illustrated in Figure 1. Through the use of ArcGIS 10.6, monthly normalized difference vegetation index (NDVI) remote sensing imagery from the corresponding period was meticulously processed and interpreted to extract NDVI values. Spatial analyses were performed on land-use remote sensing images for the years 1982, 1990, 1995, 2000, 2005, 2010, 2015, and 2020, documenting the changes in land use over time. Linear interpolation was employed to format the NDVI and land use data into daily-scale time series, facilitating their integration into the model and ensuring temporal consistency between the model's input parameters and output results. Comprehensive details of the data are provided in Table 2.
Basic data information . | Details . |
---|---|
Time series | 1982.01.01–2018.12.31 |
Hydrological stations | Xiaheyan, Qingtongxia, Shizuishan |
Meteorological forcings | |
Average daily precipitation | 18 meteorological stations (1982–2018) |
Average daily temperature | 18 meteorological stations (1982–2018) |
Catchment properties | |
NDVI | 1 km spatial resolution |
(https://www.resdc.cn/) | (1982–2018 monthly scale) |
Land use area | 1 km spatial resolution |
(http://www.resdc.cn/Default.aspx) | (8 time periods annual scale) |
Basic data information . | Details . |
---|---|
Time series | 1982.01.01–2018.12.31 |
Hydrological stations | Xiaheyan, Qingtongxia, Shizuishan |
Meteorological forcings | |
Average daily precipitation | 18 meteorological stations (1982–2018) |
Average daily temperature | 18 meteorological stations (1982–2018) |
Catchment properties | |
NDVI | 1 km spatial resolution |
(https://www.resdc.cn/) | (1982–2018 monthly scale) |
Land use area | 1 km spatial resolution |
(http://www.resdc.cn/Default.aspx) | (8 time periods annual scale) |
Experimental setup
In this study, we designed three experiments to demonstrate the feasibility of using Transformer for daily-scale streamflow modeling. We constructed a Transformer model utilizing observed hydrometeorological data collected from three hydrological stations (Xiaheyan, Qingtongxia, and Shizuishan) to predict streamflow with a lead time of 1 day. In addition, we constructed LSTM and ANN models under the same conditions for comparison. We evaluated the prediction performance of the Transformer using three evaluation metrics, namely, the NSE, RMSE, and MAE.
To investigate the response of the Transformer model to base flow separation techniques, we divided the runoff data from all hydrological stations into base flow and surface runoff flow data. Subsequently, we developed a novel model named BS-Former and established BS-LSTM and BS-ANN models as corresponding benchmark models. All experiments were conducted with observation data from all three hydrological stations.
The effectiveness of multistep-ahead runoff prediction reflects the comprehensive performance of a model. We conducted otherwise identical model comparison experiments for lead times of 1–7 days. During the verification period, we compared the evaluation indicators and runoff prediction results of all models (with and without base flow separation) at various lead times. These experiments allowed us to identify the strengths and limitations of each model in multiday ahead prediction and to assess whether the influence of base flow separation on the accuracy of different models varies with the extension of the lead time.
Model setting and parameterization
Following data normalization and serialization, the experimental data were partitioned into a calibration set and a validation set. Specifically, the first 70% of the sequence (1982.1.1–2005.2.28) served as the calibration set for model training, and the remaining 30% of the sequence (2005.3.1–2018.12.31) was designated as the validation set for evaluating the models' simulation accuracy. The NSE of each model's simulation results on the validation set was used as the basis for optimizing the network architecture and hyperparameter values.
To ensure impartiality, preliminary experiments were conducted on data from the hydrological stations at Xiaheyan, Qingtongxia, and Shizuishan to establish the initial hyperparameter settings of the Transformer, LSTM, and ANN models. These initial hyperparameter settings are summarized in Table 3. Subsequently, manual parameter adjustments were made to satisfy specific experimental needs and ensure the suitability of the models.
Models . | Index . | Values . |
---|---|---|
Transformer | Batch size | 512 |
Number of heads in multihead attention | 4 | |
Head size | 64 | |
Model dimension | 64 | |
Number of encoder and decoder layers | 4 | |
LSTM | Time steps | 7 |
Batch size | 64 | |
Number of cells | 68 | |
ANN | Number of hidden layers | 2 |
Number of cells | 64 |
Models . | Index . | Values . |
---|---|---|
Transformer | Batch size | 512 |
Number of heads in multihead attention | 4 | |
Head size | 64 | |
Model dimension | 64 | |
Number of encoder and decoder layers | 4 | |
LSTM | Time steps | 7 |
Batch size | 64 | |
Number of cells | 68 | |
ANN | Number of hidden layers | 2 |
Number of cells | 64 |
RESULTS
Prediction results of the runoff model without base flow separation
We utilized a Transformer model along with two benchmark models (LSTM and ANN) to perform simulated predictions with a lead time of 1 day for the Xiaheyan, Qingtongxia, and Shizuishan hydrological stations. The performance metrics during the model calibration and validation periods are presented in Table 4. The Transformer model achieves the best performance, with an NSE greater than 0.9 and RMSE and MAE values in the ranges of 0.09–0.11 and 0.06–0.08, respectively, during both the calibration and validation periods at all sites. This may be due to the attention mechanism of the Transformer model, which can comprehensively capture features and selectively focus on various aspects of the input through its multihead attention, enhancing the extraction of nonlinear features and achieving excellent robustness and stability. In contrast, the LSTM model's performance is slightly inferior, particularly in predicting the runoff at the Qingtongxia hydrological station, for which this model achieves a validation NSE of approximately 0.83, in contrast to its validation NSE of 0.87 for the other two hydrological stations.
Period . | Hydrologic stations . | Models . | NSE . | RMSE . | MAE . |
---|---|---|---|---|---|
Calibration | Xiaheyan | Transformer | 0.926 | 0.114 | 0.069 |
LSTM | 0.919 | 0.114 | 0.076 | ||
ANN | 0.842 | 0.128 | 0.082 | ||
Qingtongxia | Transformer | 0.903 | 0.116 | 0.076 | |
LSTM | 0.889 | 0.118 | 0.082 | ||
ANN | 0.806 | 0.155 | 0.128 | ||
Shizuishan | Transformer | 0.932 | 0.105 | 0.065 | |
LSTM | 0.922 | 0.108 | 0.069 | ||
ANN | 0.836 | 0.157 | 0.107 | ||
Validation | Xiaheyan | Transformer | 0.916 | 0.107 | 0.081 |
LSTM | 0.887 | 0.134 | 0.106 | ||
ANN | 0.802 | 0.178 | 0.112 | ||
Qingtongxia | Transformer | 0.905 | 0.098 | 0.077 | |
LSTM | 0.831 | 0.158 | 0.102 | ||
ANN | 0.783 | 0.160 | 0.103 | ||
Shizuishan | Transformer | 0.914 | 0.106 | 0.075 | |
LSTM | 0.868 | 0.142 | 0.113 | ||
ANN | 0.796 | 0.177 | 0.127 |
Period . | Hydrologic stations . | Models . | NSE . | RMSE . | MAE . |
---|---|---|---|---|---|
Calibration | Xiaheyan | Transformer | 0.926 | 0.114 | 0.069 |
LSTM | 0.919 | 0.114 | 0.076 | ||
ANN | 0.842 | 0.128 | 0.082 | ||
Qingtongxia | Transformer | 0.903 | 0.116 | 0.076 | |
LSTM | 0.889 | 0.118 | 0.082 | ||
ANN | 0.806 | 0.155 | 0.128 | ||
Shizuishan | Transformer | 0.932 | 0.105 | 0.065 | |
LSTM | 0.922 | 0.108 | 0.069 | ||
ANN | 0.836 | 0.157 | 0.107 | ||
Validation | Xiaheyan | Transformer | 0.916 | 0.107 | 0.081 |
LSTM | 0.887 | 0.134 | 0.106 | ||
ANN | 0.802 | 0.178 | 0.112 | ||
Qingtongxia | Transformer | 0.905 | 0.098 | 0.077 | |
LSTM | 0.831 | 0.158 | 0.102 | ||
ANN | 0.783 | 0.160 | 0.103 | ||
Shizuishan | Transformer | 0.914 | 0.106 | 0.075 | |
LSTM | 0.868 | 0.142 | 0.113 | ||
ANN | 0.796 | 0.177 | 0.127 |
The ANN model exhibits inferior performance compared to the other two models, with calibration and validation NSE values of approximately 0.80–0.84 and 0.78–0.80, respectively. This observation is consistent with the previous research by Tongal & Booij (2018). As a simple neural network model, an ANN lacks memory units to retain historical temporal information, resulting in poor performance in simulating long time series. Furthermore, ANN models typically employ fully connected layers for input data processing, which may result in overfitting. Consequently, such a model may overemphasize noise and anomalous values in the calibration dataset, leading to poor generalizability to new data (Hu et al. 2018).
Prediction results of runoff model with base flow separation
Forecasting results with various simulation lead times
DISCUSSION
In this study, we developed a Transformer-based runoff simulation framework incorporating base flow separation. The objective is to investigate the response of deep learning models to prior hydrological knowledge, aiming to enhance the accuracy of runoff prediction. The experimental outcomes indicate that integrating base flow separation techniques into different models can enhance the accuracy of runoff prediction to varying degrees, and this approach is particularly effective when using a Transformer-based model.
We discovered that the introduction of a base flow separation technique enhanced the performance of the LSTM model. The result is in agreement with Chen et al. (2021) and Lee et al. (2023), who reported that an LSTM model based on base flow separation exhibited the best predictive performance. In addition, for the more advanced Transformer model tested in the current study, the addition of base flow separation significantly improved its performance. This enhancement could be due to the self-attention mechanism of this model, which is capable of capturing the relationships between any two time steps, enabling the model to adaptively focus on relevant historical data across different timescales. At the same time, past information can also be flexibly incorporated into predictions, thereby facilitating the learning of long-term dependencies within a data sequence (Yin et al. 2022b). Tongal & Booij (2018) introduced the division of streamflow into base flow and surface flow into ANN-based modeling for short-term monthly scale runoff prediction and found that the prediction results were better than those obtained with a single runoff variable for runoff prediction. In contrast, our findings suggest that base flow separation does not significantly enhance the ANN model's performance. We hypothesize that this difference may stem from the ANN's inherent structure.
Base flow, supplied by groundwater, is a stable and gradual stream, while surface runoff is a stream that forms when rainfall overcomes the ground's absorption capacity, is characterized by rapid changes and a brief response time. By separating base flow from surface runoff, targeted predictions can be made for each, utilizing the Transformer model's inherent self-attention mechanism to improve prediction accuracy. This approach fully leverages the Transformer model's potential, yielding more precise hydrological forecasts. In contrast, although an LSTM model is very suitable for modeling and predicting short-term dependencies in time series data, its effectiveness in predicting runoff in basins with a high proportion of base flow is limited. This limitation stems from the fact that, in cases where the soil moisture deficit in the catchment cannot be replenished within a brief period, a portion of the rainfall is absorbed by the soil and infiltrates into the groundwater, resulting in a delay in surface runoff generation (Serinaldi et al. 2014). LSTM model may not be able to effectively utilize historical rainfall data to anticipate future surface runoff in such scenarios. Furthermore, its recursive structure constrains an LSTM model's ability to extract global information and makes it more susceptible to external disturbances, resulting in a gradual accumulation of errors (Yin et al. 2022b). For instance, in this study, the construction of the Qingtongxia water project might have impacted the spatiotemporal distribution characteristics of local runoff. However, the consideration of such human activities was omitted during the model construction process, leading to the inferior predictive performance of the LSTM model at the Qingtongxia hydrological station compared to that at the other two stations. Thus, it can be inferred that incorporating river network connectivity will be essential in future model construction processes.
We also observed that all models exhibited a significant increase in forecasting error during the dry season. This aligns with the findings of Huang et al. (2020), who employed a Bayesian joint probability model for seasonal flow forecasting and found that both base flow and surface flow play crucial roles in seasonal runoff prediction and that the forecasting error increases during the dry season. During the transition period from the rainy season to the dry season, the main source of the total flow shifts from the surface flow to the base flow. The predictive ability typically decreases during such transitional periods, a phenomenon documented by Robertson et al. (2013) and Zhao et al. (2016).
Furthermore, existing research has explored the capabilities of traditional machine learning techniques for predictions at various lead times (Cheng et al. 2020; Han & Morrison 2022). This study focuses more on demonstrating the potential of base flow separation techniques to enhance the lead time prediction performance of models. As such, our study contributes a measure of innovation within this domain, with the aim of introducing novel avenues for refining hydrological prediction models. Future studies should expand BS-Former's capabilities by incorporating broader hydrological knowledge, enhancing interpretability, and exploring hybrid models that merge machine learning's adaptability with the robustness of physically based approaches, aiming for superior accuracy and understanding in runoff predictions.
CONCLUSION
In this study, we attempted to incorporate prior knowledge of hydrological processes into data-driven modeling. To this end, we proposed a novel runoff prediction model named BS-Former, which is based on the base flow separation approach and the Transformer network architecture. The runoff simulation and prediction capabilities of BS-Former were verified at three hydrological stations in the Ningxia River section of the Yellow River in China, namely, Xiaheyan, Qingtongxia, and Shizuishan. The following conclusions can be drawn from this study:
First, compared to LSTM and ANN models, a Transformer model demonstrates exceptional flexibility in capturing long-term correlations in sequential data due to its unique self-attention mechanism. Moreover, it exhibits significant generalizability and stability for hydrological modeling, with both calibration and validation NSE values exceeding 0.9.
Second, incorporating prior hydrological knowledge of base flow separation significantly enhances the feature extraction capabilities of a Transformer model for runoff sequences. The BS-Former model displays the most robust performance and exceptional prediction accuracy, with an improvement of approximately 0.04 in the NSE index and a reduction of approximately 10% in the peak error compared to the Transformer model without base flow separation.
Third, Base flow separation can alleviate the tendency of Transformer models to overestimate runoff with increasing lead times, thereby reducing peak errors. The BS-Former model has shown the strongest predictive capability across different delivery cycles. This demonstrates that Transformers can deeply explore prior hydrological knowledge through their self-attention mechanisms, thus precisely simulating the rainfall–runoff process.
FUNDING
This work was supported by the Belt and Road Special Foundation of the State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering (Grant numbers: 2021490511) and Ningxia Ecological Geological Survey Demonstration Project (NXCZ20220201).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.