Precise long-term runoff prediction holds crucial significance in water resource management. Although the long short-term memory (LSTM) model is widely adopted for long-term runoff prediction, it encounters challenges such as error accumulation and low computational efficiency. To address these challenges, we utilized a novel method to predict runoff based on a Transformer and the base flow separation approach (BS-Former) in the Ningxia section of the Yellow River Basin. To evaluate the effectiveness of the Transformer model and its responsiveness to the base flow separation technique, we constructed LSTM and artificial neural network (ANN) models as benchmarks for comparison. The results show that Transformer outperforms the other models in terms of predictive performance and that base flow separation significantly improves the performance of the Transformer model. Specifically, the performance of BS-Former in predicting runoff 7 days in advance is comparable to that of the BS-LSTM and BS-ANN models with lead times of 4 and 2 days, respectively. In general, the BS-Former model is a promising tool for long-term runoff prediction.

  • The effectiveness of a Transformer model for simulating and predicting long-term daily runoff is explored.

  • The response of the Transformer model to prior hydrological knowledge in base flow separation is analyzed.

  • The potential of base flow separation techniques to improve the ability to predict the runoff lead time in deep learning models is explored.

Runoff simulation has long been recognized as a critical research endeavor in the field of hydrological science. However, the complex interplay of basin geomorphology, climate change, and anthropogenic activities leads to significant spatiotemporal variability in the distribution of river runoff, thereby posing a formidable challenge in accurately predicting hydrological processes (Senthil Kumar et al. 2005; Xu et al. 2022). Over the past few years, the development of runoff prediction models has attracted increasing attention. These models can be broadly categorized into two main types: mechanism-based (i.e., process-driven) and nonmechanism-based (i.e., data-driven) models (Bittelli et al. 2010; Kratzert et al. 2018; He et al. 2023). Traditional hydrological prediction models are based on complex mathematical formulations aimed at simulating and predicting the physical mechanisms that govern runoff formation. These models require a thorough understanding of the hydrological processes underlying runoff generation, including the influences of terrain, geomorphology, geological structure, meteorology, hydrology, and other relevant parameters, all of which serve as critical inputs to such a model (Qin et al. 2019). Thus, establishing traditional mechanism-based models is a time-consuming process that involves the input of numerous parameters related to the runoff formation process. Moreover, the accuracy of these models relies heavily on the physical interpretation of the model parameters. However, the complex hydraulic connectivity of some basins presents a significant challenge in terms of obtaining input data and parameters for the runoff formation process, thereby impeding the effective simulation of the process and leading to inaccurate prediction results (Yoon et al. 2011). As a result, traditional mechanism-based models may be inadequate for effectively addressing the problem of runoff prediction in such contexts. Alternatively, data-driven models have been described as ‘black box’ models because they do not directly consider the internal physical mechanisms underlying the system under investigation. Instead, they rely on statistical analysis of the input‒output relationships between various factors in the system and transfer functions to describe the system's response (Zhang et al. 2021; Yin et al. 2022a; Liu et al. 2022b). Data-driven models operate independently of prior knowledge and can be applied in information-limited scenarios. The currently available data-driven models primarily consist of statistical models, machine learning models, and deep learning models (Liu et al. 2022b).

Statistical models assume a linear input‒output relationship within a system and utilize relevant statistical methods to predict future runoff based on historical observation data. While statistical models can effectively capture linear relationships between historical and future data, the runoff process often exhibits complex nonlinear relationships. Furthermore, the runoff process is influenced by both natural environmental factors and human activities, and traditional statistical models often fall short in comprehensively integrating these multifaceted influences, resulting in inadequate performance (Liu et al. 2022b). As artificial intelligence technology has continued to advance, machine learning models have emerged as effective tools for addressing complex runoff prediction challenges. This is primarily due to their capacity to capture complex, nonstationary, dynamic, and nonlinear relationships between multiple input and output variables (Kisi 2011; Liu et al. 2022b). Nevertheless, although traditional machine learning methods have shown promise in runoff prediction, their simple structure and limited memory capacity limit their ability to extract deep information. As the dimensionality of the input features increases, these limitations further constrain the predictive performance of such a model (Xie et al. 2019).

Deep learning is a promising area of machine learning that utilizes neural network architectures composed of multiple layers, enabling the extraction of more intricate and abstract features from data. Therefore, deep learning addresses the limitations of previous models (Kratzert et al. 2018). In the field of hydrology, deep learning has garnered considerable attention, achieving superior performance in runoff prediction compared to traditional neural network methods (Xu et al. 2023). This advancement has greatly facilitated the development of runoff prediction models. In particular, recurrent neural networks (RNNs) stand out due to their remarkable capacity for data processing and long-term learning of runoff series data (Hu et al. 2018). However, in the directed loop structure of a traditional RNN, the output of the previous node is used as the input for the current step. Therefore, RNNs have difficulty learning long-term correlations in time series data, which can lead to gradient vanishing, in turn affecting the prediction performance of such a model (Shen et al. 2019). A long short-term memory (LSTM) network is an improved recursive neural network that solves this problem of vanishing gradients. Despite the impressive performance of LSTM models in streamflow prediction, their recursive nature means that they cannot account for direct connections between arbitrary positions, thereby hindering the effective extraction of meaningful information related to nonadjacent position connections. Consequently, the LSTM architecture is not well suited for multistep-ahead streamflow prediction. Furthermore, the limited ability of LSTM models to fully capture time series features may result in error accumulation and consequent degradation in the prediction performance (Yin et al. 2022a).

Subsequently, Bahdanau et al. (2015) introduced an attention mechanism that can establish direct connections between any two positions in a time series and capture the long-term correlation of the series by computing the ‘attention’ between any pair of points (Liu et al. 2022a). This mechanism facilitates multistep-ahead prediction and utilizes a nonautoregressive method, which effectively mitigates the problem of error accumulation. On the basis of extensive investigations of attention mechanisms, Vaswani et al. (2017) introduced a novel model called a Transformer, which relies solely on attention mechanisms. This model initially exhibited promising performance in natural language processing applications. Specifically, the Transformer architecture depends exclusively on attention mechanisms to establish global correlations between the input and output and has strong temporal coupling and global correlation feature extraction capabilities (Fang et al. 2024; Jin et al. 2024). Compared to an LSTM model, a Transformer model with a self-attention mechanism offers the advantage of parallelized computation, thereby enhancing the efficiency of processing long time series data (Masafu & Williams 2024). In addition, the self-attention mechanism in a Transformer model is capable of capturing intricate relationships between various time points in a time series, thus rendering it better suited for long-term multistep-ahead streamflow prediction (Yin et al. 2022b). However, the applications of Transformer models in hydrology are still limited. Therefore, more research is needed to validate the effectiveness of such models in hydrological prediction.

Data-driven models such as deep learning models do not involve the explicit modeling of any physical processes. These models instead focus on learning the interrelationships between flow characteristics and various variables. Consequently, in the presence of substantial noise interference, machine learning models may tend to excessively emphasize abnormal fluctuations within the data while ignoring genuine trends, thus weakening the stability and reliability of these relationships (Jiahai & Aiyun 2019). Therefore, it is imperative to explore methods of enhancing the capacity of deep learning models in streamflow prediction and methods of reducing errors. Numerous studies have demonstrated that the method of decomposing runoff time series and individually modeling its components exhibits superior performance compared to the traditional approach of directly modeling the entire series (Wang et al. 2021; Zhang et al. 2022a). Accordingly, in this study, we incorporate relevant prior hydrological information into a model to guide its learning and comprehension of specific flow patterns, thereby revealing deeper underlying patterns within the flow data more accurately and consequently enhancing the model performance. In particular, two major components of runoff are base flow and surface flow. Studies indicate that a combination model created by building machine learning models to separately predict these two components and subsequently overlaying them produces runoff prediction results that are superior to those obtained by a single machine learning model (Corzo & Solomatine 2007; Tongal & Booij 2018; Huang et al. 2020). However, the current research primarily focuses on short-term runoff prediction at the monthly scale, with limited exploration into long-term daily-scale runoff prediction incorporating base flow and surface flow as prediction variables. Most existing studies have predominantly emphasized enhancing the predictive accuracy of machine learning models, while the impact of base flow separation techniques on the lead time forecasting capabilities of those models has received little attention. Furthermore, there is a lack of validation for recently proposed advanced deep learning models in the context of runoff prediction based on base flow separation. In particular, no studies have yet explored whether baseflow separation techniques can enhance the predictive performance of Transformer models. Therefore, the specific objectives of this study are (1) to study the effectiveness of Transformer models for simulating and predicting long-term daily runoff, (2) to analyze Transformer model response to prior hydrological knowledge such as base flow, and (3) to explore base flow separation to enhance the potential of machine learning models for lead time predictions of runoff.

Transformer model

The Transformer algorithm, comprising an encoder and a decoder, is utilized to construct the model. Notably, this model incorporates three attention mechanisms, namely, multihead self-attention, masked attention, and encoder–decoder attention, which operate across the entire system (Yin et al. 2022a). The primary component of both the encoder and decoder modules is the multihead attention mechanism. A visual depiction of the specific architecture of the Transformer runoff prediction model is presented in Figure 1.
Figure 1

Transformer-based architecture for runoff prediction.

Figure 1

Transformer-based architecture for runoff prediction.

Close modal
Due to the dimensional mismatch between the input and output, an input transformation layer is introduced at the start of both the encoder and decoder. The input sequence is first passed through an embedding layer composed of a single-layer fully connected neural network, and then position encoding information is added, which is calculated according to Equation (1). The encoder processes meteorological forcing and catchment properties from various locations, while the decoder handles observed runoff values from those same locations (Yin et al. 2022b).
(1)
where represents the position encoding vector in the input vector mapped to the ith dimension at time t and represents the dimensionality of the input vector.
The features of the input data are encoded as query (Q), key (K), and value (V) vectors. Subsequently, the multihead self-attention mechanism is applied to calculate the attention scores between different sequences, introducing diverse weighting information into the model in accordance with Equations (2) and (3). By utilizing keys, values, and queries, the encoder can capture similarities between meteorological forcing and catchment properties at diverse locations, whereas the decoder can recognize similarities between observed runoff values at different locations.
(2)
(3)
where W with upper and lower indices represents a matrix, , , and ; is the dimensionality of the Q and K vectors, and is the dimensionality of the V vectors, satisfying , where represents the hidden layer size of the Transformer structure and h represents the number of attention heads. Moreover, the superscript T denotes transposition of a matrix.
The final step involves a nonlinear transformation mapping to obtain distinct encoding vectors c, which are then interpreted by the decoder to produce the output sequence . Specifically, each output at time t is generated based on the prior output sequence and the encoding vector generated at time t − 1, as described by Equation (4):
(4)

Each layer of the encoder is composed of two sublayers: a multihead self-attention mechanism and a feedforward network that applies nonlinear transformations to the output of the attention mechanism (Zhang et al. 2019). Each sublayer includes residual connections and layer normalization. The decoder, on the other hand, has three sublayers per layer and includes an additional encoder–decoder attention sublayer that acts on the multihead attention output. The first attention module in the decoder includes a mask. Finally, the output of the decoder is linearly transformed to generate the prediction. The attention module combinations used in the Transformer model are shown in Table 1.

Table 1

Attention used in the transformer model

PartsSelf or cross -attentionSingle or multiheadYes or No – maskAttention name
Encoder Self-attention Multihead No Multihead attention 
Decoder Self-attention Multihead Yes Masked multihead attention 
Cross-attention Multihead No Encoder–decoder attention 
PartsSelf or cross -attentionSingle or multiheadYes or No – maskAttention name
Encoder Self-attention Multihead No Multihead attention 
Decoder Self-attention Multihead Yes Masked multihead attention 
Cross-attention Multihead No Encoder–decoder attention 

BS-Former model

Separating the base flow from the daily streamflow hydrograph helps to capture the nonlinear relationships between different components and influencing factors during the formation of runoff, thereby improving the accuracy of runoff prediction. In previous research, the applicability of commonly used base flow separation methods in the Yellow River Basin has been compared. The results indicate that digital filtering demonstrates strong objectivity and reproducibility, with the Eckhardt digital recursive filtering method particularly excelling in terms of its applicability (Hu et al. 2021; Zhang et al. 2022b). Digital filtering methods involve utilizing daily runoff data and employing filtering equations to separate the base flow, which is regarded as a low-frequency signal, from the runoff hydrograph. In this work, the Eckhardt recursive digital filtering method is used for base flow separation, and the base flow separation equation is given in Equation (5) (Eckhardt 2008):
(5)
where α is the water recession constant, α = 0.925, is the maximum base flow index, , is the flow at time t, in m3/s, and and are the base flow at time t 1 and time t, respectively.
The BS-Former model that is presented in this study combines base flow separation with Transformer-based techniques, as depicted in Figure 2. The Eckhardt method is employed to partition the runoff at time t into its constituent base flow and surface flow elements, each of which serves as an input sequence to the Transformer model's decoder. Meteorological data and catchment properties are utilized as input sequences for the encoder of the Transformer model. The resulting predictions for both components are subsequently combined to estimate the total flow at time t + 1 (Zhang et al. 2022b).
Figure 2

Architecture of the BS-Former model.

Figure 2

Architecture of the BS-Former model.

Close modal

Performance evaluation and benchmarks

The model evaluation indices selected in this article include the Nash–Sutcliffe efficiency (NSE) coefficient, root-mean-square error (RMSE), and mean absolute error (MAE). Their specific mathematical expressions are as follows:
(6)
(7)
(8)
where and represent real observation data and simulated data, respectively; represents the mean value of the real observation data; represents the average value of the simulated data; i represents the ith time period; and n represents the total number of periods. The range of the NSE is (− ∞, 1), and an NSE value closer to 1 indicates a better prediction effect. Both the RMSE and MAE take values in the range of (0, +∞), and the closer these values are to 0, the better the prediction effect.
We employ two benchmarks, namely, an LSTM model (the most commonly used type of deep learning model) and an artificial neural network (ANN; an early machine learning model), for comparison to assess the performance of our Transformer model for streamflow prediction. The ANN model is a typical feedforward neural network comprising an input layer, a hidden layer, and an output layer, where each layer consists of fully connected nodes (Goyal 2014; Hu et al. 2018). Numerous studies have demonstrated the effectiveness of ANNs in streamflow prediction (Dalir et al. 2022; Samantaray et al. 2022). An LSTM network is a variant of an RNN that has been introduced as a solution to the vanishing gradient problem observed in previous RNN models (Xu et al. 2022). An LSTM cell consists of a forget gate, an input gate, and an output gate; its specific structure is shown in Figure 3.
Figure 3

LSTM network architecture.

Figure 3

LSTM network architecture.

Close modal

Study area and data

The Yellow River, spanning 5,464 km, with a basin area of 752,442 km2, is the sixth longest river globally and the second longest in China. In this study, we selected the Ningxia section of the Yellow River Basin as our study area. This area is located in the middle and upper reaches of the Yellow River, with a total length of 397 km and a basin area of 51,400 km2. The geographical coordinates of this area range between 36°0′ and 39°23′ N in latitude and 104°17′–107°39′ E in longitude. A regional overview is presented in Figure 4.
Figure 4

Location of the Ningxia section of the Yellow River Basin.

Figure 4

Location of the Ningxia section of the Yellow River Basin.

Close modal

The study area has a typical continental climate, with scarce precipitation, and the rainy season is mostly concentrated from June to September, with an annual rainfall of approximately 225 mm. During periods of little or no rainfall, the water flow in the rivers primarily depends on the release of stored groundwater. Therefore, the base flow, as a crucial component of streamflow, plays an extremely important role in maintaining the sustainability of the Yellow River ecosystem (Hu et al. 2021). The Ningxia section of the Yellow River is flat and has an average elevation of approximately 1,000 m. There are three large hydrological stations on the mainstream of the Ningxia section of the Yellow River from southwest to northeast, namely, Xiaheyan, Qingtongxia, and Shizuishan, with annual runoffs of 27.24, 20.89, and 24.61 billion m3, respectively (Wang et al. 2019). The annual runoff in the study area is unevenly distributed in time and space, with more runoff in the higher part than in the northern part, and the runoff in the flood season accounts for 70–80% of the annual runoff. Since the 1990s, the runoff of the Yellow River has been continuously decreasing. In addition, the study area is facing various challenges, such as supply and demand imbalance, water resource shortages, serious soil erosion, and a fragile natural ecology, which seriously restrict local economic development and ecological environment construction (Miao et al. 2022).

The study focuses on the Ningxia segment of the Yellow River Basin, where daily flow metrics were collected from the Xiaheyan, Qingtongxia, and Shizuishan hydrological stations spanning from 1982 to 2018. Daily precipitation and temperature readings were also gathered from 18 meteorological stations within the study area. The precise locations of these stations are illustrated in Figure 1. Through the use of ArcGIS 10.6, monthly normalized difference vegetation index (NDVI) remote sensing imagery from the corresponding period was meticulously processed and interpreted to extract NDVI values. Spatial analyses were performed on land-use remote sensing images for the years 1982, 1990, 1995, 2000, 2005, 2010, 2015, and 2020, documenting the changes in land use over time. Linear interpolation was employed to format the NDVI and land use data into daily-scale time series, facilitating their integration into the model and ensuring temporal consistency between the model's input parameters and output results. Comprehensive details of the data are provided in Table 2.

Table 2

Basic data of meteorological forcing and catchment properties

Basic data informationDetails
Time series 1982.01.01–2018.12.31 
Hydrological stations Xiaheyan, Qingtongxia, Shizuishan 
Meteorological forcings  
Average daily precipitation 18 meteorological stations (1982–2018) 
Average daily temperature 18 meteorological stations (1982–2018) 
Catchment properties  
NDVI 1 km spatial resolution 
(https://www.resdc.cn/(1982–2018 monthly scale) 
Land use area 1 km spatial resolution 
(http://www.resdc.cn/Default.aspx(8 time periods annual scale) 
Basic data informationDetails
Time series 1982.01.01–2018.12.31 
Hydrological stations Xiaheyan, Qingtongxia, Shizuishan 
Meteorological forcings  
Average daily precipitation 18 meteorological stations (1982–2018) 
Average daily temperature 18 meteorological stations (1982–2018) 
Catchment properties  
NDVI 1 km spatial resolution 
(https://www.resdc.cn/(1982–2018 monthly scale) 
Land use area 1 km spatial resolution 
(http://www.resdc.cn/Default.aspx(8 time periods annual scale) 

Experimental setup

In this study, we designed three experiments to demonstrate the feasibility of using Transformer for daily-scale streamflow modeling. We constructed a Transformer model utilizing observed hydrometeorological data collected from three hydrological stations (Xiaheyan, Qingtongxia, and Shizuishan) to predict streamflow with a lead time of 1 day. In addition, we constructed LSTM and ANN models under the same conditions for comparison. We evaluated the prediction performance of the Transformer using three evaluation metrics, namely, the NSE, RMSE, and MAE.

To investigate the response of the Transformer model to base flow separation techniques, we divided the runoff data from all hydrological stations into base flow and surface runoff flow data. Subsequently, we developed a novel model named BS-Former and established BS-LSTM and BS-ANN models as corresponding benchmark models. All experiments were conducted with observation data from all three hydrological stations.

The effectiveness of multistep-ahead runoff prediction reflects the comprehensive performance of a model. We conducted otherwise identical model comparison experiments for lead times of 1–7 days. During the verification period, we compared the evaluation indicators and runoff prediction results of all models (with and without base flow separation) at various lead times. These experiments allowed us to identify the strengths and limitations of each model in multiday ahead prediction and to assess whether the influence of base flow separation on the accuracy of different models varies with the extension of the lead time.

Model setting and parameterization

The framework of the BS-Former model was built using the Python 3.9 programming language and the TensorFlow platform. The inputs for all models included daily meteorological data (temperature and precipitation) and daily underlying surface data (land use and NDVI data). For the models without base flow separation, the inputs additionally included the runoff at time t, and the output data corresponded to the runoff at time t + 1. In contrast, the inputs to the models with base flow separation additionally included the base flow and surface flow at time t, and the outputs were the base flow and surface flow at time t + 1. The entire dataset covered the time span from 1982 to 2018. Data preprocessing was conducted with the aid of the NumPy, Pandas, and PySwarms libraries. First, we processed the necessary data into a form that could be recognized by the models. This step entailed aligning the climate and surface data with the corresponding runoff (and runoff components) for each time period into a temporally structured dataset. We considered the influencing factor data at time t–n to correspond to the runoff (and runoff components) data at time t, where n represents the length of the forecast period. During the simulation process, we employed a sliding window strategy for data selection. As time progressed, the sliding window was moved forward one step at a time until the end of the dataset was reached, as presented in Figure 5. Furthermore, it should be noted that the dimensions of the necessary runoff data were not the same as those of the meteorological, NDVI, and land use data. Therefore, data normalization to scale the data to values between 0 and 1 was necessary to facilitate effective model learning. The expression for this operation is given as follows:
(9)
where and x represent the normalization results and the original sample data, respectively, whereas and are the average and standard deviation, respectively, of the sample data.
Figure 5

An illustration of data serialization.

Figure 5

An illustration of data serialization.

Close modal

Following data normalization and serialization, the experimental data were partitioned into a calibration set and a validation set. Specifically, the first 70% of the sequence (1982.1.1–2005.2.28) served as the calibration set for model training, and the remaining 30% of the sequence (2005.3.1–2018.12.31) was designated as the validation set for evaluating the models' simulation accuracy. The NSE of each model's simulation results on the validation set was used as the basis for optimizing the network architecture and hyperparameter values.

To ensure impartiality, preliminary experiments were conducted on data from the hydrological stations at Xiaheyan, Qingtongxia, and Shizuishan to establish the initial hyperparameter settings of the Transformer, LSTM, and ANN models. These initial hyperparameter settings are summarized in Table 3. Subsequently, manual parameter adjustments were made to satisfy specific experimental needs and ensure the suitability of the models.

Table 3

The values of the hyperparameters

ModelsIndexValues
Transformer Batch size 512 
Number of heads in multihead attention 
Head size 64 
Model dimension 64 
Number of encoder and decoder layers 
LSTM Time steps 
Batch size 64 
Number of cells 68 
ANN Number of hidden layers 
Number of cells 64 
ModelsIndexValues
Transformer Batch size 512 
Number of heads in multihead attention 
Head size 64 
Model dimension 64 
Number of encoder and decoder layers 
LSTM Time steps 
Batch size 64 
Number of cells 68 
ANN Number of hidden layers 
Number of cells 64 

Prediction results of the runoff model without base flow separation

We utilized a Transformer model along with two benchmark models (LSTM and ANN) to perform simulated predictions with a lead time of 1 day for the Xiaheyan, Qingtongxia, and Shizuishan hydrological stations. The performance metrics during the model calibration and validation periods are presented in Table 4. The Transformer model achieves the best performance, with an NSE greater than 0.9 and RMSE and MAE values in the ranges of 0.09–0.11 and 0.06–0.08, respectively, during both the calibration and validation periods at all sites. This may be due to the attention mechanism of the Transformer model, which can comprehensively capture features and selectively focus on various aspects of the input through its multihead attention, enhancing the extraction of nonlinear features and achieving excellent robustness and stability. In contrast, the LSTM model's performance is slightly inferior, particularly in predicting the runoff at the Qingtongxia hydrological station, for which this model achieves a validation NSE of approximately 0.83, in contrast to its validation NSE of 0.87 for the other two hydrological stations.

Table 4

Performance indices obtained from the simulation models without base flow separation for all stations during the calibration and validation periods

PeriodHydrologic stationsModelsNSERMSEMAE
Calibration Xiaheyan Transformer 0.926 0.114 0.069 
LSTM 0.919 0.114 0.076 
ANN 0.842 0.128 0.082 
Qingtongxia Transformer 0.903 0.116 0.076 
LSTM 0.889 0.118 0.082 
ANN 0.806 0.155 0.128 
Shizuishan Transformer 0.932 0.105 0.065 
LSTM 0.922 0.108 0.069 
ANN 0.836 0.157 0.107 
Validation Xiaheyan Transformer 0.916 0.107 0.081 
LSTM 0.887 0.134 0.106 
ANN 0.802 0.178 0.112 
Qingtongxia Transformer 0.905 0.098 0.077 
LSTM 0.831 0.158 0.102 
ANN 0.783 0.160 0.103 
Shizuishan Transformer 0.914 0.106 0.075 
LSTM 0.868 0.142 0.113 
ANN 0.796 0.177 0.127 
PeriodHydrologic stationsModelsNSERMSEMAE
Calibration Xiaheyan Transformer 0.926 0.114 0.069 
LSTM 0.919 0.114 0.076 
ANN 0.842 0.128 0.082 
Qingtongxia Transformer 0.903 0.116 0.076 
LSTM 0.889 0.118 0.082 
ANN 0.806 0.155 0.128 
Shizuishan Transformer 0.932 0.105 0.065 
LSTM 0.922 0.108 0.069 
ANN 0.836 0.157 0.107 
Validation Xiaheyan Transformer 0.916 0.107 0.081 
LSTM 0.887 0.134 0.106 
ANN 0.802 0.178 0.112 
Qingtongxia Transformer 0.905 0.098 0.077 
LSTM 0.831 0.158 0.102 
ANN 0.783 0.160 0.103 
Shizuishan Transformer 0.914 0.106 0.075 
LSTM 0.868 0.142 0.113 
ANN 0.796 0.177 0.127 

The ANN model exhibits inferior performance compared to the other two models, with calibration and validation NSE values of approximately 0.80–0.84 and 0.78–0.80, respectively. This observation is consistent with the previous research by Tongal & Booij (2018). As a simple neural network model, an ANN lacks memory units to retain historical temporal information, resulting in poor performance in simulating long time series. Furthermore, ANN models typically employ fully connected layers for input data processing, which may result in overfitting. Consequently, such a model may overemphasize noise and anomalous values in the calibration dataset, leading to poor generalizability to new data (Hu et al. 2018).

In addition, the fitting process of the observed runoff values and simulated values from different models during the validation period is given in Figure 6, which intuitively illustrates the simulation performance of each model. The results show that the Transformer model exhibits good consistency in simulating runoff at each of the three hydrological stations and achieves a better fitting effect than the other models, particularly for the simulation of the peak values. The LSTM model captures the overall trend of sequential changes, but it struggles with the control of flow at certain locations, particularly at the Qingtongxia hydrological station, where the simulated values tend to be overestimated. The ANN model displays fluctuations and nonstationary phenomena during the fitting process and produces simulated values that generally underestimate the runoff at the Xiaheyan and Qingtongxia sites while overestimating the runoff at the Shizuishan site. Notably, during the dry seasons from 2016 to 2018, the simulated results of all models exhibit varying degrees of error, with particularly pronounced errors at the Xiaheyan and Qingtongxia stations. This could be due to the interaction and transformation of seasonal water flow components. The water discharge was lower than the average flow value during the entire duration of the dry season during these years, thereby increasing the errors. In addition, it is important to note that the error at the Shizuishan station is relatively small, indicating a gradual reduction in error along the flow direction of the river.
Figure 6

Fitted hydrographs of all models without base flow separation at different hydrological stations ((a) Xiaheyan, (b) Qingtongxia, and (c) Shizuishan) during the validation period, where the shaded regions represent the 95% confidence intervals for the fitted values of the three models.

Figure 6

Fitted hydrographs of all models without base flow separation at different hydrological stations ((a) Xiaheyan, (b) Qingtongxia, and (c) Shizuishan) during the validation period, where the shaded regions represent the 95% confidence intervals for the fitted values of the three models.

Close modal

Prediction results of runoff model with base flow separation

In the previous section, performance differences among the various models were discussed. Here, we examine the sensitivity of the different models to base flow separation techniques. We have selected the Eckhardt recursive digital filtering method to ensure the reliability and accuracy of base flow separation. By using this method, we developed corresponding simulation models, namely BS-Former, BS-LSTM, and BS-ANN. In these models, the original daily runoff sequence was separated into base flow and surface flow, each of which was input into the model to capture the nonlinear relationships between these runoff components and other variables. The experiment was otherwise designed in the same way as that described in Section 4.1. The base flow separation results for the three hydrological monitoring stations are depicted in Figure 7. It is apparent that the base flow patterns and runoff variations at each site exhibit high consistency over time. The ratios of base flow to runoff for the three stations are each approximately 50%. The outcomes are closely aligned with the findings of Hu et al. (2021) in their study of base flow separation within the Yellow River Basin. Their research revealed that the Eckhardt recursive digital filtering method performed optimally in terms of base flow separation, yielding a base flow index (BFI) of approximately 0.5. Moreover, Zhao et al. (2022) found a BFI of approximately 0.6 when exploring the trend of variation in the base flow in the same research area as ours. While their result was slightly higher than our base flow ratio, this divergence may stem from their utilization of the Lyne-Hollick digital filtering technique. Hu et al. (2021) also noted in their study that the Lyne-Hollick method, when employed for base flow separation, tends to overestimate base flow and exhibits greater fluctuation and a greater degree of discretization in the base flow compared to the Eckhardt method.
Figure 7

Runoff and base flow hydrograph at different hydrological stations ((a) Xiaheyan, (b) Qingtongxia, and (c) Shizuishan) during the calibration and validation period.

Figure 7

Runoff and base flow hydrograph at different hydrological stations ((a) Xiaheyan, (b) Qingtongxia, and (c) Shizuishan) during the calibration and validation period.

Close modal
We present an analysis of the scatter density distributions of the simulated and observed values for all models (with and without base flow separation) across the three hydrological stations. The results obtained during the validation are shown in Figure 8. For all models, the simulated values exhibit a significant correlation with the observed values, particularly within the 500–1,000 m3/s runoff range. Base flow separation has improved the performance of Transformer, LSTM, and ANN models. This may be because the base flow reflects the long-term response of groundwater to rainfall, while the surface flow corresponds to a short-term response (Serinaldi et al. 2014). Introducing base flow and surface flow as independent input variables into data-driven models can reduce the time lag error in runoff prediction, thereby increasing the accuracy of long-term runoff forecasts. The scatter plot of the BS-Former model shows that the simulated and observed values at all sites are closely distributed near the 45-degree line with minimal deviation, demonstrating its optimal performance. In comparison to the Transformer model without base flow separation, the BS-Former model exhibits an approximate improvement of 0.03 in the NSE index, a decrease of approximately 0.02 in the RMSE, and a reduction of approximately 0.01 in the MAE. The scatter plot distribution of the LSTM model lies generally below the 45-degree line, but the introduction of the base flow separation technique moves the results of the BS-LSTM model closer to the 45-degree line, with an average increase of approximately 0.05 in the NSE index. Conversely, the ANN model demonstrates a significant bias, as its scatter plot distribution indicates overestimation of low values and underestimation of high values.
Figure 8

Comparison of the forecasting results of all models (with and without base flow separation) at different hydrological stations ((a) Xiaheyan, (b) Qingtongxia, and (c) Shizuishan) during the validation period.

Figure 8

Comparison of the forecasting results of all models (with and without base flow separation) at different hydrological stations ((a) Xiaheyan, (b) Qingtongxia, and (c) Shizuishan) during the validation period.

Close modal

Forecasting results with various simulation lead times

The previous section investigated the response of various models to baseflow separation. Further, to explore the potential improvement in the forecasting ability of different models due to baseflow separation, we analyzed the simulation performance of all models (both with and without baseflow separation) for lead times of 1–7 days, focusing on three hydrological stations during the validation period. The performance indicators are shown in Figure 9. The results suggest that there is noticeable consistency in the performance of the different models across the three hydrological stations. As the lead time increases, the simulation performance of all models tends to exhibit a generally decreasing trend. Notably, the accuracy of all models decreases significantly when the lead time surpasses 2 days, in agreement with the research conducted on runoff prediction lead time by Deng et al. (2022). The Transformer model demonstrates superior forecasting capabilities at longer lead times. Specifically, the performance of the Transformer model for runoff prediction with a 7-day lead time is equivalent to that of the LSTM and ANN models for 4-day and 3-day lead times, respectively. Moreover, the BS-Former model shows the most significant improvement in long-term forecasting with the assistance of base flow separation, as demonstrated by its average increase of 0.04 in the NSE for each lead time when compared with the results of the Transformer model.
Figure 9

Runoff prediction accuracy of all models (with and without base flow separation) at different hydrological stations ((a) Xiaheyan, (b) Qingtongxia, and (c) Shizuishan) in each lead time during the validation period.

Figure 9

Runoff prediction accuracy of all models (with and without base flow separation) at different hydrological stations ((a) Xiaheyan, (b) Qingtongxia, and (c) Shizuishan) in each lead time during the validation period.

Close modal
The simulated flow distribution for all models with different lead times is shown in Figure 10. The results indicate that compared to the actual observed values, the overall distribution of the results of the ANN model is too concentrated and unstable at different lead times. The LSTM model tends to underestimate the flow value when the lead time is greater than 2 days. In contrast, the fluctuations of the Transformer model are relatively stable, and the distribution characteristics of the simulated values are consistent with the observed data at different lead times. However, the Transformer model somewhat overestimates the flow values. The introduction of base flow separation can improve the accuracy of the LSTM model's prediction results when the lead time is less than 2 days. As the lead time increases, base flow separation also enables a certain degree of optimization in the Transformer model's estimation of flow values. Compared with the Transformer model, the BS-Former model yields estimated values that show closer alignment with the observed values in terms of the box size, mean, and median, further enhancing the performance robustness of this model.
Figure 10

Runoff data distribution characteristics of all models (with and without base flow separation) at different hydrological stations ((a) Xiaheyan, (b) Qingtongxia, and (c) Shizuishan) in each lead time during the validation period.

Figure 10

Runoff data distribution characteristics of all models (with and without base flow separation) at different hydrological stations ((a) Xiaheyan, (b) Qingtongxia, and (c) Shizuishan) in each lead time during the validation period.

Close modal

In this study, we developed a Transformer-based runoff simulation framework incorporating base flow separation. The objective is to investigate the response of deep learning models to prior hydrological knowledge, aiming to enhance the accuracy of runoff prediction. The experimental outcomes indicate that integrating base flow separation techniques into different models can enhance the accuracy of runoff prediction to varying degrees, and this approach is particularly effective when using a Transformer-based model.

We discovered that the introduction of a base flow separation technique enhanced the performance of the LSTM model. The result is in agreement with Chen et al. (2021) and Lee et al. (2023), who reported that an LSTM model based on base flow separation exhibited the best predictive performance. In addition, for the more advanced Transformer model tested in the current study, the addition of base flow separation significantly improved its performance. This enhancement could be due to the self-attention mechanism of this model, which is capable of capturing the relationships between any two time steps, enabling the model to adaptively focus on relevant historical data across different timescales. At the same time, past information can also be flexibly incorporated into predictions, thereby facilitating the learning of long-term dependencies within a data sequence (Yin et al. 2022b). Tongal & Booij (2018) introduced the division of streamflow into base flow and surface flow into ANN-based modeling for short-term monthly scale runoff prediction and found that the prediction results were better than those obtained with a single runoff variable for runoff prediction. In contrast, our findings suggest that base flow separation does not significantly enhance the ANN model's performance. We hypothesize that this difference may stem from the ANN's inherent structure.

Base flow, supplied by groundwater, is a stable and gradual stream, while surface runoff is a stream that forms when rainfall overcomes the ground's absorption capacity, is characterized by rapid changes and a brief response time. By separating base flow from surface runoff, targeted predictions can be made for each, utilizing the Transformer model's inherent self-attention mechanism to improve prediction accuracy. This approach fully leverages the Transformer model's potential, yielding more precise hydrological forecasts. In contrast, although an LSTM model is very suitable for modeling and predicting short-term dependencies in time series data, its effectiveness in predicting runoff in basins with a high proportion of base flow is limited. This limitation stems from the fact that, in cases where the soil moisture deficit in the catchment cannot be replenished within a brief period, a portion of the rainfall is absorbed by the soil and infiltrates into the groundwater, resulting in a delay in surface runoff generation (Serinaldi et al. 2014). LSTM model may not be able to effectively utilize historical rainfall data to anticipate future surface runoff in such scenarios. Furthermore, its recursive structure constrains an LSTM model's ability to extract global information and makes it more susceptible to external disturbances, resulting in a gradual accumulation of errors (Yin et al. 2022b). For instance, in this study, the construction of the Qingtongxia water project might have impacted the spatiotemporal distribution characteristics of local runoff. However, the consideration of such human activities was omitted during the model construction process, leading to the inferior predictive performance of the LSTM model at the Qingtongxia hydrological station compared to that at the other two stations. Thus, it can be inferred that incorporating river network connectivity will be essential in future model construction processes.

We also observed that all models exhibited a significant increase in forecasting error during the dry season. This aligns with the findings of Huang et al. (2020), who employed a Bayesian joint probability model for seasonal flow forecasting and found that both base flow and surface flow play crucial roles in seasonal runoff prediction and that the forecasting error increases during the dry season. During the transition period from the rainy season to the dry season, the main source of the total flow shifts from the surface flow to the base flow. The predictive ability typically decreases during such transitional periods, a phenomenon documented by Robertson et al. (2013) and Zhao et al. (2016).

Furthermore, existing research has explored the capabilities of traditional machine learning techniques for predictions at various lead times (Cheng et al. 2020; Han & Morrison 2022). This study focuses more on demonstrating the potential of base flow separation techniques to enhance the lead time prediction performance of models. As such, our study contributes a measure of innovation within this domain, with the aim of introducing novel avenues for refining hydrological prediction models. Future studies should expand BS-Former's capabilities by incorporating broader hydrological knowledge, enhancing interpretability, and exploring hybrid models that merge machine learning's adaptability with the robustness of physically based approaches, aiming for superior accuracy and understanding in runoff predictions.

In this study, we attempted to incorporate prior knowledge of hydrological processes into data-driven modeling. To this end, we proposed a novel runoff prediction model named BS-Former, which is based on the base flow separation approach and the Transformer network architecture. The runoff simulation and prediction capabilities of BS-Former were verified at three hydrological stations in the Ningxia River section of the Yellow River in China, namely, Xiaheyan, Qingtongxia, and Shizuishan. The following conclusions can be drawn from this study:

First, compared to LSTM and ANN models, a Transformer model demonstrates exceptional flexibility in capturing long-term correlations in sequential data due to its unique self-attention mechanism. Moreover, it exhibits significant generalizability and stability for hydrological modeling, with both calibration and validation NSE values exceeding 0.9.

Second, incorporating prior hydrological knowledge of base flow separation significantly enhances the feature extraction capabilities of a Transformer model for runoff sequences. The BS-Former model displays the most robust performance and exceptional prediction accuracy, with an improvement of approximately 0.04 in the NSE index and a reduction of approximately 10% in the peak error compared to the Transformer model without base flow separation.

Third, Base flow separation can alleviate the tendency of Transformer models to overestimate runoff with increasing lead times, thereby reducing peak errors. The BS-Former model has shown the strongest predictive capability across different delivery cycles. This demonstrates that Transformers can deeply explore prior hydrological knowledge through their self-attention mechanisms, thus precisely simulating the rainfall–runoff process.

This work was supported by the Belt and Road Special Foundation of the State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering (Grant numbers: 2021490511) and Ningxia Ecological Geological Survey Demonstration Project (NXCZ20220201).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Bahdanau
D.
,
Cho
K.
&
Bengio
Y
.
2015
Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473. https://doi.org/10.48550/arXiv.1409.0473
.
Bittelli
M.
,
Tomei
F.
,
Pistocchi
A.
,
Flury
M.
,
Boll
J.
,
Brooks
E. S.
&
Antolini
G.
2010
Development and testing of a physically based, three-dimensional model of surface and subsurface hydrology
.
Advances in Water Resources
33
(
1
),
106
122
.
https://doi.org/10.1016//j.advwatres.2009.10.013
.
Chen
H.
,
Xu
Y. P.
,
Teegavarapu
R. S. V.
,
Guo
Y. X.
&
Xie
J. K.
2021
Assessing different roles of base flow and surface runoff for long-term streamflow forecasting in southeastern China
.
Hydrological Sciences Journal
66
(
16
),
2312
2329
.
https://doi:10.1080/02626667.2021.1988612
.
Cheng
M.
,
Fang
F.
,
Kinouchi
T.
,
Navon
I. M.
&
Pain
C. C.
2020
Long lead-time daily and monthly streamflow forecasting using machine learning methods
.
Journal of Hydrology
590
.
https://doi:10.1016/j.jhydrol.2020.125376
.
Corzo
G.
&
Solomatine
D.
2007
Baseflow separation techniques for modular artificial neural network modelling in flow forecasting
.
Hydrological Sciences Journal
52
(
3
),
491
507
.
https://doi.org/10.1623/hysj.52.3.491
.
Dalir
P.
,
Naghdi
R.
,
Gholami
V.
,
Tavankar
F.
,
Latterini
F.
,
Venanzi
R.
&
Picchio
R.
2022
Risk assessment of runoff generation using an artificial neural network and field plots in road and forest land areas
.
Natural Hazards
113
(
3
),
1451
1469
.
https://doi.org/10.1007/s11069-022-05352-5
.
Deng
H. Q.
,
Chen
W. J.
&
Huang
G. R.
2022
Deep insight into daily runoff forecasting based on a CNN-LSTM model
.
Natural Hazards
113
(
3
),
1675
1696
.
https://doi.org/10.1007/s11069-022-05363-2
.
Eckhardt
K.
2008
A comparison of base flow indices, which were calculated with seven different base flow separation methods
.
Journal of Hydrology
352
(
1–2
),
168
173
.
https://doi.org/10.1016/j.jhydrol.2008.01.005
.
Fang
J. J.
,
Yang
L. S.
,
Wen
X. H.
,
Li
W. D.
,
Yu
H. J.
&
Zhou
T.
2024
A deep learning-based hybrid approach for multi-time-ahead streamflow prediction in an arid region of Northwest China
.
Hydrology Research
55
(
2
),
180
204
.
https://doi:10.2166/nh.2024.124
.
Goyal
M. K.
2014
Modeling of sediment yield prediction using M5 model tree algorithm and wavelet regression
.
Water Resources Management
28
(
7
),
1991
2003
.
https://doi.org/10.1007/s11269-014-0590-6
.
Han
H.
&
Morrison
R. R.
2022
Improved runoff forecasting performance through error predictions using a deep-learning approach
.
Journal of Hydrology
608
.
https://doi:10.1016/j.jhydrol.2022.127653
.
He
S.
,
Sang
X. F.
,
Yin
J. X.
,
Zheng
Y.
&
Chen
H. T.
2023
Short-term runoff prediction optimization method based on BGRU-BP and BLSTM-BP neural networks
.
Water Resources Management
37
(
2
),
747
768
.
https://doi.org/10.1007/s11269-022-03401-z
.
Hu
C. H.
,
Wu
Q.
,
Li
H.
,
Jian
S. Q.
,
Li
N.
&
Lou
Z. Z.
2018
Deep learning with a long short-term memory networks approach for rainfall-runoff simulation
.
Water
10
(
11
).
https://doi.org/10.3390/w10111543
.
Hu
C. H.
,
Zhao
D.
&
Jian
S. Q.
2021
Baseflow estimation in typical catchments in the Yellow River Basin, China
.
Water Supply
21
(
2
),
648
667
.
https://doi.org/10.2166/ws.2020.338
.
Huang
Z. Q.
,
Zhao
T. T. G.
,
Liu
Y.
,
Zhang
Y. Y.
,
Jiang
T.
,
Lin
K. R.
&
Chen
X. H.
2020
Differing roles of base and fast flow in ensemble seasonal streamflow forecasting: An experimental investigation
.
Journal of Hydrology
591
.
https://doi:10.1016/j.jhydrol.2020.125272
.
Jiahai
L.
&
Aiyun
L.
2019
Monthly runoff prediction using wavelet transform and fast resource optimization network (fron) algorithm
.
Journal of Physics Conference Series
1302
,
042005
.
https://doi.org/10.1088/1742-6596/1302/4/042005
.
Jin
H. X.
,
Lu
H. P.
,
Zhao
Y.
,
Zhu
Z. Z.
,
Yan
W. J.
,
Yang
Q. Q.
&
Zhang
S. L.
2024
Integration of an improved transformer with physical models for the spatiotemporal simulation of urban flooding depths
.
Journal of Hydrology-Regional Studies
51
.
https://doi:10.1016/j.ejrh.2023.101627
.
Kisi
O.
2011
Wavelet regression model as an alternative to neural networks for river stage forecasting
.
Water Resources Management
25
(
2
),
579
600
.
https://doi.org/10.1007/s11269-010-9715-8
.
Kratzert
F.
,
Klotz
D.
,
Brenner
C.
,
Schulz
K.
&
Herrnegger
M.
2018
Rainfall-runoff modelling using long short-term memory (LSTM) networks
.
Hydrology and Earth System Sciences
22
(
11
),
6005
6022
.
https://doi.org/10.5194/hess-22-6005-2018
.
Lee
J.
,
Abbas
A.
,
McCarty
G. W.
,
Zhang
X. S.
,
Lee
S. C.
&
Cho
K. H.
2023
Estimation of base and surface flow using deep neural networks and a hydrologic model in two watersheds of the Chesapeake Bay
.
Journal of Hydrology
617
.
https://doi:10.1016/j.jhydrol.2022.128916
.
Liu
C. F.
,
Liu
D. R.
&
Mu
L.
2022a
Improved transformer model for enhanced monthly streamflow predictions of the Yangtze River
.
IEEE Access
10
,
58240
58253
.
https://doi.org/10.1109/access.2022.3178521
.
Liu
G. J.
,
Tang
Z. Y.
,
Qin
H.
,
Liu
S.
,
Shen
Q.
,
Qu
Y. H.
&
Zhou
J. Z.
2022b
Short-term runoff prediction using deep learning multi-dimensional ensemble method
.
Journal of Hydrology
609
.
https://doi.org/10.1016/j.jhydrol.2022.127762
.
Masafu
C.
&
Williams
R.
2024
Satellite video remote sensing for flood model validation
.
Water Resources Research
60
(
1
).
https://doi:10.1029/2023wr034545
.
Miao
J. D.
,
Zhang
X. M.
,
Zhao
Y.
,
Wei
T. X.
,
Yang
Z.
,
Li
P.
,
Zhang
Y. E.
,
Chen
Y. X.
&
Wang
Y. S.
2022
Evolution patterns and spatial sources of water and sediment discharge over the last 70 years in the Yellow River, China: A case study in the Ningxia Reach
.
Science of the Total Environment
838
.
https://doi:10.1016/j.scitotenv.2022.155952
.
Qin
J.
,
Liang
J.
,
Chen
T.
,
Lei
X.
&
Kang
A.
2019
Simulating and predicting of hydrological time series based on tensor flow deep learning
.
Polish Journal of Environmental Studies
28
(
2
),
795
802
.
https://doi.org/10.15244/pjoes/81557
.
Robertson
D. E.
,
Pokhrel
P.
&
Wang
Q. J.
2013
Improving statistical forecasts of seasonal streamflows using hydrological model output
.
Hydrology and Earth System Sciences
17
(
2
),
579
593
.
https://doi:10.5194/hess-17-579-2013
.
Samantaray
S.
,
Das
S. S.
,
Sahoo
A.
&
Satapathy
D. P.
2022
Monthly runoff prediction at Baitarani river basin by support vector machine based on Salp swarm algorithm
.
Ain Shams Engineering Journal
13
(
5
).
https://doi.org/10.1016/j.asej.2022.101732
.
Senthil Kumar
A. R.
,
Sudheer
K. P.
,
Jain
S. K.
&
& Agarwal
P. K.
2005
Rainfall-runoff modelling using artificial neural networks: Comparison of network types
.
Hydrological Processes
19
(
6
),
1277
1291
.
https://doi.org/10.1002/hyp.5581
.
Serinaldi
F.
,
Zunino
L.
&
Rosso
O. A.
2014
Complexity-entropy analysis of daily stream flow time series in the continental United States
.
Stochastic Environmental Research and Risk Assessment
28
(
7
),
1685
1708
.
https://doi.org/10.1007/s00477-013-0825-8
.
Shen
Y. T.
,
Li
Y.
,
Sun
J.
,
Ding
W. K.
,
Shi
X. J.
,
Zhang
L.
,
Shen
X. J.
&
He
J.
2019
Hashtag recommendation using LSTM networks with self-attention
.
Computers Materials & Continua
61
(
3
),
1261
1269
.
https://doi.org/10.32604/cmc.2019.06104
.
Tongal
H.
&
Booij
M. J.
2018
Simulation and forecasting of streamflows using machine learning models coupled with base flow separation
.
Journal of Hydrology
564
,
266
282
.
https://doi.org/10.1016/j.jhydrol.2018.07.004
.
Vaswani
A.
,
Shazeer
N.
,
Parmar
N.
,
Uszkoreit
J.
,
Jones
L.
,
Gomez
A. N.
,
Kaiser
L.
&
Polosukhin
I.
2017
Attention is all you need
. In:
31st Annual Conference on Neural Information Processing Systems (NIPS)
,
Long Beach, CA
.
Wang
Z.
,
Tian
J.
&
Feng
K.
2019
Research on runoff simulation in Ningxia section of the Yellow River basin based on improved SWAT model
.
Applied Ecology and Environmental Research
17
(
2
),
3483
3497
.
https://doi.org/10.15666/aeer/1702_34833497
.
Wang
X.
,
Wang
Y.
,
Yuan
P.
,
Wang
L.
&
Cheng
D.
2021
An adaptive daily runoff forecast model using VMD-LSTM-PSO hybrid approach
.
Hydrological Sciences Journal
66
(
9
),
1488
1502
.
https://doi.org/10.1080/02626667.2021.1937631
.
Xie
T.
,
Zhang
G.
,
Hou
J.
,
Xie
J.
,
Lv
M.
&
Liu
F.
2019
Hybrid forecasting model for non-stationary daily runoff series: A case study in the Han River Basin, China
.
Journal of Hydrology
577
.
https://doi.org/10.1016/j.jhydrol.2019.123915
.
Xu
Y. H.
,
Hu
C. H.
,
Wu
Q.
,
Jian
S. Q.
,
Li
Z. C.
,
Chen
Y. Q.
,
Zhang
G. D.
,
Zhang
Z. X.
&
Wang
S. L.
2022
Research on particle swarm optimization in LSTM neural networks for rainfall-runoff simulation
.
Journal of Hydrology
608
.
https://doi.org/10.1016/j.jhydrol.2022.127553
.
Xu
Y. H.
,
Lin
K. R.
,
Hu
C. H.
,
Wang
S. L.
,
Wu
Q.
,
Zhang
L.
&
Ran
G.
2023
Deep transfer learning based on transformer for flood forecasting in data-sparse basins
.
Journal of Hydrology
625
.
https://doi:10.1016/j.jhydrol.2023.129956
.
Yin
H. L.
,
Guo
Z. L.
,
Zhang
X. W.
,
Chen
J. J.
&
Zhang
Y. N.
2022a
RR-Former: Rainfall-runoff modeling based on transformer
.
Journal of Hydrology
609
.
https://doi.org/10.1016/j.jhydro-l.2022.127781
.
Yin
H. L.
,
Wang
F. D.
,
Zhang
X. W.
,
Zhang
Y. N.
,
Chen
J. J.
,
Xia
R. L.
&
Jin
J.
2022b
Rainfall-runoff modeling using long short-term memory based step-sequence framework
.
Journal of Hydrology
610
.
https://doi.org/10.1016/j.jhydrol.2022.127901
.
Yoon
H.
,
Jun
S. C.
,
Hyun
Y.
,
Bae
G. O.
&
Lee
K. K.
2011
A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer
.
Journal of Hydrology
396
(
1–2
),
128
138
.
https://doi.org/10.1016/j.jhydrol.2021.126067
.
Zhang
C.
,
Biś
D.
,
Liu
X.
&
He
Z.
2019
Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks
.
BMC Bioinformatics
20
.
https://doi.org/10.1186/s12859-019-3079-8
.
Zhang
J.
,
Chen
X.
,
Khan
A.
,
Zhang
Y. k.
,
Kuang
X.
,
Liang
X.
&
Nuttall
J
.
2021
Daily runoff forecasting by deep recursive neural network
.
Journal of Hydrology
596
.
https://doi.org/10.1016/j.jhydrol.2021.126067
.
Zhang
J. P.
,
Xiao
H. L.
&
Fang
H. Y.
2022a
Component-based reconstruction prediction of runoff at multi-time scales in the source area of the Yellow River based on the ARMA model
.
Water Resources Management
36
(
1
),
433
448
.
https://doi.org/10.1007/s11269-021-03035-7
.
Zhang
Y.
,
He
Y.
,
Mu
X.
,
Jia
L.
&
Li
Y.
2022b
Base-flow separation and character analysis of the Huangfuchuan Basin in the middle reaches of the Yellow River, China
.
Frontiers in Environmental Science
10
.
https://doi.org/10.3389/fenvs.2022.8311
.
Zhao
T.
,
Schepen
A.
&
Wang
Q. J.
2016
Ensemble forecasting of sub-seasonal to seasonal streamflow by a Bayesian joint probability modelling approach
.
Journal of Hydrology
541
,
839
849
.
https://doi:10.1016/j.jhydrol.2016.07.040
.
Zhao
G. Z.
,
Kong
L. Y.
,
Li
Y. L.
,
Xu
Y. Z.
&
Li
Z. P.
2022
Investigating historical baseflow characteristics and variations in the Upper Yellow River Basin, China
.
International Journal of Environmental Research and Public Health
19
(
15
).
https://doi:10.3390/ijerph19159267
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).