Accurate streamflow prediction is crucial for effective water resource management. However, reliable prediction remains a considerable challenge because of the highly complex, non-stationary, and non-linear processes that contribute to streamflow at various spatial and temporal scales. In this study, we utilized a convolutional neural network (CNN)–Transformer–long short-term memory (LSTM) (CTL) model for streamflow prediction, which replaced the embedding layer with a CNN layer to extract partial hidden features, and added an LSTM layer to extract correlations on a temporal scale. The CTL model incorporated Transformer's ability to extract global information, CNN's ability to extract hidden features, and LSTM's ability to capture temporal correlations. To validate its effectiveness, we applied it for streamflow prediction in the Shule River basin in northwest China across 1-, 3-, and 6-month horizons and compared its performance with Transformer, CNN, LSTM, CNN–Transformer, and Transformer–LSTM. The results demonstrated that CTL outperformed all other models in terms of predictive accuracy with Nash–Sutcliffe coefficient (NSE) values of 0.964, 0.912, and 0.856 for 1-, 3-, 6-month ahead prediction. The best results among the five comparative models were 0.908, 0.824, and 0.778, respectively. This indicated that CTL is an outstanding alternative technique for streamflow prediction where surface data are limited.

  • CTL integrated and absorbed the respective merits of Transformer, CNN, and LSTM.

  • CTL achieved exceptional accuracy, surpassing that of the benchmarked models.

  • CTL effectively predicted multi-time-ahead streamflow in arid northwest China.

Predicting streamflow accurately is critical for water resources management, electricity generation, and ecosystem conservation (Evaristo & Mcdonnell 2019; Wagena et al. 2020; Messager et al. 2021). In arid regions, streamflow is often the only source of water supplies (Lan et al. 2012; Merritt et al. 2021; Wang et al. 2021). However, streamflow, influenced by factors such as geographic location, topographic conditions, climate change, and human activities, is highly complex, non-linear, non-stationary, and multi-scale (Chang & Tsai 2016; Jahangir et al. 2023). Moreover, the scarcity of meteorological and hydrological stations in arid regions results in a lack of data, which impacts the accuracy of streamflow prediction (Gao et al. 2021). As a result, streamflow prediction can be extremely difficult and challenging in arid regions.

Several methods have been used to predict streamflow. Physical models, like SWAT, TOPMODEL, VIC, WRF-Hydro, are commonly applied. Physical models precisely depict hydrological processes by utilizing complicated mathematical techniques and a thorough grasp of the local hydrological system (Qi et al. 2020; Myers et al. 2021). Thus, the predicting accuracy of streamflow is strongly determined by the physical parameters of the model and the initial configuration of the region (Tan et al. 2020; Yang et al. 2020b). Streamflow, on the other hand, is a complicated, non-linear, and non-stationary process driven by meteorological elements like precipitation, temperature, and evapotranspiration, as well as underlying surface features like landform, slope, and land use (Wen et al. 2019; Wang et al. 2023). It may be challenging for physical models to accurately model streamflow using only the simplified water balance equation (Fidal & Kjeldsen 2020; Huo et al. 2020). The use of physical models may be hampered in places with variable underlying surface characteristics and a scarcity of meteorological stations (Cho & Kim 2022). In the arid regions of northwestern China, where glaciers cover a considerable area, streamflow prediction is complicated by the presence of glacial permafrost (Zhou et al. 2021b). Existing physical hydrological models have a high degree of uncertainty in describing the hydrological effects of glaciers and permafrost, resulting in a substantial error in streamflow prediction (Zhang et al. 2018).

Machine learning techniques such as support vector regression, neural networks, and extreme learning machines have been widely applied in streamflow prediction in recent decades (Gharib & Davies 2021; Ibrahim et al. 2022). These machine learning models often outperform physical models in modeling the non-linear streamflow process without requiring knowledge of the local hydrological system (Wu et al. 2021; Qiu et al. 2022). However, machine learning models are often too simple to perform deep feature extraction (Han et al. 2021b; Mei et al. 2022). Thus, they are still considered as ‘shallow learning’ models with limited ability to handle the stochastic features of the streamflow time series (Liu et al. 2022a).

With the rapid development of artificial intelligence, the ‘shallow learning’ problem has been solved by deep learning algorithms (a new branch of machine learning) (Li et al. 2022a). The deep learning models (e.g., convolutional neural network (CNN); long short-term memory (LSTM)) have been extensively applied in dealing with more complex tasks in pattern recognition, signal processing, and time series prediction (Apaydin et al. 2021; Zhou & Jiao 2022). Various deep learning models have been applied to streamflow prediction and show great potential (Xu et al. 2022; Zhang & Yan 2023). CNN and LSTM are well-known time series modeling algorithms (Barzegar et al. 2021; Jiang & Jafarpour 2021; Wang & Zai 2023). CNN has powerful partial feature extraction capability and is sensitive to sparse data, allowing it to effectively extract more complex and deeper hidden data features (Kabir et al. 2020; Wang et al. 2020; Adaryani et al. 2022). LSTM has an embedded recurrent structure, which allows it to remember previous information and capture temporal dynamics (Cho & Kim 2022; Ren et al. 2022; Xu et al. 2022). Moreover, LSTM regulates the memory cells information flow via non-linear gating units and relies on the learned weights to control the length of dependence, effectively alleviating the gradient disappearance problem (Kao et al. 2020; Yin et al. 2022b).

Nowadays, a newly developed deep learning model called Transformer has attracted our attention due to its unprecedented performance in machine translation (Huang et al. 2021; Nguyen et al. 2021). Relying on the attention mechanism, Transformer has a longer memory than CNN and LSTM to effectively obtain global information, and the multi-headed attention mechanism also gives the model strong expressive power (Han et al. 2021a; Bai & Tahmasebi 2022). Transformer is more flexible in extracting highly complex, non-linear, and non-stationary features from time series, and has achieved good performance in streamflow prediction. For example, Yin et al. (2022a) applied Transformer to multi-daily-ahead streamflow prediction in the Catchment Attributes and Meteorology for Large-Sample Studies (CAMELS) dataset and obtained good predicting results. However, the disadvantages of Transformer are also outstanding: first, Transformer is relatively weak at capturing partial features as completely abandoning the CNNs (Li et al. 2019; Hua et al. 2022). Secondly, Transformer cannot substitute for modeling on the temporal scale since positional coding is an artificially created index that does not reasonably characterize positional information (Zhou et al. 2021a; Yang 2022). Thirdly, the layer normalization between residuals truncates the gradient flow, which may cause gradient disappearance when applying Transformer (Liu et al. 2020; Xu et al. 2020). Thus, improving the structure of the standalone Transformer may not be enough to solve these structural characteristics caused problems. Moreover, the streamflow sequences usually show a high degree of non-stationarity, mainly in the form of obvious shifts in mean, variance or shape (Slater et al. 2021; Liu et al. 2023). Standalone deep learning models are often unable to effectively extract the highly complex and non-smooth information (Zuo et al. 2020; Liu et al. 2022b; Zhou et al. 2023). When optimizing and improving a standalone model does not improve the predicting accuracy, a reasonable combination of several standalone models to give full advantage to each standalone model and thus improve the overall predicting effect of the model could be a practical and feasible approach (Huang & Kuo 2018; Atila & Sengur 2021; Mei et al. 2022).

What triggered us to think about was whether it is possible to integrate the advantages of CNN and LSTM with Transformer and to establish an improved but powerful model for streamflow prediction. That was, firstly, to replace the embedding layer of the Transformer with a CNN layer, enabling partially hidden feature extraction before input to the encoder and decoder; secondly, to add an LSTM layer before the output of the decoder, incorporating temporal dimensional modeling and mitigating the gradient disappearance problem; thirdly, to integrate Transformer with CNN and LSTM for hydrological modeling and to propose the CNN–Transformer–LSTM (CTL) model. The proposed CTL model combined the global information extraction capability of Transformer, the partially hidden feature extraction capability of CNN, and the temporal correlation modeling capability of LSTM, thus enabling the modeling of highly complex, non-linear, and multi-scale streamflow processes. Accordingly, the following improvements could be theoretically achieved: (1) by replacing the Transformer's embedding layer with a CNN layer, the comparatively limited ability of the Transformer to extract partial information could be addressed. Consequently, the non-linear, non-stationary information of streamflow on the spatial scale could be more effectively captured. (2) By adding an LSTM layer before the output of the decoder, the lack of modeling on the temporal scale and gradient disappearance in the Transformer could be addressed. As a result, it is possible to stably extract the trend and periodic aspects of the streamflow series.

Although Transformer has some applications in streamflow prediction, there are still some areas that need to be perfected: (1) Transformer is derived from the field of natural language processing, and is not a natural time series prediction model (Yin et al. 2023). Most of the studies only focus on the advantages of Transformer, and do not analyze and discuss the disadvantages of Transformer (Mellouli et al. 2022; Xu et al. 2023a, 2023b). (2) The researches on the combination of Transformer and other classical deep learning models are not perfect enough. In this study, we took model hybridization as the theoretical basis, and carried out model combination after analyzing the advantages and disadvantages of Transformer and CNN, LSTM in detail, to explore the optimal combination of deep learning models. Therefore, the purpose of this study is to establish a deep learning-based CTL approach for multi-time-ahead streamflow prediction in an arid region. To achieve this:

  • (1)

    A hybrid CTL model was constructed by combining the Transformer with CNN and LSTM.

  • (2)

    The potential of CTL in 1-, 3-, and 6-month ahead streamflow prediction in an arid region was tested in the upstream of the Shule River, northwest China.

  • (3)

    The efficiency of the CTL model was compared with two hybrid models, including CNN-coupled Transformer model (CNN–Transformer, CT), and Transformer-coupled LSTM model (Transformer–LSTM, TL), as well as three standalone models, including Transformer, CNN, and LSTM.

Study area

The Shule River, which originated from the Qilian Mountains in the northwestern Tiber Plateau, is the second-largest river in the Hexi Corridor of Gansu Province, northwest China. The Shule River basin is located in an arid region with an average annual rainfall less than 60 mm, and evaporation of 1,500–3,000 mm (Ma et al. 2019). The Shule River basin can be split into distinctive zones related to streamflow generation, use, and dissipation due to the specific hydrological characteristics of arid regions (He et al. 2015). The upper reach of the Shule River basin (38.3°–39.9°N, 96.6°–99.0°E) refers to the river's mountainous area with an elevation of about 2,100–5,750 m (Figure 1).
Figure 1

Location of the Shule River basin, the hydrological and meteorological stations.

Figure 1

Location of the Shule River basin, the hydrological and meteorological stations.

Close modal

Streamflow in the Shule River is predominantly replenished by glacial meltwater and precipitation (Zhang et al. 2022). As current glaciers are widely developed in mountains above 4,500 m, the upper reach is responsible for creating streamflow. Permafrost covers 9,447 km2 of the upper Shule River basin, accounting for 83% of the entire basin area (Zhou et al. 2021b). Glacial meltwater contributes to over 80% of yearly streamflow, with a maximum daily streamflow of 415 m³/s (Lan et al. 2012). The presence of glaciers and permafrost complicates the streamflow generating processes, leading to more non-linear and non-stationary streamflow series on spatial and temporal scales (Zhou et al. 2021b). Coupled with the effects of precipitation, streamflow in the upper Shule River reaches its maximum in July and August (Zhang et al. 2022).

The Shule River maintains local living and society, forming the famous Dunhuang Oasis and being the nourishment of the pearl cities (i.e., Dunhuang, Yumen, Guazhou) on the Silk Economic Road (Guo et al. 2015; Xie et al. 2022). The quantity and quality of water in the upper Shule River have a significant impact on the long-term development of oases and cities in the middle and lower sections, reaches (Zhang et al. 2003; He et al. 2015). Given the importance of these cities on the Silk Road Economic Belt, efficient water resource management and long-term environmental promotion are critical (Zhang et al. 2009). While the key step toward achieving these goals is the accurate prediction of the upper Shule River streamflow.

Transformer

The Transformer model is a deep learning algorithm widely used in natural language processing. Unlike convolutional and recurrent neural networks, Transformer is based entirely on the attention mechanism (Li et al. 2020). This mechanism calculates the attention value using an attention function (like weight) for each element of the input, which strengthens or weakens the connection of two arbitrary positions in the time series (Sun et al. 2021) (Figure 2).
Figure 2

Calculation process of the attention function.

Figure 2

Calculation process of the attention function.

Close modal
Before computing the attention value, a set of query vectors, key vectors, and value vectors based on the input vectors needs to be created (Yin et al. 2022a):
formula
(1)
formula
(2)
formula
(3)
where X denotes the input vector, and denote the query vector and the corresponding weight matrix, and denote the key vector and the corresponding weight matrix, and denote the value vector and the corresponding weight matrix.
The attention values calculating process can be transformed into a matrix operation (Vaswani et al. 2017):
formula
(4)
where dk is the dimension of each element in the input vector, which is the dimension of each word embedding obtained in positional encoding and embedding. Note that if the query vector, key vector, and value vector are from the same sequence, the attention mechanism can be called self-attention, and vice versa, it can be called cross-attention. If there are multiple sets of query vector, key vector, and value vector, the final generated attention values will be spliced and given an additional weight matrix to maintain dimensionality consistency, which is known as the multi-headed attention mechanism.

The CNN

The CNN is a deep feedforward neural network that contains convolutional operations with sparse connectivity and weight-sharing characteristics. The CNN consists of three layers, the convolutional layer to extract partial features in the inputs, the pooling layer to reduce the dimensionality of the target variable, and the fully connected layer to output the desired results, which is similar to the part of the traditional neural network (Adaryani et al. 2022).

The input of a convolutional layer typically contains multiple feature planes, each of which contains rectangularly arranged neurons, the neurons in the same feature plane share weights, which are known as convolutional kernels (or filters). The convolution operation is as follows (Li et al. 2022b):
formula
(5)
formula
(6)
where X and Y denote the input and output matrices, respectively; W is the weight matrix of the convolution kernel; M is the matrix generated by convolution; f represents the activation function, and b is the bias matrix.

Long short-term memory

LSTM is a variant of the general recurrent neural network that implements a unique gate mechanism, consisting of the input gate, output gate, and forgetting gate. This mechanism helps manage the back-propagation gradient flow and the long and short dependencies of information. As a result, LSTM can remember previous information and capture temporal dynamics (Cho & Kim 2022; Xu et al. 2022). Moreover, LSTM can effectively solve the problem of gradient disappearance during the training of long-time sequences (Yin et al. 2022b) (Figure 3). The information flow transfer mechanism of LSTM can be described as follows (Alizadeh et al. 2021):
formula
(7)
where ft determines the retention ratio of the cell state of the previous moment, Xt denotes the input of the current moment, and is the hidden state of the previous moment.
formula
(8)
formula
(9)
formula
(10)
Figure 3

The architecture of the LSTM network.

Figure 3

The architecture of the LSTM network.

Close modal
The input gate includes and it, denotes the current state information, it determines the retention ratio of the current state information, and and represent the cell state of the previous moment and the current moment.
formula
(11)
formula
(12)
where ht is the hidden state of the current moment, the output gate includes Ot, Ot and Ct form the hidden state of the current moment output, is the sigmoid activation function, tanh denotes the hyperbolic sine activation function, is conventional matrix multiplication, denotes the Hadamard product of matrices.

CNN–Transformer–LSTM

The framework of the CTL model is shown in Figure 4. The following three main stages were mainly included:
Figure 4

The architecture of the CTL network.

Figure 4

The architecture of the CTL network.

Close modal

Step 1: The embedding layer of the Transformer was replaced with a CNN layer to enhance the extraction of partial hidden features. This resulted in a new CT model that combined the Transformer model with the CNN model.

Step 2: The Transformer model was modified for streamflow prediction, with the predicted streamflow set as the final output instead of the probability. The encoder of the Transformer received historical meteorological data as input, while the decoder received historical streamflow data. This model, with streamflow as the output, was considered the standard Transformer in this study.

Step 3: The output of Step 2 was used as input for the LSTM model. The final streamflow predicting results were obtained using the linear layer and the sigmoid activation function. The LSTM layer can extract trends and periodic features of streamflow data on a temporal scale. If there was no Step 1 and only the standard Transformer and the LSTM were considered, the model was called TL.

All models were implemented using the Pytorch framework in Python 3.9. We employed grid search to determine the hyperparameters, and the best sets of parameters are shown in Supplementary material, Appendix.

Data collection and pre-processing

Streamflow in the upper reach of the Shule River is a complex phenomenon, influenced by a range of factors including temperature, precipitation, glacier, frozen soil, and others (Hong et al. 2016).Therefore, it is necessary to use related variables as forcing data to develop models. However, due to the challenging climate of the upper Shule River basin and the limits of observational techniques, data collection is difficult (Zhang et al. 2018). Therefore, the modeling in this study, consistent with many previous studies, used only streamflow and meteorological data (Gao et al. 2021; Lin et al. 2021; Cho & Kim 2022).

Historical monthly streamflow data (Q) from January 1958 to December 2017, collected from the Changmabao hydrological station, was collected. Meteorological parameters, including air temperature (T), the ground temperature at 0 m (E), sunshine duration (S), and precipitation (P) of the Tuole station with the same timespan as streamflow were gathered from the National Weather Science Data Center of China (http://data.cma.cn/). Note that the precipitation, the streamflow, air temperature, ground temperature at 0 m, and sunshine duration were averaged to obtain the respective monthly values to meet the monthly streamflow predicting purpose since all the derived data were at a daily scale.

The whole dataset was divided into a training set, ranging from January 1958 to December 2007, and a testing set, ranging from January 2008 to December 2017. Table 1 presents the statistical statistics of the streamflow and the meteorological data for the whole, the training, and the testing datasets. The statistical parameters clearly demonstrated that no significant difference existed between the training and testing sets, indicating the similarity between the two datasets. Additionally, the statistical feature of the training dataset was consistent with the whole dataset, indicating that the training dataset captured enough reliable information about the hydrological system being modeled and was capable of training predictive models.

Table 1

Statistical parameters of streamflow and meteorological data in each dataset

VariableDatasetMaxMinMeanStdSKCV
Streamflow (m³/s) All 222 4.16 32.09 32.96 2.15 1.03 
Training 175 4.16 29.70 30.37 2.13 1.02 
Testing 222 10.50 44.16 41.95 1.86 0.95 
Air temperature (°C) All 13.41 −23.72 −2.46 9.96 −0.18 −4.04 
Training 13.02 −23.72 −2.66 9.97 −0.18 −3.75 
Testing 13.41 −18.63 −1.39 9.86 −0.20 −7.09 
Ground temperature at 0 m (°C) All 19.47 −23.64 1.38 11.25 −0.25 8.12 
Training 18.17 −23.64 1.15 11.29 −0.24 9.80 
Testing 19.47 −16.99 2.67 10.97 −0.29 4.09 
Sunshine duration (h) All 10.81 5.75 8.17 0.86 0.08 0.11 
Training 10.81 5.75 8.15 0.86 0.07 0.10 
Testing 10.34 6.39 8.27 0.87 0.13 0.10 
Precipitation (mm) All 164.60 24.07 31.55 1.45 1.31 
Training 132.40 23.03 29.95 1.39 1.30 
Testing 164.60 29.53 38.37 1.42 1.29 
VariableDatasetMaxMinMeanStdSKCV
Streamflow (m³/s) All 222 4.16 32.09 32.96 2.15 1.03 
Training 175 4.16 29.70 30.37 2.13 1.02 
Testing 222 10.50 44.16 41.95 1.86 0.95 
Air temperature (°C) All 13.41 −23.72 −2.46 9.96 −0.18 −4.04 
Training 13.02 −23.72 −2.66 9.97 −0.18 −3.75 
Testing 13.41 −18.63 −1.39 9.86 −0.20 −7.09 
Ground temperature at 0 m (°C) All 19.47 −23.64 1.38 11.25 −0.25 8.12 
Training 18.17 −23.64 1.15 11.29 −0.24 9.80 
Testing 19.47 −16.99 2.67 10.97 −0.29 4.09 
Sunshine duration (h) All 10.81 5.75 8.17 0.86 0.08 0.11 
Training 10.81 5.75 8.15 0.86 0.07 0.10 
Testing 10.34 6.39 8.27 0.87 0.13 0.10 
Precipitation (mm) All 164.60 24.07 31.55 1.45 1.31 
Training 132.40 23.03 29.95 1.39 1.30 
Testing 164.60 29.53 38.37 1.42 1.29 

Note: Max is the maximum; Min is the minimum; Std is the standard deviation; SK is the skewness; CV is the coefficient of variation.

To eliminate the dimension difference, each variable (Q, T, E, S, P) at the training and testing sets was separately scaled before model training by using the following equation:
formula
(13)
where W and W* are the original and normalized data series, respectively. Wmin and Wmax are the minimum and maximum value of the original data series, respectively.

Determining inputs

Streamflow and meteorological data, including Q, T, E, S, and P, from January 1958 to December 2017, were combined as inputs for the CTL, Transformer, CNN, LSTM, CT, and TL models for 1-, 3-, and 6-month ahead streamflow predictions.

The selection of an appropriate input combination is an important step in machine learning-based modeling, as the input contains the basic information of the system. Although many researchers have proposed methods for determining the lag of inputs, there is no universal method. In this study, the maximum time lag of the inputs was determined to be 6, meaning that input data lagged from 6 months (t − 5) to 1 month (t) were used for 1-, 3-, and 6-month ahead streamflow predictions. The input structure of the CTL, Transformer, LSTM, CNN, TL, and CL models was then set accordingly:
formula
(14)
where is the predicted streamflow at n-month ahead (/s), n = 1, 3, 6. T is the temperature (°C), E represents the ground temperature at 0 m (°C), S denotes sunshine duration (hour), and P is precipitation (mm).

Performance evaluation

In this study, four quantitative metrics, including Nash-Sutcliffe efficiency coefficient (NSE), Pearson correlation coefficient (R), mean absolute error (MAE), and root mean square error (RMSE) were used to evaluate the predicting performance of the CTL, Transformer, LSTM, CNN, CT, and TL models. NSE ranges between −∞ and 1, indicating the predicting ability of a model relative to the mean of the observations. The higher the NSE value, the better predicting ability of the model. Typically, a predicting result is acceptable when NSE ≥ 0.5 (Moriasi et al. 2007). R measures the linear relation between the predicted and observed streamflow with values ranging from −1.0 to 1.0. The closer the absolute value of R to 1, the higher the degree of linear correlation. RMSE reflects the overall fitness of the predictive model while MAE provides a more balanced perspective of the goodness-of-fit of the model. The closer the RMSE and MAE values are to 0, the better the fit. A model is considered the best when R = 1, NSE = 1, RMSE = 0, and MAE = 0, (Moriasi et al. 2007). The following equations were used to calculate these metrics:
formula
(15)
formula
(16)
formula
(17)
formula
(18)
where n is the number of the data; Qo and Qp denote the observed and predicted streamflow, respectively; and represent the mean of the observed and predicted streamflow, respectively.
Additionally, various plots, such as hydrographs, scatter plots, boxplots, and Taylor diagrams, were also presented to illustrate the performance of these models. Firstly, we visually evaluated the temporal correspondence between the observed and predicted streamflow using process lines and scatter plots, and boxplots were used to describe the variability and dispersion of data errors. Secondly, we further illustrated the performance metrics (including the central RMSE, Pearson correlation coefficient, and standard deviation ratio) in a polar coordinate space (Taylor 2001). In a Taylor diagram, the radius (polar diameter) indicates the standard deviation ratio between the observed and predicted data; the arc represents the correlation coefficient; the arc with ‘REF’ at the center of the circle corresponds to the central RMSE. The formula of the central RMSE is as follows:
formula
(19)
where and represent the standard deviation of the observed data and predicted data, respectively; R is the Pearson correlation coefficient. The closer the central RMSE to 0, the smaller the standard deviation of the observed and predicted datasets would be.

In this study, the CTL, Transformer, CNN, LSTM, CT, and TL models were employed for streamflow prediction at the Changmabao hydrological station in the Shule River basin. Tables 24 show the statistical values of the performance metrics derived by the models for 1-, 3-, and 6-month ahead streamflow predictions in the training and testing periods. The performance of the training and testing periods were found to be remarkably similar. Considering the predicting purpose of this study, this section mainly focused on the testing phase.

Table 2

Performance of the CTL, Transformer, CNN, LSTM, CT and TL models in training and testing periods for 1-month ahead streamflow prediction

Training period
Testing period
NSERMAE (m³/s)RMSE (m³/s)NSERMAE (m³/s)RMSE (m³/s)
CTL 0.977 0.989 2.864 4.518 0.964 0.983 4.930 7.717 
Transformer 0.825 0.908 7.009 12.540 0.783 0.892 10.228 19.028 
CNN 0.830 0.911 7.310 12.377 0.756 0.902 12.486 20.170 
LSTM 0.851 0.922 6.585 11.582 0.815 0.908 10.349 17.563 
CT 0.859 0.927 6.282 11.264 0.845 0.925 9.315 16.051 
TL 0.951 0.975 4.547 6.636 0.908 0.958 8.496 12.403 
Training period
Testing period
NSERMAE (m³/s)RMSE (m³/s)NSERMAE (m³/s)RMSE (m³/s)
CTL 0.977 0.989 2.864 4.518 0.964 0.983 4.930 7.717 
Transformer 0.825 0.908 7.009 12.540 0.783 0.892 10.228 19.028 
CNN 0.830 0.911 7.310 12.377 0.756 0.902 12.486 20.170 
LSTM 0.851 0.922 6.585 11.582 0.815 0.908 10.349 17.563 
CT 0.859 0.927 6.282 11.264 0.845 0.925 9.315 16.051 
TL 0.951 0.975 4.547 6.636 0.908 0.958 8.496 12.403 

Note: NSE is Nash–Sutcliffe efficiency coefficient; R is correlation coefficient; MAE is mean absolute error; RMSE is root mean square error.

Table 3

Performance of the CTL, Transformer, CNN, LSTM, CT and TL models in training and testing periods for 3-month ahead streamflow prediction

Training period
Testing period
NSERMAE (m³/s)RMSE (m³/s)NSERMAE (m³/s)RMSE (m³/s)
CTL 0.933 0.966 4.461 7.779 0.912 0.956 7.520 12.354 
Transformer 0.796 0.892 7.193 13.581 0.735 0.875 11.355 21.489 
CNN 0.809 0.900 7.764 13.132 0.739 0.872 12.557 21.320 
LSTM 0.808 0.899 7.147 13.166 0.755 0.884 11.528 20.639 
CT 0.838 0.916 6.319 12.091 0.788 0.893 11.083 19.222 
TL 0.879 0.938 6.035 10.467 0.824 0.923 11.626 17.487 
Training period
Testing period
NSERMAE (m³/s)RMSE (m³/s)NSERMAE (m³/s)RMSE (m³/s)
CTL 0.933 0.966 4.461 7.779 0.912 0.956 7.520 12.354 
Transformer 0.796 0.892 7.193 13.581 0.735 0.875 11.355 21.489 
CNN 0.809 0.900 7.764 13.132 0.739 0.872 12.557 21.320 
LSTM 0.808 0.899 7.147 13.166 0.755 0.884 11.528 20.639 
CT 0.838 0.916 6.319 12.091 0.788 0.893 11.083 19.222 
TL 0.879 0.938 6.035 10.467 0.824 0.923 11.626 17.487 

Note: NSE is Nash–Sutcliffe efficiency coefficient; R is correlation coefficient; MAE is mean absolute error; RMSE is root mean square error.

Table 4

Performance of the CTL, Transformer, CNN, LSTM, CT and TL models in training and testing periods for 6-month ahead streamflow prediction

Training period
Testing period
NSERMAE (m³/s)RMSE (m³/s)NSERMAE (m³/s)RMSE (m³/s)
CTL 0.864 0.930 6.379 11.121 0.856 0.929 9.703 15.818 
Transformer 0.780 0.884 7.935 14.128 0.717 0.882 12.241 22.151 
CNN 0.808 0.899 8.086 13.210 0.679 0.878 15.212 23.605 
LSTM 0.781 0.884 7.871 14.082 0.733 0.874 11.744 21.550 
CT 0.816 0.903 7.316 12.919 0.778 0.891 11.325 19.634 
TL 0.818 0.904 7.162 12.852 0.777 0.885 12.742 19.697 
Training period
Testing period
NSERMAE (m³/s)RMSE (m³/s)NSERMAE (m³/s)RMSE (m³/s)
CTL 0.864 0.930 6.379 11.121 0.856 0.929 9.703 15.818 
Transformer 0.780 0.884 7.935 14.128 0.717 0.882 12.241 22.151 
CNN 0.808 0.899 8.086 13.210 0.679 0.878 15.212 23.605 
LSTM 0.781 0.884 7.871 14.082 0.733 0.874 11.744 21.550 
CT 0.816 0.903 7.316 12.919 0.778 0.891 11.325 19.634 
TL 0.818 0.904 7.162 12.852 0.777 0.885 12.742 19.697 

Note: NSE is Nash–Sutcliffe efficiency coefficient; R is correlation coefficient; MAE is mean absolute error; RMSE is root mean square error.

Performance of the CTL model in 1-, 3-, 6-month ahead streamflow predictions

At a 1-month prediction horizon, the CTL model performed excellently, with high values of NSE and R, while maintaining relatively small RMSE and MAE values (Table 2). Specifically, the CTL model achieved an NSE value of 0.964, far exceeding the acceptable limit of 0.5; the R-value reached 0.983, indicating a very high correlation between the observed and predicted streamflow; and the MAE and RMSE values were only 4.930 and 7.717 m³/s, respectively, indicating a small error between the observed and predicted streamflow.

However, when considering predicting performance at longer time horizons, the accuracy for 3-, and 6-month ahead predictions was considerably lower than that for 1-month ahead prediction (Tables 24). This indicates that the accuracy of CTL deteriorated as the predictive horizon increased. This phenomenon was consistent with similar machine learning-based hydrological predicting studies at multi-time scales (Barzegar et al. 2020; Yin et al. 2022a). This is probably because the fact that the extraction of the non-linear relationship among the variables became more difficult as the correlation of the historical input data became more demanding with the predictive lead time increased (Barzegar et al. 2020). Although the predicting performance deteriorated, the results of CTL for 3- and 6-month ahead streamflow predictions greatly surpassed the threshold of acceptable performance requirement (NSE = 0.5). In summary, the CTL model can be regarded as an excellent model for 1-, 3-, and 6-month streamflow predictions.

The hydrographs and scatter plots help to visually assess the temporal correspondence of the observed and predicted streamflow. It can be seen from Figures 5 to 7 that the prediction of the CTL model followed the same trend as the observed streamflow, indicating that the CTL model could capture the changing pattern of the streamflow. In addition, the scatters of CTL were tightly clustered around the 1:1 line. CTL yielded k (>0.861) and R² (>0.863) close to 1 and b (<2.608) close to 0, demonstrating the strong correlation between the predicted and observed streamflow, indicating a high level of predicting accuracy of streamflow by the CTL model.
Figure 5

The observed and predicted streamflows of the CTL, Transformer, CNN, LSTM, CT, and TL models for 1-month ahead prediction obtained in the testing period.

Figure 5

The observed and predicted streamflows of the CTL, Transformer, CNN, LSTM, CT, and TL models for 1-month ahead prediction obtained in the testing period.

Close modal
Figure 6

The observed and predicted streamflows of the CTL, Transformer, CNN, LSTM, CT, and TL models for 3-month ahead prediction obtained in the testing period.

Figure 6

The observed and predicted streamflows of the CTL, Transformer, CNN, LSTM, CT, and TL models for 3-month ahead prediction obtained in the testing period.

Close modal
Figure 7

The observed and predicted streamflows of the CTL, Transformer, CNN, LSTM, CT, and TL models for 6-month ahead prediction obtained in the testing period.

Figure 7

The observed and predicted streamflows of the CTL, Transformer, CNN, LSTM, CT, and TL models for 6-month ahead prediction obtained in the testing period.

Close modal

Efficient water resource management requires accurate peak streamflow prediction. To evaluate the performance of the developed models for peak streamflow prediction, the mean values of the top ten ranked streamflow data points were selected. According to Tables 57, CTL underpredicted the peak flow in most cases. For 1-month ahead streamflow prediction, the average value of the top ten ranked streamflow was higher (5.44 m³/s) than the predicted value of CTL. When considering the results for 3-, and 6-month ahead predictions, the underpredicting phenomenon persisted as CTL underestimated about 14.11 and 20.5 m³/s, respectively. The results demonstrated a small error encountered in the peak flows generated by the CTL model at 1-, 3- and 6-monthly timescales.

Table 5

Top 10 ranked streamflow observations and the corresponding predictions derived by the CTL, Transformer, CNN, LSTM, CT, TL models in the testing phase at 1-month ahead

Observed streamflow (m³/s)Predicted streamflow (m³/s)
CTLTransformerCNNLSTMCTTL
222 186.08 132.91 123.52 117.40 142.44 169.55 
184 170.42 101.74 119.21 125.84 130.93 169.29 
143 135.17 113.95 75.29 114.01 158.99 137.01 
142 146.89 43.09 96.70 93.29 126.11 144.69 
138 132.48 109.56 83.43 108.58 94.47 126.97 
135 134.32 155.99 136.31 148.73 140.15 140.13 
130 129.33 118.59 105.76 119.75 149.57 125.80 
130 128.12 118.14 106.69 107.57 122.66 121.58 
123 127.26 92.88 85.91 106.65 83.99 125.38 
117 119.58 152.38 135.43 77.31 113.02 122.40 
Mean 146.4 140.96 113.92 106.83 111.91 126.23 138.28 
Difference 5.44 32.48 39.57 34.49 20.17 8.12 
Observed streamflow (m³/s)Predicted streamflow (m³/s)
CTLTransformerCNNLSTMCTTL
222 186.08 132.91 123.52 117.40 142.44 169.55 
184 170.42 101.74 119.21 125.84 130.93 169.29 
143 135.17 113.95 75.29 114.01 158.99 137.01 
142 146.89 43.09 96.70 93.29 126.11 144.69 
138 132.48 109.56 83.43 108.58 94.47 126.97 
135 134.32 155.99 136.31 148.73 140.15 140.13 
130 129.33 118.59 105.76 119.75 149.57 125.80 
130 128.12 118.14 106.69 107.57 122.66 121.58 
123 127.26 92.88 85.91 106.65 83.99 125.38 
117 119.58 152.38 135.43 77.31 113.02 122.40 
Mean 146.4 140.96 113.92 106.83 111.91 126.23 138.28 
Difference 5.44 32.48 39.57 34.49 20.17 8.12 
Table 6

Top 10 ranked streamflow observations and the corresponding predictions derived by the CNN, Transformer, LSTM, CT, TL, and CTL models in the testing phase at 3-month ahead

Observed streamflow (m³/s)Predicted streamflow (m³/s)
CTLTransformerCNNLSTMCTTL
222 167.08 90.79 101.08 104.37 137.55 154.25 
184 154.89 112.50 115.27 131.70 136.72 175.32 
161 160.04 106.61 119.93 96.38 105.53 88.47 
142 117.13 116.45 111.30 131.97 144.11 98.97 
141 149.93 131.16 146.31 73.11 138.96 105.33 
138 127.38 85.63 114.13 116.40 110.92 83.41 
135 169.39 123.26 130.21 146.45 142.67 182.92 
130 141.66 92.34 97.34 108.83 108.24 121.95 
130 101.61 107.14 90.19 129.57 132.84 136.70 
123 102.80 119.67 114.85 72.64 152.62 142.53 
Mean 150.6 139.19 108.55 114.06 111.14 131.02 128.99 
Difference 11.41 42.05 36.54 39.46 19.58 21.61 
Observed streamflow (m³/s)Predicted streamflow (m³/s)
CTLTransformerCNNLSTMCTTL
222 167.08 90.79 101.08 104.37 137.55 154.25 
184 154.89 112.50 115.27 131.70 136.72 175.32 
161 160.04 106.61 119.93 96.38 105.53 88.47 
142 117.13 116.45 111.30 131.97 144.11 98.97 
141 149.93 131.16 146.31 73.11 138.96 105.33 
138 127.38 85.63 114.13 116.40 110.92 83.41 
135 169.39 123.26 130.21 146.45 142.67 182.92 
130 141.66 92.34 97.34 108.83 108.24 121.95 
130 101.61 107.14 90.19 129.57 132.84 136.70 
123 102.80 119.67 114.85 72.64 152.62 142.53 
Mean 150.6 139.19 108.55 114.06 111.14 131.02 128.99 
Difference 11.41 42.05 36.54 39.46 19.58 21.61 
Table 7

Top 10 ranked streamflow observations and the corresponding predictions derived by the CNN, Transformer, LSTM, CT, TL, and CTL models in the testing phase at 6-month ahead

Observed streamflow (m³/s)Predicted streamflow (m³/s)
CTLTransformerCNNLSTMCTTL
222 175.03 102.20 123.98 114.10 153.55 166.37 
184 137.46 95.23 82.72 102.35 154.48 152.33 
161 136.47 105.38 57.30 54.07 57.46 135.97 
142 91.16 89.43 96.55 99.52 134.24 81.18 
141 152.08 113.43 105.91 128.68 150.28 151.64 
138 111.18 91.95 112.40 138.60 119.79 91.10 
135 151.43 109.72 119.98 140.81 148.20 178.95 
130 104.11 91.44 112.90 120.79 102.30 87.67 
130 84.94 93.56 79.75 99.43 108.59 91.06 
123 157.17 97.62 71.59 77.01 154.79 145.67 
Mean 150.6 130.10 99.00 96.31 107.53 128.37 128.19 
Difference 20.50 51.60 54.29 43.07 22.23 22.41 
Observed streamflow (m³/s)Predicted streamflow (m³/s)
CTLTransformerCNNLSTMCTTL
222 175.03 102.20 123.98 114.10 153.55 166.37 
184 137.46 95.23 82.72 102.35 154.48 152.33 
161 136.47 105.38 57.30 54.07 57.46 135.97 
142 91.16 89.43 96.55 99.52 134.24 81.18 
141 152.08 113.43 105.91 128.68 150.28 151.64 
138 111.18 91.95 112.40 138.60 119.79 91.10 
135 151.43 109.72 119.98 140.81 148.20 178.95 
130 104.11 91.44 112.90 120.79 102.30 87.67 
130 84.94 93.56 79.75 99.43 108.59 91.06 
123 157.17 97.62 71.59 77.01 154.79 145.67 
Mean 150.6 130.10 99.00 96.31 107.53 128.37 128.19 
Difference 20.50 51.60 54.29 43.07 22.23 22.41 

Comparative analysis with both the standalone and hybrid models

To explore the potential of the CTL model in improving the efficiency of multi-time-ahead streamflow prediction, we compared its accuracy with that of three standalone models (i.e., Transformer, CNN, LSTM) and two hybrid models (i.e., CT, TL) for 1-, 3-, and 6-month ahead streamflow predictions. The results indicated that the CTL model performed significantly better than the other standalone and hybrid models.

In terms of statistical performance metrics, taking 1-month ahead prediction as an example (Table 2), the NSE and R values of the CTL model increased by 23.116 and 10.202% compared to the Transformer model, 27.513 and 8.980% compared to the CNN model, 18.282 and 8.260% compared to the LSTM model, 14.083 and 6.270% compared to the CT model, 6.167 and 2.610% compared to the TL model, respectively. Meanwhile, the MAE and RMSE decreased by 5.298 and 11.311 m³/s for the Transformer model, 7.556 and 12.453 m³/s for the CNN model, 5.419 and 9.846 m³/s for the LSTM model, 4.385 and 8.334 m³/s for the CT model, 3.566 and 4.686 m³/s for the TL model, respectively. Furthermore, when considering the predicting performance for the 3- and 6-month ahead predictions (Tables 3 and 4), the CTL model also showed superior simulation results despite the poorer results compared to the 1-month ahead prediction, with the NSE value of the CTL model reaching 0.912 and 0.856, respectively, exceeding the best of the other five models at 0.824 (TL) and 0.778 (CT). Consequently, we concluded that the CTL model can obtain more accurate prediction for streamflow.

While the statistical metrics presented so far have assessed the skill of the CTL model, hydrographs and scatter plots are also helpful in evaluating the temporal correspondence of the observed and predicted values. It is clear that the streamflow values estimated by the CTL are closer to the corresponding observed values than those estimated by other models. For 1-month ahead prediction, the overall fit of the CTL model was significantly better than other models (Figure 5). The CTL model provided the largest value of k (0.952 vs. 0.791 vs. 0.785 vs. 0.724 vs. 0.847 vs. 0.950) and the smallest value of b (0.791 vs. 5.536 vs. 4.784 vs. 3.090 vs. 2.668 vs. 1.681) in the least square equation as well as the largest R² (0.965 vs. 0.824 vs. 0.795 vs. 0.814 vs. 0.855 vs. 0.918) compared with Transformer, CNN, LSTM, CT, and TL models (Figure 5). The linear analysis results for 3- and 6-month ahead predictions were similar to those for 1-month ahead, with the CTL models outperforming the Transformer, CNN, LSTM, CT, and TL models (Figures 6 and 7).

Figure 8 shows the mean values of the top 10 streamflow data points on the testing phase. The best results of the Transformer, CNN, LSTM, CT, and TL models underestimated about 8.12 m³/s (TL), 19.58 m³/s (CT), and 22.23 m³/s (CT) for 1-, 3-, and 6-month ahead streamflow predictions, respectively, while the CTL model underestimated about 5.44, 11.41, and 20.5 m³/s.
Figure 8

Errors between the mean of the top 10 ranked streamflow predicted by the CTL, Transformer, CNN, LSTM, CT, and TL models for 1-, 3-, 6-month ahead and the corresponding mean observed streamflow obtained in the testing period.

Figure 8

Errors between the mean of the top 10 ranked streamflow predicted by the CTL, Transformer, CNN, LSTM, CT, and TL models for 1-, 3-, 6-month ahead and the corresponding mean observed streamflow obtained in the testing period.

Close modal
While evaluating the effectiveness of a predictive model for its feasibility in streamflow prediction, it is important to assess the distribution of errors in the entire tested dataset. Therefore, to test the robustness of the developed models, we showed the error distribution in Figure 9, comparing the CTL against the Transformer, CNN, LSTM, CT, and TL models. To assess the degree of dispersion and skewness of the error, we showed a boxplot diagram for 1-, 3-, and 6-month ahead streamflow prediction. It can be seen that the CTL model generated smaller errors and dimensionless residuals than the compared models. This is true not only for the 1-month ahead streamflow prediction but also for the 3- and 6-month ahead. Therefore, the results showed that the CTL model is superior to the Transformer, CNN, LSTM, CT, and TL models.
Figure 9

Boxplots of the predicted error for the 1-, 3-, and 6-month ahead streamflow generated by the CTL, Transformer, CNN, LSTM, CT, TL models in the testing period. The top and bottom of the box represent the 75 and 25% quantiles of errors, respectively. The line inside the box indicates the average value.

Figure 9

Boxplots of the predicted error for the 1-, 3-, and 6-month ahead streamflow generated by the CTL, Transformer, CNN, LSTM, CT, TL models in the testing period. The top and bottom of the box represent the 75 and 25% quantiles of errors, respectively. The line inside the box indicates the average value.

Close modal
Figure 10 shows the hydrographs of errors between the observed and predicted streamflow generated by the CTL, Transformer, CNN, LSTM, CT, and TL models in the testing period. The CTL had smaller errors for 1-, 3-, and 6- month ahead streamflow predictions in most test data cases, and the results indicated that the CTL model had higher accuracy than Transformer, CNN, LSTM, CT, and TL models, which indicating a significantly better fit. This also reaffirms that the CTL model has better predictive skills than the Transformer, CNN, LSTM, CT, and TL models.
Figure 10

The errors between the observed and predicted streamflow of the CTL, Transformer, CNN, LSTM, CT, and TL models for 1-, 3-, 6-month ahead predictions obtained in the testing period.

Figure 10

The errors between the observed and predicted streamflow of the CTL, Transformer, CNN, LSTM, CT, and TL models for 1-, 3-, 6-month ahead predictions obtained in the testing period.

Close modal
A Taylor diagram further enables a comparison of the predicting ability of the developed models (Wen et al. 2019). Taylor diagrams showed the CTL model had the greatest variability compared to the Transformer, CNN, LSTM, CT, and TL models in 1-, 3-, and 6-month ahead predictions (Figure 11). Based on the combined results of the correlation coefficient, standard deviation ratio, and centered RMSE error, the CTL model was closer to the reference point on the x-axis than the three standalone (i.e., Transformer, CNN, LSTM) and two hybrid (i.e., CT, TL) models in all cases, which indicates that the CTL was a more realistic model for 1-, 3-, 6-month ahead streamflow predictions.
Figure 11

Taylor diagram of the CTL, Transformer, CNN, LSTM, CT and TL models for 1-, 3-, and 6-month ahead streamflow predictions in the testing period.

Figure 11

Taylor diagram of the CTL, Transformer, CNN, LSTM, CT and TL models for 1-, 3-, and 6-month ahead streamflow predictions in the testing period.

Close modal

Superiority of the CTL model

In this study, we established a CTL approach for multi-time-ahead streamflow prediction in the upstream of the Shule River, northwest China. The results indicated that the CTL achieved excellent performance for 1-, 3-, 6-month streamflow predictions, which was better than standalone models and other hybrid models. Three reasons for this were mainly included.

First, in the deep learning-based hybrid CTL model, we added the Transformer model. Transformer can strengthen or weaken the connection between two arbitrary locations to capture the global relationship between streamflow and its influences, thereby resolving the problem of the long distance between information locations in the streamflow sequence (Yin et al. 2022a). Transformer can now extract more complicated information on both temporal and geographical scales. Specifically, Transformer can capture the correlations between one element of the input and all elements (including historical precipitation, sunshine duration, air temperature, and 0-m ground temperature, streamflow). This may be the reason that the performance of the standalone Transformer far exceeded the acceptable NSE limit of 0.5 in streamflow prediction (0.783, 0.735, and 0.717 for 1-, 3-, and 6-month ahead, respectively).

Second, CNN is utilized to extract deeper and more complex features between variables in the CTL model. The merits of CNN lie in the extraction of the partial hidden correlations between streamflow and its impact factors. As a result, useful information could be well captured. The obtained good predictive results of the standalone CNN model further tested the effectiveness of the CNN algorithm with NSE values reaching 0.756, 0.739, and 0.679 for 1-, 3-, and 6-month ahead, respectively. When combining Transformer with CNN (CT), better extraction capability of partial hidden features and global information from streamflow and meteorological (i.e., precipitation, sunshine duration, air temperature, and 0-m ground temperature) data can be strengthened. Therefore, the CT model outperformed the Transformer model with the NSE values increasing by 7.918, 7.211, and 8.508, 11.711, 6.631, and 14.580% compared to the CNN model at 1-, 3-, and 6-month prediction horizons, respectively.

Third, by incorporating the LSTM model into the CTL model, the relevant information on a temporal scale, such as trends and periods in streamflow and meteorological series, could be further extracted. This can be proved by the good predictive outcome of the standalone LSTM model as the NSE values of LSTM reached 0.815, 0.755, and 0.733 for 1-, 3-, and 6-month ahead predictions, respectively. The TL model combined the Transformer with the LSTM model, resulting in improved temporal correlation and global information capture in the streamflow process. Thus, the TL model outperformed the Transformer and LSTM models. Specifically, the NSE values of the TL model increased 15.964, 12.109, and 8.368% more than the Transformer model, while 11.411, 9.139, and 6.003% more than the LSTM model at 1-, 3-, and 6-month prediction horizons, respectively.

Consequently, the CTL, as a deep learning-based model, combined the ability of the CNN model to extract partial hidden features (Khosravi et al. 2020; Panahi et al. 2020; Wang et al. 2020; Pyo et al. 2021), the ability of the Transformer model to extract global information (directly calculating the correlation between two features instead of passing through the network) (Baek et al. 2022; Peng et al. 2022), and the sensitivity of the LSTM model to the correlation on the temporal scale (Xu et al. 2022; Yin et al. 2022b), can well capture the highly complex, non-stationary, non-linear, and multi-scaled correlations between streamflow and meteorological parameters. CTL showed outstanding streamflow predicting efficiency with NSE of 0.964, 0.912, and 0.856 at 1-, 3-, and 6-month prediction horizons, respectively.

Hydrological implication

Efficient water resource management through accurate streamflow prediction is crucial for arid regions (Yang et al. 2020a; Paul et al. 2021). However, predicting streamflow is challenging due to its highly complex, non-stationary, non-linear, and multi-scale nature. Some studies have sought to improve standalone-predicting models, although standalone-predicting models offer advantages as well as deficiencies due to structural constraints, leaving them with limitations in terms of improving prediction (Mei et al. 2022). To fully exploit the advantages of each deep learning model and improve the overall predicting efficiency, hybrid models were frequently employed (Barzegar et al. 2020, 2021; Cho & Kim 2022). In this study, we established a hybrid CTL model for 1-, 3-, 6-month ahead streamflow prediction in an arid region of northwest China and obtained excellent performance. The results indicated that CTL is an appropriate alternative approach for streamflow prediction where surface data are scarce.

Hybrid models have also been used in numerous other hydrological research fields, including water quality prediction, lake level prediction, and evaporation prediction (Barzegar et al. 2021; Lakmini Prarthana Jayasinghe et al. 2022; Mei et al. 2022), all studies showed that hybrid models provided extremely accurate prediction. Mei et al. (2022) proposed a CNN-GRU-Attention (CGA) model to predict daily water quality of the Bayi Waterworks in Xiaogan, Hubei province, China, and the results showed that the CGA model outperformed standalone LSTM and GRU models. Therefore, the CTL, as a hybrid deep learning-based model, combining the advantages of the three deep learning models (i.e., Transformer, CNN, LSTM), can also be applied for hydrological applications other than streamflow prediction.

While a standalone deep learning model can yield good results, the pursuit of more accurate prediction drives scientific progress. Therefore, combining multiple deep learning models to fully leverage their individual advantages can increase the robustness and reliability, leading to more accurate and stable prediction. In order to satisfy our ongoing pursuit of prediction accuracy in the future, it will become more and more crucial to investigate suitable combinations of deep learning models due to the continuous development of technology, the increasing number of deep learning models.

Limitations and future work

The CTL model established in this study, however, had certain drawbacks. First, it has demonstrated poor performance in predicting summer peaks, particularly for 6-month ahead streamflow prediction, which is consistent with the findings of other studies (Wen et al. 2019; Gao et al. 2020; Alizadeh et al. 2021; Lin et al. 2021). This may be due to the fact that snowmelt water is often considered a vital source of streamflow, as glacial meltwater makes up approximately 80% of the annual streamflow in the Shule River basin (Lan et al. 2012). Additionally, freezing and thawing effects are prevalent in this region, so soil infiltration and evaporation need to be considered when converting rainfall to streamflow (Wang et al. 2022). The model in this study only considered precipitation, sunshine duration, air temperature, and 0-m ground temperature. To optimize peak prediction in the future, we will also consider soil and snow data. Second, hyperparameters were determined using grid search, which introduced uncertainty into our optimal results. To effectively avoid this, we will instead use optimization algorithms (e.g., crisscross optimization (CSO); particle swarm optimization (PSO)) for tuning parameters in the future. Finally, despite the excellent results of the CTL model, it is time-consuming like other deep learning models. As such, searching for simpler and more efficient model structures is the direction of future research.

For the rational use of water resources, streamflow prediction must be accurate. In this study, we established a hybrid model for streamflow prediction called CTL. This model combined Transformer with the LSTM and the CNN to extract partially hidden features, capture temporal correlations, and extract global information. We applied the CTL, as well as the Transformer, CNN, LSTM, CT, and TL models, to predict streamflow at the Changmabao hydrological station in the Shule River basin. To examine the models' predictive abilities, we utilized statistical performance metrics, hydrographs, peak prediction, scatter plots, boxplot diagrams, and Taylor diagrams. The conclusions drawn from this study are as follows:

  • (1)

    Our experiments showed that the CTL model provided accurate streamflow prediction for 1-month, 3-month, and 6-month ahead predictions in an arid region with the Shule River as an example.

  • (2)

    The CTL model outperformed standalone models (i.e., Transformer, CNN, and LSTM) and hybrid models (i.e., CT and TL) without the LSTM or CNN module. This validated the stability of the CTL model framework and demonstrated the applicability of the CTL model for streamflow prediction in an arid region. This study has a significant impact on the accurate prediction of streamflow and helps us to better predict and manage water resources in the absence of observational data.

This study was funded by the National Natural Science Foundation of China (42130113 and 42001035), the Young Elite Scientist Sponsorship Program of China Association for Science and Technology (Grant No. YESS20200089), and the Youth Innovation Promotion Association of Chinese Academy of Sciences (Grant No. 2022435).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Adaryani
F. R.
,
Jamshid Mousavi
S.
&
Jafari
F.
2022
Short-term rainfall forecasting using machine learning-based approaches of PSO-SVR, LSTM and CNN
.
Journal of Hydrology
614
,
128463
.
https://doi.org/10.1016/j.jhydrol.2022.128463
.
Alizadeh
B.
,
Bafti
A. G.
,
Kamangir
H.
,
Zhang
Y.
,
Wright
D. B.
&
Franz
K. J.
2021
A novel attention-based LSTM cell post-processor coupled with Bayesian optimization for streamflow prediction
.
Journal of Hydrology
601
,
126526
.
https://doi.org/10.1016/j.jhydrol.2021.126526
.
Apaydin
H.
,
Sattari
M. T.
,
Falsafian
K.
&
Prasad
R.
2021
Artificial intelligence modelling integrated with Singular Spectral analysis and Seasonal-Trend decomposition using Loess approaches for streamflow predictions
.
Journal of Hydrology
600
,
126506
.
https://doi.org/10.1016/j.jhydrol.2021.126506
.
Atila
O.
&
Sengur
A.
2021
Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition
.
Applied Acoustics
182
(
1
),
108260
.
https://doi.org/10.1016/j.apacoust.2021.108260
.
Baek
S. S.
,
Jung
E. Y.
,
Pyo
J.
,
Pachepsky
Y.
,
Son
H.
&
Cho
K. H.
2022
Hierarchical deep learning model to simulate phytoplankton at phylum/class and genus levels and zooplankton at the genus level
.
Water Research
218
,
118494
.
https://doi.org/10.1016/j.watres.2022.118494
.
Bai
T.
&
Tahmasebi
P.
2022
Characterization of groundwater contamination: A transformer-based deep learning model
.
Advances in Water Resources
164
,
104217
.
https://doi.org/10.1016/j.advwatres.2022.104217
.
Barzegar
R.
,
Aalami
M. T.
&
Adamowski
J.
2020
Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model
.
Stochastic Environmental Research and Risk Assessment
34
(
8
),
1
19
.
https://doi.org/10.1007/s00477-020-01776-2
.
Barzegar
R.
,
Aalami
M. T.
&
Adamowski
J.
2021
Coupling a hybrid CNN-LSTM deep learning model with a boundary corrected maximal overlap discrete wavelet transform for multiscale lake water level forecasting
.
Journal of Hydrology
598
,
126196
.
https://doi.org/10.1016/j.jhydrol.2021.126196
.
Chang
F.-J.
&
Tsai
M.-J.
2016
A nonlinear spatio-temporal lumping of radar rainfall for modeling multi-step-ahead inflow forecasts by data-driven techniques
.
Journal of Hydrology
535
,
256
269
.
https://doi.org/10.1016/j.jhydrol.2016.01.056
.
Cho
K.
&
Kim
Y.
2022
Improving streamflow prediction in the WRF-Hydro model with LSTM networks
.
Journal of Hydrology
605
,
127297
.
https://doi.org/10.1016/j.jhydrol.2021.127297
.
Evaristo
J.
&
Mcdonnell
J. J.
2019
RETRACTED ARTICLE: Global analysis of streamflow response to forest management
.
Nature
570
(
7762
),
455
461
.
https://doi.org/10.1038/s41586-019-1306-0
.
Fidal
J.
&
Kjeldsen
T.
2020
Accounting for soil moisture in rainfall-runoff modelling of urban areas
.
Journal of Hydrology
589
,
125122
.
https://doi.org/10.1016/j.jhydrol.2020.125122
.
Gao
S.
,
Huang
Y.
,
Zhang
S.
,
Han
J.
,
Wang
G.
,
Zhang
M.
&
Lin
Q.
2020
Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation
.
Journal of Hydrology
589
,
125188
.
https://doi.org/10.1016/j.jhydrol.2020.125188
.
Gao
G. Y.
,
Ning
Z.
,
Li
Z. W.
&
Fu
B. J.
2021
Prediction of long-term inter-seasonal variations of streamflow and sediment load by state-space model in the Loess Plateau of China
.
Journal of Hydrology
600
,
126534
.
https://doi.org/10.1016/j.jhydrol.2021.126534
.
Gharib
A.
&
Davies
E. G. R.
2021
A workflow to address pitfalls and challenges in applying machine learning models to hydrology
.
Advances in Water Resources
152
,
103920
.
https://doi.org/10.1016/j.advwatres.2021.103920
.
Guo
X.
,
Feng
Q.
,
Liu
W.
,
Li
Z.
,
Wen
X.
,
Si
J.
,
Xi
H.
,
Guo
R.
&
Jia
B.
2015
Stable isotopic and geochemical identification of groundwater evolution and recharge sources in the arid Shule River Basin of Northwestern China
.
Hydrological Processes
29
(
22
),
4703
4718
.
https://doi.org/10.1002/hyp.10495
.
Han
K.
,
Xiao
A.
,
Wu
E.
,
Guo
J.
,
Xu
C.
&
Wang
Y.
2021a
Transformer in transformer
.
Advances in Neural Information Processing Systems
34
,
15908
15919
.
Han
X.
,
Wei
Z.
,
Zhang
B. Z.
,
Li
Y. N.
,
Du
T. S.
&
Chen
H.
2021b
Crop evapotranspiration prediction by considering dynamic change of crop coefficient and the precipitation effect in back-propagation neural network model
.
Journal of Hydrology
596
(
3–4
),
126104
.
https://doi.org/10.1016/j.jhydrol.2021.126104
.
He
J.
,
Ma
J.
,
Zhao
W.
&
Sun
S.
2015
Groundwater evolution and recharge determination of the Quaternary aquifer in the Shule River basin, Northwest China
.
Hydrogeology Journal
23
(
8
),
1745
1759
.
doi:10.1007/s10040-015-1311-9
.
Hong
M.
,
Wang
D.
,
Wang
Y.
,
Zeng
X.
,
Ge
S.
,
Yan
H.
&
Singh
V.P.
2016
Mid- and long-term runoff predictions by an improved phase-space reconstruction model
.
Environmental Research
148
,
560
573
.
https://doi.org/10.1016/j.envres.2015.11.024
.
Hua
W.
,
Dai
Z.
,
Liu
H.
&
Le
Q.
2022
Transformer quality in linear time
.
International Conference on Machine Learning
,
Baltimore, MD. PMLR
, pp.
9099
9117
.
Huang
C. J.
&
Kuo
P. H.
2018
A deep CNN-LSTM model for Particulate Matter (PM2.5) forecasting in smart cities
.
Sensors
18
(
7
),
2220
.
https://doi.org/10.3390/s18072220
.
Huang
L.
,
Chen
W.
&
Qu
H.
2021
Accelerating transformer for neural machine translation
. In:
2021 13th International Conference on Machine Learning and Computing
, pp.
191
197
.
Huo
W.
,
Li
Z.
,
Zhang
K.
,
Wang
J.
&
Yao
C.
2020
GA-PIC: An improved Green-ampt rainfall-runoff model with a physically based infiltration distribution curve for semi-arid basins
.
Journal of Hydrology
586
,
124900
.
https://doi.org/10.1016/j.jhydrol.2020.124900
.
Ibrahim
K. S. M. H.
,
Huang
Y. F.
,
Ahmed
A. N.
,
Koo
C. H.
&
El-Shafie
A.
2022
A review of the hybrid artificial intelligence and optimization modelling of hydrological streamflow forecasting
.
Alexandria Engineering Journal
61
(
1
),
279
303
.
https://doi.org/10.1016/j.aej.2021.04.100
.
Jahangir
M. S.
,
You
J.
&
Quilty
J.
2023
A quantile-based encoder-decoder framework for multi-step ahead runoff forecasting
.
Journal of Hydrology
619
,
129269
.
https://doi.org/10.1016/j.jhydrol.2023.129269
.
Jiang
A.
&
Jafarpour
B.
2021
Inverting subsurface flow data for geologic scenarios selection with convolutional neural networks
.
Advances in Water Resources
149
,
103840
.
https://doi.org/10.1016/j.advwatres.2020.103840
.
Kabir
S.
,
Patidar
S.
,
Xia
X. L.
,
Liang
Q. H.
,
Neal
J.
&
Pender
G.
2020
A deep convolutional neural network model for rapid prediction of fluvial flood inundation
.
Journal of Hydrology
590
,
125481
.
https://doi.org/10.1016/j.jhydrol.2020.125481
.
Kao
I. F.
,
Zhou
Y. L.
,
Chang
L. C.
&
Chang
F. J.
2020
Exploring a Long Short-Term Memory based Encoder-Decoder framework for multi-step-ahead flood forecasting
.
Journal of Hydrology
583
,
124631
.
https://doi.org/10.1016/j.jhydrol.2020.124631
.
Khosravi
K.
,
Panahi
M.
,
Golkarian
A.
,
Keesstra
S. D.
,
Saco
P. M.
,
Bui
D. T.
&
Lee
S.
2020
Convolutional neural network approach for spatial prediction of flood hazard at national scale of Iran
.
Journal of Hydrology
591
,
125552
.
https://doi.org/10.1016/j.jhydrol.2020.125552
.
Lakmini Prarthana Jayasinghe
W. J. M.
,
Deo
R. C.
,
Ghahramani
A.
,
Ghimire
S.
&
Raj
N.
2022
Development and evaluation of hybrid deep learning long short-term memory network model for pan evaporation estimation trained with satellite and ground-based data
.
Journal of Hydrology
607
,
127534
.
https://doi.org/10.1016/j.jhydrol.2022.127534
.
Lan
Y.
,
Xinglin
H. U.
,
Hongwei
D.
,
Chengfang
L. A.
&
Song
J.
2012
Variation of water cycle factors in the Western Qilian Mountain area under climate warming——Taking the mountain watershed of the main stream of Shule River basin for example
.
Journal of Mountain Science
6
,
675
680
.
Li
S.
,
Jin
X.
,
Xuan
Y.
,
Zhou
X.
,
Chen
W.
,
Wang
Y.-X.
&
Yan
X.
2019
Enhancing the locality and breaking the memory bottleneck of Transformer on time series forecasting
.
Advances in Neural Information Processing Systems
32
,
5244
5254
.
Li
L. L.
,
Yang
B.
,
Liang
M.
,
Zeng
W.
,
Ren
M.
,
Segal
S.
&
Urtasun
R.
2020
End-to-end contextual perception and prediction with interaction transformer
. In:
2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
.
IEEE
, pp.
5784
5791
.
https://doi.org/10.1109/IROS45743.2020.9341392
.
Li
P.
,
Zhang
J.
&
Krebs
P.
2022a
Prediction of flow based on a CNN-LSTM combined deep learning approach
.
Water
14
(
6
),
993
.
https://doi.org/10.3390/w14060993
.
Li
W.
,
Pan
B.
,
Xia
J.
&
Duan
Q.
2022b
Convolutional neural network-based statistical post-processing of ensemble precipitation forecasts
.
Journal of Hydrology
605
,
127301
.
https://doi.org/10.1016/j.jhydrol.2021.127301
.
Lin
Y.
,
Wang
D.
,
Wang
G.
,
Qiu
J.
,
Long
K.
,
Du
Y.
,
Xie
H.
,
Wei
Z.
,
Shangguan
W.
&
Dai
Y.
2021
A hybrid deep learning algorithm and its application to streamflow prediction
.
Journal of Hydrology
601
,
126636
.
https://doi.org/10.1016/j.jhydrol.2021.126636
.
Liu
L.
,
Liu
X.
,
Gao
J.
,
Chen
W.
&
Han
J.
2020
Understanding the difficulty of training transformers. arXiv preprint arXiv:2004.08249. https://doi.org/10.48550/arXiv.2004.08249
.
Liu
G. J.
,
Tang
Z. Y.
,
Qin
H.
,
Liu
S.
,
Shen
Q.
,
Qu
Y. H.
&
Zhou
J. Z.
2022a
Short-term runoff prediction using deep learning multi-dimensional ensemble method
.
Journal of Hydrology
609
,
127762
.
https://doi.org/10.1016/j.jhydrol.2022.127762
.
Liu
Y.
,
Wu
H.
,
Wang
J.
&
Long
M.
2022b
Non-stationary transformers: Exploring the stationarity in time series forecasting
.
Advances in Neural Information Processing Systems
35
,
9881
9893
.
Liu
G.
,
Ouyang
S.
,
Qin
H.
,
Liu
S.
,
Shen
Q.
,
Qu
Y.
,
Zheng
Z.
,
Sun
H.
&
Zhou
J.
2023
Assessing spatial connectivity effects on daily streamflow forecasting using Bayesian-based graph neural network
.
Science of The Total Environment
855
,
158968
.
https://doi.org/10.1016/j.scitotenv.2022.158968
.
Ma
L.
,
Bo
J.
,
Li
X.
,
Fang
F.
&
Cheng
W.
2019
Identifying key landscape pattern indices influencing the ecological security of Inland River Basin: The middle and lower reaches of Shule River Basin as an example
.
Science of the Total Environment
674
,
424
438
.
https://doi.org/10.1016/j.scitotenv.2019.04.107
.
Mei
P.
,
Li
M.
,
Zhang
Q.
,
Li
G.
&
Song
L.
2022
Prediction model of drinking water source quality with potential industrial-agricultural pollution based on CNN-GRU-Attention
.
Journal of Hydrology
610
,
127934
.
https://doi.org/10.1016/j.jhydrol.2022.127934
.
Mellouli
N.
,
Rabah
M. L.
&
Farah
I. R.
2022
Transformers-based time series forecasting for piezometric level prediction
. In:
2022 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)
, pp.
1
6
.
doi:10.1109/EAIS51927.2022.9787530
.
Merritt
A. M.
,
Lane
B.
&
Hawkins
C. P.
2021
Classification and prediction of natural streamflow regimes in arid regions of the USA
.
Water
13
(
3
),
380
.
https://doi.org/10.3390/w13030380
.
Messager
M. L.
,
Lehner
B.
,
Cockburn
C.
,
Lamouroux
N.
,
Pella
H.
,
Snelder
T.
,
Tockner
K.
,
Trautmann
T.
,
Watt
C.
&
Datry
T.
2021
Global prevalence of non-perennial rivers and streams
.
Nature
594
(
7863
),
391
397
.
https://doi.org/10.1038/s41586-021-03565-5
.
Moriasi
D. N.
,
Arnold
J. G.
,
Liew
M.
,
Bingner
R. L.
,
Harmel
R. D.
&
Veith
T. L.
2007
Model evaluation guidelines for systematic quantification of accuracy in watershed simulations
.
Transactions of the ASABE
50
(
3
),
885
900
.
Myers
D. T.
,
Ficklin
D. L.
&
Robeson
S. M.
2021
Incorporating rain-on-snow into the SWAT model results in more accurate simulations of hydrologic extremes
.
Journal of Hydrology
603
,
126972
.
https://doi.org/10.1016/j.jhydrol.2021.126972
.
Nguyen
T.
,
Nguyen
L.
,
Tran
P.
&
Nguyen
H.
2021
Improving transformer-based neural machine translation with prior alignments
.
Complexity
2021
,
1
10
.
https://doi.org/10.1155/2021/5515407
.
Panahi
M.
,
Sadhasivam
N.
,
Pourghasemi
H. R.
,
Rezaie
F.
&
Lee
S.
2020
Spatial prediction of groundwater potential mapping based on convolutional neural network (CNN) and support vector regression (SVR)
.
Journal of Hydrology
588
,
125033
.
https://doi.org/10.1016/j.jhydrol.2020.125033
.
Paul
M.
,
Rajib
A.
,
Negahban-Azar
M.
,
Shirmohammadi
A.
&
Srivastava
P.
2021
Improved agricultural water management in data-scarce semi-arid watersheds: Value of integrating remotely sensed leaf area index in hydrological modeling
.
Science of the Total Environment
791
,
148177
.
https://doi.org/10.1016/j.scitotenv.2021.148177
.
Peng
L.
,
Wu
H.
,
Gao
M.
,
Yi
H.
,
Xiong
Q.
,
Yang
L.
&
Cheng
S.
2022
TLT: Recurrent fine-tuning transfer learning for water quality long-term prediction
.
Water Research
225
,
119171
.
https://doi.org/10.1016/j.watres.2022.119171
.
Pyo
J.
,
Cho
K. H.
,
Kim
K.
,
Baek
S. S.
,
Nam
G.
&
Park
S.
2021
Cyanobacteria cell prediction using interpretable deep learning model with observed, numerical, and sensing data assemblage
.
Water Research
203
,
117483
.
https://doi.org/10.1016/j.watres.2021.117483
.
Qi
J.
,
Zhang
X.
,
Yang
Q.
,
Srinivasan
R.
,
Arnold
J. G.
,
Li
J.
,
Waldholf
S. T.
&
Cole
J.
2020
SWAT ungauged: Water quality modeling in the Upper Mississippi River Basin
.
Journal of Hydrology
584
,
124601
.
https://doi.org/10.1016/j.jhydrol.2020.124601
.
Qiu
R.
,
Li
L.
,
Wu
L.
,
Agathokleous
E.
,
Liu
C.
&
Zhang
B.
2022
Comparison of machine learning and dynamic models for predicting actual vapour pressure when psychrometric data are unavailable
.
Journal of Hydrology
610
,
127989
.
https://doi.org/10.1016/j.jhydrol.2022.127989
.
Ren
J.
,
Yu
Z.
,
Gao
G.
,
Yu
G.
&
Yu
J.
2022
A CNN-LSTM-LightGBM based short-term wind power prediction method based on attention mechanism
.
Energy Reports
8
,
437
443
.
https://doi.org/10.1016/j.egyr.2022.02.206
.
Slater
L. J.
,
Anderson
B.
,
Buechel
M.
,
Dadson
S.
,
Han
S.
,
Harrigan
S.
,
Kelder
T.
,
Kowal
K.
,
Lees
T.
,
Matthews
T.
,
Murphy
C.
&
Wilby
R. L.
2021
Nonstationary weather and water extremes: A review of methods for their detection, attribution, and management
.
Hydrology and Earth System Sciences
25
(
7
),
3897
3935
.
https://doi.org/10.5194/hess-25-3897-2021
.
Sun
Z.
,
Cao
S.
,
Yang
Y.
&
Kitani
K. M.
2021
Rethinking transformer-based set prediction for object detection
. In:
Proceedings of the IEEE/CVF International Conference on Computer Vision
, pp.
3611
3620
.
Tan
M. L.
,
Gassman
P. W.
,
Yang
X.
&
Haywood
J.
2020
A review of SWAT applications, performance and future needs for simulation of hydro-climatic extremes
.
Advances in Water Resources
143
,
103662
.
https://doi.org/10.1016/j.advwatres.2020.103662
.
Taylor
K. E.
2001
Summarizing multiple aspects of model performance in a single diagram
.
Journal of Geophysical Research Atmospheres
106
(
D7
),
7183
7192
.
https://doi.org/10.1029/2000JD900719
.
Vaswani
A.
,
Shazeer
N.
,
Parmar
N.
,
Uszkoreit
J.
,
Jones
L.
,
Gomez
A. N.
,
Kaiser
Ł.
&
Polosukhin
I.
2017
Attention is all you Need, Advances in Neural Information Processing Systems
.
Wagena
M. B.
,
Goering
D.
,
Collick
A. S.
,
Bock
E.
,
Fuka
D. R.
,
Buda
A.
&
Easton
Z. M.
2020
Comparison of short-term streamflow forecasting using stochastic time series, neural networks, process-based, and Bayesian models
.
Environmental Modelling & Software
126
,
104669
.
https://doi.org/10.1016/j.envsoft.2020.104669
.
Wang
F.
&
Zai
Y.
2023
Image segmentation and flow prediction of digital rock with U-net network
.
Advances in Water Resources
172
,
104384
.
https://doi.org/10.1016/j.advwatres.2023.104384
.
Wang
Y.
,
Fang
Z. C.
,
Hong
H. Y.
&
Peng
L.
2020
Flood susceptibility mapping using convolutional neural network frameworks
.
Journal of Hydrology
582
,
124482
.
https://doi.org/10.1016/j.jhydrol.2019.124482
.
Wang
S.
,
Zhao
Q.
&
Pu
T.
2021
Assessment of water stress level about global glacier-covered arid areas: A case study in the Shule River Basin, northwestern China
.
Journal of Hydrology: Regional Studies
37
,
100895
.
https://doi.org/10.1016/j.ejrh.2021.100895
.
Wang
Z.
,
He
Y.
,
Li
W.
,
Chen
X.
,
Yang
P.
&
Bai
X.
2022
A generalized reservoir module for SWAT applications in watersheds regulated by reservoirs
.
Journal of Hydrology
128770
.
https://doi.org/10.1016/j.jhydrol.2022.128770
.
Wang
H. J.
,
Merz
R.
,
Yang
S.
,
Tarasova
L.
&
Basso
S.
2023
Emergence of heavy tails in streamflow distributions: The role of spatial rainfall variability
.
Advances in Water Resources
171
,
104359
.
https://doi.org/10.1016/j.advwatres.2022.104359
.
Wen
X.
,
Feng
Q.
,
Deo
R. C.
,
Wu
M.
,
Yin
Z.
,
Yang
L.
&
Singh
V. P.
2019
Two-phase extreme learning machines integrated with the complete ensemble empirical mode decomposition with adaptive noise algorithm for multi-scale runoff prediction problems
.
Journal of Hydrology
570
,
167
184
.
https://doi.org/10.1016/j.jhydrol.2018.12.060
.
Xie
C.
,
Zhao
L. J.
,
Eastoe
C. J.
,
Wang
N. L.
&
Dong
X. Y.
2022
An isotope study of the Shule River Basin, Northwest China: Sources and groundwater residence time, sulfate sources and climate change
.
Journal of Hydrology
612
,
128043
.
https://doi.org/10.1016/j.jhydrol.2022.128043
.
Xu
H.
,
van Genabith
J.
,
Xiong
D.
&
Liu
Q.
2020
Dynamically adjusting transformer batch size by monitoring gradient direction change. arXiv preprint arXiv:2005.02008. https://doi.org/10.48550/arXiv.2005.02008
.
Xu
Y. H.
,
Hu
C. H.
,
Wu
Q.
,
Jian
S. Q.
,
Li
Z. C.
,
Chen
Y. Q.
,
Zhang
G. D.
,
Zhang
Z. X.
&
Wang
S. L.
2022
Research on particle swarm optimization in LSTM neural networks for rainfall-runoff simulation
.
Journal of Hydrology
608
,
127553
.
https://doi.org/10.1016/j.jhydrol.2022.127553
.
Xu
J.
,
Fan
H.
,
Luo
M.
,
Li
P.
,
Jeong
T.
&
Xu
L.
2023a
Transformer based water level prediction in Poyang Lake, China
.
Water
.
doi:10.3390/w15030576
.
Xu
Y.
,
Lin
K.
,
Hu
C.
,
Wang
S.
,
Wu
Q.
,
Zhang
L.
&
Ran
G.
2023b
Deep transfer learning based on transformer for flood forecasting in data-sparse basins
.
Journal of Hydrology
625
,
129956
.
https://doi.org/10.1016/j.jhydrol.2023.129956
.
Yang
M.
2022
Visual Transformer for Object Detection. arXiv preprint arXiv:2206.06323. DOI:https://doi.org/10.48550/arXiv.2206.06323
Yang
L.
,
Feng
Q.
,
Adamowski
J. F.
,
Deo
R. C.
,
Yin
Z.
,
Wen
X.
,
Tang
X.
&
Wu
M.
2020a
Causality of climate, food production and conflict over the last two millennia in the Hexi Corridor, China
.
Science of the Total Environment
713
,
136587
.
https://doi.org/10.1016/j.scitotenv.2020.136587
.
Yang
S. Y.
,
Yang
D. W.
,
Chen
J. S.
,
Santisirisomboon
J.
,
Lu
W. W.
&
Zhao
B. X.
2020b
A physical process and machine learning combined hydrological model for daily streamflow simulations of large watersheds with limited observation data
.
Journal of Hydrology
590
,
125206
.
https://doi.org/10.1016/j.jhydrol.2020.125206
.
Yin
H. L.
,
Guo
Z. L.
,
Zhang
X. W.
,
Chen
J. J.
&
Zhang
Y. N.
2022a
RR-Former: Rainfall-runoff modeling based on Transformer
.
Journal of Hydrology
609
,
127781
.
https://doi.org/10.1016/j.jhydrol.2022.127781
.
Yin
H. L.
,
Wang
F. D.
,
Zhang
X. W.
,
Zhang
Y. N.
,
Chen
J. J.
,
Xia
R. L.
&
Jin
J.
2022b
Rainfall-runoff modeling using long short-term memory based step-sequence framework
.
Journal of Hydrology
610
,
127901
.
https://doi.org/10.1016/j.jhydrol.2022.127901
.
Yin
H.
,
Zhu
W.
,
Zhang
X.
,
Xing
Y.
,
Xia
R.
,
Liu
J.
&
Zhang
Y.
2023
Runoff predictions in new-gauged basins using two transformer-based models
.
Journal of Hydrology
622
,
129684
.
https://doi.org/10.1016/j.jhydrol.2023.129684
.
Zhang
J.
&
Yan
H.
2023
A long short-term components neural network model with data augmentation for daily runoff forecasting
.
Journal of Hydrology
617
,
128853
.
https://doi.org/10.1016/j.jhydrol.2022.128853
.
Zhang
Y.
,
Wang
L.
,
Zhang
H.
&
Xiangyun
L. I.
2003
An analysis on land use changes and their driving factors in Shule River:an example from Anxi County
.
Progress in Geography
22
(
3
),
53
61
.
Zhang
H.
,
Wang
C.
,
Shanzhen
Y.
&
Jiang
Q.
2009
Geostatistical analysis of spatial and temporal variations of groundwater depth in Shule River
. In
2009 WASE International Conference on Information Engineering
.
IEEE
, pp.
453
457
.
https://doi.org/10.1109/ICIE.2009.213
.
Zhang
F.
,
Zeng
C.
,
Wang
G. X.
,
Wang
L.
&
Shi
X. N.
2022
Runoff and sediment yield in relation to precipitation, temperature and glaciers on the Tibetan plateau
.
International Soil and Water Conservation Research
10
(
2
),
197
207
.
https://doi.org/10.1016/j.iswcr.2021.09.004
.
Zhou
Y.
&
Jiao
X.
2022
Intelligent analysis system for signal processing tasks based on LSTM recurrent neural network algorithm
.
Neural Computing and Applications
34
(
15
),
12257
12269
.
https://doi.org/10.1007/s00521-021-06478-6
.
Zhou
H.
,
Zhang
S.
,
Peng
J.
,
Zhang
S.
,
Li
J.
,
Xiong
H.
&
Zhang
W.
2021a
Informer: Beyond efficient transformer for long sequence time-series forecasting
. In:
Proceedings of the AAAI Conference on Artificial Intelligence
, pp.
11106
11115
.
https://doi.org/10.1609/aaai.v35i12.17325
.
Zhou
J.
,
Ding
Y.
,
Wu
J.
,
Liu
F.
&
Wang
S.
2021b
Streamflow generation in semi-arid, glacier-covered, montane catchments in the upper Shule River, Qilian Mountains, northeastern Tibetan plateau
.
Hydrological Processes
35
(
8
),
e14276
.
https://doi.org/10.1002/hyp.14276
.
Zhou
G.
,
Guo
Z.
,
Sun
S.
&
Jin
Q.
2023
A CNN-BiGRU-AM neural network for AI applications in shale oil production prediction
.
Applied Energy
344
,
121249
.
https://doi.org/10.1016/j.apenergy.2023.121249
.
Zuo
G.
,
Luo
J.
,
Wang
N.
,
Lian
Y.
&
He
X.
2020
Decomposition ensemble model based on variational mode decomposition and long short-term memory for streamflow forecasting
.
Journal of Hydrology
585
,
124776
.
https://doi.org/10.1016/j.jhydrol.2020.124776
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data