Abstract
Accurate streamflow prediction is crucial for effective water resource management. However, reliable prediction remains a considerable challenge because of the highly complex, non-stationary, and non-linear processes that contribute to streamflow at various spatial and temporal scales. In this study, we utilized a convolutional neural network (CNN)–Transformer–long short-term memory (LSTM) (CTL) model for streamflow prediction, which replaced the embedding layer with a CNN layer to extract partial hidden features, and added an LSTM layer to extract correlations on a temporal scale. The CTL model incorporated Transformer's ability to extract global information, CNN's ability to extract hidden features, and LSTM's ability to capture temporal correlations. To validate its effectiveness, we applied it for streamflow prediction in the Shule River basin in northwest China across 1-, 3-, and 6-month horizons and compared its performance with Transformer, CNN, LSTM, CNN–Transformer, and Transformer–LSTM. The results demonstrated that CTL outperformed all other models in terms of predictive accuracy with Nash–Sutcliffe coefficient (NSE) values of 0.964, 0.912, and 0.856 for 1-, 3-, 6-month ahead prediction. The best results among the five comparative models were 0.908, 0.824, and 0.778, respectively. This indicated that CTL is an outstanding alternative technique for streamflow prediction where surface data are limited.
HIGHLIGHTS
CTL integrated and absorbed the respective merits of Transformer, CNN, and LSTM.
CTL achieved exceptional accuracy, surpassing that of the benchmarked models.
CTL effectively predicted multi-time-ahead streamflow in arid northwest China.
INTRODUCTION
Predicting streamflow accurately is critical for water resources management, electricity generation, and ecosystem conservation (Evaristo & Mcdonnell 2019; Wagena et al. 2020; Messager et al. 2021). In arid regions, streamflow is often the only source of water supplies (Lan et al. 2012; Merritt et al. 2021; Wang et al. 2021). However, streamflow, influenced by factors such as geographic location, topographic conditions, climate change, and human activities, is highly complex, non-linear, non-stationary, and multi-scale (Chang & Tsai 2016; Jahangir et al. 2023). Moreover, the scarcity of meteorological and hydrological stations in arid regions results in a lack of data, which impacts the accuracy of streamflow prediction (Gao et al. 2021). As a result, streamflow prediction can be extremely difficult and challenging in arid regions.
Several methods have been used to predict streamflow. Physical models, like SWAT, TOPMODEL, VIC, WRF-Hydro, are commonly applied. Physical models precisely depict hydrological processes by utilizing complicated mathematical techniques and a thorough grasp of the local hydrological system (Qi et al. 2020; Myers et al. 2021). Thus, the predicting accuracy of streamflow is strongly determined by the physical parameters of the model and the initial configuration of the region (Tan et al. 2020; Yang et al. 2020b). Streamflow, on the other hand, is a complicated, non-linear, and non-stationary process driven by meteorological elements like precipitation, temperature, and evapotranspiration, as well as underlying surface features like landform, slope, and land use (Wen et al. 2019; Wang et al. 2023). It may be challenging for physical models to accurately model streamflow using only the simplified water balance equation (Fidal & Kjeldsen 2020; Huo et al. 2020). The use of physical models may be hampered in places with variable underlying surface characteristics and a scarcity of meteorological stations (Cho & Kim 2022). In the arid regions of northwestern China, where glaciers cover a considerable area, streamflow prediction is complicated by the presence of glacial permafrost (Zhou et al. 2021b). Existing physical hydrological models have a high degree of uncertainty in describing the hydrological effects of glaciers and permafrost, resulting in a substantial error in streamflow prediction (Zhang et al. 2018).
Machine learning techniques such as support vector regression, neural networks, and extreme learning machines have been widely applied in streamflow prediction in recent decades (Gharib & Davies 2021; Ibrahim et al. 2022). These machine learning models often outperform physical models in modeling the non-linear streamflow process without requiring knowledge of the local hydrological system (Wu et al. 2021; Qiu et al. 2022). However, machine learning models are often too simple to perform deep feature extraction (Han et al. 2021b; Mei et al. 2022). Thus, they are still considered as ‘shallow learning’ models with limited ability to handle the stochastic features of the streamflow time series (Liu et al. 2022a).
With the rapid development of artificial intelligence, the ‘shallow learning’ problem has been solved by deep learning algorithms (a new branch of machine learning) (Li et al. 2022a). The deep learning models (e.g., convolutional neural network (CNN); long short-term memory (LSTM)) have been extensively applied in dealing with more complex tasks in pattern recognition, signal processing, and time series prediction (Apaydin et al. 2021; Zhou & Jiao 2022). Various deep learning models have been applied to streamflow prediction and show great potential (Xu et al. 2022; Zhang & Yan 2023). CNN and LSTM are well-known time series modeling algorithms (Barzegar et al. 2021; Jiang & Jafarpour 2021; Wang & Zai 2023). CNN has powerful partial feature extraction capability and is sensitive to sparse data, allowing it to effectively extract more complex and deeper hidden data features (Kabir et al. 2020; Wang et al. 2020; Adaryani et al. 2022). LSTM has an embedded recurrent structure, which allows it to remember previous information and capture temporal dynamics (Cho & Kim 2022; Ren et al. 2022; Xu et al. 2022). Moreover, LSTM regulates the memory cells information flow via non-linear gating units and relies on the learned weights to control the length of dependence, effectively alleviating the gradient disappearance problem (Kao et al. 2020; Yin et al. 2022b).
Nowadays, a newly developed deep learning model called Transformer has attracted our attention due to its unprecedented performance in machine translation (Huang et al. 2021; Nguyen et al. 2021). Relying on the attention mechanism, Transformer has a longer memory than CNN and LSTM to effectively obtain global information, and the multi-headed attention mechanism also gives the model strong expressive power (Han et al. 2021a; Bai & Tahmasebi 2022). Transformer is more flexible in extracting highly complex, non-linear, and non-stationary features from time series, and has achieved good performance in streamflow prediction. For example, Yin et al. (2022a) applied Transformer to multi-daily-ahead streamflow prediction in the Catchment Attributes and Meteorology for Large-Sample Studies (CAMELS) dataset and obtained good predicting results. However, the disadvantages of Transformer are also outstanding: first, Transformer is relatively weak at capturing partial features as completely abandoning the CNNs (Li et al. 2019; Hua et al. 2022). Secondly, Transformer cannot substitute for modeling on the temporal scale since positional coding is an artificially created index that does not reasonably characterize positional information (Zhou et al. 2021a; Yang 2022). Thirdly, the layer normalization between residuals truncates the gradient flow, which may cause gradient disappearance when applying Transformer (Liu et al. 2020; Xu et al. 2020). Thus, improving the structure of the standalone Transformer may not be enough to solve these structural characteristics caused problems. Moreover, the streamflow sequences usually show a high degree of non-stationarity, mainly in the form of obvious shifts in mean, variance or shape (Slater et al. 2021; Liu et al. 2023). Standalone deep learning models are often unable to effectively extract the highly complex and non-smooth information (Zuo et al. 2020; Liu et al. 2022b; Zhou et al. 2023). When optimizing and improving a standalone model does not improve the predicting accuracy, a reasonable combination of several standalone models to give full advantage to each standalone model and thus improve the overall predicting effect of the model could be a practical and feasible approach (Huang & Kuo 2018; Atila & Sengur 2021; Mei et al. 2022).
What triggered us to think about was whether it is possible to integrate the advantages of CNN and LSTM with Transformer and to establish an improved but powerful model for streamflow prediction. That was, firstly, to replace the embedding layer of the Transformer with a CNN layer, enabling partially hidden feature extraction before input to the encoder and decoder; secondly, to add an LSTM layer before the output of the decoder, incorporating temporal dimensional modeling and mitigating the gradient disappearance problem; thirdly, to integrate Transformer with CNN and LSTM for hydrological modeling and to propose the CNN–Transformer–LSTM (CTL) model. The proposed CTL model combined the global information extraction capability of Transformer, the partially hidden feature extraction capability of CNN, and the temporal correlation modeling capability of LSTM, thus enabling the modeling of highly complex, non-linear, and multi-scale streamflow processes. Accordingly, the following improvements could be theoretically achieved: (1) by replacing the Transformer's embedding layer with a CNN layer, the comparatively limited ability of the Transformer to extract partial information could be addressed. Consequently, the non-linear, non-stationary information of streamflow on the spatial scale could be more effectively captured. (2) By adding an LSTM layer before the output of the decoder, the lack of modeling on the temporal scale and gradient disappearance in the Transformer could be addressed. As a result, it is possible to stably extract the trend and periodic aspects of the streamflow series.
Although Transformer has some applications in streamflow prediction, there are still some areas that need to be perfected: (1) Transformer is derived from the field of natural language processing, and is not a natural time series prediction model (Yin et al. 2023). Most of the studies only focus on the advantages of Transformer, and do not analyze and discuss the disadvantages of Transformer (Mellouli et al. 2022; Xu et al. 2023a, 2023b). (2) The researches on the combination of Transformer and other classical deep learning models are not perfect enough. In this study, we took model hybridization as the theoretical basis, and carried out model combination after analyzing the advantages and disadvantages of Transformer and CNN, LSTM in detail, to explore the optimal combination of deep learning models. Therefore, the purpose of this study is to establish a deep learning-based CTL approach for multi-time-ahead streamflow prediction in an arid region. To achieve this:
- (1)
A hybrid CTL model was constructed by combining the Transformer with CNN and LSTM.
- (2)
The potential of CTL in 1-, 3-, and 6-month ahead streamflow prediction in an arid region was tested in the upstream of the Shule River, northwest China.
- (3)
The efficiency of the CTL model was compared with two hybrid models, including CNN-coupled Transformer model (CNN–Transformer, CT), and Transformer-coupled LSTM model (Transformer–LSTM, TL), as well as three standalone models, including Transformer, CNN, and LSTM.
MATERIALS AND METHODS
Study area
Streamflow in the Shule River is predominantly replenished by glacial meltwater and precipitation (Zhang et al. 2022). As current glaciers are widely developed in mountains above 4,500 m, the upper reach is responsible for creating streamflow. Permafrost covers 9,447 km2 of the upper Shule River basin, accounting for 83% of the entire basin area (Zhou et al. 2021b). Glacial meltwater contributes to over 80% of yearly streamflow, with a maximum daily streamflow of 415 m³/s (Lan et al. 2012). The presence of glaciers and permafrost complicates the streamflow generating processes, leading to more non-linear and non-stationary streamflow series on spatial and temporal scales (Zhou et al. 2021b). Coupled with the effects of precipitation, streamflow in the upper Shule River reaches its maximum in July and August (Zhang et al. 2022).
The Shule River maintains local living and society, forming the famous Dunhuang Oasis and being the nourishment of the pearl cities (i.e., Dunhuang, Yumen, Guazhou) on the Silk Economic Road (Guo et al. 2015; Xie et al. 2022). The quantity and quality of water in the upper Shule River have a significant impact on the long-term development of oases and cities in the middle and lower sections, reaches (Zhang et al. 2003; He et al. 2015). Given the importance of these cities on the Silk Road Economic Belt, efficient water resource management and long-term environmental promotion are critical (Zhang et al. 2009). While the key step toward achieving these goals is the accurate prediction of the upper Shule River streamflow.
METHODOLOGY
Transformer
The CNN
The CNN is a deep feedforward neural network that contains convolutional operations with sparse connectivity and weight-sharing characteristics. The CNN consists of three layers, the convolutional layer to extract partial features in the inputs, the pooling layer to reduce the dimensionality of the target variable, and the fully connected layer to output the desired results, which is similar to the part of the traditional neural network (Adaryani et al. 2022).
Long short-term memory
CNN–Transformer–LSTM
Step 1: The embedding layer of the Transformer was replaced with a CNN layer to enhance the extraction of partial hidden features. This resulted in a new CT model that combined the Transformer model with the CNN model.
Step 2: The Transformer model was modified for streamflow prediction, with the predicted streamflow set as the final output instead of the probability. The encoder of the Transformer received historical meteorological data as input, while the decoder received historical streamflow data. This model, with streamflow as the output, was considered the standard Transformer in this study.
Step 3: The output of Step 2 was used as input for the LSTM model. The final streamflow predicting results were obtained using the linear layer and the sigmoid activation function. The LSTM layer can extract trends and periodic features of streamflow data on a temporal scale. If there was no Step 1 and only the standard Transformer and the LSTM were considered, the model was called TL.
All models were implemented using the Pytorch framework in Python 3.9. We employed grid search to determine the hyperparameters, and the best sets of parameters are shown in Supplementary material, Appendix.
Data collection and pre-processing
Streamflow in the upper reach of the Shule River is a complex phenomenon, influenced by a range of factors including temperature, precipitation, glacier, frozen soil, and others (Hong et al. 2016).Therefore, it is necessary to use related variables as forcing data to develop models. However, due to the challenging climate of the upper Shule River basin and the limits of observational techniques, data collection is difficult (Zhang et al. 2018). Therefore, the modeling in this study, consistent with many previous studies, used only streamflow and meteorological data (Gao et al. 2021; Lin et al. 2021; Cho & Kim 2022).
Historical monthly streamflow data (Q) from January 1958 to December 2017, collected from the Changmabao hydrological station, was collected. Meteorological parameters, including air temperature (T), the ground temperature at 0 m (E), sunshine duration (S), and precipitation (P) of the Tuole station with the same timespan as streamflow were gathered from the National Weather Science Data Center of China (http://data.cma.cn/). Note that the precipitation, the streamflow, air temperature, ground temperature at 0 m, and sunshine duration were averaged to obtain the respective monthly values to meet the monthly streamflow predicting purpose since all the derived data were at a daily scale.
The whole dataset was divided into a training set, ranging from January 1958 to December 2007, and a testing set, ranging from January 2008 to December 2017. Table 1 presents the statistical statistics of the streamflow and the meteorological data for the whole, the training, and the testing datasets. The statistical parameters clearly demonstrated that no significant difference existed between the training and testing sets, indicating the similarity between the two datasets. Additionally, the statistical feature of the training dataset was consistent with the whole dataset, indicating that the training dataset captured enough reliable information about the hydrological system being modeled and was capable of training predictive models.
Variable . | Dataset . | Max . | Min . | Mean . | Std . | SK . | CV . |
---|---|---|---|---|---|---|---|
Streamflow (m³/s) | All | 222 | 4.16 | 32.09 | 32.96 | 2.15 | 1.03 |
Training | 175 | 4.16 | 29.70 | 30.37 | 2.13 | 1.02 | |
Testing | 222 | 10.50 | 44.16 | 41.95 | 1.86 | 0.95 | |
Air temperature (°C) | All | 13.41 | −23.72 | −2.46 | 9.96 | −0.18 | −4.04 |
Training | 13.02 | −23.72 | −2.66 | 9.97 | −0.18 | −3.75 | |
Testing | 13.41 | −18.63 | −1.39 | 9.86 | −0.20 | −7.09 | |
Ground temperature at 0 m (°C) | All | 19.47 | −23.64 | 1.38 | 11.25 | −0.25 | 8.12 |
Training | 18.17 | −23.64 | 1.15 | 11.29 | −0.24 | 9.80 | |
Testing | 19.47 | −16.99 | 2.67 | 10.97 | −0.29 | 4.09 | |
Sunshine duration (h) | All | 10.81 | 5.75 | 8.17 | 0.86 | 0.08 | 0.11 |
Training | 10.81 | 5.75 | 8.15 | 0.86 | 0.07 | 0.10 | |
Testing | 10.34 | 6.39 | 8.27 | 0.87 | 0.13 | 0.10 | |
Precipitation (mm) | All | 164.60 | 0 | 24.07 | 31.55 | 1.45 | 1.31 |
Training | 132.40 | 0 | 23.03 | 29.95 | 1.39 | 1.30 | |
Testing | 164.60 | 0 | 29.53 | 38.37 | 1.42 | 1.29 |
Variable . | Dataset . | Max . | Min . | Mean . | Std . | SK . | CV . |
---|---|---|---|---|---|---|---|
Streamflow (m³/s) | All | 222 | 4.16 | 32.09 | 32.96 | 2.15 | 1.03 |
Training | 175 | 4.16 | 29.70 | 30.37 | 2.13 | 1.02 | |
Testing | 222 | 10.50 | 44.16 | 41.95 | 1.86 | 0.95 | |
Air temperature (°C) | All | 13.41 | −23.72 | −2.46 | 9.96 | −0.18 | −4.04 |
Training | 13.02 | −23.72 | −2.66 | 9.97 | −0.18 | −3.75 | |
Testing | 13.41 | −18.63 | −1.39 | 9.86 | −0.20 | −7.09 | |
Ground temperature at 0 m (°C) | All | 19.47 | −23.64 | 1.38 | 11.25 | −0.25 | 8.12 |
Training | 18.17 | −23.64 | 1.15 | 11.29 | −0.24 | 9.80 | |
Testing | 19.47 | −16.99 | 2.67 | 10.97 | −0.29 | 4.09 | |
Sunshine duration (h) | All | 10.81 | 5.75 | 8.17 | 0.86 | 0.08 | 0.11 |
Training | 10.81 | 5.75 | 8.15 | 0.86 | 0.07 | 0.10 | |
Testing | 10.34 | 6.39 | 8.27 | 0.87 | 0.13 | 0.10 | |
Precipitation (mm) | All | 164.60 | 0 | 24.07 | 31.55 | 1.45 | 1.31 |
Training | 132.40 | 0 | 23.03 | 29.95 | 1.39 | 1.30 | |
Testing | 164.60 | 0 | 29.53 | 38.37 | 1.42 | 1.29 |
Note: Max is the maximum; Min is the minimum; Std is the standard deviation; SK is the skewness; CV is the coefficient of variation.
Determining inputs
Streamflow and meteorological data, including Q, T, E, S, and P, from January 1958 to December 2017, were combined as inputs for the CTL, Transformer, CNN, LSTM, CT, and TL models for 1-, 3-, and 6-month ahead streamflow predictions.
Performance evaluation
RESULTS AND DISCUSSION
In this study, the CTL, Transformer, CNN, LSTM, CT, and TL models were employed for streamflow prediction at the Changmabao hydrological station in the Shule River basin. Tables 2–4 show the statistical values of the performance metrics derived by the models for 1-, 3-, and 6-month ahead streamflow predictions in the training and testing periods. The performance of the training and testing periods were found to be remarkably similar. Considering the predicting purpose of this study, this section mainly focused on the testing phase.
. | Training period . | Testing period . | ||||||
---|---|---|---|---|---|---|---|---|
NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | |
CTL | 0.977 | 0.989 | 2.864 | 4.518 | 0.964 | 0.983 | 4.930 | 7.717 |
Transformer | 0.825 | 0.908 | 7.009 | 12.540 | 0.783 | 0.892 | 10.228 | 19.028 |
CNN | 0.830 | 0.911 | 7.310 | 12.377 | 0.756 | 0.902 | 12.486 | 20.170 |
LSTM | 0.851 | 0.922 | 6.585 | 11.582 | 0.815 | 0.908 | 10.349 | 17.563 |
CT | 0.859 | 0.927 | 6.282 | 11.264 | 0.845 | 0.925 | 9.315 | 16.051 |
TL | 0.951 | 0.975 | 4.547 | 6.636 | 0.908 | 0.958 | 8.496 | 12.403 |
. | Training period . | Testing period . | ||||||
---|---|---|---|---|---|---|---|---|
NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | |
CTL | 0.977 | 0.989 | 2.864 | 4.518 | 0.964 | 0.983 | 4.930 | 7.717 |
Transformer | 0.825 | 0.908 | 7.009 | 12.540 | 0.783 | 0.892 | 10.228 | 19.028 |
CNN | 0.830 | 0.911 | 7.310 | 12.377 | 0.756 | 0.902 | 12.486 | 20.170 |
LSTM | 0.851 | 0.922 | 6.585 | 11.582 | 0.815 | 0.908 | 10.349 | 17.563 |
CT | 0.859 | 0.927 | 6.282 | 11.264 | 0.845 | 0.925 | 9.315 | 16.051 |
TL | 0.951 | 0.975 | 4.547 | 6.636 | 0.908 | 0.958 | 8.496 | 12.403 |
Note: NSE is Nash–Sutcliffe efficiency coefficient; R is correlation coefficient; MAE is mean absolute error; RMSE is root mean square error.
. | Training period . | Testing period . | ||||||
---|---|---|---|---|---|---|---|---|
NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | |
CTL | 0.933 | 0.966 | 4.461 | 7.779 | 0.912 | 0.956 | 7.520 | 12.354 |
Transformer | 0.796 | 0.892 | 7.193 | 13.581 | 0.735 | 0.875 | 11.355 | 21.489 |
CNN | 0.809 | 0.900 | 7.764 | 13.132 | 0.739 | 0.872 | 12.557 | 21.320 |
LSTM | 0.808 | 0.899 | 7.147 | 13.166 | 0.755 | 0.884 | 11.528 | 20.639 |
CT | 0.838 | 0.916 | 6.319 | 12.091 | 0.788 | 0.893 | 11.083 | 19.222 |
TL | 0.879 | 0.938 | 6.035 | 10.467 | 0.824 | 0.923 | 11.626 | 17.487 |
. | Training period . | Testing period . | ||||||
---|---|---|---|---|---|---|---|---|
NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | |
CTL | 0.933 | 0.966 | 4.461 | 7.779 | 0.912 | 0.956 | 7.520 | 12.354 |
Transformer | 0.796 | 0.892 | 7.193 | 13.581 | 0.735 | 0.875 | 11.355 | 21.489 |
CNN | 0.809 | 0.900 | 7.764 | 13.132 | 0.739 | 0.872 | 12.557 | 21.320 |
LSTM | 0.808 | 0.899 | 7.147 | 13.166 | 0.755 | 0.884 | 11.528 | 20.639 |
CT | 0.838 | 0.916 | 6.319 | 12.091 | 0.788 | 0.893 | 11.083 | 19.222 |
TL | 0.879 | 0.938 | 6.035 | 10.467 | 0.824 | 0.923 | 11.626 | 17.487 |
Note: NSE is Nash–Sutcliffe efficiency coefficient; R is correlation coefficient; MAE is mean absolute error; RMSE is root mean square error.
. | Training period . | Testing period . | ||||||
---|---|---|---|---|---|---|---|---|
NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | |
CTL | 0.864 | 0.930 | 6.379 | 11.121 | 0.856 | 0.929 | 9.703 | 15.818 |
Transformer | 0.780 | 0.884 | 7.935 | 14.128 | 0.717 | 0.882 | 12.241 | 22.151 |
CNN | 0.808 | 0.899 | 8.086 | 13.210 | 0.679 | 0.878 | 15.212 | 23.605 |
LSTM | 0.781 | 0.884 | 7.871 | 14.082 | 0.733 | 0.874 | 11.744 | 21.550 |
CT | 0.816 | 0.903 | 7.316 | 12.919 | 0.778 | 0.891 | 11.325 | 19.634 |
TL | 0.818 | 0.904 | 7.162 | 12.852 | 0.777 | 0.885 | 12.742 | 19.697 |
. | Training period . | Testing period . | ||||||
---|---|---|---|---|---|---|---|---|
NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | NSE . | R . | MAE (m³/s) . | RMSE (m³/s) . | |
CTL | 0.864 | 0.930 | 6.379 | 11.121 | 0.856 | 0.929 | 9.703 | 15.818 |
Transformer | 0.780 | 0.884 | 7.935 | 14.128 | 0.717 | 0.882 | 12.241 | 22.151 |
CNN | 0.808 | 0.899 | 8.086 | 13.210 | 0.679 | 0.878 | 15.212 | 23.605 |
LSTM | 0.781 | 0.884 | 7.871 | 14.082 | 0.733 | 0.874 | 11.744 | 21.550 |
CT | 0.816 | 0.903 | 7.316 | 12.919 | 0.778 | 0.891 | 11.325 | 19.634 |
TL | 0.818 | 0.904 | 7.162 | 12.852 | 0.777 | 0.885 | 12.742 | 19.697 |
Note: NSE is Nash–Sutcliffe efficiency coefficient; R is correlation coefficient; MAE is mean absolute error; RMSE is root mean square error.
Performance of the CTL model in 1-, 3-, 6-month ahead streamflow predictions
At a 1-month prediction horizon, the CTL model performed excellently, with high values of NSE and R, while maintaining relatively small RMSE and MAE values (Table 2). Specifically, the CTL model achieved an NSE value of 0.964, far exceeding the acceptable limit of 0.5; the R-value reached 0.983, indicating a very high correlation between the observed and predicted streamflow; and the MAE and RMSE values were only 4.930 and 7.717 m³/s, respectively, indicating a small error between the observed and predicted streamflow.
However, when considering predicting performance at longer time horizons, the accuracy for 3-, and 6-month ahead predictions was considerably lower than that for 1-month ahead prediction (Tables 2–4). This indicates that the accuracy of CTL deteriorated as the predictive horizon increased. This phenomenon was consistent with similar machine learning-based hydrological predicting studies at multi-time scales (Barzegar et al. 2020; Yin et al. 2022a). This is probably because the fact that the extraction of the non-linear relationship among the variables became more difficult as the correlation of the historical input data became more demanding with the predictive lead time increased (Barzegar et al. 2020). Although the predicting performance deteriorated, the results of CTL for 3- and 6-month ahead streamflow predictions greatly surpassed the threshold of acceptable performance requirement (NSE = 0.5). In summary, the CTL model can be regarded as an excellent model for 1-, 3-, and 6-month streamflow predictions.
Efficient water resource management requires accurate peak streamflow prediction. To evaluate the performance of the developed models for peak streamflow prediction, the mean values of the top ten ranked streamflow data points were selected. According to Tables 5–7, CTL underpredicted the peak flow in most cases. For 1-month ahead streamflow prediction, the average value of the top ten ranked streamflow was higher (5.44 m³/s) than the predicted value of CTL. When considering the results for 3-, and 6-month ahead predictions, the underpredicting phenomenon persisted as CTL underestimated about 14.11 and 20.5 m³/s, respectively. The results demonstrated a small error encountered in the peak flows generated by the CTL model at 1-, 3- and 6-monthly timescales.
Observed streamflow (m³/s) . | Predicted streamflow (m³/s) . | ||||||
---|---|---|---|---|---|---|---|
CTL . | Transformer . | CNN . | LSTM . | CT . | TL . | ||
222 | 186.08 | 132.91 | 123.52 | 117.40 | 142.44 | 169.55 | |
184 | 170.42 | 101.74 | 119.21 | 125.84 | 130.93 | 169.29 | |
143 | 135.17 | 113.95 | 75.29 | 114.01 | 158.99 | 137.01 | |
142 | 146.89 | 43.09 | 96.70 | 93.29 | 126.11 | 144.69 | |
138 | 132.48 | 109.56 | 83.43 | 108.58 | 94.47 | 126.97 | |
135 | 134.32 | 155.99 | 136.31 | 148.73 | 140.15 | 140.13 | |
130 | 129.33 | 118.59 | 105.76 | 119.75 | 149.57 | 125.80 | |
130 | 128.12 | 118.14 | 106.69 | 107.57 | 122.66 | 121.58 | |
123 | 127.26 | 92.88 | 85.91 | 106.65 | 83.99 | 125.38 | |
117 | 119.58 | 152.38 | 135.43 | 77.31 | 113.02 | 122.40 | |
Mean | 146.4 | 140.96 | 113.92 | 106.83 | 111.91 | 126.23 | 138.28 |
Difference | 5.44 | 32.48 | 39.57 | 34.49 | 20.17 | 8.12 |
Observed streamflow (m³/s) . | Predicted streamflow (m³/s) . | ||||||
---|---|---|---|---|---|---|---|
CTL . | Transformer . | CNN . | LSTM . | CT . | TL . | ||
222 | 186.08 | 132.91 | 123.52 | 117.40 | 142.44 | 169.55 | |
184 | 170.42 | 101.74 | 119.21 | 125.84 | 130.93 | 169.29 | |
143 | 135.17 | 113.95 | 75.29 | 114.01 | 158.99 | 137.01 | |
142 | 146.89 | 43.09 | 96.70 | 93.29 | 126.11 | 144.69 | |
138 | 132.48 | 109.56 | 83.43 | 108.58 | 94.47 | 126.97 | |
135 | 134.32 | 155.99 | 136.31 | 148.73 | 140.15 | 140.13 | |
130 | 129.33 | 118.59 | 105.76 | 119.75 | 149.57 | 125.80 | |
130 | 128.12 | 118.14 | 106.69 | 107.57 | 122.66 | 121.58 | |
123 | 127.26 | 92.88 | 85.91 | 106.65 | 83.99 | 125.38 | |
117 | 119.58 | 152.38 | 135.43 | 77.31 | 113.02 | 122.40 | |
Mean | 146.4 | 140.96 | 113.92 | 106.83 | 111.91 | 126.23 | 138.28 |
Difference | 5.44 | 32.48 | 39.57 | 34.49 | 20.17 | 8.12 |
Observed streamflow (m³/s) . | Predicted streamflow (m³/s) . | ||||||
---|---|---|---|---|---|---|---|
CTL . | Transformer . | CNN . | LSTM . | CT . | TL . | ||
222 | 167.08 | 90.79 | 101.08 | 104.37 | 137.55 | 154.25 | |
184 | 154.89 | 112.50 | 115.27 | 131.70 | 136.72 | 175.32 | |
161 | 160.04 | 106.61 | 119.93 | 96.38 | 105.53 | 88.47 | |
142 | 117.13 | 116.45 | 111.30 | 131.97 | 144.11 | 98.97 | |
141 | 149.93 | 131.16 | 146.31 | 73.11 | 138.96 | 105.33 | |
138 | 127.38 | 85.63 | 114.13 | 116.40 | 110.92 | 83.41 | |
135 | 169.39 | 123.26 | 130.21 | 146.45 | 142.67 | 182.92 | |
130 | 141.66 | 92.34 | 97.34 | 108.83 | 108.24 | 121.95 | |
130 | 101.61 | 107.14 | 90.19 | 129.57 | 132.84 | 136.70 | |
123 | 102.80 | 119.67 | 114.85 | 72.64 | 152.62 | 142.53 | |
Mean | 150.6 | 139.19 | 108.55 | 114.06 | 111.14 | 131.02 | 128.99 |
Difference | 11.41 | 42.05 | 36.54 | 39.46 | 19.58 | 21.61 |
Observed streamflow (m³/s) . | Predicted streamflow (m³/s) . | ||||||
---|---|---|---|---|---|---|---|
CTL . | Transformer . | CNN . | LSTM . | CT . | TL . | ||
222 | 167.08 | 90.79 | 101.08 | 104.37 | 137.55 | 154.25 | |
184 | 154.89 | 112.50 | 115.27 | 131.70 | 136.72 | 175.32 | |
161 | 160.04 | 106.61 | 119.93 | 96.38 | 105.53 | 88.47 | |
142 | 117.13 | 116.45 | 111.30 | 131.97 | 144.11 | 98.97 | |
141 | 149.93 | 131.16 | 146.31 | 73.11 | 138.96 | 105.33 | |
138 | 127.38 | 85.63 | 114.13 | 116.40 | 110.92 | 83.41 | |
135 | 169.39 | 123.26 | 130.21 | 146.45 | 142.67 | 182.92 | |
130 | 141.66 | 92.34 | 97.34 | 108.83 | 108.24 | 121.95 | |
130 | 101.61 | 107.14 | 90.19 | 129.57 | 132.84 | 136.70 | |
123 | 102.80 | 119.67 | 114.85 | 72.64 | 152.62 | 142.53 | |
Mean | 150.6 | 139.19 | 108.55 | 114.06 | 111.14 | 131.02 | 128.99 |
Difference | 11.41 | 42.05 | 36.54 | 39.46 | 19.58 | 21.61 |
Observed streamflow (m³/s) . | Predicted streamflow (m³/s) . | ||||||
---|---|---|---|---|---|---|---|
CTL . | Transformer . | CNN . | LSTM . | CT . | TL . | ||
222 | 175.03 | 102.20 | 123.98 | 114.10 | 153.55 | 166.37 | |
184 | 137.46 | 95.23 | 82.72 | 102.35 | 154.48 | 152.33 | |
161 | 136.47 | 105.38 | 57.30 | 54.07 | 57.46 | 135.97 | |
142 | 91.16 | 89.43 | 96.55 | 99.52 | 134.24 | 81.18 | |
141 | 152.08 | 113.43 | 105.91 | 128.68 | 150.28 | 151.64 | |
138 | 111.18 | 91.95 | 112.40 | 138.60 | 119.79 | 91.10 | |
135 | 151.43 | 109.72 | 119.98 | 140.81 | 148.20 | 178.95 | |
130 | 104.11 | 91.44 | 112.90 | 120.79 | 102.30 | 87.67 | |
130 | 84.94 | 93.56 | 79.75 | 99.43 | 108.59 | 91.06 | |
123 | 157.17 | 97.62 | 71.59 | 77.01 | 154.79 | 145.67 | |
Mean | 150.6 | 130.10 | 99.00 | 96.31 | 107.53 | 128.37 | 128.19 |
Difference | 20.50 | 51.60 | 54.29 | 43.07 | 22.23 | 22.41 |
Observed streamflow (m³/s) . | Predicted streamflow (m³/s) . | ||||||
---|---|---|---|---|---|---|---|
CTL . | Transformer . | CNN . | LSTM . | CT . | TL . | ||
222 | 175.03 | 102.20 | 123.98 | 114.10 | 153.55 | 166.37 | |
184 | 137.46 | 95.23 | 82.72 | 102.35 | 154.48 | 152.33 | |
161 | 136.47 | 105.38 | 57.30 | 54.07 | 57.46 | 135.97 | |
142 | 91.16 | 89.43 | 96.55 | 99.52 | 134.24 | 81.18 | |
141 | 152.08 | 113.43 | 105.91 | 128.68 | 150.28 | 151.64 | |
138 | 111.18 | 91.95 | 112.40 | 138.60 | 119.79 | 91.10 | |
135 | 151.43 | 109.72 | 119.98 | 140.81 | 148.20 | 178.95 | |
130 | 104.11 | 91.44 | 112.90 | 120.79 | 102.30 | 87.67 | |
130 | 84.94 | 93.56 | 79.75 | 99.43 | 108.59 | 91.06 | |
123 | 157.17 | 97.62 | 71.59 | 77.01 | 154.79 | 145.67 | |
Mean | 150.6 | 130.10 | 99.00 | 96.31 | 107.53 | 128.37 | 128.19 |
Difference | 20.50 | 51.60 | 54.29 | 43.07 | 22.23 | 22.41 |
Comparative analysis with both the standalone and hybrid models
To explore the potential of the CTL model in improving the efficiency of multi-time-ahead streamflow prediction, we compared its accuracy with that of three standalone models (i.e., Transformer, CNN, LSTM) and two hybrid models (i.e., CT, TL) for 1-, 3-, and 6-month ahead streamflow predictions. The results indicated that the CTL model performed significantly better than the other standalone and hybrid models.
In terms of statistical performance metrics, taking 1-month ahead prediction as an example (Table 2), the NSE and R values of the CTL model increased by 23.116 and 10.202% compared to the Transformer model, 27.513 and 8.980% compared to the CNN model, 18.282 and 8.260% compared to the LSTM model, 14.083 and 6.270% compared to the CT model, 6.167 and 2.610% compared to the TL model, respectively. Meanwhile, the MAE and RMSE decreased by 5.298 and 11.311 m³/s for the Transformer model, 7.556 and 12.453 m³/s for the CNN model, 5.419 and 9.846 m³/s for the LSTM model, 4.385 and 8.334 m³/s for the CT model, 3.566 and 4.686 m³/s for the TL model, respectively. Furthermore, when considering the predicting performance for the 3- and 6-month ahead predictions (Tables 3 and 4), the CTL model also showed superior simulation results despite the poorer results compared to the 1-month ahead prediction, with the NSE value of the CTL model reaching 0.912 and 0.856, respectively, exceeding the best of the other five models at 0.824 (TL) and 0.778 (CT). Consequently, we concluded that the CTL model can obtain more accurate prediction for streamflow.
While the statistical metrics presented so far have assessed the skill of the CTL model, hydrographs and scatter plots are also helpful in evaluating the temporal correspondence of the observed and predicted values. It is clear that the streamflow values estimated by the CTL are closer to the corresponding observed values than those estimated by other models. For 1-month ahead prediction, the overall fit of the CTL model was significantly better than other models (Figure 5). The CTL model provided the largest value of k (0.952 vs. 0.791 vs. 0.785 vs. 0.724 vs. 0.847 vs. 0.950) and the smallest value of b (0.791 vs. 5.536 vs. 4.784 vs. 3.090 vs. 2.668 vs. 1.681) in the least square equation as well as the largest R² (0.965 vs. 0.824 vs. 0.795 vs. 0.814 vs. 0.855 vs. 0.918) compared with Transformer, CNN, LSTM, CT, and TL models (Figure 5). The linear analysis results for 3- and 6-month ahead predictions were similar to those for 1-month ahead, with the CTL models outperforming the Transformer, CNN, LSTM, CT, and TL models (Figures 6 and 7).
DISCUSSION
Superiority of the CTL model
In this study, we established a CTL approach for multi-time-ahead streamflow prediction in the upstream of the Shule River, northwest China. The results indicated that the CTL achieved excellent performance for 1-, 3-, 6-month streamflow predictions, which was better than standalone models and other hybrid models. Three reasons for this were mainly included.
First, in the deep learning-based hybrid CTL model, we added the Transformer model. Transformer can strengthen or weaken the connection between two arbitrary locations to capture the global relationship between streamflow and its influences, thereby resolving the problem of the long distance between information locations in the streamflow sequence (Yin et al. 2022a). Transformer can now extract more complicated information on both temporal and geographical scales. Specifically, Transformer can capture the correlations between one element of the input and all elements (including historical precipitation, sunshine duration, air temperature, and 0-m ground temperature, streamflow). This may be the reason that the performance of the standalone Transformer far exceeded the acceptable NSE limit of 0.5 in streamflow prediction (0.783, 0.735, and 0.717 for 1-, 3-, and 6-month ahead, respectively).
Second, CNN is utilized to extract deeper and more complex features between variables in the CTL model. The merits of CNN lie in the extraction of the partial hidden correlations between streamflow and its impact factors. As a result, useful information could be well captured. The obtained good predictive results of the standalone CNN model further tested the effectiveness of the CNN algorithm with NSE values reaching 0.756, 0.739, and 0.679 for 1-, 3-, and 6-month ahead, respectively. When combining Transformer with CNN (CT), better extraction capability of partial hidden features and global information from streamflow and meteorological (i.e., precipitation, sunshine duration, air temperature, and 0-m ground temperature) data can be strengthened. Therefore, the CT model outperformed the Transformer model with the NSE values increasing by 7.918, 7.211, and 8.508, 11.711, 6.631, and 14.580% compared to the CNN model at 1-, 3-, and 6-month prediction horizons, respectively.
Third, by incorporating the LSTM model into the CTL model, the relevant information on a temporal scale, such as trends and periods in streamflow and meteorological series, could be further extracted. This can be proved by the good predictive outcome of the standalone LSTM model as the NSE values of LSTM reached 0.815, 0.755, and 0.733 for 1-, 3-, and 6-month ahead predictions, respectively. The TL model combined the Transformer with the LSTM model, resulting in improved temporal correlation and global information capture in the streamflow process. Thus, the TL model outperformed the Transformer and LSTM models. Specifically, the NSE values of the TL model increased 15.964, 12.109, and 8.368% more than the Transformer model, while 11.411, 9.139, and 6.003% more than the LSTM model at 1-, 3-, and 6-month prediction horizons, respectively.
Consequently, the CTL, as a deep learning-based model, combined the ability of the CNN model to extract partial hidden features (Khosravi et al. 2020; Panahi et al. 2020; Wang et al. 2020; Pyo et al. 2021), the ability of the Transformer model to extract global information (directly calculating the correlation between two features instead of passing through the network) (Baek et al. 2022; Peng et al. 2022), and the sensitivity of the LSTM model to the correlation on the temporal scale (Xu et al. 2022; Yin et al. 2022b), can well capture the highly complex, non-stationary, non-linear, and multi-scaled correlations between streamflow and meteorological parameters. CTL showed outstanding streamflow predicting efficiency with NSE of 0.964, 0.912, and 0.856 at 1-, 3-, and 6-month prediction horizons, respectively.
Hydrological implication
Efficient water resource management through accurate streamflow prediction is crucial for arid regions (Yang et al. 2020a; Paul et al. 2021). However, predicting streamflow is challenging due to its highly complex, non-stationary, non-linear, and multi-scale nature. Some studies have sought to improve standalone-predicting models, although standalone-predicting models offer advantages as well as deficiencies due to structural constraints, leaving them with limitations in terms of improving prediction (Mei et al. 2022). To fully exploit the advantages of each deep learning model and improve the overall predicting efficiency, hybrid models were frequently employed (Barzegar et al. 2020, 2021; Cho & Kim 2022). In this study, we established a hybrid CTL model for 1-, 3-, 6-month ahead streamflow prediction in an arid region of northwest China and obtained excellent performance. The results indicated that CTL is an appropriate alternative approach for streamflow prediction where surface data are scarce.
Hybrid models have also been used in numerous other hydrological research fields, including water quality prediction, lake level prediction, and evaporation prediction (Barzegar et al. 2021; Lakmini Prarthana Jayasinghe et al. 2022; Mei et al. 2022), all studies showed that hybrid models provided extremely accurate prediction. Mei et al. (2022) proposed a CNN-GRU-Attention (CGA) model to predict daily water quality of the Bayi Waterworks in Xiaogan, Hubei province, China, and the results showed that the CGA model outperformed standalone LSTM and GRU models. Therefore, the CTL, as a hybrid deep learning-based model, combining the advantages of the three deep learning models (i.e., Transformer, CNN, LSTM), can also be applied for hydrological applications other than streamflow prediction.
While a standalone deep learning model can yield good results, the pursuit of more accurate prediction drives scientific progress. Therefore, combining multiple deep learning models to fully leverage their individual advantages can increase the robustness and reliability, leading to more accurate and stable prediction. In order to satisfy our ongoing pursuit of prediction accuracy in the future, it will become more and more crucial to investigate suitable combinations of deep learning models due to the continuous development of technology, the increasing number of deep learning models.
Limitations and future work
The CTL model established in this study, however, had certain drawbacks. First, it has demonstrated poor performance in predicting summer peaks, particularly for 6-month ahead streamflow prediction, which is consistent with the findings of other studies (Wen et al. 2019; Gao et al. 2020; Alizadeh et al. 2021; Lin et al. 2021). This may be due to the fact that snowmelt water is often considered a vital source of streamflow, as glacial meltwater makes up approximately 80% of the annual streamflow in the Shule River basin (Lan et al. 2012). Additionally, freezing and thawing effects are prevalent in this region, so soil infiltration and evaporation need to be considered when converting rainfall to streamflow (Wang et al. 2022). The model in this study only considered precipitation, sunshine duration, air temperature, and 0-m ground temperature. To optimize peak prediction in the future, we will also consider soil and snow data. Second, hyperparameters were determined using grid search, which introduced uncertainty into our optimal results. To effectively avoid this, we will instead use optimization algorithms (e.g., crisscross optimization (CSO); particle swarm optimization (PSO)) for tuning parameters in the future. Finally, despite the excellent results of the CTL model, it is time-consuming like other deep learning models. As such, searching for simpler and more efficient model structures is the direction of future research.
CONCLUSION
For the rational use of water resources, streamflow prediction must be accurate. In this study, we established a hybrid model for streamflow prediction called CTL. This model combined Transformer with the LSTM and the CNN to extract partially hidden features, capture temporal correlations, and extract global information. We applied the CTL, as well as the Transformer, CNN, LSTM, CT, and TL models, to predict streamflow at the Changmabao hydrological station in the Shule River basin. To examine the models' predictive abilities, we utilized statistical performance metrics, hydrographs, peak prediction, scatter plots, boxplot diagrams, and Taylor diagrams. The conclusions drawn from this study are as follows:
- (1)
Our experiments showed that the CTL model provided accurate streamflow prediction for 1-month, 3-month, and 6-month ahead predictions in an arid region with the Shule River as an example.
- (2)
The CTL model outperformed standalone models (i.e., Transformer, CNN, and LSTM) and hybrid models (i.e., CT and TL) without the LSTM or CNN module. This validated the stability of the CTL model framework and demonstrated the applicability of the CTL model for streamflow prediction in an arid region. This study has a significant impact on the accurate prediction of streamflow and helps us to better predict and manage water resources in the absence of observational data.
ACKNOWLEDGEMENTS
This study was funded by the National Natural Science Foundation of China (42130113 and 42001035), the Young Elite Scientist Sponsorship Program of China Association for Science and Technology (Grant No. YESS20200089), and the Youth Innovation Promotion Association of Chinese Academy of Sciences (Grant No. 2022435).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.