With the widespread application of machine learning in various fields, enhancing its accuracy in hydrological forecasting has become a focal point of interest for hydrologists. This study, set against the backdrop of the Haihe River Basin, focuses on daily-scale streamflow and explores the application of the Lasso feature selection method alongside three machine learning models (long short-term memory, LSTM; transformer for time series, TTS; random forest, RF) in short-term streamflow prediction. Through comparative experiments, we found that the Lasso method significantly enhances the model's performance, with a respective increase in the generalization capabilities of the three models by 21, 12, and 14%. Among the selected features, lagged streamflow and precipitation play dominant roles, with streamflow closest to the prediction date consistently being the most crucial feature. In comparison to the TTS and RF models, the LSTM model demonstrates superior performance and generalization capabilities in streamflow prediction for 1–7 days, making it more suitable for practical applications in hydrological forecasting in the Haihe River Basin and similar regions. Overall, this study deepens our understanding of feature selection and machine learning models in hydrology, providing valuable insights for hydrological simulations under the influence of complex human activities.

  • Pioneering application of the Lasso method integrated with machine learning for hydrological prediction.

  • The generalization ability of the LSTM model is significantly better than that of TTS and RF model.

  • Streamflow closest to the prediction date consistently being the most crucial feature.

The accuracy and adaptability of hydrological forecasting are crucial for effective water resource management. As machine learning technologies rapidly advance across various disciplines, their increasing application in hydrology has garnered widespread attention among researchers. In the face of complex environmental influences such as climate change and human activities, traditional hydrological models encounter challenges in delivering precise predictions. Consequently, researchers are exploring the integration of machine learning methods to enhance the accuracy and adaptability of hydrological forecasting. Machine learning directly explores the connection between hydrological factors and streamflow, accommodating unpredictable hydrologic conditions (Alizadeh et al. 2021; Willard et al. 2023). With the advancement of computer technology and increased data availability, data-driven machine learning techniques have achieved remarkable accuracy in hydrological simulations.

Over the last two decades, many types of data-based machine learning algorithms, like recurrent neural network (RNN), support vector machine (SVM), long short-term memory (LSTM), and random forest (RF), have been widely employed in hydrological forecasting (Li et al. 2020; Lee et al. 2023). LSTM can capture the periodic and chaotic nature of sequential data, more accurately replicate long-term correlations than traditional neural networks, and describe highly nonlinear and complicated systems in terms of data (Yin et al. 2022b). In applications such as hydrological forecasting and groundwater dynamics prediction, LSTM has demonstrated superiority over state-of-the-art process-based models and classical machine learning models (Jing et al. 2023). Dtissibe et al. (2024) applied the LSTM model for flood forecasting in the Far North region of Cameroon, achieving more satisfactory performance in both short-term and long-term forecasts compared to the CNN model. Tursun et al. (2024) conducted flow estimations for 40 catchment areas in the Yellow River Basin and highlighted that models based on LSTM demonstrate immense potential in reconstructing river flow in arid regions affected by human-regulated basins. Tounsi et al. (2023) confirmed these findings by showcasing the performance of LSTM models on the U.S. MOPEX dataset. Arsenault et al. (2023) demonstrated that, in regional experiments conducted across over 148 North American basins, LSTM consistently outperformed traditional hydrological models. RF is a machine learning model that uses bagging techniques to incorporate classification or regression trees. In contrast to linear and other machine learning models, RF is regarded as a robust modeling method for hydrological forecasting since it allows for non-normally distributed input data and is largely insensitive to exceptional value and noise (Khaledi et al. 2022). Mangukiya & Sharma (2024) constructed a sophisticated flow prediction model with different physio-meteorological conditions using the RF algorithm. Their findings suggest that the RF model outperforms traditional methods of flood frequency analysis, offering more accurate and reliable flood estimations, especially valuable in areas constrained by sparse data.

In 2017, a transformer method based on a novel architecture that totally abandons recursion and convolution and depends solely on the attention mechanism was proposed (Vaswani et al. 2017). The transformer produced good successes in Natural Language Processing, such as Chat Generative Pre-trained transformer (ChatGPT), by analyzing the entire input sequence at the same time and effectively resolving the dependencies between long sequence data and target values (Bai & Tahmasebi 2022). In order to explore the application of transformer models in time series problems, several variants of the transformer have been proposed, each exhibiting distinctive performance characteristics. Examples include Informer (Zhou et al. 2020), Autoformer (Wu et al. 2021), and others. These methods have demonstrated acceptable forecasting ability for energy, electricity, and weather, leading to increased interest in transformer models for time series problems (Wen et al. 2022). However, its application in the field of hydrology is relatively limited (Tripathy & Mishra 2023). Yin et al. (2022a) introduced the RR-Former model for precipitation-streamflow prediction, highlighting its significantly superior performance over LSTM models on the CAMELS dataset. In the context of soil moisture and evapotranspiration prediction, Li & Lu (2023) found that transformer-based models, as compared to LSTM models, can learn more effective parameter mappings. Nevertheless, some studies have suggested that the performance of transformer models has not conclusively surpassed that of LSTM models, emphasizing the need for further research through a more comprehensive investigation (Hindersland 2023; Wei et al. 2023).

With the rapid growth of data volume, handling high-dimensional datasets has become a significant challenge in the field of machine learning. Feature selection, as a means of dimensionality reduction, has proven to be highly effective in dealing with high-dimensional data and improving training efficiency (Cai et al. 2018; Ghobadi et al. 2024). Feature selection refers to the process of obtaining a subset from the original feature set based on certain selection criteria, thereby selecting relevant features from the dataset. It plays a role in reducing the scale of data processing by removing redundant and unrelated features. Well-executed feature selection results can enhance learning accuracy, shorten learning time, and prevent overfitting (Muthukrishnan & Rohini 2016). In order to design optimal feature selection and feature extraction techniques, considerable effort has been devoted, with notable methods including least absolute shrinkage and selection operator (Lasso), recursive feature elimination (RFE), principal component analysis (PCA), and others (Khalid et al. 2014). In forecasting the likelihood of significant flooding within the Peace-Athabasca Delta, Smith et al. (2024) employed PCA to identify the most critical predictive factors, thereby enhancing the efficacy of the model's forecasts. In practical applications, choosing the appropriate feature selection method is crucial based on the specific problem at hand. This is because different problems and datasets may require different strategies.

Due to accelerated urbanization in recent decades, the Haihe River basin, centered on the Beijing–Tianjin–Hebei region, has experienced a dramatic decline in water volume accompanied by significant river process changes, which makes it difficult for traditional hydrological models to predict daily river flows. Fortunately, machine learning incorporating lagged streamflow has proven to be highly effective in basins where quantifying human activity is challenging (Xu et al. 2022). Therefore, the primary objective of this study is to utilize the most widely recognized machine learning methods (namely, LSTM, TTS, and RF) to forecast short-term, daily-scale streamflow in the Haihe River Basin, an area significantly altered by human activities, assessing the applicability of these models in such a complex hydrological environment. A significant innovation in our approach is the integration of the Lasso feature selection technique to identify key predictive features for streamflow forecasting. This technique aims to refine input features, thereby enhancing the predictive accuracy of the machine learning models. Our research not only seeks to improve the precision of hydrological predictions in urban-impacted areas but also intends to expand the knowledge base on the application of machine learning in hydrology, especially in navigating the complexities introduced by anthropogenic changes.

Study area and hydrometeorological data

The Haihe River Basin is situated in the hinterland of northern China's Bohai Sea Rim and serves as the political and economic center of the country. The overall topography of the Haihe River Basin is characterized by higher elevations in the west and lower elevations in the east, with rivers primarily flowing from the mountainous regions in the west and north towards the plains in the east and south, ultimately draining into Bohai Bay. This region falls within the temperate East Asian monsoon climate zone, with an average annual temperature and rainfall of 1.5–14 °C and 539 mm, respectively. Due to the rapid expansion of the local social economy, certain human activities, including the overexploitation of water resources, construction of numerous reservoirs, and implementation of water diversion projects, have exerted an impact on the natural state of the basin, leading to changes in streamflow processes and other characteristics (Bin et al. 2018).

This study primarily focuses on 10 hydrological stations in the Haihe River Basin, with five located in mountainous areas and five situated in plain regions, as illustrated in Figure 1. The flow data for three stations (2009–2017) were sourced from the China Hydrological Yearbook. Meteorological data were obtained from the Chinese Meteorological Administration's China Surface Climate Data Daily dataset V3.0, which includes precipitation, temperature, sunshine duration, and wind speed data from 2008 to 2017. Based on the collected data, the average correlation between monthly streamflow and precipitation across all stations is only around 0.3, which suggests that the streamflow processes in this region have deviated from the typical natural relationship between precipitation and streamflow. Without extensive data on actual human water consumption and reservoir operations, utilizing traditional hydrological models becomes challenging for effectively simulating streamflow.
Figure 1

Overview of the study area and distribution of hydrological stations.

Figure 1

Overview of the study area and distribution of hydrological stations.

Close modal

Method introduction

Least absolute shrinkage and selection operator (Lasso)

Least absolute shrinkage and selection operator (Lasso) method is a regularization technique used for linear regression and variable selection. It was first introduced by the statistician Robert Tibshirani in 1996 and has achieved significant success in the analysis of high-dimensional data (Tibshirani 1996). The core idea of the Lasso method is to introduce an L1 regularization term into the loss function of linear regression and then find a sparse solution by minimizing the sum of the loss function and the regularization term. The loss function can be expressed as:
formula
(1)
where yi is the observed value; xij is the j-th feature of the i-th observation; βj is the corresponding coefficient; β0 is the constant coefficient; λ is the regularization parameter that controls sparsity.

The introduction of the Lasso method results in certain feature coefficients becoming zero, thereby achieving feature selection and effectively addressing high-dimensional data and multicollinearity issues. Additionally, Lasso possesses the ability to handle noise in the data, contributing to improved model generalization performance. The parameter λ plays a crucial role in the Lasso method, filtering model features by comparing its value with the absolute covariance between features and outputs. Cross-validation methods, such as LassoCV, provide a convenient way to select the optimal λ, employing an exhaustive approach to automatically train within a given range of values (Roberts & Nowak 2014). In this study, LassoCV, a combination of the Lasso method with cross-validation, was employed. In the context of streamflow prediction, regression coefficients between input features and target flow values are minimized, with some set to zero, aiming to select important features (Chu et al. 2020).

Transformers for time series

Transformer is a novel and simplified network architecture entirely composed of attention mechanisms (Vaswani et al. 2017). The idea of attention is inspired by human vision, allocating different weights to different parts of the input sequence to prioritize important components. A distinctive feature of the Transformer is the use of self-attention mechanisms throughout the entire architecture, enabling connections between different positions in a single sequence to understand their internal relationships. Moreover, the Transformer algorithm employs ‘Multi-Head Attention’ to parallelly compute attention for different parts of the input data, simultaneously learning different features (Bai & Tahmasebi 2022). The Transformer model primarily consists of an encoder and a decoder, where the encoder processes the input sequence, and the decoder generates the output sequence.

Since a typical transformer is designed to solve Natural Language Processing tasks, the TTS constructed in this study has partially modified the model framework and is more suitable for the simulation of time series, such as streamflow. A regular layer is used in the first step of processing the input data to more appropriately match the entire sequence rather than position encoding. To avoid negative output, the embedding is replaced with a generic linear layer and a sigmoid activation function instead of softmax in the final layer (Lee et al. 2023).

Long short-term memory

LSTM is an improved RNN model that introduces memory and forget units to control information flow. It stores previous input data in the network, influencing the model's learning process (Hochreiter & Schmidhuber 1997; Kratzert et al. 2018). As a result, LSTM can capture long-term correlations between features over extended time spans, avoiding the gradient vanishing issues present in traditional RNNs (Yang & Yang 2020). LSTM has demonstrated reliable performance in various time series research areas, including image recognition, power prediction, and emerging advantages in hydrological forecasting studies (Xu et al. 2022). The main parameter settings for the LSTM model align with the basic parameter settings of the TTS model, both of which are introduced in the experimental design section.

Random forest

RF is a machine learning model that integrates classification trees or regression trees. Essentially, it is a method that combines bagging ensemble learning theory with the concept of a random subspace in classifier ensembles. This approach addresses the overfitting problem of individual decision trees while maintaining prediction accuracy (Safari 2020; Schoppa et al. 2020). RF is characterized by its simplicity of operation and fast computation, yielding good results in studies related to hydrology, making it a viable alternative to traditional hydrological models (Tyralis et al. 2019).

Among all the hyperparameters of random forests, the number of trained trees (n_estimators) and the maximum number of considered features (max_feature) are regarded as important parameters that markedly affect model performance (Boulesteix et al. 2012). Max_feature refers to the number of features considered when dividing a node: the smaller the value, the smaller the variance, and the larger the bias. N_estimators represent the number of submodels, and the result of the RF is proportional to its value. However, the gain in the RF algorithm results is very limited when n_estimators is greater than 250 (Probst & Boulesteix 2018).

Input features and model settings

The data used in this study mainly consist of streamflow and meteorological data from 1 January 2010 to 31 December 2015, totaling 2,922 days. The data from 31 January 2010 to 31 December 2015 (2,191 days) serve as the training set, while the remaining data from 01 January 2016 to 31 December 2017 (731 days) serve as the test set. Short-term streamflow is not only related to current meteorological conditions but also has a connection to historical hydro-meteorological data. Therefore, when predicting streamflow, it is common to incorporate data from multiple preceding days, i.e., lagged data. For different lead times, the feature combination approach is illustrated in Figure 2. Taking 1 day ahead as an example, the input consists of data from the previous 30 days to predict the streamflow on the 31st day, and so forth. During the training process, the training period data for the three models is randomly split into a training set and a validation set in a 4:1 ratio. The forecast results of the three models for the entire training period were selected for index calculation in the final model evaluation.
Figure 2

Schematic diagram of data partitioning.

Figure 2

Schematic diagram of data partitioning.

Close modal
The meteorological data include average precipitation (P), highest temperature (Th), lowest temperature (Tl), average temperature (T), wind speed (W), and relative humidity (R). These variables are consistent with input parameters used in some hydrological models, such as the SWAT model. With the addition of streamflow (S), there are a total of seven feature variables each day. The use of observed streamflow as a predictive factor in hydrology is not a new concept; it has been a valuable indicator for short-term forecasting over the past decade (Cloke & Pappenberger 2009; Sabzipour et al. 2023) Considering a lag period of 30 days, the machine learning models in this study have a total of 210 features. Among these numerous features, some may have a significant impact on the target variable, while others may be less relevant. Therefore, we introduced the Lasso feature selection method to enhance the efficiency and accuracy of model training and designed two sets of scenarios based on this. In the first scenario, all features were directly input into the machine learning models for training. In the second scenario, Lasso is used beforehand to select the 1–9 most important features, progressively adding them to the machine learning models. The workflows for both scenarios are illustrated in Figure 3. In the second scenario, the models are referred to as LA-LSTM, LA-TTS, and LA-RF, representing the combination of the Lasso method with LSTM, TTS, and RF, respectively.
Figure 3

Workflow diagram of the scheme design.

Figure 3

Workflow diagram of the scheme design.

Close modal
For this study, the expressions for the two scenarios are represented by Equations (2) and (3), respectively. Equation (2) aligns with S1 in Figure 3, involving the integration of all lagged streamflow and meteorological elements from the preceding 30 days as input features for machine learning training. Equation (3) corresponds to S2 in Figure 3, where the Lasso method is applied to filter all elements, isolating n critical features (with n up to 9) that are subsequently utilized for training the machine learning models:
formula
(2)
formula
(3)
where Qt and Mt are the streamflow and meteorological elements for the current time instant t, respectively; Fn is the features selected by the Lasso method, n = 1, 2, … , 9; m is the lead time for forecasting.

The TTS and LSTM models were implemented using the PyTorch framework, the RF was implemented using algorithms provided by Scikit-learn, and all three models were based on Python 3.7. The most important hyperparameters must be set before training the network and closely related to the performance of the neural network (Bai et al. 2021). Therefore, before the formal operation of the model, a manual trial-and-error method was used to adjust the model parameters and determine the initial values or ranges of the main parameters. The initial values for the hyperparameters of each model are presented in Table 1. An appropriate learning rate is effective in increasing the speed of convergence of the model and avoiding model oscillations near the local optimal solution (Yin et al. 2022b). Epoch is the iteration time of parameter updates, that is, the number of times the dataset was passed in its entirety. A small epoch does not leave sufficient time for the model to learn the most reasonable parameters, whereas a large epoch value causes the model to overlearn the training data (Ravindra 2018). Overfitting is a condition that can occur in machine learning, and the dropout rate can help to avoid this by setting a fixed proportion of neurons to zero, making machine learning more reliable (Yin et al. 2022b). To ensure that the input features used by the three models were consistent, Max_feature in the RF model was set to ‘auto’, which means that all input features were selected.

Table 1

Initial values for the hyperparameters of three model

LSTMTTSRF
learning rate 0.01 learning rate 0.0001 n_estimators 10–300 
epochs 500 epochs 500 max_depth 2–10 
dropout 0.1 dropout 0.1 max_features auto 
hidden_size d_model 720   
num_layers number of encoder and decoder   
LSTMTTSRF
learning rate 0.01 learning rate 0.0001 n_estimators 10–300 
epochs 500 epochs 500 max_depth 2–10 
dropout 0.1 dropout 0.1 max_features auto 
hidden_size d_model 720   
num_layers number of encoder and decoder   

Evaluation indexes

In order to evaluate the model results, this study utilized common hydrological statistics indicators, namely NSE (Nash–Sutcliffe efficiency), RMSE (root mean square error), PE (relative peak error) and KGE (Kling–Gupta efficiency). NSE represents the goodness of fit between predicted and observed values, with a higher value indicating a better fit (closer to 1). The objective functions of all models were selected or modified to NSE, which is commonly used in hydrological simulations. NRMSE (normalized root mean square error) is a normalized version of RMSE, facilitating comparisons between datasets of different magnitudes; a smaller NRMSE indicates less dispersion in the samples. However, a drawback of NRMSE is that it amplifies differences between large errors. The PE serves as a critical indicator for evaluating the precision of models in modeling peak streamflow (Safari et al. 2020). A value approaching 0 indicates a superior ability of the model to accurately capture peak occurrences. KGE is a comprehensive metric that considers bias, variance, and correlation, assigning them equal weights to improve the balance among these factors. Compared to NSE, KGE slightly reduces the weight of correlation. A higher KGE value indicates a better model fit (Gupta et al. 2009; Schoppa et al. 2020). Additionally, performance evaluation for machine learning models introduced a metric for generalization ability, which refers to the model's adaptability to unseen data. The closer the value is to 1, the better the generalization ability, implying that it is more acceptable in practical applications (Chen et al. 2020; Adombi et al. 2021). The equations for these indicators are as follows:
formula
(4)
formula
(5)
formula
(6)
formula
(7)
with:
formula
(8)
formula
(9)
formula
(10)
where Oi and Si are the observed and predicted streamflow, respectively; and are the average of the observed and predicted streamflow, respectively; Op and Sp are the peak value of observed and predicted streamflow, respectively; Omax and Omin are the maximum and minimum of observed streamflow; n is the total number of data; α is a measure of relative variability in the predicted and observed values; r is the linear correlation coefficient between observed and predicted streamflow; and and are the standard deviations of the observed and predicted streamflow, respectively; Etrain and Etest are the NSE values for the training set and test set, respectively.

Comparison of results with and without the Lasso method

Initially, a comparative assessment was conducted to evaluate the impact of introducing the Lasso method on model performance. Taking a lead time of 1 day as an example, the NSE results for the training sets at all stations under the two approaches are illustrated in Figure 4. The machine learning results incorporating the Lasso method are averaged over nine runs, representing the average NSE values for feature numbers ranging from 1 to 9. The three machine learning models achieved good results during the training period. The median NSE values for LSTM and LA-LSTM were around 0.85 and 0.90, respectively, while TTS, RF, and the Lasso-combined models all had median values around 0.95. However, during the testing period, the NSE values of the three models not combined with Lasso experienced a significant decrease, with median values decreasing by approximately 20%, and the distribution range of NSE values becoming wider. After applying Lasso for feature selection, there was a noticeable improvement in the reduction of NSE values for the three models, with median values decreasing by an average of only 10%, and the data distribution becoming more concentrated.
Figure 4

Comparison of NSE results for two scenarios.

Figure 4

Comparison of NSE results for two scenarios.

Close modal

Generally, due to differences in data distribution, machine learning models may exhibit slightly lower performance on the test set compared to the training set. However, if there is a significant decrease in performance on the test set, it may indicate the occurrence of overfitting. In this study, when all features were input into the machine learning models for training, the model's performance on the test set significantly deteriorated. In other words, before feature selection, too many features were input into the machine learning models, including features that were irrelevant or had little relevance to the target, resulting in the model learning too many details on the training set and performing poorly on the unseen test set.

Table 2 also presents the average values of various indicators for multi-station results. KGE and NRMSE exhibit similar trends to the NSE indicator. Overall, the Lasso method significantly enhances the model performance on the test set. The improvement is particularly pronounced for the LSTM model, with NSE and KGE increasing from 0.67 and 0.64 to 0.84 and 0.82, respectively, and NRMSE decreasing from 0.08 to 0.05. In terms of generalization ability, models combined with Lasso outperform those without feature selection. Compared to LSTM, TTS, and RF, LA-LSTM, LA-TTS, and LA-RF show an improvement in generalization ability by 21, 12, and 14%, respectively. This is because the Lasso method can extract key information from features, avoiding the use of too many irrelevant features, making the trained machine learning models more stable, and achieving better performance on the test set, i.e., higher generalization ability. In evaluating peak streamflow, we performed calculations for the PE metric by selecting 10 peak streamflow events per site over the entire study period, leading to the derived average PE values for each model as detailed in Table 2. Analogous to preceding metrics, models enhanced with the Lasso method exhibited varying degrees of advancement in peak streamflow simulation. Noteworthy is the LA-LSTM model, which saw its PE value decrease from 0.38 in the LSTM model to 0.22 on the test set. While the improvements in the TTS and RF models were not as pronounced as that of the LSTM model, their PE values nonetheless reduced by 0.05 and 0.02, respectively.

Table 2

Average of various indicators for the results of two scenarios at all stations

NSE
KGE
NRMSE
PE
GA
TrainTestTrainTestTrainTestTrainTest
LSTM 0.85 0.67 0.79 0.64 0.04 0.08 0.28 0.38 0.80 
LA-LSTM 0.88 0.84 0.91 0.82 0.03 0.05 0.23 0.22 0.96 
TTS 0.95 0.77 0.92 0.78 0.02 0.06 0.17 0.29 0.82 
LA-TTS 0.92 0.84 0.93 0.80 0.02 0.05 0.13 0.24 0.91 
RF 0.94 0.73 0.91 0.67 0.02 0.06 0.18 0.33 0.77 
LA-RF 0.92 0.81 0.91 0.71 0.02 0.05 0.16 0.31 0.89 
NSE
KGE
NRMSE
PE
GA
TrainTestTrainTestTrainTestTrainTest
LSTM 0.85 0.67 0.79 0.64 0.04 0.08 0.28 0.38 0.80 
LA-LSTM 0.88 0.84 0.91 0.82 0.03 0.05 0.23 0.22 0.96 
TTS 0.95 0.77 0.92 0.78 0.02 0.06 0.17 0.29 0.82 
LA-TTS 0.92 0.84 0.93 0.80 0.02 0.05 0.13 0.24 0.91 
RF 0.94 0.73 0.91 0.67 0.02 0.06 0.18 0.33 0.77 
LA-RF 0.92 0.81 0.91 0.71 0.02 0.05 0.16 0.31 0.89 

Figure 5 takes the Baiyinboluo station (Q1) as an example and illustrates the flow hydrographs and scatter plots of different models on the test set compared to observed values. All three models can capture the variation of flow over time, but models combined with the Lasso method exhibit a better fit to the observed values. The flow hydrograph of the LSTM model shows some fluctuations throughout the entire period, indicating suboptimal stability. In contrast, the results of the LA-LSTM model closely align with the observed values. In terms of scatter plots, the results of the LA-LSTM model are more concentrated than those of the LSTM model. The TTS model shows consistently higher values than the observed values throughout the entire month of April 2016. In contrast, the LA-TTS model does not exhibit similar discrepancies throughout the entire period. The scatter plot also reveals that the TTS model has more points above the diagonal line, while the scatter points of the LA-TTS model are more evenly distributed on both sides of the diagonal. For the RF model, there is not much difference in the flow hydrograph compared to the LA-RF model. The main distinctions are observed in peak values and certain times when there are varying degrees of overestimation or underestimation, resulting in the LA-RF model outperforming the RF model in terms of indicators.
Figure 5

Comparison of hydrographs and scatters of the three models at Baiyinbuluo station.

Figure 5

Comparison of hydrographs and scatters of the three models at Baiyinbuluo station.

Close modal

Feature filtering with Lasso

The previous comparison indicates that the Lasso method can effectively enhance the model's performance. This section will showcase and discuss the most important features that Lasso has selected for us. We applied the Lasso method for feature selection on hydro-meteorological data from all stations. For each station, we sequentially picked nine key features from a pool of 210. Taking a lead time of 1 day as an example, Table 3 showcases the feature selection results for all stations, where t-n (n = 1, 2, … , 30) denotes data lagging n days relative to the prediction date. Examining the table, we discern commonalities in certain features across different stations in the Haihe River Basin. Specifically, for all stations, the consistently top-ranking and most influential feature is S(t − 1), signifying that, in this study's streamflow prediction, the lagged streamflow from the previous day holds paramount significance. Among the remaining features, precipitation and flow-related attributes take precedence. Despite the pronounced impact of human activities on streamflow patterns in the study area, mitigating natural variability, precipitation remains a pivotal driving factor. Additionally, certain temperature-related features are noteworthy, particularly at specific stations (e.g., Q2). However, the impact of relative humidity and wind speed is less pronounced, indicating their limited influence on streamflow prediction and a potential contribution to model overfitting. The table also illustrates that the selected features encompass not only data proximal to the prediction date but also include features lagging more than 20 days for several stations. This emphasizes the necessity of a 30-day total lag period in this study to avoid overlooking relatively crucial features.

Table 3

Importance ranking of input features at the 10 stations with a forecast period of 1 day

Station123456789
Q1 S(t − 1) S(t − 6) S(t − 13) S(t − 7) P(t − 3) T(t − 8) S(t − 4) P(t − 1) S(t − 3) 
Q2 S(t − 1) P(t − 1) P(t − 2) P(t − 21) P(t − 23) S(t − 2) S(t − 4) Th(t − 1) S(t − 20) 
Q3 S(t − 1) S(t − 10) P(t − 1) S(t − 5) P(t − 11) P(t − 2) Th(t − 11) P(t − 24) Th(t − 6) 
Q4 S(t − 1) P(t − 3) S(t − 7) P(t − 2) P(t − 1) P(t − 23) S(t − 30) Th(t − 8) S(t − 20) 
Q5 S(t − 1) S(t − 2) S(t − 3) Tl(t − 28) S(t − 4) Th(t − 2) Tl(t − 25) Tl(t − 29) Th(t − 22) 
Q6 S(t − 1) P(t − 3) S(t − 2) P(t − 4) P(t − 2) P(t − 5) P(t − 14) P(t − 15) P(t − 6) 
Q7 S(t − 1) P(t − 4) S(t − 3) P(t − 5) S(t − 5) S(t − 9) P(t − 9) S(t − 4) P(t − 10) 
Q8 S(t − 1) S(t − 2) P(t − 2) P(t − 1) P(t − 3) P(t − 12) S(t − 9) P(t − 10) P(t − 15) 
Q9 S(t − 1) S(t − 2) P(t − 6) P(t − 13) P(t − 5) S(t − 3) P(t − 15) P(t − 2) P(t − 3) 
Q10 S(t − 1) P(t − 1) S(t − 2) P(t − 21) S(t − 18) S(t − 28) S(t − 10) S(t − 24) S(t − 13) 
Station123456789
Q1 S(t − 1) S(t − 6) S(t − 13) S(t − 7) P(t − 3) T(t − 8) S(t − 4) P(t − 1) S(t − 3) 
Q2 S(t − 1) P(t − 1) P(t − 2) P(t − 21) P(t − 23) S(t − 2) S(t − 4) Th(t − 1) S(t − 20) 
Q3 S(t − 1) S(t − 10) P(t − 1) S(t − 5) P(t − 11) P(t − 2) Th(t − 11) P(t − 24) Th(t − 6) 
Q4 S(t − 1) P(t − 3) S(t − 7) P(t − 2) P(t − 1) P(t − 23) S(t − 30) Th(t − 8) S(t − 20) 
Q5 S(t − 1) S(t − 2) S(t − 3) Tl(t − 28) S(t − 4) Th(t − 2) Tl(t − 25) Tl(t − 29) Th(t − 22) 
Q6 S(t − 1) P(t − 3) S(t − 2) P(t − 4) P(t − 2) P(t − 5) P(t − 14) P(t − 15) P(t − 6) 
Q7 S(t − 1) P(t − 4) S(t − 3) P(t − 5) S(t − 5) S(t − 9) P(t − 9) S(t − 4) P(t − 10) 
Q8 S(t − 1) S(t − 2) P(t − 2) P(t − 1) P(t − 3) P(t − 12) S(t − 9) P(t − 10) P(t − 15) 
Q9 S(t − 1) S(t − 2) P(t − 6) P(t − 13) P(t − 5) S(t − 3) P(t − 15) P(t − 2) P(t − 3) 
Q10 S(t − 1) P(t − 1) S(t − 2) P(t − 21) S(t − 18) S(t − 28) S(t − 10) S(t − 24) S(t − 13) 

Q1: Baiyinbuluo station; Q2: Sandaohezi station; Q3: Sandaoying station; Q4: Xiahui station; Q5: Gu'anqiao station; Q6: Xianxian station; Q7: Linqing station; Q8: Yuancunji station; Q9: Yuechengshuiku station; Q10: Liaocheng station.

By incrementally adding the selected top nine important features to the machine learning models, nine sets of streamflow predictions were obtained for each station as the number of features increased. The average trend of NSE changes for all stations is illustrated in Figure 6, with the shaded area representing the 95% confidence interval. For the training period, as the number of features increases, both LA-TTS and LA-RF exhibit a synchronized upward trend. In contrast, LA-LSTM shows fluctuations after reaching three features, stabilizing thereafter. For the test set, LA-LSTM maintains an improving trend as the number of features increases from 1 to 3. However, similar to the training period, it then exhibits fluctuations. Although LA-TTS and LA-RF show gradual improvement in the training period, the results on the test set remain relatively stable, essentially forming a straight line.
Figure 6

Trend of NSE changes with the number of features in three models.

Figure 6

Trend of NSE changes with the number of features in three models.

Close modal

In the training set, both LA-TTS and LA-RF consistently have higher NSE values than LA-LSTM. However, on the test set, the results of LA-LSTM consistently outperform the other two models, indicating relatively weaker generalization abilities for LA-TTS and LA-RF. The reason for this situation could be that the increase in input features relatively introduced more noise, causing LA-TTS and LA-RF to overfit and thus fail to effectively process unseen data. Conversely, the LA-LSTM model excels in capturing temporal dependencies within the time-series data. Despite the increment in features, its ability to generalize across different datasets remains notably superior. The performance gaps among the three models align with the results in Section 4.1. This consistency is because the combined results with Lasso in the previous section are based on the average of nine simulation results in this section. Averaging is employed to account for the variability in the optimal number of features for each station, ensuring that the average reflects the overall performance across all stations, even though the individual models may have different optimal feature counts for each station.

Comparison of the results of the three models

Based on the average of nine simulation results, we further compared the performance of the LA-LSTM, LA-TTS, and LA-RF models. Figure 7 illustrates the numerical distribution of NSE, KGE, NRMSE and PE indicators for these three models during the training and validation periods. The dashed lines, from bottom to top, respectively, represent the first quartile, the median (long-dashed line), and the third quartile. Overall, the situation is consistent with the previously mentioned content: during the training period, LA-TTS and LA-RF achieve higher indicator values than LA-LSTM, but their performance on the test set is noticeably inferior to LA-LSTM, indicating a certain degree of overfitting.
Figure 7

Numerical distribution of indicators for three models. (a) NSE; (b) KGE; (c) NRMSE; (d) PE. The dashed lines, from bottom to top, respectively, represent the first quartile, the median (long-dashed line), and the third quartile.

Figure 7

Numerical distribution of indicators for three models. (a) NSE; (b) KGE; (c) NRMSE; (d) PE. The dashed lines, from bottom to top, respectively, represent the first quartile, the median (long-dashed line), and the third quartile.

Close modal

Firstly, concerning NSE, during the training period, the NSE values of LA-LSTM are primarily concentrated within the 25–75% range, exhibiting a relatively even distribution. In contrast, the NSE distribution of LA-TTS and LA-RF is more concentrated, with a higher concentration within the 50–75% range, and their median lines are notably higher than that of LA-LSTM. However, during the test period, the situation reverses. The median NSE of LA-LSTM surpasses that of LA-TTS and LA-RF, and its distribution is more concentrated above 50%. A similar pattern emerges in terms of KGE, particularly focusing on performance during the test set. The median KGE of LA-LSTM is comparable to the 75% quartile lines of LA-TTS and LA-RF. LA-RF exhibits inferior performance compared to LA-TTS, with a lower median and an overall flatter distribution. Regarding NRMSE, LA-LSTM outperforms LA-TTS and LA-RF on the test set, displaying a lower median, indicating that the model achieves smaller errors. While LA-TTS and LA-RF show similar performance below the median, LA-RF's 25% quartile line is at a higher error level. The LA-LSTM model not only excels in reducing overall errors but also stands out in minimizing peak errors. Specifically, on the test set, the PE median value of LA-LSTM is notably closer to 0, with its distribution more tightly clustered around values near zero.

In summary, the LA-LSTM model not only ensures satisfactory results during the training period but also demonstrates higher applicability and greater generalization ability on the test set. The performance of the LA-TTS model, while not as outstanding as LA-LSTM, is slightly better than that of the LA-RF model.

Results of the 1–7-day forecast period

The preceding results primarily focused on a lead time of 1 day. To assess the short-term forecasting capabilities of the three machine learning models, predictions for streamflow were extended to 1–7 days, as depicted in Figure 8. Over the entire period, as the forecast time increases, the performance of the models declines, accompanied by an enlargement of the confidence intervals. This is attributed to the highest importance given to elements closest to the target streamflow date during the prediction process. As the lead time increases, the time interval between the target streamflow and input features elongates, reducing the correlation and making model training more challenging.
Figure 8

Indicators of streamflow prediction results for 1–7 days ahead. (a) Train period; (b) test period.

Figure 8

Indicators of streamflow prediction results for 1–7 days ahead. (a) Train period; (b) test period.

Close modal

Based on the current forecasting results, similar to the analysis of a 1-day lead time discussed earlier, when the lead time extends to 7 days, LA-LSTM's indicators during the training period are slightly lower than those of LA-TTS and LA-RF, but it exhibits superior performance during the test period. Specifically, at a lead time of 4 days, LA-LSTM achieves an NSE of approximately 0.60 on the test set, surpassing the values of 0.57 for LA-TTS and 0.52 for LA-RF. Even with a lead time of 7 days, the LA-LSTM model still attains an NSE value of 0.43. The performance of LA-TTS remains intermediate between the other two models, closely resembling LA-LSTM's results for lead times less than 5 days and experiencing a more pronounced decline afterward, approaching the performance of the LA-RF model.

The Lasso method demonstrated a significant impact in this study. By contrasting the results with machine learning models that did not utilize the Lasso method, we observed a noteworthy reduction in the dimensionality of the feature space. This reduction addressed the overfitting issue arising from a large number of features, ultimately enhancing the model's generalization ability. Through regularization techniques, the Lasso method effectively shrinks some unimportant feature coefficients to zero, identifying and selecting only a few features with the highest predictive power for the target variable before training the machine learning model. For all stations in this study, lagged streamflow and precipitation play a predominant role in Lasso selection. Lagged streamflow closest to the prediction date is often identified as the most crucial. In comparison, RFE typically require gradually eliminating unimportant features based on model performance, which demand more computational resources and time (Ijadi Maghsoodi et al. 2023), and PCA obtains new feature vectors through linear transformation that may compromise the interpretability of the model (Palit et al. 2022).

Comparing the performance of the LA-LSTM, LA-TTS, and LA-RF machine learning models, we observed that the LA-LSTM model exhibits notable stability and generalization ability in streamflow prediction. The LA-LSTM model, as a variant of recurrent neural networks suitable for sequential data, possesses excellent capabilities in modeling time series. In streamflow prediction, LA-LSTM effectively captures both long-term and short-term dependencies in time series, enabling more accurate predictions of future streamflow conditions while maintaining stability between the training and testing periods. While some literature suggests that the TTS model outperforms the LSTM model in streamflow prediction (Yin et al. 2022a), however, this is not always the case in practical applications. After we introduced the Lasso method, although LA-TTS performed well in some cases, its generalization ability for streamflow prediction for sites within the Haihe River Basin was not as good as that of LA-LSTM. In addition, RF has advantages in dealing with high-dimensional data and complex relationships (Tyralis et al. 2019), but LA-LSTM outperforms LA-RF in this particular task of the present study. Certainly, the choice of the most suitable prediction method should still be based on a careful consideration of different models' strengths and weaknesses depending on the specific characteristics of the problem in different scenarios.

While our study has yielded meaningful insights into the application of Lasso in conjunction with machine learning models for streamflow prediction, we acknowledge certain limitations in our research. Firstly, limitations in the quality and availability of hydrological data constrained us to use hydro-meteorological data from only a subset of stations within the Haihe River Basin. Despite our awareness of significant human impacts in the study area, we were unable to incorporate effective daily-scale data related to human water consumption, reservoir operation, and other anthropogenic activities, preventing a detailed discussion of their specific influences. This limitation contributes to the observed decrease in model predictive performance with an increasing lead time. Therefore, building upon more comprehensive data collection efforts, our future work will emphasize enhancing the interpretability of machine learning models in streamflow prediction.

In this study, we engaged with the integration of the Lasso method and forefront machine learning models (LSTM, TTS, and RF) to simulate and predict daily streamflow in the Haihe River Basin. Our investigation underscored the Lasso method's crucial role in enhancing model accuracy by selectively pinpointing key predictive features, thereby elevating the precision of forecasts in environments where hydrological behaviors veer from traditional patterns. Our major conclusions are as follows:

  • (1) The Lasso method significantly improved the generalization capabilities of the machine learning models by identifying critical predictors, such as lagged streamflow and precipitation. Among these, streamflow data nearest to the forecast date were paramount in achieving accurate predictions.

  • (2) Among the integrated models, LA-LSTM stood out for its exceptional stability and ability to generalize, particularly excelling over LA-RF and LA-TTS in terms of performance and reliability as the forecast horizon extended.

For hydrologists and river engineers, the insights derived from our study stand to substantially improve flood risk management and water resource planning by providing tools that are not just more accurate but also exceptionally adaptable to the dynamic challenges of changing environments. Moving forward, we plan to enrich our predictive models with a broader range of geographical data and detailed observations of human activities, aiming to elevate the precision and applicability of our forecasts even further.

This study was supported by the National Key R&D Program of China (2017YFC0406004) and National Natural Science Foundation of China (41271004).

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Adombi
A. V. D. P.
,
Chesnaux
R.
&
Boucher
M. A.
2021
Review: Theory-guided machine learning applied to hydrogeology-state of the art, opportunities and future challenges
.
Hydrogeology Journal
29
(
8
),
2671
2683
.
doi:10.1007/s10040-021-02403-2
.
Alizadeh
A.
,
Rajabi
A.
,
Shabanlou
S.
,
Yaghoubi
B.
&
Yosefvand
F.
2021
Modeling long-term rainfall-runoff time series through wavelet-weighted regularization extreme learning machine
.
Earth Science Informatics
14
(
2
),
1047
1063
.
doi:10.1007/s12145-021-00603-8
.
Arsenault
R.
,
Martel
J. L.
,
Brunet
F.
,
Brissette
F.
&
Mai
J.
2023
Continuous streamflow prediction in ungauged basins: Long short-term memory neural networks clearly outperform traditional hydrological models
.
Hydrology and Earth System Sciences
27
(
1
),
139
157
.
Copernicus GmbH. doi:10.5194/hess-27-139-2023
.
Bai
T.
&
Tahmasebi
P.
2022
Characterization of groundwater contamination: A transformer-based deep learning model
.
Advances in Water Resources
164
,
104217
.
doi:10.1016/j.advwatres.2022.104217
.
Bin
L.
,
Xu
K.
,
Xu
X.
,
Lian
J.
&
Ma
C.
2018
Development of a landscape indicator to evaluate the effect of landscape pattern on surface runoff in the Haihe River Basin
.
Journal of Hydrology
566
,
546
557
.
doi:10.1016/j.jhydrol.2018.09.045
.
Boulesteix
A. L.
,
Janitza
S.
,
Kruppa
J.
&
König
I. R.
2012
Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics
.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
2
(
6
),
493
507
.
doi:10.1002/widm.1072
.
Cai
J.
,
Luo
J.
,
Wang
S.
&
Yang
S.
2018
Feature selection in machine learning: A new perspective
.
Neurocomputing
300
,
70
79
.
doi:10.1016/j.neucom.2017.11.077
.
Chen
C.
,
He
W.
,
Zhou
H.
,
Xue
Y.
&
Zhu
M.
2020
A comparative study among machine learning and numerical models for simulating groundwater dynamics in the Heihe River Basin, northwestern China
.
Scientific Reports
10
(
1
).
doi:10.1038/s41598-020-60698-9
.
Chu
H.
,
Wei
J.
&
Wu
W.
2020
Streamflow prediction using LASSO-FCM-DBN approach based on hydro-meteorological condition classification
.
Journal of Hydrology
580
,
124253
.
doi:10.1016/j.jhydrol.2019.124253
.
Cloke
H. L.
&
Pappenberger
F.
2009
Ensemble flood forecasting: A review
.
Journal of Hydrology
375
(
3–4
),
613
626
.
doi:10.1016/j.jhydrol.2009.06.005
.
Dtissibe
F. Y.
,
Ari
A. A. A.
,
Abboubakar
H.
,
Njoya
A. N.
,
Mohamadou
A.
&
Thiare
O.
2024
A comparative study of machine learning and deep learning methods for flood forecasting in the Far-North region, Cameroon
.
Scientific African
23
,
e02053
.
doi:10.1016/j.sciaf.2023.e02053
.
Gupta
H. V.
,
Kling
H.
,
Yilmaz
K. K.
&
Martinez
G. F.
2009
Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling
.
Journal of Hydrology
377
(
1–2
),
80
91
.
doi:10.1016/j.jhydrol.2009.08.003
.
Hindersland
J. H.
2023
Analyzing the Performance of Transformers for Streamflow Prediction
.
Master Thesis
,
University of Agder
,
Kristiansand, Nolway
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
(
8
),
1735
1780
.
doi:10.1162/neco.1997.9.8.1735
.
Ijadi Maghsoodi
A.
,
Torkayesh
A. E.
,
Wood
L. C.
,
Herrera-Viedma
E.
&
Govindan
K.
2023
A machine learning driven multiple criteria decision analysis using LS-SVM feature elimination: Sustainability performance assessment with incomplete data
.
Engineering Applications of Artificial Intelligence
119
,
105785
.
doi:10.1016/j.engappai.2022.105785
.
Jing
H.
,
He
X.
,
Tian
Y.
,
Lancia
M.
,
Cao
G.
,
Crivellari
A.
,
Guo
Z.
&
Zheng
C.
2023
Comparison and interpretation of data-driven models for simulating site-specific human-impacted groundwater dynamics in the North China Plain
.
Journal of Hydrology
616
,
128751
.
doi:10.1016/j.jhydrol.2022.128751
.
Khaledi
J.
,
Nitschke
C.
,
Lane
P. N. J.
,
Penman
T.
&
Nyman
P.
2022
The influence of atmosphere-ocean phenomenon on water availability across temperate Australia
.
Water Resources Research
58
(
1
),
e2020WR029409
.
https://doi.org/10.1029/2020WR029409
.
Khalid
S.
,
Khalil
T.
&
Nasreen
S.
2014
A survey of feature selection and feature extraction techniques in machine learning
. In:
2014 Science and Information Conference. Paper Presented at the 2014 Science and Information Conference
, pp.
372
378
,
doi:10.1109/SAI.2014.6918213
.
Kratzert
F.
,
Klotz
D.
,
Brenner
C.
,
Schulz
K.
&
Herrnegger
M.
2018
Rainfall-runoff modelling using Long Short-Term Memory (LSTM) networks
.
Hydrology and Earth System Sciences
22
(
11
),
6005
6022
.
doi:10.5194/hess-22-6005-2018
.
Lee
J.
,
Abbas
A.
,
McCarty
G. W.
,
Zhang
X.
,
Lee
S.
&
Hwa Cho
K.
2023
Estimation of base and surface flow using deep neural networks and a hydrologic model in two watersheds of the Chesapeake Bay
.
Journal of Hydrology (Amsterdam)
617
,
128916
.
doi:10.1016/j.jhydrol.2022.128916
.
Li
K.
&
Lu
Y.
2023
A transformer-based framework for parameter learning of a land surface hydrological process model
.
Remote Sensing
15
(
14
),
3536
.
Multidisciplinary Digital Publishing Institute. doi:10.3390/rs15143536
.
Li
Y.
,
Liang
Z.
,
Hu
Y.
,
Li
B.
,
Xu
B.
&
Wang
D.
2020
A multi-model integration method for monthly streamflow prediction: Modified stacking ensemble strategy
.
Journal of Hydroinformatics
22
(
2
),
310
326
.
doi:10.2166/hydro.2019.066
.
Mangukiya
N. K.
&
Sharma
A.
2024
Alternate pathway for regional flood frequency analysis in data-sparse region
.
Journal of Hydrology
629
,
130635
.
doi:10.1016/j.jhydrol.2024.130635
.
Muthukrishnan
R.
&
Rohini
R.
2016
LASSO: A feature selection technique in predictive modeling for machine learning
. In:
2016 IEEE International Conference on Advances in Computer Applications (ICACA). Paper Presented at the 2016 IEEE International Conference on Advances in Computer Applications (ICACA)
, pp.
18
20
.
doi:10.1109/ICACA.2016.7887916
.
Probst
P.
&
Boulesteix
A. L.
2018
To tune or not to tune the number of trees in random forest
.
Journal of Machine Learning Research
18
,
1
18
.
doi:10.48550/arXiv.1705.05654
.
Ravindra
B.
2018
Forecasting solar radiation during dust storms using deep learning. arXiv:1808.10854
.
Roberts
S.
&
Nowak
G.
2014
Stabilizing the lasso against cross-validation variability
.
Computational Statistics & Data Analysis
70
,
198
211
.
doi:10.1016/j.csda.2013.09.008
.
Sabzipour
B.
,
Arsenault
R.
,
Troin
M.
,
Martel
J. L.
,
Brissette
F.
,
Brunet
F.
&
Mai
J.
2023
Comparing a long short-term memory (LSTM) neural network with a physically-based hydrological model for streamflow forecasting over a Canadian catchment
.
Journal of Hydrology
130380
.
doi:10.1016/j.jhydrol.2023.130380
.
Safari
M. J. S.
,
Rahimzadeh Arashloo
S.
&
Danandeh Mehr
A.
2020
Rainfall-runoff modeling through regression in the reproducing kernel Hilbert space algorithm
.
Journal of Hydrology
587
,
125014
.
doi:10.1016/j.jhydrol.2020.125014
.
Schoppa
L.
,
Disse
M.
&
Bachmair
S.
2020
Evaluating the performance of random forest for large-scale flood discharge simulation
.
Journal of Hydrology
590
,
125531
.
doi:10.1016/j.jhydrol.2020.125531
.
Smith
J. D.
,
Lamontagne
J. R.
&
Jasek
M.
2024
Considering uncertainty of historical ice jam flood records in a Bayesian frequency analysis for the Peace-Athabasca Delta
.
Water Resources Research
60
,
e2022WR034377
.
doi:10.1029/2022WR034377
.
Tibshirani
R.
1996
Regression shrinkage and selection via the Lasso
.
Journal of the Royal Statistical Society: Series B (Methodological)
58
(
1
),
267
288
.
doi:10.1111/j.2517-6161.1996.tb02080.x
.
Tounsi
A.
,
Abdelkader
M.
&
Temimi
M.
2023
Assessing the simulation of streamflow with the LSTM model across the continental United States using the MOPEX dataset
.
Neural Computing and Applications
35
(
30
),
22469
22486
.
doi:10.1007/s00521-023-08922-1
.
Tripathy
K. P.
&
Mishra
A. K.
2023
Deep learning in hydrology and water resources disciplines: Concepts, methods, applications, and research directions
.
Journal of Hydrology
130458
.
doi:10.1016/j.jhydrol.2023.130458
.
Tursun, A., Xie, X., Wang, Y., Liu, Y., Peng, D., Rusuli, Y. & Zheng, B.
2024
Reconstruction of missing streamflow series in human-regulated catchments using a data integration LSTM model
.
Journal of Hydrology: Regional Studies
52
,
101744
.
doi:10.1016/j.ejrh.2024.101744
.
Tyralis
H.
,
Papacharalampous
G.
&
Langousis
A.
2019
A brief review of random forests for water scientists and practitioners and their recent history in water resources
.
Water
11
(
5
),
910
.
doi:10.3390/w11050910
.
Vaswani
A.
,
Shazeer
N.
,
Parmar
N.
,
Uszkoreit
J.
,
Jones
L.
,
Gomez
A. N.
,
Kaiser
L.
&
Polosukhin
I.
2017
Attention Is All You Need. arXiv:1706.03762
.
Wei
X.
,
Wang
G.
,
Schmalz
B.
,
Hagan
D. F. T.
&
Duan
Z.
2023
Evaluation of transformer model and self-attention mechanism in the Yangtze River basin runoff prediction
.
Journal of Hydrology: Regional Studies
47
,
101438
.
doi:10.1016/j.ejrh.2023.101438
.
Wen
Q.
,
Zhou
T.
,
Zhang
C.
,
Chen
W.
,
Ma
Z.
,
Yan
J.
&
Sun
L.
2022
Transformers in Time Series: A Survey. arXiv:2202.07125
.
Willard
J.
,
Varadharajan
C.
,
Jia
X.
&
Kumar
V.
2023
Time Series Predictions in Unmonitored Sites: A Survey of Machine Learning Techniques in Water Resources
.
Wu
H.
,
Xu
J.
,
Wang
J.
&
Long
M.
2021
Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. arXiv:2106.13008
.
Xu
Y.
,
Hu
C.
,
Wu
Q.
,
Jian
S.
,
Li
Z.
,
Chen
Y.
,
Zhang
G.
,
Zhang
Z.
&
Wang
S.
2022
Research on particle swarm optimization in LSTM neural networks for rainfall-runoff simulation
.
Journal of Hydrology
608
,
127553
.
doi:10.1016/j.jhydrol.2022.127553
.
Yang
Y.
&
Yang
Y.
2020
Hybrid method for short-term time series forecasting based on EEMD
.
IEEE Access
8
,
61915
61928
.
doi:10.1109/ACCESS.2020.2983588
.
Yin
H.
,
Guo
Z.
,
Zhang
X.
,
Chen
J.
&
Zhang
Y.
2022a
RR-Former: Rainfall-runoff modeling based on transformer
.
Journal of Hydrology
609
,
127781
.
doi:10.1016/j.jhydrol.2022.127781
.
Yin
H.
,
Wang
F.
,
Zhang
X.
,
Zhang
Y.
,
Chen
J.
,
Xia
R.
&
Jin
J.
2022b
Rainfall-runoff modeling using long short-term memory based step-sequence framework
.
Journal of Hydrology (Amsterdam)
610
,
127901
.
doi:10.1016/j.jhydrol.2022.127901
.
Zhou
H.
,
Zhang
S.
,
Peng
J.
,
Zhang
S.
,
Li
J.
,
Xiong
H.
&
Zhang
W.
2020
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. arXiv:2012.07436
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).