Accurate streamflow prediction is vital for hydropower operations, agricultural planning, and water resource management. This study assesses the effectiveness of Long Short-Term Memory (LSTM) networks in daily streamflow prediction at the Kratie station, investigate different network structures and hyperparameters to optimize predictive accuracy while considering computational efficiency. Our findings underscore the significance of LSTM models in addressing streamflow prediction challenges. Training LSTM on historical streamflow data reveals the significance of the training dataset size; spanning 2013–2022 yields optimal results. Incorporating a hidden layer with a nonlinear activation function, and adding a fully connected layer improve prediction ability. However, increasing the number of neurons and layers introduces complexity and computational overhead. Careful parameter tuning, including epochs, dropout, and the number of LSTM units, is crucial for optimal performance without sacrificing efficiency. The stacked LSTM with sigmoid activation demonstrates exceptional performance, boasting a high Nash–Sutcliffe Efficiency of 0.95 and a low relative root mean square error (rRMSE) of approximately 0.002%. Moreover, the model excels in forecasting streamflow for 5–15 antecedent days, with 5 days exhibiting particularly high accuracy. These findings offer valuable insights into LSTM networks for streamflow prediction for water management in the Vietnam Mekong Delta.

  • LSTM networks effectively predict daily streamflow, achieving high accuracy in the Mekong Basin.

  • Historical streamflow data spanning 2013 to 2022 is optimal for LSTM training in Mekong.

  • Incorporating non-linear activation functions and a fully connected layer enhances learning efficiency and prediction ability.

  • More neurons and layers introduces complexity and computational cost, requiring careful tuning.

  • The stacked LSTM model with sigmoid activation achieves an NSE of 0.95 and a low rRMSE (∼0.002%), with optimal antecedent input ranging from 5 to 15 days, peaking at five days.

Precise streamflow forecasts are vital in optimizing water resources management, enhancing hydroelectricity power generation efficiency, strategic irrigation planning and effective flood monitoring, and other hydrological functions (Kisi & Cimen 2011). However, the nonlinear nature of hydrology, resulting from complex weather patterns and infiltration mechanisms, poses a persistent challenge in achieving reliable streamflow forecasting (Wang et al. 2009; Ghimire et al. 2021; Zhang et al. 2022).

The application of machine learning (ML) algorithms in streamflow prediction has gained prominence due to their ability to address the non-stationarity and non-linearity of hydrological phenomena (Fu et al. 2020; Zhu et al. 2020b; Naganna et al. 2023; Jamei et al. 2024). Consequently, there is a growth in using ML, such as Long Short-Term Memory (LSTM) networks, which are particularly well-suited for sequential data and have shown promising results in hydrological time-series forecasting.

Recent studies, such as that by Mehraein et al. (2022), investigate the efficacy of metaheuristic regression approaches – including CatBoost (CB), Random Forest (RF), and Extreme Gradient Tree Boosting (XGBoost) – for forecasting monthly streamflow using satellite precipitation data. Their findings suggest that XGBoost and CB offer superior accuracy when incorporating TRMM rainfall data and periodicity components compared to other methods such as RF and ANN. While effective in certain situations, their challenge is capturing long-term temporal dependencies for streamflow forecasting.

In contrast, although models like support vector machines (SVM) and decision trees are powerful for regression tasks, they require significant feature engineering to handle time-series data effectively. SVM, for instance, performs well with lagged values and rolling windows but fails to capture the sequential dependencies vital for streamflow prediction (Cheng et al. 2020; Duy Nguyen et al. 2022). Similarly, decision trees, RFs, and Gradient Boosting Machines are easy to interpret and perform well with nonlinear data. They struggle to model sequential dependencies without manually adding temporal features (Juneau et al. 2021; Al-Bossly et al. 2023; Zhang & Wang, 2020). This limitation makes them less effective for modeling streamflow.

Deep learning has achieved impressive methodological advancements in the last few years and yielded promising outcomes in various practical data science applications, including hydrology. LSTM networks effectively manage the nonlinear, non-stationary, and seasonal characteristics of streamflow data, enabling them to capture complex temporal patterns (Zhu et al. 2020a; Tao et al. 2021; Yaseen 2021; Naganna et al. 2023). Unlike convolutional neural networks (CNNs), which are effective in extracting local temporal features but may not capture long-range dependencies, and gated recurrent units (GRUs), which have fewer parameters than LSTM and are efficient in training but may perform less effectively in long-term dependency modeling, LSTMs are effective due to their memory block architecture that enables them to retain and learn from long sequences of data (Yao et al. 2023; Bounoua et al. 2024; Zhao et al. 2024).

Studies have shown that the LSTM model consistently outperforms other ML models such as SVM and Decision Trees and even addresses a fundamental limitation of the conventional recurrent neural network (RNN) processing data containing temporal sequences. By incorporating a specialized unit, referred to as a memory block, the LSTM can retain information over extended periods, allowing for the effective learning and modeling of long-term dependencies and facilitating the capture of intricate relationships within the data (Hochreiter & Schmidhuber 1997; Gers et al. 2000; Shen 2018; Kratzert et al. 2019a; Fu et al. 2020; Hunt et al. 2022). A series of publications by Kratzert et al. (2018, 2019a, 2019b) applied the LSTM model in simulating daily streamflows of 241 basins throughout the United States. They showed the remarkable power of LSTM in flow modeling, even when the models were trained on multiple basins simultaneously, and introduced a novel model called Entity-Aware LSTM (EA-LSTM), demonstrating significant improvement in regional-level performance for 531 basins. In the research of Zhang et al. (2018b), the authors utilized LSTM for predicting water depth. They observed a substantial improvement in prediction accuracy compared to the feedforward neural network model. This outcome demonstrated the LSTM's ability to effectively learn from previous information, thus enhancing its predictive capabilities in hydrological applications. Zhang et al. (2018a) conducted a comparative analysis of four neural network models for daily river flow forecasting for sewer overflow structures. The models examined were the LSTM, multilayer perceptron, GRU, and wavelet neural network. Their findings demonstrated that LSTM and GRU exhibited remarkable proficiency in prediction for different leading times. Using discharge data at the upper Yangtze River and the Hun River, Xu et al. (2020) created an LSTM model for simulating daily and 10-day average streamflow and found that the LSTM outperformed the Soil and Water Assessment Tool model and other hydrological models. In the study of Yuan et al. (2018), the accuracy of a combined approach utilizing deep learning and a parameter optimizer for predicting monthly runoff was evaluated, and the results showed that the hybrid LSTM outperformed other hybrid models. Fu et al. (2020) highlighted the superior performance of LSTM models in processing both steady and fluctuating streamflow data, demonstrating a good ability to capture data features across seasons and further evaluated the influence of training set size, the time interval between training and testing sets, and time span of prediction data on LSTM model performance.

In recent developments, advancements in LSTM models have further improved our understanding of streamflow dynamics and forecasting capabilities (Sahoo et al. 2019, 2023a, b; Fu et al. 2020; Zhu et al., 2020a; Slater et al. 2021). Among these research, the simplest approaches have focused on river discharge with a prominent annual cycle and substantial lagged autocorrelation. These studies utilize previous discharge values at the exact location to make predictions using LSTMs. Sahoo et al. (2019) investigated the application of LSTM networks for forecasting low-flow streamflow in hydrological systems. The study demonstrates that LSTM surpasses traditional statistical models such as Autoregressive Integrated Moving Average (ARIMA) and other ML techniques in terms of accuracy, especially in low-flow forecasting. However, it does not consider how varying hyperparameters impacts performance. Such analysis could provide valuable insights into the features that drive predictions within hydrological contexts. Sahoo et al. (2023a) introduced a new approach for accurately predicting daily suspended sediment loads. Their approach combines smoothing techniques with LSTM deep learning models to improve accuracy, especially in handling variable and noisy data. The authors improve prediction performance by smoothing the data before inputting it into the model. Sahoo et al. (2023b) explored the use of LSTM, predicting future water demand in urban settings across multiple steps. The study shows LSTM's superior performance in multi-step ahead predictions. Since urban systems can vary widely, further validation on different datasets would be helpful. In addition, more advanced models have been developed that incorporate upstream data to enhance streamflow prediction. These models may either merely forecast river flow from different study areas (Costa Silva et al. 2021) or can also predict flow while including precipitation (Ding et al. 2019; Le et al. 2019; Yan et al. 2019; Sonali et al. 2023). In a case study conducted by Costa Silva et al. (2021) in Brazil, promising results were obtained for a 5-day ahead prediction. Sonali et al. (2023) introduce a hybrid deep learning model combining LSTM with other ML techniques for forecasting monthly runoff time series of the Brahmani River in India. The hybrid model shows improvements in prediction accuracy. The complexity of the hybrid model might pose challenges in real-time applications. More extensive sensitivity analysis could enhance understanding of how different model components (LSTM, CNN, and Attention) contribute to performance improvements.

Similarly, Le et al. (2019) obtained a positive outcome in modeling flood events in the Da River basin of Vietnam at a lead time of 3 days. In the research of (Yan et al. 2019), the author designed an LSTM network specifically for flood prediction based on historical discharge and meteorological forecast data. This finding highlights the LSTM's effectiveness in accurately predicting flood events, especially in capturing the magnitude of peak flows. To predict runoff for the Lech and Danube Rivers confluence, Ding et al. (2019) developed an advanced LSTM-based hydrological forecast model, which incorporated European Centre for Medium-Range Weather Forecasts (ECMWF) projections of multi-variables (precipitation, soil moisture, etc.). The research evaluated the prediction accuracy up to 9-h lead times and observed a Nash–Sutcliffe Efficiency (NSE) of 0.71, which further improved to 0.77 after incorporating an attention layer (Hunt et al. 2022). In summary, antecedent research has underscored the efficacy of employing LSTM models with streamflow and rainfall data for streamflow prediction. Despite these advancements, significant challenges remain, particularly in regions with sparse data and in understanding the influence of historical streamflow patterns on present-day dynamics.

In the expansive and globally significant Mekong River Basin (MRB), where reservoir releases significantly influence streamflow dynamics, the main river trunk holds a crucial position in shaping the streamflow dynamics of the primary river and confronts challenges related to sparse data regions. Understanding the role of historical streamflow patterns is crucial, particularly in regions with sparse data coverage. Conventionally, precipitation has been acknowledged as the principal driver of streamflow patterns, evidenced by hydrographs displaying consistent changes corresponding to precipitation fluctuations over time. Based on these works, this study aims to assess the effectiveness of LSTM models in capturing the complex dynamics of streamflow in the Mekong River Basin – a region where reservoir operations and seasonal variability play a crucial role in shaping the streamflow. This study contributes to the growing body of research by providing a detailed assessment of LSTM models applied to streamflow forecasting, with a unique focus on the role of historical streamflow patterns – especially in the Mekong River Basin, a region that faces significant data gaps. While previous models have mainly relied on precipitation data as a primary driver of streamflow, this study introduces a novel approach by emphasizing the importance of past streamflow behavior, which reflects responses to climatic, land use, and operational changes over time. While precipitation undeniably influences streamflow variations, the historical streamflow behavior, reflecting past hydrological responses to factors such as land use changes and climate variability, offers valuable insights into the current river conditions. In this context, this study aimed to investigate the effectiveness of LSTM models in capturing the complex dynamics of streamflow, which is highly influenced by reservoir operation, a unique challenge in the Mekong region. Furthermore, this research makes a significant contribution to optimizing LSTM structures and hyperparameters for streamflow prediction. By carefully tuning parameters such as epochs, dropout rates, and the number of LSTM units, this study enhances the predictive accuracy of the model while improving computational efficiency. This optimization process is critical in applying deep learning models to large-scale, real-world hydrological systems, particularly in regions with limited data availability. The study is conducted at the Kratie station, a key hydrological control site located upstream in the Mekong River, which regulates inflows into the Mekong Delta. This station is within the boundaries of the Vietnam Mekong Delta as defined by Minderhoud et al. (2019). This location serves as an ideal case study for investigating the influence of reservoir operations on streamflow and highlights the broader applicability of LSTM models for water resource management in the Vietnam Mekong Delta. The findings of this study not only advance our understanding of LSTM capabilities in streamflow forecasting but also offer practical insights for improving water resource management and hydropower operations in regions with sparse data.

LSTM cell structure

An LSTM network is an enhanced form of RNN, which tackles the gradient disappearance issue during parameter backpropagation. This is achieved by incorporating repeating memory cells within LSTM layers (Hochreiter & Schmidhuber 1997; Sainath et al. 2015). Each LSTM unit comprises three primary components, the input, the forget, and the output gate, which are crucial in regulating the information within the LSTM cell (Chen et al. 2018; Wei & Jing 2020). These components work together to regulate the flow of information within the model, determining which information to retain or discard at each time step. The input gate controls the accumulation of new information within the LSTM cell. It takes input from the current input vector and the previous hidden state and decides which information is relevant to update the cell state. The forget gate controls the disregard of the previous internal memory state, takes input from the current input vector and the previous hidden state, and outputs a forget gate vector that modulates the previous cell state. The output gate regulates the transmission of the latest output of the LSTM cell to the final state. It stores information over long sequences and is updated based on the input gate, forget gate, and input modulation (Sainath et al. 2015). The LSTM architecture, as depicted in Figure 1, illustrates these components. By working together, these elements govern the information flow within the network, enabling it to learn and process sequential data with long-term dependencies effectively.
Figure 1

LSTM network architecture. The expected discharge is expressed as , and the historical input (the input to the network) is . The forget gate is denoted as , the update gate (input gate) as , and the output gate as . The weight matrices associated with the input variables in each respective gate are represented by , , and . The output from the previous time step is represented as . The output of the LSTM unit is denoted as , and the state of the memory cell is represented as .

Figure 1

LSTM network architecture. The expected discharge is expressed as , and the historical input (the input to the network) is . The forget gate is denoted as , the update gate (input gate) as , and the output gate as . The weight matrices associated with the input variables in each respective gate are represented by , , and . The output from the previous time step is represented as . The output of the LSTM unit is denoted as , and the state of the memory cell is represented as .

Close modal

The LSTM network incorporates a cell state as long-term memory, allowing it to selectively retain or forget information as required across time steps. This mechanism is facilitated by three gates (the forget, input, and output gates, as depicted in Figure 1), which precisely regulate the information from the memory cell. The hidden state represents the output of the LSTM cell at each time step, reflecting the regulated information conveyed by three gates.

LSTM fully connected layers

In addition to the LSTM cells, fully connected layers (FCLs) can be added to the LSTM network architecture to perform additional computations and transformations on the output of the memory cells. The LSTM structure, as illustrated in Figure 1(a), comprises four unique layer components:

  • (i) The input layer is responsible for receiving the input sequence data and acting as the network's initial entry point for information into the network.

  • (ii) The FCL (dense) layer serves as an intermediary between the input and the LSTM cell, facilitating the alignment of input dimensions with those of the cells. By adjusting the dimensionality of the input, the FCL effectively acts as a bridge, enabling communication between the input and the subsequent LSTM cell layer. For instance, the FCL transfers the streamflow vector (Xt) with m dimensions into n dimensions, with n being the LSTM cell number.

  • (iii) The LSTM cell layer (LSTM(n)) comprises a collection of n cells, representing the core component that provides diverse memory capabilities.

  • (iv) The output layer is responsible for generating the final output based on the information processed by the LSTM cells.

The LSTM cells within the network analyze and process the input sequence, generating an output that is passed through the FCL to produce the final output for the specific task. By increasing the number of FCLs, the model can capture more complex patterns and relationships in the data. Each additional FCL introduces nonlinear transformations to the LSTM cell output, enabling the network to learn more intricate representations. This increasing depth of the FCL allows the model to capture more complex patterns and relationships in the data.

Loss function

The loss function quantifies the agreement between the predictions generated by the model and the actual target values, and it is used to guide the learning process during training. The choice of the loss function should align with the specific requirements and the nature of the task at hand. Mean Square Error (MSE) is often used for regression tasks where the goal is to predict a continuous value. The loss function estimates the average squared difference between the observed and simulated discharges. The Adam algorithm (Kingma & Ba 2015) is employed to adjust the weights of the LSTM networks through optimization during the training process. The discharge from the hydrological gauge serves as actual values against which the simulated discharge generated by the LSTM is evaluated. The loss function is evaluated below by comparing the discrepancy between model outputs and observation (Equation (1)).
(1)
where and are the predicted and observed values at time step t, and T represents the number of training samples in a batch used for each training iteration.

Study site and dataset

The Mekong River is a vital transboundary river spanning 4,909 km, and its catchment covers an area of 795,000 km2. The river is crucial to the socio economic and ecological advancement (such as agriculture, sanitation, power generation, and industry) of six nations: China, Myanmar, Laos, Thailand, Cambodia, and Vietnam (Dinh et al. 2020). The study area is classified under the Köppen Climate Classification as a tropical monsoon climate characterized by a distinct wet and dry season. The wet season brings substantial rainfall due to the monsoon, while the dry season typically experiences lower precipitation. This delta's hydrology brings both vital freshwater replenishment and the risk of flooding. Understanding and accurately predicting the hydrological dynamics of the Mekong Delta are paramount for sustainable water management and the preservation of its unique ecosystems. As depicted in Figure 2, the Mekong River exhibits complex hydrological dynamics and climatic conditions, which are influenced by its primary tributaries and various environmental factors (Ruiz-Barradas & Nigam 2018; Binh et al. 2020).
Figure 2

Location of the Kratie station in the Mekong River Basin, where testing is conducted to evaluate the accuracy of the LSTM network for the daily streamflow prediction.

Figure 2

Location of the Kratie station in the Mekong River Basin, where testing is conducted to evaluate the accuracy of the LSTM network for the daily streamflow prediction.

Close modal

Mekong's hydrology is influenced by many factors, including precipitation patterns, sediment deposition, and human interventions such as dam construction and land use changes. Additionally, the delta's low-lying topography and vulnerability to sea level rise exacerbate the risk of saltwater intrusion, especially during the dry season (Räsänen et al. 2017; Phung et al. 2021).

The Kratie station serves as a critical hydrological control site within the Vietnam Mekong Delta, positioned at the entry point of the flat and low-lying delta region. With an average annual discharge of approximately 437 billion m3/s (MRC 2010), Kratie plays a pivotal role in regulating inflows into the delta and is essential for managing water resources, agriculture, and other socio-economic activities in the region. Figure 2 depicts a geographic map showcasing the Mekong River, its primary tributaries, and the Kratie station.

The continuous daily streamflow data used in our study, spanning from 2013/1/1 to 2022/12/1, were acquired from the Mekong River Commission's monitoring system at the Kratie station (https://portal.mrcmekong.org/monitoring/river-monitoring-telemetry). Figure 3 illustrates the daily discharge pattern observed at Kratie, highlighting the consistent annual and interannual trends in streamflow. Notably, the streamflow exhibits distinct seasonal variations, with minimum discharge typically occurring in March and peak discharge in October, reflecting the influence of the subtropical monsoon climate.
Figure 3

Daily discharge recorded at the Kratie station.

Figure 3

Daily discharge recorded at the Kratie station.

Close modal
The streamflow data at Kratie exhibit substantial variability (high standard deviation) (Table 1). The distribution is right-skewed, suggesting frequent lower flows but occasional extreme high flows, and there are relatively fewer extreme outliers, demonstrating platykurtic behavior (Figure 4). The uneven distribution and significant fluctuations in streamflow at Kratie present challenges for accurate runoff forecasting, necessitating the development of robust predictive models. By focusing on this critical location within the Mekong River Basin, our study aims to contribute to the improved understanding and prediction of streamflow dynamics in the region. Given the high variability and seasonal changes in streamflow, LSTM models or other time-series forecasting methods should be carefully tuned to handle this high variability and capture the long-term temporal dependencies in the data. Hyperparameter optimization, especially for input sequence length and model complexity, will be crucial for accurately forecasting low-flow and high-flow events.
Table 1

The descriptive statistics for the streamflow data at the Kratie station

StatisticCountMeanStdMinimum25%50%75%Maximum
Value 8,846 12,285.51 11,905.93 1,377 3,439 6,141 18,422 54,012 
StatisticCountMeanStdMinimum25%50%75%Maximum
Value 8,846 12,285.51 11,905.93 1,377 3,439 6,141 18,422 54,012 
Figure 4

Distribution of the streamflow data at the Kratie station.

Figure 4

Distribution of the streamflow data at the Kratie station.

Close modal

LSTM model construction for the Kratie station

Data preparation

The sole input to the model was the historical streamflow data from the Kratie station, with no additional climate or hydrological variables considered. This data were used to capture the temporal patterns and dependencies of streamflow over time, which are crucial for accurate forecasting. The recorded discharge at Kratie stations serves as the target values for training purposes, enabling a comparison with the predicted discharges generated by the LSTM model. The input data consist of historical streamflow values from the previous time steps, excluding the last 5–15 days, to predict the future value. It means the historical streamflow data from the previous day (i.e., Q1, Q2, … Qt − 5) to forecast the streamflow for the day t with five time_shift days (i.e., Qt, 5 days ahead). The model learns temporal dependencies from historical streamflow data over the past days to predict streamflow at the future time step. This approach aligns with the standard practice of leveraging past values to predict future outcomes in time-series forecasting.

Data testing

Assessing the discharge data's stationarity (consistent statistical properties over time) is the first step in the data processing phase. The study employed the Dickey–Fuller (DF) test to achieve this. By utilizing the DF test, it becomes possible to evaluate and reject the null hypothesis, which proposes that the streamflow (Q) is not stationary. With an ADF statistic of −10.24 and a very small p-value of approximately 4.76×10−18, we have strong evidence against the null hypothesis. Therefore, we reject the null hypothesis in favor of the alternative hypothesis. This implies that the streamflow time-series data at the Kratie station are stationary, providing a solid foundation for subsequent modeling and analysis tasks.

Data normalization

During the simulation phase, the scaled data are utilized. However, when the simulation is completed, the data need to be restored to its original scale for further analysis. This restoration is achieved by applying the transform function, as described in Equation (3). In Equation (3), represents the simulated output value, while and represent the minimum and maximum observed in the original dataset, respectively (Ghimire et al. 2021).
(2)
(3)

The dataset is subsequently divided into separate training and testing periods after the normalization process. Specifically, the dataset spanning 10 years is split into an 8-year training period (from 2013 to 2020) and a 2-year testing period (from 2021 to 2022). This division ensures that the model is trained on a substantial portion of the data while also allowing for independent evaluation of its performance on unseen data during the testing phase.

Main model development

The study focuses on utilizing a six-stacked (hidden) LSTM layer model to predict discharge using one input variable. Table 2 displays the hyperparameters of the benchmark model, which have been chosen using a trial-and-error approach.

Table 2

The architecture of LSTM used as the benchmark model for Kratie discharge prediction

Model hyperparametersHyperparameter selection
LSTM cell 1 [64, sigmoid activation function] 
LSTM cell 2 [256] 
LSTM cell 3 [128] 
LSTM cell 4 [128] 
Epochs 1,000 
Drop rate 0.4 
Model hyperparametersHyperparameter selection
LSTM cell 1 [64, sigmoid activation function] 
LSTM cell 2 [256] 
LSTM cell 3 [128] 
LSTM cell 4 [128] 
Epochs 1,000 
Drop rate 0.4 

Common hyperparameters

  • Activation function: The neural network architecture employs three different activation functions, namely linear, ReLU, and sigmoid, across various layers.

  • Dropout: Dropout is a regularization technique to address potential overfitting issues and enhance training performance. The drop technique is implemented by temporarily removing a certain fraction of neurons from contributing to the model's training process, which encourages the remaining neurons to learn more robustly and improves generalization. In this study, four different dropped rates are being tested 0.0, 0.2, 0.4, and 0.6. This experiment investigates the optimal dropout rate for improving the model's performance.

Model performance evaluation

This section reports the statistical metrics utilized to assess the prediction efficiency of the model. The NSE and root mean square error (RMSE) are employed to evaluate the effectiveness of the different structures and hyperparameters combination. NSE and RMSE are widely recognized as reliable metrics for prediction problems (Tiyasha et al. 2020). NSE is a metric that measures the efficiency of the model's predictions by evaluating the covariance between the predicted and observed flows (Equation (4)). It quantitatively describes how well the model replicates the observed data (Nash & Sutcliffe 1970). RMSE is a commonly used metric that calculates the square root of the average squared discrepancy between the predicted and observed values (Equation (5)). It indicates the model's overall prediction error. Relative RMSE (rRMSE) is a relative version of RMSE that is typically utilized to measure the error ratio in comparison to the actual values (Equation (6)). The NSE typically ranges from negative infinity to 1 (perfect match), while rRMSE close to zero indicates better model performance as they penalizes large errors more heavily.
(4)
(5)
(6)

In the provided equation, N is the time steps of the dataset, represents the observed values at a specific time step i, represents the predicted value at the same time step i, and denotes the mean observation.

Model architecture scenarios

Tables 3 and 4 provide multiple scenarios involving various LSTM architectures and hyperparameters to evaluate the efficiency of the LSTM model. Among the various scenarios, the four-layer stacked LSTM architecture is selected as the benchmark against which other configurations are compared. The two- and three-layer LSTMs are evaluated accordingly. Building upon the stacked LSTM, scenario f1 introduces an additional FCL between the input layer and the first hidden LSTM using a sigmoid activation function. By adding one more FCL between the fifth hidden layer and the output layer, scenario f1 becomes f2. Furthermore, the performance of the vanilla LSTM, which represents the basic architecture with a single LSTM layer, is also evaluated in the study.

Table 3

The LSTM structure scenarios

ScenariosFCLCell number
FC
Layer 1Layer 2Layer 3Layer 4
Stacked LSTM  64 256 128 128  
Vanilla LSTM  64     
S0  64 N_feats    
S1  64 64 N_feats   
S2  64 128 N_feats   
S3  64 256 N_feats   
S4  64 256 128 N_feats  
S5  64 256 128 64  
S6  64 256 256 128  
S7  128 256 256 128  
f1 FCLa (64) 64 256 128 128  
f2 FCLa (64) 64 256 128 128 FCLa (64) 
f3 FCLa (64) 64 256 128 128 FCLa (128) 
ScenariosFCLCell number
FC
Layer 1Layer 2Layer 3Layer 4
Stacked LSTM  64 256 128 128  
Vanilla LSTM  64     
S0  64 N_feats    
S1  64 64 N_feats   
S2  64 128 N_feats   
S3  64 256 N_feats   
S4  64 256 128 N_feats  
S5  64 256 128 64  
S6  64 256 256 128  
S7  128 256 256 128  
f1 FCLa (64) 64 256 128 128  
f2 FCLa (64) 64 256 128 128 FCLa (64) 
f3 FCLa (64) 64 256 128 128 FCLa (128) 

FCL: fully connected layer; FCLa: FCL with sigmoid activation function.

Table 4

The scenarios of the LSTM hyperparameters

ScenariosModel hyperparametersHeperparameter selection
a(1–3) Activation function [linear, ReLU, sigmoid] 
c(1–6) Number of LSTM cells [20, 60, 100, 200] 
e(1–5) Epochs [300, 700, 1,000, 1,500] 
d(1–4) Drop rate [0.0, 0.2, 0.4, 0.6] 
t(1–3) Batch size [1, 64, 128] 
ScenariosModel hyperparametersHeperparameter selection
a(1–3) Activation function [linear, ReLU, sigmoid] 
c(1–6) Number of LSTM cells [20, 60, 100, 200] 
e(1–5) Epochs [300, 700, 1,000, 1,500] 
d(1–4) Drop rate [0.0, 0.2, 0.4, 0.6] 
t(1–3) Batch size [1, 64, 128] 

The study explores the influence of activation functions on stacked LSTM models, as these functions introduce nonlinear transformations to the LSTM cells. This activation function influence is done by examining the model's ability to capture complex patterns in the data of the LSTM models following scenarios a(1–3) (Table 4). Additionally, scenario c1–scenario c5 are created to examine the influence of the cell number on the learning effectiveness of the model.

A batch of data is utilized for each learning iteration during the training process,. The batch size (T), drop rate, and number of epochs are tested to determine whether an extension provided a significant performance gain. The values tested for these parameters in this study are presented in Table 3.

The hyperparameter ranges explored in Table 4 are informed by best practices in hydrological forecasting, empirical tuning, and prior studies. In streamflow prediction, LSTM units typically range from 50 to 200, balancing model complexity with the risk of overfitting (Li et al. 2023). This study adopts a similar range, consistent with previous hydrological applications. Hyperparameters were then selected through empirical testing to evaluate performance metrics such as loss and accuracy. By combining established guidelines with systematic experimentation, the hyperparameter ranges are selected for the streamflow prediction task.

Learning efficiency with different structures

Table 5 and Figure 5(a)–5(f) display the test results obtained from LSTMs with two, three, and four hidden layers (S0, S2, and stacked LSTM), vanilla LSTMs (one layer), and two FCL scenarios (f1 and f2). Each LSTM architecture is trained for 1,000 epochs. It is simply incorporating an FCL before or/and after hidden layers. The results are slightly different from those obtained without using a FCL. Augmenting a front-end FCL with 128 neurons diminishes computational effectiveness (scenario f1). Similarly, appending an FCL (128 neurons) at the end fails to capture the oscillations satisfactorily (scenario f2). These points are further supported by the variation in the loss values (Figure 5(g) and 5(h)). The loss value in the stacked LSTM scenario rapidly decreases, showing a quick learning process, while the loss value in other LSTM structures shows an unchanged loss value after 50 epochs.
Table 5

Performances in the daily discharge prediction of different LSTM architectures: stacked LSTM, LSTM with two, three hidden layers (S0 and S2) and FCLs (f1, f2, and f3), and vanilla LSTM (one single LSTM cell)

MetricsPrediction periodsVanilla LSTMS0S2Stacked LSTMf1f2
RMSE (m3/s) Train 18,916 3,487 4,776 2,217 2,241 >10,000 
Test 19,446 3,841 5,566 2,754 2,803 >10,000 
NSE Train −2.2 0.89 0.80 0.96 0.95 
Test −2.18. 0.88 0.74 0.94 0.93 
MetricsPrediction periodsVanilla LSTMS0S2Stacked LSTMf1f2
RMSE (m3/s) Train 18,916 3,487 4,776 2,217 2,241 >10,000 
Test 19,446 3,841 5,566 2,754 2,803 >10,000 
NSE Train −2.2 0.89 0.80 0.96 0.95 
Test −2.18. 0.88 0.74 0.94 0.93 
Figure 5

Comparison of simulated and observed discharges from (a) vanilla LSTM, (b) S0 scenario, (c) S2 scenario, (d) stacked LSTM, (e) f1 scenario, (f) f2 scenario for the Kratie station, (g) loss variation in the stacked scenario, and (h) loss variation in the f2 scenario. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.

Figure 5

Comparison of simulated and observed discharges from (a) vanilla LSTM, (b) S0 scenario, (c) S2 scenario, (d) stacked LSTM, (e) f1 scenario, (f) f2 scenario for the Kratie station, (g) loss variation in the stacked scenario, and (h) loss variation in the f2 scenario. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.

Close modal

The utilization of three layers (S2 scenario) produces inferior results compared to two layers (S0 scenario) (Table 5). While employing four layers yields favorable outcomes, introducing an additional hidden layer degrades performance (stacked LSTM). The model outputs in S2 do not align well with the observation, indicating that these scenarios struggle to capture the underlying patterns and relationships in the data (Figure 5).

When using a vanilla LSTM model, the predictions often deviate significantly from the target, resulting in flat predictions. However, when the cell number in the LSTM is set to 150, the predictions show some improvement and are relatively closer to the expected values. Deviating from this optimal number of neurons, increasing or decreasing, leads to poorer results. By increasing the training epochs to 1,000, the network exhibits signs of learning and shows better performance.

These findings indicate that the LSTM model does not require an excessive FCL to enhance efficient learning. As a result, the stacked LSTM architecture is considered optimum for predicting the streamflow at the Kratie station.

The recommendation to utilize a dataset spanning from 2013 to 2022 for our analysis stems from the recognition that this period encompasses significant hydrological changes influenced by reservoir operations within the Mekong River Basin. During this timeframe, there have been notable alterations in water release patterns from upstream reservoirs, which directly impact streamflow dynamics downstream. By focusing on this specific period, we can capture the effects of reservoir regulation on streamflow behavior and calibrate our LSTM models to accurately simulate these fluctuations. Additionally, limiting the dataset to the period from 2013 to 2022 ensures that our analysis reflects current hydrological conditions, providing more relevant and reliable predictions for water resource management and decision-making in the Mekong Delta (Table 6). Furthermore, extending the analysis to include data prior to 2013 may introduce complexities and uncertainties related to historical hydrological conditions that are less representative of the present-day scenario (Figure 6). Therefore, by concentrating on the 2013–2022 timeframe, we can effectively capture the influence of reservoir operations on streamflow dynamics, leading to more accurate and actionable predictions for stakeholders in the Mekong River Basin.
Table 6

Performances in daily discharge prediction for stacked LSTM with different datasets

MetricsPrediction periodsDatasets
1998–20222013–2022
RMSE (m3/s) Train 2,998.83 2,717.24 
Test 3,683.35 2,754.61 
NSE Train 0.94 0.96 
Test 0.88 0.94 
MetricsPrediction periodsDatasets
1998–20222013–2022
RMSE (m3/s) Train 2,998.83 2,717.24 
Test 3,683.35 2,754.61 
NSE Train 0.94 0.96 
Test 0.88 0.94 
Figure 6

Hydrograph of predicted and observed discharges for stacked LSTM with (a) 1998–2022 datasets and (b) 2013–2022 datasets.

Figure 6

Hydrograph of predicted and observed discharges for stacked LSTM with (a) 1998–2022 datasets and (b) 2013–2022 datasets.

Close modal

Results in Figure 6 show that LSTM effectively captures seasonal cycles in streamflow data (annual changes). Its ability to retain long-term dependencies helps differentiate between seasonal trends and short-term fluctuations, enabling accurate streamflow forecasts during wet and dry seasons.

Figure 6 also captures the interannual variability from reservoir operations or other factors. Using sufficient prior time steps, LSTM can incorporate this interannual variation to simulate the changes in hydrological regimes under climatic conditions or reservoir operation.

Learning efficiency with different parameters

Effects of epochs

Figure 7(a)–7(f) illustrates the predicted results of LSTM scenarios, considering the number of epochs ranging from 10 to 1,500. It is evident that increasing the number of epochs provides a significant advantage. Specifically, for the Kratie station, when the epoch reaches 1,000 (Figure 7(d)), the LSTM model demonstrates good learning efficiency (Table 7). When the number of epochs is set to 10, the results flatten out. However, beyond 1,000 epochs, the improvement in efficiency becomes considerably slow (Figure 7(d)–7(f)).
Table 7

Performances in daily discharge prediction for stacked LSTM with different epochs

MetricsPrediction periodsEpochs
103007001,0001,500
RMSE (m3/s) Train 10,829 3,406.78 2,817.24 2,217.24 2,375.51 
Test 11,052 3,714.11 3,059.22 2,754.61 3,139.77 
NSE Train 0.0 0.9 0.93 0.96 0.95 
Test 0.0 0.88 0.92 0.94 0.92 
MetricsPrediction periodsEpochs
103007001,0001,500
RMSE (m3/s) Train 10,829 3,406.78 2,817.24 2,217.24 2,375.51 
Test 11,052 3,714.11 3,059.22 2,754.61 3,139.77 
NSE Train 0.0 0.9 0.93 0.96 0.95 
Test 0.0 0.88 0.92 0.94 0.92 
Figure 7

Hydrograph of predicted and observed discharges for stacked LSTM with different epochs and loss variation with epochs. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.

Figure 7

Hydrograph of predicted and observed discharges for stacked LSTM with different epochs and loss variation with epochs. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.

Close modal

Effects of dropout

Table 8 and Figure 8(a)–8(d) illustrate the performance accuracy of stacked LSTM for different dropout rates. The results indicate that all predictions exhibit NSE varying in a range of 0.78–0.96 and RMSE between 2,000 and 5,000 m3/s.
Table 8

LSTM prediction capability for daily discharge with different dropout rates

MetricsPrediction periodsD = 0D = 0.2D = 0.4D = 0.6
RMSE (m3/s) Train 3,417.91 2,829.56 2,217.27 2,644.32 
Test 5,159.26 3,276.32 2,754.61 3,195.35 
NSE Train 0.9 0.93 0.96 0.94 
Test 0.78 0.91 0.94 0.91 
MetricsPrediction periodsD = 0D = 0.2D = 0.4D = 0.6
RMSE (m3/s) Train 3,417.91 2,829.56 2,217.27 2,644.32 
Test 5,159.26 3,276.32 2,754.61 3,195.35 
NSE Train 0.9 0.93 0.96 0.94 
Test 0.78 0.91 0.94 0.91 
Figure 8

Simulated and observed discharges from (a) d1 scenario (dropout = 0.0), (b) d2 scenario (dropout = 0.2), (c) d3 scenario (dropout = 0.4), and (d) d4 scenario (dropout = 0.6) using four-layer stacked LSTM with sigmoid activation function for the Kratie station. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.

Figure 8

Simulated and observed discharges from (a) d1 scenario (dropout = 0.0), (b) d2 scenario (dropout = 0.2), (c) d3 scenario (dropout = 0.4), and (d) d4 scenario (dropout = 0.6) using four-layer stacked LSTM with sigmoid activation function for the Kratie station. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.

Close modal

For the four-layer model, a dropout rate that begins at zero and reaches a maximum value of 0.6 yields good results. It is important to note that dropout consistently proved beneficial, especially in the testing phase. Among the tested dropout rates, 0.4 appeared to work best for the Kratie station. However, it should be acknowledged that this study did not investigate the impact of varying dropout rates on prediction performance in different locations (Figure 8(a)–8(d)).

Interestingly, with a dropout rate of 0.4, the ability to capture peaks may not be as good, but overall, the prediction yields better results (Figure 8(c)).

Activation function influence

Three scenarios, a1, a2, and a3, are tested for the Kratie station,. Each network structure is trained over 1,000 epochs. Figure 8(a)–8(c) presents the hydrographs depicting the observed and simulated discharges from the three scenarios during training and testing. The values identified in Table 9 correspond to those used for generating the discharges depicted in Figure 9.
Table 9

LSTM prediction efficiency with different activation function (a1, a2, and a3) scenarios

MetricsPrediction periodsa1a2a3
RMSE (m3/s) Train 2,853.5 2,797.02 2,217.27 
Test 4,279.21 3,958.37 2,754.61 
NSE Train 0.93 0.93 0.96 
Test 0.85 0.87 0.94 
MetricsPrediction periodsa1a2a3
RMSE (m3/s) Train 2,853.5 2,797.02 2,217.27 
Test 4,279.21 3,958.37 2,754.61 
NSE Train 0.93 0.93 0.96 
Test 0.85 0.87 0.94 
Figure 9

Simulated and observed discharges from the (a) a1 scenario using linear activation function during the train/test period (3,500/1,400 points) compared with the (b) a2 scenario using ReLU activation function and (c) the a3 scenario using sigmoid activation function for the Kratie station.

Figure 9

Simulated and observed discharges from the (a) a1 scenario using linear activation function during the train/test period (3,500/1,400 points) compared with the (b) a2 scenario using ReLU activation function and (c) the a3 scenario using sigmoid activation function for the Kratie station.

Close modal

The results presented in Figure 9 reveal that in scenario a1, which involves a linear activation function in LSTM, learning becomes more challenging, resulting in high RMSE and low NSE values. In scenarios a2 and a3, the LSTM cells undergo a nonlinear transformation through activation functions, enabling them to better learn the complex patterns and relationships present in the discharge data. The hydrographs indicate that the prediction from scenario a3 closely matches the observed discharge compared to the other scenarios, especially for the peak flow and the testing phase. These findings highlight that the model using nonlinear transformation, as in a2 and a3 scenarios, exhibits promising prediction performance in streamflow prediction, offering accurate and reliable daily streamflow predictions.

Effects of batch size T

For a model to identify common patterns and features across the input training samples, it is necessary to present them in batches. By providing a batch of samples at a time, the model can examine multiple samples together and distinguish their shared characteristics. The learning efficiency of the LSTM is greatly influenced by the batch size (T) of the data, which impacts the learning effectiveness. Three different T (1, 64, and 128) batch sizes are examined for the Kratie station (Figure 10(a) and 10(b)).
Figure 10

Hydrograph of predicted and observed discharges for stacked LSTM with different batch sizes. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.

Figure 10

Hydrograph of predicted and observed discharges for stacked LSTM with different batch sizes. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.

Close modal

Figure 10 shows the LSTM results from 1,000 epochs and a batch size T of 64 with low RMSE and relatively good NSE. The training and testing performances of the LSTM model are reported as 0.95 and 0.92, respectively (Table 10). For the Kratie station, batch size does not have a significant impact on the model's simulation results. Even with a small batch size, frequent updates of network weights lead to the best simulation outcomes, although the difference compared to a larger batch size is not substantial. However, increasing the batch size reduces the oscillation of the loss value. As a result, superior performance for the larger T becomes irrelevant in this circumstance.

Table 10

LSTM prediction efficiency with the different batch sizes

MetricsPrediction periodsBatch size
64128
RMSE (m3/s) Train 2,393.56 3,280.67 
Test 3,034.07 3,363.12 
NSE Train 0.95 0.9 
Test 0.92 0.9 
MetricsPrediction periodsBatch size
64128
RMSE (m3/s) Train 2,393.56 3,280.67 
Test 3,034.07 3,363.12 
NSE Train 0.95 0.9 
Test 0.92 0.9 

Cell number influence

The LSTM network operates on the fundamental unit called a cell. The efficiency of the network is influenced by the cell number present in the architecture. The study uses an equal number of cells across all layers for simplicity in evaluation. It is observed that when the number of cells exceeds 60 cells, the LSTM network demonstrates a favorable learning efficiency at the Kratie station (Figure 11(a)–11(d)). However, beyond 100 cells, the improvement in efficiency becomes gradual. The learning process presents challenges due to the significant fluctuation and a low decrease in loss values, followed by an increase after 200 epochs (Table 11).
Table 11

LSTM prediction efficiency with the different cell numbers

MetricsPrediction periodsScenarios
C1C2C3C4
2060100200
RMSE (m3/s) Train 4,883.03 2,885.65 2,834.75 2,598.98 
Test 5,627.87 3,187.95 3,592.79 3,478.63 
NSE Train 0.79 0.93 0.93 0.94 
Test 0.73 0.91 0.89 0.9 
MetricsPrediction periodsScenarios
C1C2C3C4
2060100200
RMSE (m3/s) Train 4,883.03 2,885.65 2,834.75 2,598.98 
Test 5,627.87 3,187.95 3,592.79 3,478.63 
NSE Train 0.79 0.93 0.93 0.94 
Test 0.73 0.91 0.89 0.9 
Figure 11

Simulated and observed discharges from different cell numbers using stacked LSTM with sigmoid activation function for the Kratie station.

Figure 11

Simulated and observed discharges from different cell numbers using stacked LSTM with sigmoid activation function for the Kratie station.

Close modal

When adjusting the number of cells per layer, it is evident that the cell count still relies on the number of layers (Table 12). In scenarios s1 and s2, a marginal improvement in the model's effectiveness is observed when increasing the number of cells in the second layer. For the four-layer model, enhancing the model's effectiveness is achieved by increasing the number of cells in the third hidden layer to 128. However, a further increase to 256 neurons diminishes the model's simulation ability (RMSE and NSE between stacked LSTM and s6 scenarios, Table 12), which also applies to the fourth hidden layer. Contrarily, increasing the number of cells to 128 in the first hidden layer decreases the model's simulation ability (stacked LSTM and s7 scenarios). Hence, it can be concluded that increasing the number of cells generally enhances the model's effectiveness. Deviating from this optimal number of neurons, increasing or decreasing, leads to poorer results and this optimal value depends on the specific hidden layers.

Table 12

LSTM prediction capability for daily discharge with the different cell numbers

MetricsPrediction periodsS1S2S3S4S5Stacked LSTMS6S7
RMSE Train 4,776 4,732 4,555 2,543 2,546 2,217 2,651 2,553 
Test 5,566 5,575 5,447 3,360 2,961 2,754 3,434 3,069 
NSE Train 0.80 0.80 0.81 0.94 0.94 0.96 0.94 0.94 
Test 0.74 0.74 0.75 0.9 0.93 0.94 0.9 0.92 
MetricsPrediction periodsS1S2S3S4S5Stacked LSTMS6S7
RMSE Train 4,776 4,732 4,555 2,543 2,546 2,217 2,651 2,553 
Test 5,566 5,575 5,447 3,360 2,961 2,754 3,434 3,069 
NSE Train 0.80 0.80 0.81 0.94 0.94 0.96 0.94 0.94 
Test 0.74 0.74 0.75 0.9 0.93 0.94 0.9 0.92 

Learning efficiency with antecedent days for prediction

This section employs antecedent streamflow data from the same site to make predictions for the following day using LSTMs. Specifically, the goal is to predict the value for the following item in the sequence, which corresponds to the fourth day when given a sequence of 3 days' worth of data. The input for the prediction consists of a vector (Q(t − 3), Q(t − 2), Q(t − 1), Q(t)), where Q(t) represents the time-series variable.

The study conducted predictions based on different antecedent periods, namely 3 days (3Ds), 5 days (5Ds), 9 days (9Ds), and 14 days (14Ds). The actual and predicted discharges from 2013 to 2022 are visually represented in Figure 12, and Table 13 presents the metrics used to evaluate model performance. The best-performing model for each antecedent period is highlighted in bold. Notably, the predicted discharge in Figure 12(b) closely resembles the observation. The LSTM achieves a commendable NSE value of 0.96 during the training phase. Table 13 illustrates that the LSTM's prediction accuracy during the verification phase is only slightly superior to that during training. Although occasional deviations exist between the simulated and observation, the LSTM accurately predicts most peak flows.
Table 13

LSTM prediction efficiency for daily discharge for various antecedent days

MetricsPrediction periods3D5D9D14D
RMSE Train 2,251.12 2,217.27 3,307.39 3,939.42 
Test 2,648.1 2,754.61 4,229.09 5,390.48 
NSE Train 0.95 0.96 0.9 0.86 
Test 0.94 0.94 0.85 0.76 
MetricsPrediction periods3D5D9D14D
RMSE Train 2,251.12 2,217.27 3,307.39 3,939.42 
Test 2,648.1 2,754.61 4,229.09 5,390.48 
NSE Train 0.95 0.96 0.9 0.86 
Test 0.94 0.94 0.85 0.76 
Figure 12

Predicted and observed discharges from (a) 3Ds, (b) 5Ds, (c) 9Ds, and (d) 14Ds for the Kratie station. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.

Figure 12

Predicted and observed discharges from (a) 3Ds, (b) 5Ds, (c) 9Ds, and (d) 14Ds for the Kratie station. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.

Close modal

The model's performance will decrease as the number of forecasted days increases. A more extended sequence does not necessarily guarantee better results. It is crucial to select the model framework carefully. This result might indicate that there is not enough information to train the model. At least 5 days of data are needed to have sufficient information for model training while using only 3 days of data results in poor performance. Forecasting over an excessively long time span will lead to a flattening of the flood peak. This poor forecasting result can also be attributed to the issue of overfitting.

The recommendation of utilizing a sequence of 5 antecedent days for prediction is grounded in hydrological principles and empirical evidence from our study area. Hydrological processes in the Mekong Delta exhibit inherent temporal dependencies, whereby current streamflow conditions are influenced by antecedent rainfall, soil moisture, and river discharges. By considering a lagged sequence of antecedent days, our LSTM models can capture these temporal relationships and leverage them to make more accurate predictions of future streamflow. Additionally, empirical analysis of our dataset revealed that a 5-day antecedent window provided the optimal balance between capturing relevant hydrological dynamics and avoiding model complexity. This finding is consistent with hydrological theory, which suggests that the influence of antecedent conditions on streamflow diminishes over time, with shorter sequences potentially neglecting important hydrological drivers while longer sequences may introduce unnecessary noise and complexity. Thus, by integrating a 5-day antecedent window into our LSTM models, we were able to achieve particularly high accuracy in our predictions, enhancing the reliability and applicability of our findings for water resource management and decision-making in the Mekong Delta.

The analysis of the LSTM network's architecture and hyperparameters reveals that the stacked LSTM with four layers, consisting of 64, 256, 128, and 128 neurons, respectively, performs the best for next-day forecasting. Therefore, this configuration, which utilized the MSE loss function, the Adam optimizer, and the sigmoid activation function (Table 2), is selected for learning and predicting the discharge at the Kratie station, providing the most reliable forecasts and a short training time.

Our study explores the application of LSTM networks for streamflow prediction at the Kratie station in the Vietnam Mekong Delta, building upon the advancements and methodologies outlined in the existing literature on streamflow forecasting using deep learning techniques.

This study emphasizes the significance of hyperparameters on streamflow predictions specific to the Mekong region. These hyperparameters are vital to the model's accuracy, particularly within complex hydrological applications.

The number of LSTM cells directly affects the model's capacity to capture temporal dependencies. This research tested configurations ranging from 50 to 200 units, with 100 units demonstrating the best balance between performance and computational efficiency. Similar results have been found in other studies, such as those by Kratzert et al. (2019b) and Fu et al. (2020), which indicated that 100–150 units were optimal for streamflow forecasting. Smaller configurations (50–100 units) encountered difficulties with complex datasets, while more extensive networks (200 + units) tended to overfit.

The epoch number decides the convergence and generalization of the LSTM structure. This study explored 300, 700, 1,000, and 1,500 epochs, and our results indicate that 1,000 epochs provided the best results in minimizing validation loss and ensuring model stability. Shen (2018) used a 500–1,000 epoch range for hydrological models, reporting that training beyond 1,000 epochs led to overfitting with minimal gains in model accuracy. Similarly, Kratzert et al. (2019b) noted that after 1,000 epochs, the model's performance plateaued, with little benefit from extended training. Our results also show that training beyond 1,000 epochs led to diminishing improvements in model performance, suggesting that a longer training time does not necessarily translate into better predictive capability in streamflow forecasting.

Dropout is a technique that reduces overfitting by randomly excluding a percentage of neurons during training. We tested dropout rates of 0.0, 0.2, 0.4, and 0.6, finding the best performance at rates between 0.2 and 0.4, which maintained model capacity while minimizing overfitting. Fu et al. (2020) noted that a dropout rate of 0.3–0.4 is effective in streamflow prediction. In contrast, higher rates like 0.6 led to underfitting by limiting the model's ability to learn complex patterns, consistent with our results showing decreased performance at rates above 0.4.

The batch size refers to the number of samples used in each training iteration. This study tested batch sizes of 1, 64, and 128, finding that 64 provided the best balance between computational efficiency and model performance. Although a batch size of 128 allowed for faster convergence, it resulted in slightly lower predictive accuracy. In hydrological forecasting, Jamei et al. (2024) found that smaller batch sizes (1–32) excelled in short-term streamflow predictions, while larger sizes (64–128) were better for long-term forecasts. Our findings support the idea that smaller batch sizes enhance convergence and generalization. Still, a batch size of 64 is optimal for practical applications, as noted by Shen (2018), effectively utilizing modern hardware without compromising learning dynamics.

In streamflow forecasting, activation function selection and LSTM architecture design are crucial for learning temporal dependencies and generalizing to unseen data. Various activation functions have been examined to improve model performance. Kratzert et al. (2019b) found that while ReLU often led to faster convergence in large-scale models, the sigmoid function produced smoother, more stable outputs, which are vital in hydrological modeling. Our study determined that the sigmoid function was better for streamflow data, offering smoother predictions, which aligns with Fu et al. (2020).

The design of LSTM models, especially regarding network depth and FCLs, has been widely studied for streamflow forecasting. Kratzert et al. (2019b) found that stacked LSTM models consistently outperformed single-layer structures and that FCLs improved the integration of temporal features. Variations such as the EA-LSTM incorporated external features, enhancing performance in multi-basin forecasting. This study focused on historical streamflow data and found that a simple stacked LSTM with FCLs was sufficient for accurate predictions, suggesting that additional features may not be necessary when sufficient historical data are available. Zhang et al. (2018b) reported the role of dense layers in capturing nonlinear interactions between past flow and environmental variables, improving model performance for complex time-series data.

Notably, the LSTM complexity can better capture unique variations of data series but also require more computational resources and extended training and forecasting times.

Additionally, overfitting can be a challenge when the model complexity increases, leading to poor performance on training data but poor performance on forecasting data.

In real-time applications, the time factor is significant, and it is necessary to consider processing time and effective architecture that maintains a balance between complexity and efficiency to make effective predictions. Employing a hybrid approach can be very beneficial. The study can achieve real-time efficiency and enhanced accuracy when resources permit by using simpler models for rapid predictions (from 1 to 3 days) and reserving more sophisticated models for in-depth post-event analysis. Research on balancing computational speed with accuracy in real-time forecasting has yielded encouraging insights. For example, Kratzert et al. (2019b) prioritized simpler models for short-time forecasts and considered more complicated models when allowed. Similarly, the findings of Fu et al. (2020) emphasize the importance of a flexible model that adapts to time constraints and available resources. This strategy successfully balances speed and accuracy in real-time applications.

By applying LSTM models and optimizing their parameters, the study enhanced our understanding of streamflow patterns and improved predictive capabilities in this complex hydrological environment. Our findings demonstrate promising results, with the LSTM models effectively capturing the seasonal variations and interannual trends in streamflow at Kratie.

The LSTM models exhibited high levels of accuracy in replicating observed streamflow patterns, as indicated by high NSE values and low RMSE values. The predicted streamflow closely matched the observed values, particularly in capturing seasonal variations and interannual trends. Notable patterns, such as the seasonal peak and troughs in streamflow, were accurately reproduced by the LSTM models, indicating their robustness in capturing hydrological dynamics.

However, certain limitations were observed, particularly in predicting extreme events or sudden changes in streamflow, which may require further refinement of model parameters or the integration of additional environmental variables.

Our study aligns with the existing literature (Fu et al. 2020) in its utilization of deep learning techniques, particularly LSTM networks, for streamflow prediction. By evaluating the performance of LSTM models, the study contributes to the ongoing discourse on the efficacy of deep learning techniques in improving streamflow prediction accuracy.

Unlike studies focused on rivers in India, Malaysia, and Canada (Fu et al. 2020; Naganna et al. 2023; Jamei et al. 2024), our study examines streamflow prediction in the Vietnam Mekong Delta, introducing variations in hydrological conditions and dataset properties. This geographic divergence underscores the importance of contextualizing model development and evaluation within specific regional contexts.

Furthermore, this study offers unique contributions to optimizing LSTM structures and hyperparameters for streamflow prediction at the Kratie station. By systematically exploring the impact of different network architectures, activation functions, and input configurations, we provide valuable insights for practitioners seeking to enhance the accuracy of streamflow forecasts in similar hydrological settings. Additionally, our study emphasizes the practical implications of streamflow prediction for water resource management, hydropower operations, and agricultural planning in the Vietnam Mekong Delta, thereby bridging the gap between scientific research and real-world applications.

LSTM has shown effectiveness in streamflow forecasting at the Kratie station, but its generalizability to other areas may be constrained by several factors. Firstly, LSTM requires large, high-quality datasets for training. In regions with limited or inconsistent streamflow data, the model's ability to learn accurate patterns will be reduced. The Kratie station provides daily continuous data, but this advantage may not extend to other areas. Streamflow dynamics vary greatly depending on local geography, climate, and human activities. For example, regions influenced by snowmelt (upstream MRB) or monsoons (downstream MRB) might exhibit different patterns. Furthermore, climate change, land use changes, or human activities can change flow regimes. If the LSTM model is trained on past data that does not account for these changes, its predictions for future conditions could be inaccurate. Moreover, complex LSTM models may also overfit the training data, capturing noise instead of actual patterns and reducing generalization to new data.

While the study has demonstrated the effectiveness of the LSTM model in streamflow forecasting at the Kratie station, extending the applicability of this model to other sites with different hydrological conditions is necessary. Future research will apply the LSTM model to different hydrological conditions or other parts of the Mekong Delta, strengthening the model's transferability and making predictions flow more reliably across various conditions. Moving forward, future research endeavors could build upon the findings of our study by further investigating the generalizability of LSTM models across diverse hydrological contexts and exploring additional factors influencing streamflow dynamics in the Mekong Delta. Future studies could explore the integration of additional environmental variables, such as soil moisture and vegetation cover, to improve the accuracy of streamflow predictions. Additionally, efforts to integrate multivariable data and incorporate insights from socio-economic and environmental factors could enhance the robustness and applicability of streamflow forecasting models in complex hydrological systems and support evidence-based decision-making in the region.

Although this study focuses solely on LSTM for streamflow forecasting due to its advantages, it is limited by not comparing it with these other models. Future work will include these comparisons better to demonstrate LSTM's performance, validate its effectiveness in hydrological applications, and identify areas for improvement.

This study investigates the effectiveness of LSTM in streamflow forecasting at the Kratie station, Mekong River Basin. Results show that the efficacy of the LSTM model depends on basin characteristics and flow regime rather than the dataset size, emphasizing the importance of model input compared to data length. Among the various LSTM architectures tested, stacked LSTMs exhibited superior predictive capabilities, with an optimal dropout rate of 0.4 being the most effective in mitigating overfitting.

Integrating FCLs slightly enhanced model accuracy. It also extended training duration, indicating a trade-off between performance and efficiency. The sigmoid activation function was identified as the most effective for discharge predictions, while a larger batch size did not lead to considerable gains in efficiency. Additionally, increasing the number of cells beyond 100 leads to instability in the loss values.

The study provides additional meaningful information to strengthen water management measures in the Mekong River Basin, especially the Vietnam Mekong Delta, and a deeper understanding of the influence of different hyperparameters on flow forecasting based on LSTM. Future research should aim to refine parameter selection and optimization to achieve more reliable forecasts and explore integrating LSTM networks with other neural models to improve predictive accuracy across diverse hydro-climatological contexts.

N.Y.N. and T.N.A. conceptualized the study and wrote, reviewed, and edited the article. N.Y.N., D.D.K., L.V.N. and V.T.A. analyzed and wrote, reviewed, and edited the article.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Al-Bossly
A.
,
Anja
M. I.
,
Abaker
A. O.
,
Ali
H. E.
&
Alkhalaf
S.
(
2023
)
A comparative study on market index prediction: long short- term memory (LSTM) vs. decision tree model
,
Journal of Statistics Applications & Probability
,
12
(
4
),
Article 18
.
http://dx.doi.org/10.18576/jsap/12S118
.
Binh
D. V.
,
Kantoush
S.
&
Sumi
T.
(
2020
)
Changes to long-term discharge and sediment loads in the Vietnamese Mekong delta caused by upstream dams
,
Geomorphology
,
353
,
107011
.
https://doi.org/10.1016/j.geomorph.2019.107011
.
Chen
J.
,
Zeng
G.-Q.
,
Zhou
W.
,
Du
W.
&
Lu
K.-D.
(
2018
)
Wind speed forecasting using nonlinear-learning ensemble of deep learning time series prediction and extremal optimization
,
Energy Conversion and Management
,
165
,
681
695
.
https://doi.org/10.1016/j.enconman.2018.03.098
.
Cheng
M.
,
Fang
F.
,
Kinouchi
T.
,
Navon
I. M.
&
Pain
C. C.
(
2020
)
Long lead-time daily and monthly streamflow forecasting using machine learning methods
,
Journal of Hydrology
,
590
,
125376
.
https://doi.org/10.1016/j.jhydrol.2020.125376
.
Costa Silva
D. F.
,
Galvão Filho
A. R.
,
Carvalho
R. V.
,
de Souza
L.
,
Ribeiro
F.
&
Coelho
C. J.
(
2021
)
Water flow forecasting based on river tributaries using long short-term memory ensemble model
,
Energies
,
14
,
7707
.
https://doi.org/10.3390/en14227707
.
Ding
Y.
,
Zhu
Y.
,
Wu
Y.
,
Jun
F.
&
Cheng
Z.
(
2019
). '
Spatio-temporal attention LSTM model for flood forecasting
',
2019 International Conference on Internet of Things (IThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData)
.
IEEE
, pp.
458
465
.
https://doi.org/10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00095
.
Dinh
K. D.
,
Anh
T. N.
,
Nguyen
N. Y.
,
Bui
D. D.
&
Srinivasan
R.
(
2020
)
Evaluation of grid-based rainfall products and water balances over the Mekong River Basin
,
Remote Sensing
,
12
,
1858
.
https://doi.org/10.3390/rs12111858
.
Duy Nguyen
H.
(
2022
)
Daily streamflow forecasting by machine learning in Tra Khuc river in Vietnam
,
Vietnam Journal of Earth Sciences
,
45
(
1
),
82
97
.
https://doi.org/10.15625/2615-9783/17914
.
Fu
M.
,
Fan
T.
,
Ding
Z.
,
Salih
S. Q.
,
Al-Ansari
N.
&
Yaseen
Z. M.
(
2020
)
Deep learning data-intelligence model based on adjusted forecasting window scale: application in daily streamflow simulation
,
IEEE Access
,
8
,
32632
32651
.
doi: 10.1109/ACCESS.2020.2974406
.
Gers
F. A.
,
Schmidhuber
J.
&
Cummins
F.
(
2000
)
Learning to forget: continual prediction with LSTM
,
Neural Computation
,
12
,
2451
2471
.
https://doi.org/10.1162/089976600300015015
.
Ghimire
S.
,
Yaseen
Z. M.
,
Farooque
A. A.
,
Deo
R. C.
,
Zhang
J.
&
Tao
X.
(
2021
)
Streamflow prediction using an integrated methodology based on convolutional neural network and long short-term memory networks
,
Scientific Reports
,
11
,
1
26
.
https://doi.org/10.1038/s41598-021-96751-4
.
Hochreiter
S.
&
Schmidhuber
J.
(
1997
)
Long short-term memory
,
Neural Computation
,
9
,
1735
1780
.
Hunt
K. M. R.
,
Matthews
G. R.
,
Pappenberger
F.
&
Prudhomme
C.
(
2022
)
Using a long short-term memory (LSTM) neural network to boost river streamflow forecasts over the western United States
,
Hydrology and Earth System Sciences
,
26
,
5449
5472
.
https://doi.org/10.5194/hess-26-5449-2022
.
Jamei
M.
,
Jamei
M.
,
Ali
M.
,
Karbasi
M.
,
Farooque
A. A.
,
Malik
A.
,
Cheema
S. J.
,
Esau
T. J.
&
Yaseen
Z. M.
(
2024
)
Quantitative improvement of streamflow forecasting accuracy in the Atlantic zones of Canada based on hydro-meteorological signals: a multi-level advanced intelligent expert framework
,
Ecological Informatics
,
80
,
102455
.
https://doi.org/10.1016/j.ecoinf.2023.102455
.
Juneau
P.
,
Baddour
N.
,
Burger
H.
,
Bavec
A.
&
Lemaire
E. D.
(
2021
)
Comparison of decision tree and long short-term memory approaches for automated foot strike detection in lower extremity amputee populations
,
Sensors
,
21
,
6974
.
https://doi.org/10.3390/s21216974
.
Kingma
D.P.
&
Ba
J.
, (
2015
)
Adam: A Method for Stochastic Optimization. the 3rd International Conference for Learning Representations
,
San Diego
,
2015
.
Kisi
O.
&
Cimen
M.
(
2011
)
A wavelet-support vector machine conjunction model for monthly streamflow forecasting
,
Journal of Hydrology
,
399
,
132
140
.
https://doi.org/10.1016/j.jhydrol.2010.12.041
.
Kratzert
F.
,
Klotz
D.
,
Brenner
C.
,
Schulz
K.
&
Herrnegger
M.
(
2018
)
Rainfall–runoff modelling using long short-term memory (LSTM) networks
,
Hydrology and Earth System Sciences
,
22
,
6005
6022
.
https://doi.org/10.5194/hess-22-6005-2018
.
Kratzert
F.
,
Klotz
D.
,
Herrnegger
M.
,
Sampson
A. K.
,
Hochreiter
S.
&
Nearing
G. S.
(
2019a
)
Toward improved predictions in ungauged basins: exploiting the power of machine learning
,
Water Resources Research
,
55
,
11344
11354
.
https://doi.org/10.1029/2019WR026065
.
Kratzert
F.
,
Klotz
D.
,
Shalev
G.
,
Klambauer
G.
,
Hochreiter
S.
&
Nearing
G.
(
2019b
)
Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets
,
Hydrology and Earth System Sciences
,
23
,
5089
5110
.
https://doi.org/10.5194/hess-23-5089-2019
.
Le
X. H.
,
Ho
H. V.
,
Lee
G.
&
Jung
S.
(
2019
)
Application of long short-term memory (LSTM) neural network for flood forecasting
,
Water
,
11
,
1387
.
https://doi.org/10.3390/w11071387
.
Li
J.
,
Yuan
X.
&
Ji
P
. (
2023
)
Long-lead daily streamflow forecasting using Long Short-Term Memory model with different predictors
.
Journal of Hydrology: Regional Studies
48
,
101471
.
https://doi.org/10.1016/j.ejrh.2023.101471
.
Mehraein
M.
,
Mohanavelu
A.
,
Naganna
S. R.
,
Kulls
C.
&
Kisi
O
. (
2022
)
Comparing metaheuristic regression models for streamflow forecasting
,
Water
,
14
,
3636
.
https://doi.org/10.3390/w14223636
.
Mekong River Commission (MRC)
. (
2010
)
State of the Basin Report 2010
.
Vientiane
:
Mekong River Commission
.
Minderhoud
P. S. J.
,
Coumou
L.
,
Erkens
G.
,
Middelkoop
H.
&
Stouthamer
E.
(
2019
)
Mekong delta much lower than previously assumed in sea-level rise impact assessments
,
Nature Communications
,
10
,
3847
.
https://doi.org/10.1038/s41467-019-11602-1
.
Naganna
S. R.
,
Marulasiddappa
S. B.
,
Balreddy
M. S.
&
Yaseen
Z. M.
(
2023
)
Daily scale streamflow forecasting in multiple stream orders of Cauvery River, India: application of advanced ensemble and deep learning models
,
Journal of Hydrology
,
626
(
Part B
),
130320
.
https://doi.org/10.1016/j.jhydrol.2023.130320
.
Nash
J. E.
&
Sutcliffe
J. V
. (
1970
)
River flow forecasting through conceptual models part I – A discussion of principles
.
Journal of Hydrology
,
10
(
3
),
282
290
.
https://doi.org/10.1016/0022-1694(70)90255-6
.
Phung
D.
,
Thong
N. H.
,
Ngoc
N. T.
,
Dang
N. T.
,
Van
Q. D.
,
Son
N.
,
Nga
H. N.
,
Trung
H. N.
&
Trude
B.
(
2021
)
Hydropower dams, river drought and health effects: a detection and attribution study in the lower Mekong Delta Region
,
Climate Risk Management
,
32
,
100280
.
https://doi.org/10.1016/j.crm.2021.100280
.
Räsänen
T. A.
,
Someth
P.
,
Lauri
H.
,
Koponen
J.
,
Sarkkula
J.
&
Kummu
M.
(
2017
)
Observed river discharge changes due to hydropower operations in the Upper Mekong Basin
,
Journal of Hydrology
,
545
,
28
41
.
https://doi.org/10.1016/j.jhydrol.2016.12.023
.
Ruiz-Barradas
A.
&
Nigam
S.
(
2018
)
Hydroclimate variability and change over the Mekong River basin: modeling and predictability and policy implications
,
Journal of Hydrometeorology
,
19
,
849
869
.
https://doi.org/10.1175/JHM-D-17-0195.1
.
Sahoo
B. B.
,
Jha
R.
,
Singh
A.
&
Kumar
D.
(
2019
)
Long short-term memory (LSTM) recurrent neural network for low-flow hydrological time series forecasting
,
Acta Geophysics
,
67
,
1471
1481
.
https://doi.org/10.1007/s11600-019-00330-1
.
Sahoo
B. B.
,
Sankalp
S.
&
Kisi
O.
(
2023a
)
A novel smoothing-based deep learning time-series approach for daily suspended sediment load prediction
,
Water Resources Management
,
37
(
11
),
4271
4292
.
Sahoo
B.B.
,
Panigrahi
B.
,
Nanda
T.
,
Tiwari
M.K.
&
Sankalp
S.
(
2023b
)
Multi-step ahead urban water demand forecasting using deep learning models
,
SN Computer Science
,
4
,
752
.
https://doi.org/10.1007/s42979-023-02246-6
.
Sainath
T. N.
,
Vinyals
O.
,
Senior
A.
&
Sak
H.
(
2015
). '
Convolutional, long short-term memory, fully connected deep neural networks
’,
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
.
IEEE
, pp.
4580
4584
.
https://doi.org/10.1109/ICASSP.2015.7178838
.
Shen
C.
(
2018
)
A transdisciplinary review of deep learning research and its relevance for water resources scientists
,
Water Resource Research
,
54
,
8558
8593
.
https://doi.org/10.1029/2018WR022643
.
Slater
L. J.
,
Anderson
B.
,
Buechel
M.
,
Dadson
S.
,
Han
S.
,
Harrigan
S.
,
Kelder
T.
,
Kowal
K.
,
Lees
T.
,
Matthews
T.
,
Murphy
C.
&
Wilby
R. L.
(
2021
)
Nonstationary weather and water extremes: a review of methods for their detection, attribution, and management
,
Hydrology and Earth System Sciences
,
25
,
3897
3935
.
https://doi.org/10.5194/hess-25-3897-2021
.
Sonali
S.
,
Paul
J. C.
,
Sahoo
B. B.
,
Gupta
S. K.
&
Singh
P. K
. (
2023
)
Improving forecasting accuracy of monthly runoff time series of Brahmani River, India using a hybrid deep learning model
,
Journal of Water and Climate Change
(
2024
)
15 (1), 139–156. https://doi.org/10.2166/wcc.2023.487
.
Tao
H.
,
Habib
M.
,
Aljarah
I.
,
Faris
H.
,
Afan
H. A.
&
Yaseen
Z. M.
(
2021
)
An intelligent evolutionary extreme gradient boosting algorithm development for modeling scour depths under submerged weir
,
Information Science (Ny)
,
570
,
172
184
.
https://doi.org/10.1016/j.ins.2021.04.063
.
Tiyasha
S.
,
Tung
T. M.
&
Yaseen
Z. M
. (
2020
)
A survey on river water quality modelling using artificial intelligence models: 2000–2020
.
Journal of Hydrology
,
585
,
124670
.
https://doi.org/10.1016/j.jhydrol.2020.124670
.
Wang
W.
,
Jin
J.
&
Li
Y.
(
2009
)
Prediction of inflow at Three Gorges Dam in Yangtze River with wavelet network model
,
Water Resource Management
,
23
,
2791
2803
.
https://doi.org/10.1007/s11269-009-9409-2
.
Wei
W.
&
Jing
H.
(
2020
)
Short-term load forecasting based on LSTM-RF-SVM combined model
,
Journal of Physics: Conference Series
,
1651
.
https://doi.org/10.1088/1742-6596/1651/1/012028
.
Xu
W.
,
Jiang
Y.
,
Zhang
X.
,
Li
Y.
,
Zhang
R.
&
Fu
G.
(
2020
)
Using long short-term memory networks for river flow prediction
,
Hydrology Research
,
51
,
1358
1376
.
https://doi.org/10.2166/nh.2020.026
.
Yan
L.
,
Feng
J.
&
Hang
T.
(
2019
). '
Small watershed stream-flow forecasting based on LSTM
',
Proceedings of the 13th International Conference on Ubiquitous Information Management and Communication (IMCOM)
, pp.
1006
1014
.
https://doi.org/10.1007/978-3-030-19063-7_79
.
Yuan
X.
,
Chen
C.
,
Lei
X.
,
Yuan
Y.
&
Muhammad Adnan
R.
(
2018
)
Monthly runoff forecasting based on LSTM–ALO model
,
Stochastic Environmental Research and Risk Assessment
,
32
,
2199
2212
.
https://doi.org/10.1007/s00477-018-1560-y
.
Zhang
L.
&
Wang
Y.
(
2020
)
A comparative study of LSTM and decision tree models in hydrological time-series prediction
,
Environmental Modelling & Software
,
132
,
104750
.
doi:10.1016/j.envsoft.2020.104750
.
Zhang
D.
,
Lin
J.
,
Peng
Q.
,
Wang
D.
,
Yang
T.
,
Sorooshian
S.
,
Liu
X.
&
Zhuang
J.
(
2018a
)
Modeling and simulating of reservoir operation using the artificial neural network, support vector regression, deep learning algorithm
,
Journal of Hydrology
,
565
,
720
736
.
https://doi.org/10.1016/j.jhydrol.2018.08.050
.
Zhang
J.
,
Zhu
Y.
,
Zhang
X.
,
Ye
M.
&
Yang
J.
(
2018b
)
Developing a long short-term memory (LSTM) based model for predicting water table depth in agricultural areas
,
Journal of Hydrology
,
561
,
918
929
.
https://doi.org/10.1016/j.jhydrol.2018.04.065
.
Zhang
K.
,
Li
Y.
,
Yu
Z.
,
Yang
T.
,
Xu
J.
,
Chao
L.
,
Ni
J.
,
Wang
L.
,
Gao
Y.
,
Hu
Y.
&
Lin
Z.
(
2022
)
Xin'anjiang nested experimental watershed (XAJ-NEW) for understanding multiscale water cycle scientific objectives and experimental design
,
Engineering
,
18
,
207
217
.
https://doi.org/10.1016/j.eng.2021.08.026
.
Zhao
X.
,
Wang
H.
,
Bai
M.
,
Xu
Y.
,
Dong
S.
,
Rao
H.
&
Ming
W.
(
2024
)
A comprehensive review of methods for hydrological forecasting based on deep learning
,
Water
,
16
,
1407
.
https://doi.org/10.3390/w16101407
.
Zhu
S.
,
Hrnjica
B.
,
Ptak
M.
,
Choiński
A.
&
Sivakumar
B.
(
2020a
)
Forecasting of water level in multiple temperate lakes using machine learning models
,
Journal of Hydrology
,
585
,
124819
.
https://doi.org/10.1016/j.jhydrol.2020.124819
.
Zhu
S.
,
Luo
X.
,
Yuan
X.
&
Xu
Z.
(
2020b
)
An improved long short-term memory network for streamflow forecasting in the upper Yangtze River
,
Stochastic Environmental Research and Risk Assessment
,
34
,
1313
1329
.
https://doi.org/10.1007/s00477-020-01766-4
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).