ABSTRACT
Accurate streamflow prediction is vital for hydropower operations, agricultural planning, and water resource management. This study assesses the effectiveness of Long Short-Term Memory (LSTM) networks in daily streamflow prediction at the Kratie station, investigate different network structures and hyperparameters to optimize predictive accuracy while considering computational efficiency. Our findings underscore the significance of LSTM models in addressing streamflow prediction challenges. Training LSTM on historical streamflow data reveals the significance of the training dataset size; spanning 2013–2022 yields optimal results. Incorporating a hidden layer with a nonlinear activation function, and adding a fully connected layer improve prediction ability. However, increasing the number of neurons and layers introduces complexity and computational overhead. Careful parameter tuning, including epochs, dropout, and the number of LSTM units, is crucial for optimal performance without sacrificing efficiency. The stacked LSTM with sigmoid activation demonstrates exceptional performance, boasting a high Nash–Sutcliffe Efficiency of 0.95 and a low relative root mean square error (rRMSE) of approximately 0.002%. Moreover, the model excels in forecasting streamflow for 5–15 antecedent days, with 5 days exhibiting particularly high accuracy. These findings offer valuable insights into LSTM networks for streamflow prediction for water management in the Vietnam Mekong Delta.
HIGHLIGHTS
LSTM networks effectively predict daily streamflow, achieving high accuracy in the Mekong Basin.
Historical streamflow data spanning 2013 to 2022 is optimal for LSTM training in Mekong.
Incorporating non-linear activation functions and a fully connected layer enhances learning efficiency and prediction ability.
More neurons and layers introduces complexity and computational cost, requiring careful tuning.
The stacked LSTM model with sigmoid activation achieves an NSE of 0.95 and a low rRMSE (∼0.002%), with optimal antecedent input ranging from 5 to 15 days, peaking at five days.
INTRODUCTION
Precise streamflow forecasts are vital in optimizing water resources management, enhancing hydroelectricity power generation efficiency, strategic irrigation planning and effective flood monitoring, and other hydrological functions (Kisi & Cimen 2011). However, the nonlinear nature of hydrology, resulting from complex weather patterns and infiltration mechanisms, poses a persistent challenge in achieving reliable streamflow forecasting (Wang et al. 2009; Ghimire et al. 2021; Zhang et al. 2022).
The application of machine learning (ML) algorithms in streamflow prediction has gained prominence due to their ability to address the non-stationarity and non-linearity of hydrological phenomena (Fu et al. 2020; Zhu et al. 2020b; Naganna et al. 2023; Jamei et al. 2024). Consequently, there is a growth in using ML, such as Long Short-Term Memory (LSTM) networks, which are particularly well-suited for sequential data and have shown promising results in hydrological time-series forecasting.
Recent studies, such as that by Mehraein et al. (2022), investigate the efficacy of metaheuristic regression approaches – including CatBoost (CB), Random Forest (RF), and Extreme Gradient Tree Boosting (XGBoost) – for forecasting monthly streamflow using satellite precipitation data. Their findings suggest that XGBoost and CB offer superior accuracy when incorporating TRMM rainfall data and periodicity components compared to other methods such as RF and ANN. While effective in certain situations, their challenge is capturing long-term temporal dependencies for streamflow forecasting.
In contrast, although models like support vector machines (SVM) and decision trees are powerful for regression tasks, they require significant feature engineering to handle time-series data effectively. SVM, for instance, performs well with lagged values and rolling windows but fails to capture the sequential dependencies vital for streamflow prediction (Cheng et al. 2020; Duy Nguyen et al. 2022). Similarly, decision trees, RFs, and Gradient Boosting Machines are easy to interpret and perform well with nonlinear data. They struggle to model sequential dependencies without manually adding temporal features (Juneau et al. 2021; Al-Bossly et al. 2023; Zhang & Wang, 2020). This limitation makes them less effective for modeling streamflow.
Deep learning has achieved impressive methodological advancements in the last few years and yielded promising outcomes in various practical data science applications, including hydrology. LSTM networks effectively manage the nonlinear, non-stationary, and seasonal characteristics of streamflow data, enabling them to capture complex temporal patterns (Zhu et al. 2020a; Tao et al. 2021; Yaseen 2021; Naganna et al. 2023). Unlike convolutional neural networks (CNNs), which are effective in extracting local temporal features but may not capture long-range dependencies, and gated recurrent units (GRUs), which have fewer parameters than LSTM and are efficient in training but may perform less effectively in long-term dependency modeling, LSTMs are effective due to their memory block architecture that enables them to retain and learn from long sequences of data (Yao et al. 2023; Bounoua et al. 2024; Zhao et al. 2024).
Studies have shown that the LSTM model consistently outperforms other ML models such as SVM and Decision Trees and even addresses a fundamental limitation of the conventional recurrent neural network (RNN) processing data containing temporal sequences. By incorporating a specialized unit, referred to as a memory block, the LSTM can retain information over extended periods, allowing for the effective learning and modeling of long-term dependencies and facilitating the capture of intricate relationships within the data (Hochreiter & Schmidhuber 1997; Gers et al. 2000; Shen 2018; Kratzert et al. 2019a; Fu et al. 2020; Hunt et al. 2022). A series of publications by Kratzert et al. (2018, 2019a, 2019b) applied the LSTM model in simulating daily streamflows of 241 basins throughout the United States. They showed the remarkable power of LSTM in flow modeling, even when the models were trained on multiple basins simultaneously, and introduced a novel model called Entity-Aware LSTM (EA-LSTM), demonstrating significant improvement in regional-level performance for 531 basins. In the research of Zhang et al. (2018b), the authors utilized LSTM for predicting water depth. They observed a substantial improvement in prediction accuracy compared to the feedforward neural network model. This outcome demonstrated the LSTM's ability to effectively learn from previous information, thus enhancing its predictive capabilities in hydrological applications. Zhang et al. (2018a) conducted a comparative analysis of four neural network models for daily river flow forecasting for sewer overflow structures. The models examined were the LSTM, multilayer perceptron, GRU, and wavelet neural network. Their findings demonstrated that LSTM and GRU exhibited remarkable proficiency in prediction for different leading times. Using discharge data at the upper Yangtze River and the Hun River, Xu et al. (2020) created an LSTM model for simulating daily and 10-day average streamflow and found that the LSTM outperformed the Soil and Water Assessment Tool model and other hydrological models. In the study of Yuan et al. (2018), the accuracy of a combined approach utilizing deep learning and a parameter optimizer for predicting monthly runoff was evaluated, and the results showed that the hybrid LSTM outperformed other hybrid models. Fu et al. (2020) highlighted the superior performance of LSTM models in processing both steady and fluctuating streamflow data, demonstrating a good ability to capture data features across seasons and further evaluated the influence of training set size, the time interval between training and testing sets, and time span of prediction data on LSTM model performance.
In recent developments, advancements in LSTM models have further improved our understanding of streamflow dynamics and forecasting capabilities (Sahoo et al. 2019, 2023a, b; Fu et al. 2020; Zhu et al., 2020a; Slater et al. 2021). Among these research, the simplest approaches have focused on river discharge with a prominent annual cycle and substantial lagged autocorrelation. These studies utilize previous discharge values at the exact location to make predictions using LSTMs. Sahoo et al. (2019) investigated the application of LSTM networks for forecasting low-flow streamflow in hydrological systems. The study demonstrates that LSTM surpasses traditional statistical models such as Autoregressive Integrated Moving Average (ARIMA) and other ML techniques in terms of accuracy, especially in low-flow forecasting. However, it does not consider how varying hyperparameters impacts performance. Such analysis could provide valuable insights into the features that drive predictions within hydrological contexts. Sahoo et al. (2023a) introduced a new approach for accurately predicting daily suspended sediment loads. Their approach combines smoothing techniques with LSTM deep learning models to improve accuracy, especially in handling variable and noisy data. The authors improve prediction performance by smoothing the data before inputting it into the model. Sahoo et al. (2023b) explored the use of LSTM, predicting future water demand in urban settings across multiple steps. The study shows LSTM's superior performance in multi-step ahead predictions. Since urban systems can vary widely, further validation on different datasets would be helpful. In addition, more advanced models have been developed that incorporate upstream data to enhance streamflow prediction. These models may either merely forecast river flow from different study areas (Costa Silva et al. 2021) or can also predict flow while including precipitation (Ding et al. 2019; Le et al. 2019; Yan et al. 2019; Sonali et al. 2023). In a case study conducted by Costa Silva et al. (2021) in Brazil, promising results were obtained for a 5-day ahead prediction. Sonali et al. (2023) introduce a hybrid deep learning model combining LSTM with other ML techniques for forecasting monthly runoff time series of the Brahmani River in India. The hybrid model shows improvements in prediction accuracy. The complexity of the hybrid model might pose challenges in real-time applications. More extensive sensitivity analysis could enhance understanding of how different model components (LSTM, CNN, and Attention) contribute to performance improvements.
Similarly, Le et al. (2019) obtained a positive outcome in modeling flood events in the Da River basin of Vietnam at a lead time of 3 days. In the research of (Yan et al. 2019), the author designed an LSTM network specifically for flood prediction based on historical discharge and meteorological forecast data. This finding highlights the LSTM's effectiveness in accurately predicting flood events, especially in capturing the magnitude of peak flows. To predict runoff for the Lech and Danube Rivers confluence, Ding et al. (2019) developed an advanced LSTM-based hydrological forecast model, which incorporated European Centre for Medium-Range Weather Forecasts (ECMWF) projections of multi-variables (precipitation, soil moisture, etc.). The research evaluated the prediction accuracy up to 9-h lead times and observed a Nash–Sutcliffe Efficiency (NSE) of 0.71, which further improved to 0.77 after incorporating an attention layer (Hunt et al. 2022). In summary, antecedent research has underscored the efficacy of employing LSTM models with streamflow and rainfall data for streamflow prediction. Despite these advancements, significant challenges remain, particularly in regions with sparse data and in understanding the influence of historical streamflow patterns on present-day dynamics.
In the expansive and globally significant Mekong River Basin (MRB), where reservoir releases significantly influence streamflow dynamics, the main river trunk holds a crucial position in shaping the streamflow dynamics of the primary river and confronts challenges related to sparse data regions. Understanding the role of historical streamflow patterns is crucial, particularly in regions with sparse data coverage. Conventionally, precipitation has been acknowledged as the principal driver of streamflow patterns, evidenced by hydrographs displaying consistent changes corresponding to precipitation fluctuations over time. Based on these works, this study aims to assess the effectiveness of LSTM models in capturing the complex dynamics of streamflow in the Mekong River Basin – a region where reservoir operations and seasonal variability play a crucial role in shaping the streamflow. This study contributes to the growing body of research by providing a detailed assessment of LSTM models applied to streamflow forecasting, with a unique focus on the role of historical streamflow patterns – especially in the Mekong River Basin, a region that faces significant data gaps. While previous models have mainly relied on precipitation data as a primary driver of streamflow, this study introduces a novel approach by emphasizing the importance of past streamflow behavior, which reflects responses to climatic, land use, and operational changes over time. While precipitation undeniably influences streamflow variations, the historical streamflow behavior, reflecting past hydrological responses to factors such as land use changes and climate variability, offers valuable insights into the current river conditions. In this context, this study aimed to investigate the effectiveness of LSTM models in capturing the complex dynamics of streamflow, which is highly influenced by reservoir operation, a unique challenge in the Mekong region. Furthermore, this research makes a significant contribution to optimizing LSTM structures and hyperparameters for streamflow prediction. By carefully tuning parameters such as epochs, dropout rates, and the number of LSTM units, this study enhances the predictive accuracy of the model while improving computational efficiency. This optimization process is critical in applying deep learning models to large-scale, real-world hydrological systems, particularly in regions with limited data availability. The study is conducted at the Kratie station, a key hydrological control site located upstream in the Mekong River, which regulates inflows into the Mekong Delta. This station is within the boundaries of the Vietnam Mekong Delta as defined by Minderhoud et al. (2019). This location serves as an ideal case study for investigating the influence of reservoir operations on streamflow and highlights the broader applicability of LSTM models for water resource management in the Vietnam Mekong Delta. The findings of this study not only advance our understanding of LSTM capabilities in streamflow forecasting but also offer practical insights for improving water resource management and hydropower operations in regions with sparse data.
METHODOLOGY
LSTM cell structure
LSTM network architecture. The expected discharge is expressed as , and the historical input (the input to the network) is
. The forget gate is denoted as
, the update gate (input gate) as
, and the output gate as
. The weight matrices associated with the input variables in each respective gate are represented by
,
, and
. The output from the previous time step is represented as
. The output of the LSTM unit is denoted as
, and the state of the memory cell is represented as
.
LSTM network architecture. The expected discharge is expressed as , and the historical input (the input to the network) is
. The forget gate is denoted as
, the update gate (input gate) as
, and the output gate as
. The weight matrices associated with the input variables in each respective gate are represented by
,
, and
. The output from the previous time step is represented as
. The output of the LSTM unit is denoted as
, and the state of the memory cell is represented as
.
The LSTM network incorporates a cell state as long-term memory, allowing it to selectively retain or forget information as required across time steps. This mechanism is facilitated by three gates (the forget, input, and output gates, as depicted in Figure 1), which precisely regulate the information from the memory cell. The hidden state represents the output of the LSTM cell at each time step, reflecting the regulated information conveyed by three gates.
LSTM fully connected layers
In addition to the LSTM cells, fully connected layers (FCLs) can be added to the LSTM network architecture to perform additional computations and transformations on the output of the memory cells. The LSTM structure, as illustrated in Figure 1(a), comprises four unique layer components:
(i) The input layer is responsible for receiving the input sequence data and acting as the network's initial entry point for information into the network.
(ii) The FCL (dense) layer serves as an intermediary between the input and the LSTM cell, facilitating the alignment of input dimensions with those of the cells. By adjusting the dimensionality of the input, the FCL effectively acts as a bridge, enabling communication between the input and the subsequent LSTM cell layer. For instance, the FCL transfers the streamflow vector (Xt) with m dimensions into n dimensions, with n being the LSTM cell number.
(iii) The LSTM cell layer (LSTM(n)) comprises a collection of n cells, representing the core component that provides diverse memory capabilities.
(iv) The output layer is responsible for generating the final output based on the information processed by the LSTM cells.
The LSTM cells within the network analyze and process the input sequence, generating an output that is passed through the FCL to produce the final output for the specific task. By increasing the number of FCLs, the model can capture more complex patterns and relationships in the data. Each additional FCL introduces nonlinear transformations to the LSTM cell output, enabling the network to learn more intricate representations. This increasing depth of the FCL allows the model to capture more complex patterns and relationships in the data.
Loss function


STUDY SITE AND LSTM CONSTRUCTION
Study site and dataset
Location of the Kratie station in the Mekong River Basin, where testing is conducted to evaluate the accuracy of the LSTM network for the daily streamflow prediction.
Location of the Kratie station in the Mekong River Basin, where testing is conducted to evaluate the accuracy of the LSTM network for the daily streamflow prediction.
Mekong's hydrology is influenced by many factors, including precipitation patterns, sediment deposition, and human interventions such as dam construction and land use changes. Additionally, the delta's low-lying topography and vulnerability to sea level rise exacerbate the risk of saltwater intrusion, especially during the dry season (Räsänen et al. 2017; Phung et al. 2021).
The Kratie station serves as a critical hydrological control site within the Vietnam Mekong Delta, positioned at the entry point of the flat and low-lying delta region. With an average annual discharge of approximately 437 billion m3/s (MRC 2010), Kratie plays a pivotal role in regulating inflows into the delta and is essential for managing water resources, agriculture, and other socio-economic activities in the region. Figure 2 depicts a geographic map showcasing the Mekong River, its primary tributaries, and the Kratie station.
The descriptive statistics for the streamflow data at the Kratie station
Statistic . | Count . | Mean . | Std . | Minimum . | 25% . | 50% . | 75% . | Maximum . |
---|---|---|---|---|---|---|---|---|
Value | 8,846 | 12,285.51 | 11,905.93 | 1,377 | 3,439 | 6,141 | 18,422 | 54,012 |
Statistic . | Count . | Mean . | Std . | Minimum . | 25% . | 50% . | 75% . | Maximum . |
---|---|---|---|---|---|---|---|---|
Value | 8,846 | 12,285.51 | 11,905.93 | 1,377 | 3,439 | 6,141 | 18,422 | 54,012 |
LSTM model construction for the Kratie station
Data preparation
The sole input to the model was the historical streamflow data from the Kratie station, with no additional climate or hydrological variables considered. This data were used to capture the temporal patterns and dependencies of streamflow over time, which are crucial for accurate forecasting. The recorded discharge at Kratie stations serves as the target values for training purposes, enabling a comparison with the predicted discharges generated by the LSTM model. The input data consist of historical streamflow values from the previous time steps, excluding the last 5–15 days, to predict the future value. It means the historical streamflow data from the previous day (i.e., Q1, Q2, … Qt − 5) to forecast the streamflow for the day t with five time_shift days (i.e., Qt, 5 days ahead). The model learns temporal dependencies from historical streamflow data over the past days to predict streamflow at the future time step. This approach aligns with the standard practice of leveraging past values to predict future outcomes in time-series forecasting.
Data testing
Assessing the discharge data's stationarity (consistent statistical properties over time) is the first step in the data processing phase. The study employed the Dickey–Fuller (DF) test to achieve this. By utilizing the DF test, it becomes possible to evaluate and reject the null hypothesis, which proposes that the streamflow (Q) is not stationary. With an ADF statistic of −10.24 and a very small p-value of approximately 4.76×10−18, we have strong evidence against the null hypothesis. Therefore, we reject the null hypothesis in favor of the alternative hypothesis. This implies that the streamflow time-series data at the Kratie station are stationary, providing a solid foundation for subsequent modeling and analysis tasks.
Data normalization



The dataset is subsequently divided into separate training and testing periods after the normalization process. Specifically, the dataset spanning 10 years is split into an 8-year training period (from 2013 to 2020) and a 2-year testing period (from 2021 to 2022). This division ensures that the model is trained on a substantial portion of the data while also allowing for independent evaluation of its performance on unseen data during the testing phase.
Main model development
The study focuses on utilizing a six-stacked (hidden) LSTM layer model to predict discharge using one input variable. Table 2 displays the hyperparameters of the benchmark model, which have been chosen using a trial-and-error approach.
The architecture of LSTM used as the benchmark model for Kratie discharge prediction
Model hyperparameters . | Hyperparameter selection . |
---|---|
LSTM cell 1 | [64, sigmoid activation function] |
LSTM cell 2 | [256] |
LSTM cell 3 | [128] |
LSTM cell 4 | [128] |
Epochs | 1,000 |
Drop rate | 0.4 |
Model hyperparameters . | Hyperparameter selection . |
---|---|
LSTM cell 1 | [64, sigmoid activation function] |
LSTM cell 2 | [256] |
LSTM cell 3 | [128] |
LSTM cell 4 | [128] |
Epochs | 1,000 |
Drop rate | 0.4 |
Common hyperparameters
Activation function: The neural network architecture employs three different activation functions, namely linear, ReLU, and sigmoid, across various layers.
Dropout: Dropout is a regularization technique to address potential overfitting issues and enhance training performance. The drop technique is implemented by temporarily removing a certain fraction of neurons from contributing to the model's training process, which encourages the remaining neurons to learn more robustly and improves generalization. In this study, four different dropped rates are being tested 0.0, 0.2, 0.4, and 0.6. This experiment investigates the optimal dropout rate for improving the model's performance.
Model performance evaluation
In the provided equation, N is the time steps of the dataset, represents the observed values at a specific time step i,
represents the predicted value at the same time step i, and
denotes the mean observation.
Model architecture scenarios
Tables 3 and 4 provide multiple scenarios involving various LSTM architectures and hyperparameters to evaluate the efficiency of the LSTM model. Among the various scenarios, the four-layer stacked LSTM architecture is selected as the benchmark against which other configurations are compared. The two- and three-layer LSTMs are evaluated accordingly. Building upon the stacked LSTM, scenario f1 introduces an additional FCL between the input layer and the first hidden LSTM using a sigmoid activation function. By adding one more FCL between the fifth hidden layer and the output layer, scenario f1 becomes f2. Furthermore, the performance of the vanilla LSTM, which represents the basic architecture with a single LSTM layer, is also evaluated in the study.
The LSTM structure scenarios
Scenarios . | FCL . | Cell number . | FC . | |||
---|---|---|---|---|---|---|
Layer 1 . | Layer 2 . | Layer 3 . | Layer 4 . | |||
Stacked LSTM | 64 | 256 | 128 | 128 | ||
Vanilla LSTM | 64 | |||||
S0 | 64 | N_feats | ||||
S1 | 64 | 64 | N_feats | |||
S2 | 64 | 128 | N_feats | |||
S3 | 64 | 256 | N_feats | |||
S4 | 64 | 256 | 128 | N_feats | ||
S5 | 64 | 256 | 128 | 64 | ||
S6 | 64 | 256 | 256 | 128 | ||
S7 | 128 | 256 | 256 | 128 | ||
f1 | FCLa (64) | 64 | 256 | 128 | 128 | |
f2 | FCLa (64) | 64 | 256 | 128 | 128 | FCLa (64) |
f3 | FCLa (64) | 64 | 256 | 128 | 128 | FCLa (128) |
Scenarios . | FCL . | Cell number . | FC . | |||
---|---|---|---|---|---|---|
Layer 1 . | Layer 2 . | Layer 3 . | Layer 4 . | |||
Stacked LSTM | 64 | 256 | 128 | 128 | ||
Vanilla LSTM | 64 | |||||
S0 | 64 | N_feats | ||||
S1 | 64 | 64 | N_feats | |||
S2 | 64 | 128 | N_feats | |||
S3 | 64 | 256 | N_feats | |||
S4 | 64 | 256 | 128 | N_feats | ||
S5 | 64 | 256 | 128 | 64 | ||
S6 | 64 | 256 | 256 | 128 | ||
S7 | 128 | 256 | 256 | 128 | ||
f1 | FCLa (64) | 64 | 256 | 128 | 128 | |
f2 | FCLa (64) | 64 | 256 | 128 | 128 | FCLa (64) |
f3 | FCLa (64) | 64 | 256 | 128 | 128 | FCLa (128) |
FCL: fully connected layer; FCLa: FCL with sigmoid activation function.
The scenarios of the LSTM hyperparameters
Scenarios . | Model hyperparameters . | Heperparameter selection . |
---|---|---|
a(1–3) | Activation function | [linear, ReLU, sigmoid] |
c(1–6) | Number of LSTM cells | [20, 60, 100, 200] |
e(1–5) | Epochs | [300, 700, 1,000, 1,500] |
d(1–4) | Drop rate | [0.0, 0.2, 0.4, 0.6] |
t(1–3) | Batch size | [1, 64, 128] |
Scenarios . | Model hyperparameters . | Heperparameter selection . |
---|---|---|
a(1–3) | Activation function | [linear, ReLU, sigmoid] |
c(1–6) | Number of LSTM cells | [20, 60, 100, 200] |
e(1–5) | Epochs | [300, 700, 1,000, 1,500] |
d(1–4) | Drop rate | [0.0, 0.2, 0.4, 0.6] |
t(1–3) | Batch size | [1, 64, 128] |
The study explores the influence of activation functions on stacked LSTM models, as these functions introduce nonlinear transformations to the LSTM cells. This activation function influence is done by examining the model's ability to capture complex patterns in the data of the LSTM models following scenarios a(1–3) (Table 4). Additionally, scenario c1–scenario c5 are created to examine the influence of the cell number on the learning effectiveness of the model.
A batch of data is utilized for each learning iteration during the training process,. The batch size (T), drop rate, and number of epochs are tested to determine whether an extension provided a significant performance gain. The values tested for these parameters in this study are presented in Table 3.
The hyperparameter ranges explored in Table 4 are informed by best practices in hydrological forecasting, empirical tuning, and prior studies. In streamflow prediction, LSTM units typically range from 50 to 200, balancing model complexity with the risk of overfitting (Li et al. 2023). This study adopts a similar range, consistent with previous hydrological applications. Hyperparameters were then selected through empirical testing to evaluate performance metrics such as loss and accuracy. By combining established guidelines with systematic experimentation, the hyperparameter ranges are selected for the streamflow prediction task.
RESULTS
Learning efficiency with different structures
Performances in the daily discharge prediction of different LSTM architectures: stacked LSTM, LSTM with two, three hidden layers (S0 and S2) and FCLs (f1, f2, and f3), and vanilla LSTM (one single LSTM cell)
Metrics . | Prediction periods . | Vanilla LSTM . | S0 . | S2 . | Stacked LSTM . | f1 . | f2 . |
---|---|---|---|---|---|---|---|
RMSE (m3/s) | Train | 18,916 | 3,487 | 4,776 | 2,217 | 2,241 | >10,000 |
Test | 19,446 | 3,841 | 5,566 | 2,754 | 2,803 | >10,000 | |
NSE | Train | −2.2 | 0.89 | 0.80 | 0.96 | 0.95 | 0 |
Test | −2.18. | 0.88 | 0.74 | 0.94 | 0.93 | 0 |
Metrics . | Prediction periods . | Vanilla LSTM . | S0 . | S2 . | Stacked LSTM . | f1 . | f2 . |
---|---|---|---|---|---|---|---|
RMSE (m3/s) | Train | 18,916 | 3,487 | 4,776 | 2,217 | 2,241 | >10,000 |
Test | 19,446 | 3,841 | 5,566 | 2,754 | 2,803 | >10,000 | |
NSE | Train | −2.2 | 0.89 | 0.80 | 0.96 | 0.95 | 0 |
Test | −2.18. | 0.88 | 0.74 | 0.94 | 0.93 | 0 |
Comparison of simulated and observed discharges from (a) vanilla LSTM, (b) S0 scenario, (c) S2 scenario, (d) stacked LSTM, (e) f1 scenario, (f) f2 scenario for the Kratie station, (g) loss variation in the stacked scenario, and (h) loss variation in the f2 scenario. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.
Comparison of simulated and observed discharges from (a) vanilla LSTM, (b) S0 scenario, (c) S2 scenario, (d) stacked LSTM, (e) f1 scenario, (f) f2 scenario for the Kratie station, (g) loss variation in the stacked scenario, and (h) loss variation in the f2 scenario. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.
The utilization of three layers (S2 scenario) produces inferior results compared to two layers (S0 scenario) (Table 5). While employing four layers yields favorable outcomes, introducing an additional hidden layer degrades performance (stacked LSTM). The model outputs in S2 do not align well with the observation, indicating that these scenarios struggle to capture the underlying patterns and relationships in the data (Figure 5).
When using a vanilla LSTM model, the predictions often deviate significantly from the target, resulting in flat predictions. However, when the cell number in the LSTM is set to 150, the predictions show some improvement and are relatively closer to the expected values. Deviating from this optimal number of neurons, increasing or decreasing, leads to poorer results. By increasing the training epochs to 1,000, the network exhibits signs of learning and shows better performance.
These findings indicate that the LSTM model does not require an excessive FCL to enhance efficient learning. As a result, the stacked LSTM architecture is considered optimum for predicting the streamflow at the Kratie station.
Performances in daily discharge prediction for stacked LSTM with different datasets
Metrics . | Prediction periods . | Datasets . | |
---|---|---|---|
1998–2022 . | 2013–2022 . | ||
RMSE (m3/s) | Train | 2,998.83 | 2,717.24 |
Test | 3,683.35 | 2,754.61 | |
NSE | Train | 0.94 | 0.96 |
Test | 0.88 | 0.94 |
Metrics . | Prediction periods . | Datasets . | |
---|---|---|---|
1998–2022 . | 2013–2022 . | ||
RMSE (m3/s) | Train | 2,998.83 | 2,717.24 |
Test | 3,683.35 | 2,754.61 | |
NSE | Train | 0.94 | 0.96 |
Test | 0.88 | 0.94 |
Hydrograph of predicted and observed discharges for stacked LSTM with (a) 1998–2022 datasets and (b) 2013–2022 datasets.
Hydrograph of predicted and observed discharges for stacked LSTM with (a) 1998–2022 datasets and (b) 2013–2022 datasets.
Results in Figure 6 show that LSTM effectively captures seasonal cycles in streamflow data (annual changes). Its ability to retain long-term dependencies helps differentiate between seasonal trends and short-term fluctuations, enabling accurate streamflow forecasts during wet and dry seasons.
Figure 6 also captures the interannual variability from reservoir operations or other factors. Using sufficient prior time steps, LSTM can incorporate this interannual variation to simulate the changes in hydrological regimes under climatic conditions or reservoir operation.
Learning efficiency with different parameters
Effects of epochs
Performances in daily discharge prediction for stacked LSTM with different epochs
Metrics . | Prediction periods . | Epochs . | ||||
---|---|---|---|---|---|---|
10 . | 300 . | 700 . | 1,000 . | 1,500 . | ||
RMSE (m3/s) | Train | 10,829 | 3,406.78 | 2,817.24 | 2,217.24 | 2,375.51 |
Test | 11,052 | 3,714.11 | 3,059.22 | 2,754.61 | 3,139.77 | |
NSE | Train | 0.0 | 0.9 | 0.93 | 0.96 | 0.95 |
Test | 0.0 | 0.88 | 0.92 | 0.94 | 0.92 |
Metrics . | Prediction periods . | Epochs . | ||||
---|---|---|---|---|---|---|
10 . | 300 . | 700 . | 1,000 . | 1,500 . | ||
RMSE (m3/s) | Train | 10,829 | 3,406.78 | 2,817.24 | 2,217.24 | 2,375.51 |
Test | 11,052 | 3,714.11 | 3,059.22 | 2,754.61 | 3,139.77 | |
NSE | Train | 0.0 | 0.9 | 0.93 | 0.96 | 0.95 |
Test | 0.0 | 0.88 | 0.92 | 0.94 | 0.92 |
Hydrograph of predicted and observed discharges for stacked LSTM with different epochs and loss variation with epochs. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.
Hydrograph of predicted and observed discharges for stacked LSTM with different epochs and loss variation with epochs. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.
Effects of dropout
LSTM prediction capability for daily discharge with different dropout rates
Metrics . | Prediction periods . | D = 0 . | D = 0.2 . | D = 0.4 . | D = 0.6 . |
---|---|---|---|---|---|
RMSE (m3/s) | Train | 3,417.91 | 2,829.56 | 2,217.27 | 2,644.32 |
Test | 5,159.26 | 3,276.32 | 2,754.61 | 3,195.35 | |
NSE | Train | 0.9 | 0.93 | 0.96 | 0.94 |
Test | 0.78 | 0.91 | 0.94 | 0.91 |
Metrics . | Prediction periods . | D = 0 . | D = 0.2 . | D = 0.4 . | D = 0.6 . |
---|---|---|---|---|---|
RMSE (m3/s) | Train | 3,417.91 | 2,829.56 | 2,217.27 | 2,644.32 |
Test | 5,159.26 | 3,276.32 | 2,754.61 | 3,195.35 | |
NSE | Train | 0.9 | 0.93 | 0.96 | 0.94 |
Test | 0.78 | 0.91 | 0.94 | 0.91 |
Simulated and observed discharges from (a) d1 scenario (dropout = 0.0), (b) d2 scenario (dropout = 0.2), (c) d3 scenario (dropout = 0.4), and (d) d4 scenario (dropout = 0.6) using four-layer stacked LSTM with sigmoid activation function for the Kratie station. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.
Simulated and observed discharges from (a) d1 scenario (dropout = 0.0), (b) d2 scenario (dropout = 0.2), (c) d3 scenario (dropout = 0.4), and (d) d4 scenario (dropout = 0.6) using four-layer stacked LSTM with sigmoid activation function for the Kratie station. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.
For the four-layer model, a dropout rate that begins at zero and reaches a maximum value of 0.6 yields good results. It is important to note that dropout consistently proved beneficial, especially in the testing phase. Among the tested dropout rates, 0.4 appeared to work best for the Kratie station. However, it should be acknowledged that this study did not investigate the impact of varying dropout rates on prediction performance in different locations (Figure 8(a)–8(d)).
Interestingly, with a dropout rate of 0.4, the ability to capture peaks may not be as good, but overall, the prediction yields better results (Figure 8(c)).
Activation function influence
LSTM prediction efficiency with different activation function (a1, a2, and a3) scenarios
Metrics . | Prediction periods . | a1 . | a2 . | a3 . |
---|---|---|---|---|
RMSE (m3/s) | Train | 2,853.5 | 2,797.02 | 2,217.27 |
Test | 4,279.21 | 3,958.37 | 2,754.61 | |
NSE | Train | 0.93 | 0.93 | 0.96 |
Test | 0.85 | 0.87 | 0.94 |
Metrics . | Prediction periods . | a1 . | a2 . | a3 . |
---|---|---|---|---|
RMSE (m3/s) | Train | 2,853.5 | 2,797.02 | 2,217.27 |
Test | 4,279.21 | 3,958.37 | 2,754.61 | |
NSE | Train | 0.93 | 0.93 | 0.96 |
Test | 0.85 | 0.87 | 0.94 |
Simulated and observed discharges from the (a) a1 scenario using linear activation function during the train/test period (3,500/1,400 points) compared with the (b) a2 scenario using ReLU activation function and (c) the a3 scenario using sigmoid activation function for the Kratie station.
Simulated and observed discharges from the (a) a1 scenario using linear activation function during the train/test period (3,500/1,400 points) compared with the (b) a2 scenario using ReLU activation function and (c) the a3 scenario using sigmoid activation function for the Kratie station.
The results presented in Figure 9 reveal that in scenario a1, which involves a linear activation function in LSTM, learning becomes more challenging, resulting in high RMSE and low NSE values. In scenarios a2 and a3, the LSTM cells undergo a nonlinear transformation through activation functions, enabling them to better learn the complex patterns and relationships present in the discharge data. The hydrographs indicate that the prediction from scenario a3 closely matches the observed discharge compared to the other scenarios, especially for the peak flow and the testing phase. These findings highlight that the model using nonlinear transformation, as in a2 and a3 scenarios, exhibits promising prediction performance in streamflow prediction, offering accurate and reliable daily streamflow predictions.
Effects of batch size T
Hydrograph of predicted and observed discharges for stacked LSTM with different batch sizes. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.
Hydrograph of predicted and observed discharges for stacked LSTM with different batch sizes. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.
Figure 10 shows the LSTM results from 1,000 epochs and a batch size T of 64 with low RMSE and relatively good NSE. The training and testing performances of the LSTM model are reported as 0.95 and 0.92, respectively (Table 10). For the Kratie station, batch size does not have a significant impact on the model's simulation results. Even with a small batch size, frequent updates of network weights lead to the best simulation outcomes, although the difference compared to a larger batch size is not substantial. However, increasing the batch size reduces the oscillation of the loss value. As a result, superior performance for the larger T becomes irrelevant in this circumstance.
LSTM prediction efficiency with the different batch sizes
Metrics . | Prediction periods . | Batch size . | |
---|---|---|---|
64 . | 128 . | ||
RMSE (m3/s) | Train | 2,393.56 | 3,280.67 |
Test | 3,034.07 | 3,363.12 | |
NSE | Train | 0.95 | 0.9 |
Test | 0.92 | 0.9 |
Metrics . | Prediction periods . | Batch size . | |
---|---|---|---|
64 . | 128 . | ||
RMSE (m3/s) | Train | 2,393.56 | 3,280.67 |
Test | 3,034.07 | 3,363.12 | |
NSE | Train | 0.95 | 0.9 |
Test | 0.92 | 0.9 |
Cell number influence
LSTM prediction efficiency with the different cell numbers
Metrics . | Prediction periods . | Scenarios . | |||
---|---|---|---|---|---|
C1 . | C2 . | C3 . | C4 . | ||
20 . | 60 . | 100 . | 200 . | ||
RMSE (m3/s) | Train | 4,883.03 | 2,885.65 | 2,834.75 | 2,598.98 |
Test | 5,627.87 | 3,187.95 | 3,592.79 | 3,478.63 | |
NSE | Train | 0.79 | 0.93 | 0.93 | 0.94 |
Test | 0.73 | 0.91 | 0.89 | 0.9 |
Metrics . | Prediction periods . | Scenarios . | |||
---|---|---|---|---|---|
C1 . | C2 . | C3 . | C4 . | ||
20 . | 60 . | 100 . | 200 . | ||
RMSE (m3/s) | Train | 4,883.03 | 2,885.65 | 2,834.75 | 2,598.98 |
Test | 5,627.87 | 3,187.95 | 3,592.79 | 3,478.63 | |
NSE | Train | 0.79 | 0.93 | 0.93 | 0.94 |
Test | 0.73 | 0.91 | 0.89 | 0.9 |
Simulated and observed discharges from different cell numbers using stacked LSTM with sigmoid activation function for the Kratie station.
Simulated and observed discharges from different cell numbers using stacked LSTM with sigmoid activation function for the Kratie station.
When adjusting the number of cells per layer, it is evident that the cell count still relies on the number of layers (Table 12). In scenarios s1 and s2, a marginal improvement in the model's effectiveness is observed when increasing the number of cells in the second layer. For the four-layer model, enhancing the model's effectiveness is achieved by increasing the number of cells in the third hidden layer to 128. However, a further increase to 256 neurons diminishes the model's simulation ability (RMSE and NSE between stacked LSTM and s6 scenarios, Table 12), which also applies to the fourth hidden layer. Contrarily, increasing the number of cells to 128 in the first hidden layer decreases the model's simulation ability (stacked LSTM and s7 scenarios). Hence, it can be concluded that increasing the number of cells generally enhances the model's effectiveness. Deviating from this optimal number of neurons, increasing or decreasing, leads to poorer results and this optimal value depends on the specific hidden layers.
LSTM prediction capability for daily discharge with the different cell numbers
Metrics . | Prediction periods . | S1 . | S2 . | S3 . | S4 . | S5 . | Stacked LSTM . | S6 . | S7 . |
---|---|---|---|---|---|---|---|---|---|
RMSE | Train | 4,776 | 4,732 | 4,555 | 2,543 | 2,546 | 2,217 | 2,651 | 2,553 |
Test | 5,566 | 5,575 | 5,447 | 3,360 | 2,961 | 2,754 | 3,434 | 3,069 | |
NSE | Train | 0.80 | 0.80 | 0.81 | 0.94 | 0.94 | 0.96 | 0.94 | 0.94 |
Test | 0.74 | 0.74 | 0.75 | 0.9 | 0.93 | 0.94 | 0.9 | 0.92 |
Metrics . | Prediction periods . | S1 . | S2 . | S3 . | S4 . | S5 . | Stacked LSTM . | S6 . | S7 . |
---|---|---|---|---|---|---|---|---|---|
RMSE | Train | 4,776 | 4,732 | 4,555 | 2,543 | 2,546 | 2,217 | 2,651 | 2,553 |
Test | 5,566 | 5,575 | 5,447 | 3,360 | 2,961 | 2,754 | 3,434 | 3,069 | |
NSE | Train | 0.80 | 0.80 | 0.81 | 0.94 | 0.94 | 0.96 | 0.94 | 0.94 |
Test | 0.74 | 0.74 | 0.75 | 0.9 | 0.93 | 0.94 | 0.9 | 0.92 |
Learning efficiency with antecedent days for prediction
This section employs antecedent streamflow data from the same site to make predictions for the following day using LSTMs. Specifically, the goal is to predict the value for the following item in the sequence, which corresponds to the fourth day when given a sequence of 3 days' worth of data. The input for the prediction consists of a vector (Q(t − 3), Q(t − 2), Q(t − 1), Q(t)), where Q(t) represents the time-series variable.
LSTM prediction efficiency for daily discharge for various antecedent days
Metrics . | Prediction periods . | 3D . | 5D . | 9D . | 14D . |
---|---|---|---|---|---|
RMSE | Train | 2,251.12 | 2,217.27 | 3,307.39 | 3,939.42 |
Test | 2,648.1 | 2,754.61 | 4,229.09 | 5,390.48 | |
NSE | Train | 0.95 | 0.96 | 0.9 | 0.86 |
Test | 0.94 | 0.94 | 0.85 | 0.76 |
Metrics . | Prediction periods . | 3D . | 5D . | 9D . | 14D . |
---|---|---|---|---|---|
RMSE | Train | 2,251.12 | 2,217.27 | 3,307.39 | 3,939.42 |
Test | 2,648.1 | 2,754.61 | 4,229.09 | 5,390.48 | |
NSE | Train | 0.95 | 0.96 | 0.9 | 0.86 |
Test | 0.94 | 0.94 | 0.85 | 0.76 |
Predicted and observed discharges from (a) 3Ds, (b) 5Ds, (c) 9Ds, and (d) 14Ds for the Kratie station. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.
Predicted and observed discharges from (a) 3Ds, (b) 5Ds, (c) 9Ds, and (d) 14Ds for the Kratie station. The red line represents the predicted values, while the blue line corresponds to the true values. Additionally, the black line delineates the boundary between the training and testing periods.
The model's performance will decrease as the number of forecasted days increases. A more extended sequence does not necessarily guarantee better results. It is crucial to select the model framework carefully. This result might indicate that there is not enough information to train the model. At least 5 days of data are needed to have sufficient information for model training while using only 3 days of data results in poor performance. Forecasting over an excessively long time span will lead to a flattening of the flood peak. This poor forecasting result can also be attributed to the issue of overfitting.
The recommendation of utilizing a sequence of 5 antecedent days for prediction is grounded in hydrological principles and empirical evidence from our study area. Hydrological processes in the Mekong Delta exhibit inherent temporal dependencies, whereby current streamflow conditions are influenced by antecedent rainfall, soil moisture, and river discharges. By considering a lagged sequence of antecedent days, our LSTM models can capture these temporal relationships and leverage them to make more accurate predictions of future streamflow. Additionally, empirical analysis of our dataset revealed that a 5-day antecedent window provided the optimal balance between capturing relevant hydrological dynamics and avoiding model complexity. This finding is consistent with hydrological theory, which suggests that the influence of antecedent conditions on streamflow diminishes over time, with shorter sequences potentially neglecting important hydrological drivers while longer sequences may introduce unnecessary noise and complexity. Thus, by integrating a 5-day antecedent window into our LSTM models, we were able to achieve particularly high accuracy in our predictions, enhancing the reliability and applicability of our findings for water resource management and decision-making in the Mekong Delta.
The analysis of the LSTM network's architecture and hyperparameters reveals that the stacked LSTM with four layers, consisting of 64, 256, 128, and 128 neurons, respectively, performs the best for next-day forecasting. Therefore, this configuration, which utilized the MSE loss function, the Adam optimizer, and the sigmoid activation function (Table 2), is selected for learning and predicting the discharge at the Kratie station, providing the most reliable forecasts and a short training time.
DISCUSSION
Our study explores the application of LSTM networks for streamflow prediction at the Kratie station in the Vietnam Mekong Delta, building upon the advancements and methodologies outlined in the existing literature on streamflow forecasting using deep learning techniques.
This study emphasizes the significance of hyperparameters on streamflow predictions specific to the Mekong region. These hyperparameters are vital to the model's accuracy, particularly within complex hydrological applications.
The number of LSTM cells directly affects the model's capacity to capture temporal dependencies. This research tested configurations ranging from 50 to 200 units, with 100 units demonstrating the best balance between performance and computational efficiency. Similar results have been found in other studies, such as those by Kratzert et al. (2019b) and Fu et al. (2020), which indicated that 100–150 units were optimal for streamflow forecasting. Smaller configurations (50–100 units) encountered difficulties with complex datasets, while more extensive networks (200 + units) tended to overfit.
The epoch number decides the convergence and generalization of the LSTM structure. This study explored 300, 700, 1,000, and 1,500 epochs, and our results indicate that 1,000 epochs provided the best results in minimizing validation loss and ensuring model stability. Shen (2018) used a 500–1,000 epoch range for hydrological models, reporting that training beyond 1,000 epochs led to overfitting with minimal gains in model accuracy. Similarly, Kratzert et al. (2019b) noted that after 1,000 epochs, the model's performance plateaued, with little benefit from extended training. Our results also show that training beyond 1,000 epochs led to diminishing improvements in model performance, suggesting that a longer training time does not necessarily translate into better predictive capability in streamflow forecasting.
Dropout is a technique that reduces overfitting by randomly excluding a percentage of neurons during training. We tested dropout rates of 0.0, 0.2, 0.4, and 0.6, finding the best performance at rates between 0.2 and 0.4, which maintained model capacity while minimizing overfitting. Fu et al. (2020) noted that a dropout rate of 0.3–0.4 is effective in streamflow prediction. In contrast, higher rates like 0.6 led to underfitting by limiting the model's ability to learn complex patterns, consistent with our results showing decreased performance at rates above 0.4.
The batch size refers to the number of samples used in each training iteration. This study tested batch sizes of 1, 64, and 128, finding that 64 provided the best balance between computational efficiency and model performance. Although a batch size of 128 allowed for faster convergence, it resulted in slightly lower predictive accuracy. In hydrological forecasting, Jamei et al. (2024) found that smaller batch sizes (1–32) excelled in short-term streamflow predictions, while larger sizes (64–128) were better for long-term forecasts. Our findings support the idea that smaller batch sizes enhance convergence and generalization. Still, a batch size of 64 is optimal for practical applications, as noted by Shen (2018), effectively utilizing modern hardware without compromising learning dynamics.
In streamflow forecasting, activation function selection and LSTM architecture design are crucial for learning temporal dependencies and generalizing to unseen data. Various activation functions have been examined to improve model performance. Kratzert et al. (2019b) found that while ReLU often led to faster convergence in large-scale models, the sigmoid function produced smoother, more stable outputs, which are vital in hydrological modeling. Our study determined that the sigmoid function was better for streamflow data, offering smoother predictions, which aligns with Fu et al. (2020).
The design of LSTM models, especially regarding network depth and FCLs, has been widely studied for streamflow forecasting. Kratzert et al. (2019b) found that stacked LSTM models consistently outperformed single-layer structures and that FCLs improved the integration of temporal features. Variations such as the EA-LSTM incorporated external features, enhancing performance in multi-basin forecasting. This study focused on historical streamflow data and found that a simple stacked LSTM with FCLs was sufficient for accurate predictions, suggesting that additional features may not be necessary when sufficient historical data are available. Zhang et al. (2018b) reported the role of dense layers in capturing nonlinear interactions between past flow and environmental variables, improving model performance for complex time-series data.
Notably, the LSTM complexity can better capture unique variations of data series but also require more computational resources and extended training and forecasting times.
Additionally, overfitting can be a challenge when the model complexity increases, leading to poor performance on training data but poor performance on forecasting data.
In real-time applications, the time factor is significant, and it is necessary to consider processing time and effective architecture that maintains a balance between complexity and efficiency to make effective predictions. Employing a hybrid approach can be very beneficial. The study can achieve real-time efficiency and enhanced accuracy when resources permit by using simpler models for rapid predictions (from 1 to 3 days) and reserving more sophisticated models for in-depth post-event analysis. Research on balancing computational speed with accuracy in real-time forecasting has yielded encouraging insights. For example, Kratzert et al. (2019b) prioritized simpler models for short-time forecasts and considered more complicated models when allowed. Similarly, the findings of Fu et al. (2020) emphasize the importance of a flexible model that adapts to time constraints and available resources. This strategy successfully balances speed and accuracy in real-time applications.
By applying LSTM models and optimizing their parameters, the study enhanced our understanding of streamflow patterns and improved predictive capabilities in this complex hydrological environment. Our findings demonstrate promising results, with the LSTM models effectively capturing the seasonal variations and interannual trends in streamflow at Kratie.
The LSTM models exhibited high levels of accuracy in replicating observed streamflow patterns, as indicated by high NSE values and low RMSE values. The predicted streamflow closely matched the observed values, particularly in capturing seasonal variations and interannual trends. Notable patterns, such as the seasonal peak and troughs in streamflow, were accurately reproduced by the LSTM models, indicating their robustness in capturing hydrological dynamics.
However, certain limitations were observed, particularly in predicting extreme events or sudden changes in streamflow, which may require further refinement of model parameters or the integration of additional environmental variables.
Our study aligns with the existing literature (Fu et al. 2020) in its utilization of deep learning techniques, particularly LSTM networks, for streamflow prediction. By evaluating the performance of LSTM models, the study contributes to the ongoing discourse on the efficacy of deep learning techniques in improving streamflow prediction accuracy.
Unlike studies focused on rivers in India, Malaysia, and Canada (Fu et al. 2020; Naganna et al. 2023; Jamei et al. 2024), our study examines streamflow prediction in the Vietnam Mekong Delta, introducing variations in hydrological conditions and dataset properties. This geographic divergence underscores the importance of contextualizing model development and evaluation within specific regional contexts.
Furthermore, this study offers unique contributions to optimizing LSTM structures and hyperparameters for streamflow prediction at the Kratie station. By systematically exploring the impact of different network architectures, activation functions, and input configurations, we provide valuable insights for practitioners seeking to enhance the accuracy of streamflow forecasts in similar hydrological settings. Additionally, our study emphasizes the practical implications of streamflow prediction for water resource management, hydropower operations, and agricultural planning in the Vietnam Mekong Delta, thereby bridging the gap between scientific research and real-world applications.
LSTM has shown effectiveness in streamflow forecasting at the Kratie station, but its generalizability to other areas may be constrained by several factors. Firstly, LSTM requires large, high-quality datasets for training. In regions with limited or inconsistent streamflow data, the model's ability to learn accurate patterns will be reduced. The Kratie station provides daily continuous data, but this advantage may not extend to other areas. Streamflow dynamics vary greatly depending on local geography, climate, and human activities. For example, regions influenced by snowmelt (upstream MRB) or monsoons (downstream MRB) might exhibit different patterns. Furthermore, climate change, land use changes, or human activities can change flow regimes. If the LSTM model is trained on past data that does not account for these changes, its predictions for future conditions could be inaccurate. Moreover, complex LSTM models may also overfit the training data, capturing noise instead of actual patterns and reducing generalization to new data.
While the study has demonstrated the effectiveness of the LSTM model in streamflow forecasting at the Kratie station, extending the applicability of this model to other sites with different hydrological conditions is necessary. Future research will apply the LSTM model to different hydrological conditions or other parts of the Mekong Delta, strengthening the model's transferability and making predictions flow more reliably across various conditions. Moving forward, future research endeavors could build upon the findings of our study by further investigating the generalizability of LSTM models across diverse hydrological contexts and exploring additional factors influencing streamflow dynamics in the Mekong Delta. Future studies could explore the integration of additional environmental variables, such as soil moisture and vegetation cover, to improve the accuracy of streamflow predictions. Additionally, efforts to integrate multivariable data and incorporate insights from socio-economic and environmental factors could enhance the robustness and applicability of streamflow forecasting models in complex hydrological systems and support evidence-based decision-making in the region.
Although this study focuses solely on LSTM for streamflow forecasting due to its advantages, it is limited by not comparing it with these other models. Future work will include these comparisons better to demonstrate LSTM's performance, validate its effectiveness in hydrological applications, and identify areas for improvement.
CONCLUSIONS
This study investigates the effectiveness of LSTM in streamflow forecasting at the Kratie station, Mekong River Basin. Results show that the efficacy of the LSTM model depends on basin characteristics and flow regime rather than the dataset size, emphasizing the importance of model input compared to data length. Among the various LSTM architectures tested, stacked LSTMs exhibited superior predictive capabilities, with an optimal dropout rate of 0.4 being the most effective in mitigating overfitting.
Integrating FCLs slightly enhanced model accuracy. It also extended training duration, indicating a trade-off between performance and efficiency. The sigmoid activation function was identified as the most effective for discharge predictions, while a larger batch size did not lead to considerable gains in efficiency. Additionally, increasing the number of cells beyond 100 leads to instability in the loss values.
The study provides additional meaningful information to strengthen water management measures in the Mekong River Basin, especially the Vietnam Mekong Delta, and a deeper understanding of the influence of different hyperparameters on flow forecasting based on LSTM. Future research should aim to refine parameter selection and optimization to achieve more reliable forecasts and explore integrating LSTM networks with other neural models to improve predictive accuracy across diverse hydro-climatological contexts.
AUTHOR CONTRIBUTIONS
N.Y.N. and T.N.A. conceptualized the study and wrote, reviewed, and edited the article. N.Y.N., D.D.K., L.V.N. and V.T.A. analyzed and wrote, reviewed, and edited the article.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.