This paper presents the long short-term memory (LSTM), rating curve (RC), and rainfall–runoff model that can be used for estimating the daily flow at the outlet of river basins. The Hieu river basin in Vietnam is selected as an example for demonstrating the ability of multiple approaches. Hydro-meteorological data at the Quy Chau station are collected over a long period from 1/1/1991 to 31/12/2020. Multiple approaches mentioned above are implemented and used for calculating the daily flow in the studied river basin. The coefficients and modeling parameters in each approach are then carefully determined based on five statistical error indexes. The results revealed that the RC using either one or two segments and the LSTM model using water elevation as input data represented the observed daily flow very well, with the values of dimensional errors (i.e. mean error, mean absolute error and root mean square error) equal only to about of 1% of the observed magnitude of the flow in the studied river basin, while Nash–Sutcliffe efficiency and correlation coefficients are greater than 0.95. Impacts of different types of input datasets on estimated values of the daily flow are also presented when the LSTM model is applied.

  • Long short-term memory, rating curve, and rainfall–runoff model are implemented for estimating the daily flow in river basins.

  • Long short-term memory and rating curve using either one or two segments represented the observed daily flow very well in the studied river basin.

  • Using the long short-term memory model, input datasets affected significantly estimated values of the daily flow.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Daily flow is an essential component of river engineering daily management that requires accurate analysis and day-ahead prediction. This is because daily flow relates to various elements, in terms of using, planning, managing as well as developing sustainable strategies for both the quantity and quality of water resources in river basins. Due to the importance of daily flow, different methods have been developed to compute the daily flow in river basins, including (i) semi-empirical formulas such as rating curves (RCs) under a standard or modified form (Le Coz et al. 2014; Kiang et al. 2018; Pham Van & Nguyen-Van 2022), (ii) physically based models such as rainfall–runoff MIKE NAM (Hafezparast et al. 2013; Lee et al. 2018; Aredo et al. 2021; Ghosh et al. 2022), (iii) hydraulic models like one-dimensional sectional-averaged, two-dimensional depth-averaged models (Pham Van et al. 2016; Lee et al. 2018), lateral distribution method (Darby & Thorne 1996), (iv) data-driven models such as the long short-term memory denoted as LSTM (Hochreiter & Schmidhuber 1997), an alternative and extension of the LSTM that is well known and so called as gated recurrent units (Pham Van & Nguyen-Van 2022), and (v) machine learning (ML) models that is to be robust and efficient, frequently outperforming the standard hydrological models, both conceptual and physically based, and can be used to assess the performance of hydrological models (Rozos et al. 2022). However, estimation of the daily flow is still challenging and not straightforward because of various sources of uncertainty, e.g., multiple features of river basins, nonlinear interactions between multiple features, multiple time scales of flow variability as well as input, parameters, process representations, model structure, etc. that inherently exist in each method (Dey & Mujumdar 2021).

A huge number of studies use traditional approaches such as RC, physically based models such as rainfall–runoff MIKE NAM, and hydraulic models for estimating the flow in river basins (e.g., Darby & Thorne 1996; Hafezparast et al. 2013; Le Coz et al. 2014; Kiang et al. 2018; Lee et al. 2018; Aredo et al. 2021; Ghosh et al. 2022; Pham Van & Nguyen-Van 2022). Although the results achieved from these approaches are impressive at least to some extent. However, in traditional approaches, several drawbacks remain, and one of the main downsides is that their accuracy greatly depends on the quality of input datasets and data length (for both RC and MIKE NAM as well as hydraulic models), model type, structure, and parameters (for MIKE NAM and hydraulic models). In addition, the RC does not allow to take into account fully unsteady flow conditions and hysteretic processes (Fread 1973, 1975). In terms of physically based models, as pointed out by Beven (1989), there are still practical physically based predictions such as realistic estimates of the uncertainty associated with the model's predictions. Regarding hydraulic models, there are difficult to apply such kind of models because of the requirement of multiple input data like geometry, bed friction, and three-dimensional flow characteristics that are not always available (Pham Van & Nguyen-Van 2022). Thus, significant efforts are still needed to improve the accuracy of traditional approaches as well as to appropriately represent a nonlinear time series of the daily flow in hydrological science (Kratzert et al. 2018; Yin et al. 2022).

Recently, the LSTM, which is one of the popular data-driven models, has been applied for flow estimation and prediction in different river basins. Kratzert et al. (2018) investigated the potential of using the LSTM model for simulating the daily flow from meteorological observations in 241 basins of the catchment attributes and meteorology for large-sample studies dataset, in the United States. Yin et al. (2022) use the same datasets as Kratzert et al. (2018) to test the LSTM using the step-sequence framework in the rainfall–runoff model for daily flow predictions. Pham Van & Nguyen-Van (2022) use the gated recurrent units for calculating the flow in the Lo river basin, Vietnam. Frame et al. (2022) compared the different types of data-driven models (e.g., LSTM, mass-conserving LSTM), and argued that data-driven models such as LSTM can outperform catchment-scale conceptual and physically based models at predicting flow. Achieved results from previous studies mentioned above show that the LSTM model can be applied to estimate the daily flow from available hydro-meteorological observations with accuracies comparable to the other models. However, as pointed out by Moosavi et al. (2022), input variables may affect the performance of data-driven models in general and LSTM models in particular.

The main purpose of this study is to describe and implement multiple approaches including the data-driven model (like the LSTM) and traditional approaches (such as RC and MIKE NAM) that can be used for estimating the daily flow at the outlet of river basins. In particular, the research also aims at (i) identifying the optimal values of the coefficients and modeling parameters in each approach or model, (ii) investigating the impacts of different types of input data (e.g., rainfall, water elevation, both rainfall and water elevation) when using the LSTM model for simulating the daily flow, and (iii) determining the potential appropriate approaches (among multiple approaches mentioned above) that can be applied for simulating the flow at the time scale of the daily in the river basin. In terms of the Hieu river basin, which is the river of interest in the present study, this study is also the first one in which the data-driven model and traditional approaches are implemented and then applied to compute the daily flow. The daily time series of available hydro-meteorological observations (e.g., flow, water elevation, rainfall, and evaporation) in the collection period from 1/1/1990 to 31/12/2020 were used to determine the optimal values for regression coefficients in the RC using one or two segments, modeling parameters in MIKE NAM and LSTM models. Five statistical estimate errors are also used to measure quantitatively the comparison between estimated results and observed data of the daily flow in the studied river basin.

The Hieu river basin

The Hieu river is the dominant branch of the Ca river, which is known as the biggest river system in north-central Vietnam. The river originates from the Cao Phu Hoat mountain area (in the Que Phong district, Nghe An province) and merges with the Ca river at the Cay Chanh confluence (Figure 1). The river flows over a length of about 314 km, occupying an area of 5,340 km2 in which the river basin area from the source to the Quy Chau (hydrological station) is about 1,960 km2. The Hieu river has various tributaries such as Nam Quang, Nam Giai, Ke Coc – Khe Nha, Chang, Dinh, Khe Nghia, and Khe Da. The river is characterized by materials of gravels, with narrow and shallow characteristics in the upstream area (from the source to Nghia Dan), where the water depth varies in a range from 0.5 to 2.0 m in the dry season (from December to July). The geometry of the river varies significantly because of flow changes and the river's meandering in the downstream region. The width of the river in this part is wide, ranging from 150 to 280 m in the flood season (from August to November) and from 100 to 120 m in the dry season.
Figure 1

Map of the Hieu river basin at the Quy Chau location.

Figure 1

Map of the Hieu river basin at the Quy Chau location.

Close modal

In terms of characteristics of the weather, the mean monthly temperature at the Quy Chau station varies between 17.4 and 28.5°C. The annual evapotranspiration changes in a range from 700 to 940 mm, while annual moisture varies between 85 and 94%. The mean monthly rainfall changes from 13.8 to 322.0 mm, with annual rainfall ranging between 1,170 and 2,110 mm. The annual flow at the Quy Chau is about 77.5 m3/s. The daily flow varies in a wide range from 6 m3/s (in the dry season) to 3,690 m3/s (in the flood season). Large flood events often occur in September and October, while the three driest months usually occur from February to April. The total flow volume in the flood season equals 55–75% of the annual flow.

The Hieu river basin has an important role in providing the water supply for Que Phong, Quy Chau, Quy Hop, Nghia Dan, and Tan Ky districts in Nghe An province. In detail, Chang and Dinh tributaries, among the different tributaries mentioned above, are the two largest tributaries that supply water for different users in the Que Phong district. It is thus necessary to determine quantitatively the daily flow in the river basin to allow for achieving a good assessment of the water supply as well as to enhance biodiversity and river ecosystems.

Data collection

Daily flow, rainfall, water surface elevation, and evaporation at the Quy Chau station (see Table 1 and Figure 1) are collected for the long period from 1/1/1991 to 31/12/2020. Figure 2 shows the probability density of collected hydro-meteorological data at the Quy Chau station. The daily flow is used as the target results when using multiple approaches including a rating curve, MIKE NAM, and LSTM model. The records of the daily flow and water level are used to determine the optimal values of regression coefficients in the RC, with one or two segments. The time series of rainfall and evaporation are used as input data in MIKE NAM, for calibrating modeling parameters as well as validating the model. When the LSTM model is applied, different input data (e.g., rainfall, water surface elevation, both rainfall and water surface elevation) are considered separately to examine the impacts of input data on the output results. The dataset in the period from 1/1/1991 to 31/12/2011 (about 70% of the dataset) is used for calibrating MIKE NAM parameters or training the LSTM model, while the remaining dataset in the period from 1/1/2012 to 31/12/2020 (approximated 30% of the dataset) is applied for validating the MIKE NAM and LSTM models.
Table 1

Observation period of the hydro-meteorological data

StationQuantityLocation
PeriodStatistical characteristics
LongitudeLatitudeMeanStandard-deviationSkewnessKurtosisr
Quy Chau Evaporation 105°06′000″ 19°34′00″ 1991–2020 0.33 1.30 5.38 73.61 0.121 
Quy Chau Rainfall 4.45 13.80 6.87 81.16 0.551 
Quy Chau Water level 105°12'05″ 19°33'50″ 68.11 0.57 4.37 41.13 0.919 
Quy Chau Water discharge 77.50 114.1 11.93 253.29 1.00 
StationQuantityLocation
PeriodStatistical characteristics
LongitudeLatitudeMeanStandard-deviationSkewnessKurtosisr
Quy Chau Evaporation 105°06′000″ 19°34′00″ 1991–2020 0.33 1.30 5.38 73.61 0.121 
Quy Chau Rainfall 4.45 13.80 6.87 81.16 0.551 
Quy Chau Water level 105°12'05″ 19°33'50″ 68.11 0.57 4.37 41.13 0.919 
Quy Chau Water discharge 77.50 114.1 11.93 253.29 1.00 
Figure 2

Probability density of (a) rainfall, (b) water level, and (c) flow at the Quy Chau station.

Figure 2

Probability density of (a) rainfall, (b) water level, and (c) flow at the Quy Chau station.

Close modal
Figure 3 shows the schematic map of multiple approaches for estimating the daily flow in the present study. Again, the latter consists of the data-driven model named LSTM as well as traditional approaches including the RC and rainfall–runoff MIKE NAM models. To measure the performance of different models, different statistical errors are computed in the calibration (or training) and validation steps of each model.
Figure 3

Schematic map of multiple approaches for estimating the daily flow.

Figure 3

Schematic map of multiple approaches for estimating the daily flow.

Close modal

Stage–discharge relationship

An RC that describes by the hydraulic equation is generally written as (Le Coz et al. 2014):
formula
(1)
where a and c are scaling and exponential coefficients, respectively; b and H are, respectively, reference level and water level relative to a given datum; Q is the daily flow. Note that Equation (1) is well known as the standard form that uses only one segment in the RC (denoted RC1).
If two segments are used in the RC (denoted RC2), Equation (1) is rewritten as:
formula
(2)
where a1 and a2 are scaling coefficients; b1 and b2 are reference levels; c1 and c2 are exponential coefficients; br is the transition point in the RC. The transition point is considered in the RC to take into account prior knowledge of the hydraulic controls and information content of the uncertain gauging in estimation as pointed out by Le Coz et al. (2014).

Rainfall–runoff model

Among various rainfall–runoff models, the MIKE NAM model is chosen for simulating the daily flow at the outlet location of the river basin of interest. This is because of (i) the model's ability in terms of performance, fast, and accuracy as well as the wide range of applications of the model in reality (Hafezparast et al. 2013; Lee et al. 2018; Aredo et al. 2021; Ghosh et al. 2022). The model simulates runoff output from rainfall input of the hydrological cycle via the surface, rootzone, groundwater, and snow storages that represent different physical components and characteristics of a catchment.

The MIKE NAM model produces runoff and other related moisture and groundwater components of the land phase of the hydrological cycle using meteorological input data like rainfall and evaporation. The resulting flow is conceptually divided into overland, rootzone, and groundwater flow. The parameters of the model are thus associated with these flow components, including (i) two parameters representing storage, (ii) five parameters illustrating overland and interflow (also called surface and rootzone flow), and (iii) two main parameters for groundwater flow. Additional information on the above-mentioned modeling parameters can be found in Table 4. Regarding the application of MIKE NAM to the river basin of interest, modeling parameters are calibrated and validated based on the comparison between observed data and simulated values of the daily flow at the Quy Chau station in the period between 1/1/1991 and 31/12/2020.

LSTM model

The LSTM model, which is a popular data-driven model, is applied to evaluate the daily flow in the river basin of interest. Regarding the model structure, a memory cell is the main building block of the LSTM model, and it represents essentially the hidden layer. There are three main components in a cell of the LSTM model, consisting of an input, forget, and output gates. The input gate receives the input data and information, e.g., rainfall, water level, rainfall and water level in the river basin of interest. The forget gate allows the memory cell to reset the cell state without growing indefinitely. Indeed, the forget gate also decides which information is allowed to go through and which information to suppress. Finally, the output gate provides the output of the model. This gate will decide how to update the values of the hidden unit. In detail, the output gate combines the input data at the present time, the output of the hidden unit at previous time step, and the components of the bias vector for the hidden units in the last iteration, in order to process the final output results. Further information of governing equations of the different gates, cell state, and hidden units in the LSTM model refers to the previous studies (e.g., Hochreiter & Schmidhuber 1997; Kratzert et al. 2018; Frame et al. 2022).

Estimated daily flow using the LSTM model involves on meteorological and hydrological data in an adaptive intelligent framework that can evaluate the daily flow efficiently (Mishra et al. 2022). The number of epochs, number of hidden units, and learning rate are three hyper-parameters in the LSTM model. The optimal values of these hyper-parameters are normally determined by the trial-and-error method in the training step. The latter is performed by using the dataset from 1/1/1991 to 31/12/2011. The LSTM model is then validated by using the dataset in the period from 1/1/2012 to 31/12/2020. In terms of optimization algorithms, it is worth noting that the Adam optimization algorithm (Kingma & Ba 2015) is chosen in both the training and validation steps of the LSTM model since this algorithm is widely applied in real applications (e.g., Hochreiter & Schmidhuber 1997; Kratzert et al. 2018; Pham Van & Nguyen-Van 2022). Different types of input data (e.g., rainfall, water level, both rainfall and water level) are tested to investigate the impacts of input data on the output results when using the LSTM model.

Performance evaluations

To assess the model performance, both dimensional and dimensionless errors are implemented and applied in the present study. The dimensional errors include the three most popular indicators named mean error (ME), mean absolute error (MAE), and root mean square error (RMSE). The dimensionless errors consist of two common indices such as Nash–Sutcliffe efficiency (NSE) and Pearson's correlation coefficient (r). These dimensional and dimensionless errors are computed as follows:
formula
(3)
formula
(4)
formula
(5)
formula
(6)
formula
(7)
where and are the observed and estimated daily flows at point number i in a record, respectively; and are the mean values of estimated and observed daily flow, respectively; and N is the length of a record.

Determination of regression coefficients of the RC

As mentioned previously, both one and two segments in the RC are examined to determine a suitable one. Using observations of the daily flow and water surface elevation in the long period between 1/1/1990 and 31/12/2020, the appropriate values of the regression coefficients of the RC are identified based on different trial-and-error tests and five statistical errors. An uncertainty analysis of the three coefficients, i.e., a, b, and c in Equation (1) were also performed in order to obtain the optimal values of the regression coefficients (see detail in Table 2). Finally, the optimal values of the regression coefficients are determined as: a1 = 32.15, b1 = 66.64, c1 = 2 for the RC using only the one segment and a1 = 6.95, b1 = 66.0, c1 = 3, a2 = 26.62, b2 = 66.05, c2 = 2 for the RC using two segments. Table 3 summarizes the detailed values of five statistical errors (including RMSE, ME, MAE, NSE, and r) of the daily flow when using the optimal values of regression coefficients mentioned above. Figure 4 shows detailed presentations of (i) daily water elevation versus water discharge when using one and two segments in the RC and (ii) time series of estimated results and observed data of the daily flow in the whole considered period.
Table 2

Statistical characteristics of the daily flow at optimal values of the regression coefficients

Ca
b
RMSE (m3/s)Sum of squared residuals (m3/s)
Optimal value95% confidence intervalOptimal value95% confidence interval
1.95 32.93 32.77÷33.10 66.66 66.65÷66.67 21.32 4.98 × 106 
2.00 32.15 31.99÷32.31 66.64 66.64÷66.65 21.26 4.95 × 106 
2.10 25.15 25.02÷25.29 66.48 66.48÷66.49 22.71 5.65 × 106 
2.15 22.19 22.06÷22.32 66.40 66.39÷66.41 23.46 6.03 × 106 
Ca
b
RMSE (m3/s)Sum of squared residuals (m3/s)
Optimal value95% confidence intervalOptimal value95% confidence interval
1.95 32.93 32.77÷33.10 66.66 66.65÷66.67 21.32 4.98 × 106 
2.00 32.15 31.99÷32.31 66.64 66.64÷66.65 21.26 4.95 × 106 
2.10 25.15 25.02÷25.29 66.48 66.48÷66.49 22.71 5.65 × 106 
2.15 22.19 22.06÷22.32 66.40 66.39÷66.41 23.46 6.03 × 106 
Table 3

Estimated errors of the daily flow when using the RC

Rating curve usingRMSE
MAE
ME
rNSE
m3/s%m3/s%m3/s%
One segment 21.26 0.58 14.78 0.40 −2.03 −0.06 0.983 0.965 
Two segments 19.56 0.53 14.0 0.38 − 0.74 − 0.02 0.985 0.971 
Rating curve usingRMSE
MAE
ME
rNSE
m3/s%m3/s%m3/s%
One segment 21.26 0.58 14.78 0.40 −2.03 −0.06 0.983 0.965 
Two segments 19.56 0.53 14.0 0.38 − 0.74 − 0.02 0.985 0.971 
Table 4

Appropriate values of parameters when using the MIKE NAM model

ParameterDescriptionOptimal valueTypical rangeUnit
Umax Maximum water content in surface storage 35 10–20 mm 
Lmax Maximum water content in root zone storage 388 50–300 mm 
CQOF Overland flow runoff coefficient 0.40 0–1 – 
CKIF Time constant for interflow 1,142 500–1,000 hour 
CK1,2 Time constants for routing overland flow 29.9 3–48 hour 
TOF Root zone threshold value for overland flow 0.372 0–0.99 – 
TIF Root zone threshold value for interflow 0.68  – 
TG Root zone threshold value for groundwater recharge 0.15 0–0.99 – 
CKBF Time constant for routing baseflow 2,386 1,000–4,000 hour 
ParameterDescriptionOptimal valueTypical rangeUnit
Umax Maximum water content in surface storage 35 10–20 mm 
Lmax Maximum water content in root zone storage 388 50–300 mm 
CQOF Overland flow runoff coefficient 0.40 0–1 – 
CKIF Time constant for interflow 1,142 500–1,000 hour 
CK1,2 Time constants for routing overland flow 29.9 3–48 hour 
TOF Root zone threshold value for overland flow 0.372 0–0.99 – 
TIF Root zone threshold value for interflow 0.68  – 
TG Root zone threshold value for groundwater recharge 0.15 0–0.99 – 
CKBF Time constant for routing baseflow 2,386 1,000–4,000 hour 
Figure 4

Rating curve results: (a) water level versus flow, (b) observed versus estimated water discharge using one segment in the RC, (c) observed versus estimated water discharge using two segments in the RC and (d) daily flow series.

Figure 4

Rating curve results: (a) water level versus flow, (b) observed versus estimated water discharge using one segment in the RC, (c) observed versus estimated water discharge using two segments in the RC and (d) daily flow series.

Close modal

The results show that both one and two segments of the RC represented the observed data of the daily flow very well. The dimensional errors of the daily flow are less than 0.6% of the observed magnitude of the flow (Table 3). The correlation coefficient between estimated results and observed data of the daily flow is close to unity, depicting that the rating curve reproduces well the observed variable trend of the daily flow (Figure 4). The values of NSE are greater than 0.96, demonstrating that the rating curve also reproduces the observed values of the daily flow very well. These results confirm that the optimal values of the regression coefficients are obtained when using the RC.

In addition, two segments of the RC showed a slight improvement in estimating water discharge compared to the single one (Table 3), in terms of both statistical errors and variable trends. This is consistent with the results reported in previous studies (e.g., Le Coz et al. 2014; Pham Van & Nguyen-Van 2022), which showed that multiple segments of the RC are more suitable to estimate the daily flow in river basins.

Results of the MIKE NAM

When the MIKE NAM was applied, the dataset in the period from 1/1/1991 to 31/12/2011 was used for calibrating the modeling parameters. Using the optimization tool of the model, the appropriate values of the modeling parameters are determined for the calibration step and listed in Table 4. The detailed statistical errors of the daily flow are synthesized in Table 5. Figure 5 depicts the time series of estimated results versus observed data of the daily flow.
Table 5

Estimated errors of the daily flow for calibrating and validating the MIKE NAM

YearRMSE
MAE
ME
rNSEPhase
(m3/s)(%)(m3/s)(%)(m3/s)(%)
1991–2011 70.91 1.92 37.56 1.02 −18.8 −0.51 0.80 0.606 Calibration 
2012–2020 84.65 3.51 46.0 1.91 −28.9 −1.20 0.732 0.474 Validation 
YearRMSE
MAE
ME
rNSEPhase
(m3/s)(%)(m3/s)(%)(m3/s)(%)
1991–2011 70.91 1.92 37.56 1.02 −18.8 −0.51 0.80 0.606 Calibration 
2012–2020 84.65 3.51 46.0 1.91 −28.9 −1.20 0.732 0.474 Validation 
Figure 5

Time series of estimated and measured daily flow for (a) calibrating and (b) validating the MIKE NAM model.

Figure 5

Time series of estimated and measured daily flow for (a) calibrating and (b) validating the MIKE NAM model.

Close modal

In the calibration step, the values of dimensional errors (e.g., ME, MAE, and RMSE) of the daily flow vary in a range from −18.8 to 71 m3/s (see Table 5). These errors equal from 0.5 to 1.9% of the observed magnitude of the flow at the station. The r=0.80 and NSE = 0.61 are also obtained. These suggest that the MIKE NAM model reproduced acceptably the observed values of the daily flow.

The MIKE NAM model is validated using the remaining collected dataset (from 1/1/2012 to 31/12/2020). As shown in Table 5, a slight increase in the values of dimensional statistical errors is obtained in the validation step. In detail, the values of dimensional errors change from −29 to 84.7 m3/s (equal appropriately from 1.2 to 3.5% of the observed magnitude of the flow at the station). The values of 0.47 and 0.73 are archived for the NSE and r coefficients, respectively.

As shown in Figure 5, the MIKE NAM model reproduced an underestimation of the observed values of the daily flow. The discrepancies between estimated results and observed data of the daily flow are still largely up to 1,500 m3/s in flood seasons. This is because of different reasons as follows. Firstly, the use of the meteorological data at the Quy Chau station for calculating the daily flow may not be an appropriate and characteristic option for the runoff associated with the upstream region of the river basin. Secondly, rainfall–runoff processes in the mountain river basin are presenting inherently very complex due to the impacts of different basin characteristics such as land use, elevation, slope, soil moisture, etc. that commonly change locally. However, to make the model as simple as possible, constant values of the modeling parameters are assumed to be used in the space for all calculations. The use of such constant values of the modeling parameter is another reason for the discrepancy mentioned above.

Results of the LSTM model

The daily flow evaluated from the LSTM model can be thus affected by the input datasets, including meteorological and hydrological quantities such as rainfall and water surface elevation. To investigate the impacts of input on the output when using the LSTM model, three different types of input data are considered, consisting of using (i) rainfall, (ii) water elevation, and (iii) both rainfall and water elevation. Like the rainfall–runoff model, the collected data is also divided into two datasets, i.e., 70% of the dataset for determining the hyper-parameters values based on the trial-and-error method in the training step and the remaining 30% of the dataset for validating the LSTM model. Detailed results in training and validating steps of the LSTM model when using three different input datasets are presented and discussed in this section.

Results of the LSTM model when using rainfall input

Figure 6(a) depicts the measured and computed values of the daily flow for the training step of the LSTM model. Note that these results correspond to the optimal values of the model hyper-parameters, i.e., learning rate = 0.001, the number of hidden units = 128 and the number of epochs = 250 that are determined by the trial-and-error method. As shown in Table 6, the dimensional errors (i.e., ME, MAE, and RMSE) of the daily flow vary in a range from 1.4 to 33.6 m3/s (about from 0.04 to 1% of the observed magnitude of the flow in the river basin). The dimensionless errors are equal to 0.91 and 0.95 for the NSE and r coefficients, respectively. These results suggest that the LSTM model reproduced well the observed daily flow in the training period (from 1/1/1991 to 31/12/2011).
Table 6

Estimated errors of the daily flow for training and validating the LSTM model using the rainfall input

YearRMSE
MAE
ME
rNSEPhase
(m3/s)(%)(m3/s)(%)(m3/s)(%)
1991–2011 33.65 0.92 20.10 0.54 −1.42 −0.04 0.955 0.911 Training 
2012–2020 77.37 3.21 27.86 1.16 1.71 0.07 0.749 0.560 Validation 
YearRMSE
MAE
ME
rNSEPhase
(m3/s)(%)(m3/s)(%)(m3/s)(%)
1991–2011 33.65 0.92 20.10 0.54 −1.42 −0.04 0.955 0.911 Training 
2012–2020 77.37 3.21 27.86 1.16 1.71 0.07 0.749 0.560 Validation 
Figure 6

Time series of estimated and measured daily flow when using rainfall input for (a) training and (b) validating the LSTM model.

Figure 6

Time series of estimated and measured daily flow when using rainfall input for (a) training and (b) validating the LSTM model.

Close modal

Figure 6(b) shows the results for the validation step of the LSTM model using the dataset in the period from 1/1/2012 to 31/12/2020. In comparison with the results obtained in the training step, large values of dimensional errors (including RMSE, MAE, and ME) are presented in the validation step. In detail, the values of dimensional errors vary between 1.7 and 77.4 m3/s. These errors equal from 0.07 to 3.2% of the observed magnitude of the flow recording at the outlet of the river basin (Table 6). Smaller values of dimensionless errors (i.e., r and NSE) are also achieved, e.g., r = 0.75 and NSE = 0.56.

Using the rainfall as input data, the estimated results of the daily flow from the LSTM model are similar to those from the hydrological model named MIKE NAM. These results are consistent with the correlation coefficient between flow and rainfall. The latter is equal to 0.55 as shown in Table 1. On the other hand, as shown in Figure 6, the discrepancy between the estimated results and measured data of the daily flow is still presented when using the rainfall input data in the LSTM model. These results together with the results obtained from MIKE NAM suggest that using the rainfall data for estimating the daily flow achieves only a certain degree of justification of the observed daily flow. This is because the flow in the river basin normally results from integrating various processes such as water concentration on the land surface of the river basin, flow dynamics. However, all these processes are not yet fully taken into account in the MIKE NAM or LSTM model.

Results of the LSTM model when using water level input

Figure 7 illustrates the measured and estimated daily flows for the training and validation steps when the only water level is used as input data in the LSTM model. Detailed statistical errors are summarized in Table 7. Note that the learning rate = 0.001, the number of hidden units = 100 and the number of epochs = 250 were used and these optimal values were also identified from the trial-and-error method.
Table 7

Estimated errors of the daily flow for training and validating the LSTM model using the water level input

YearRMSE
MAE
ME
rNSEPhase
(m3/s)(%)(m3/s)(%)(m3/s)(%)
1991–2011 15.96 0.43 12.38 0.34 1.92 0.05 0.990 0.980 Training 
2012–2020 21.57 0.90 6.98 0.29 −3.26 −0.14 0.990 0.966 Validation 
YearRMSE
MAE
ME
rNSEPhase
(m3/s)(%)(m3/s)(%)(m3/s)(%)
1991–2011 15.96 0.43 12.38 0.34 1.92 0.05 0.990 0.980 Training 
2012–2020 21.57 0.90 6.98 0.29 −3.26 −0.14 0.990 0.966 Validation 
Figure 7

Time series of estimated and measured daily flow when using water level input for (a) training and (b) validating the LSTM model.

Figure 7

Time series of estimated and measured daily flow when using water level input for (a) training and (b) validating the LSTM model.

Close modal

It is evident from the results that overall, the LSTM model represented the observed values of the daily flow very well in the training period. The values of ME, MAE, and RMSE are equal to only about 0.4% of the observed magnitude of the flow in the river basin. The values of dimensionless errors (i.e., r and NSE) are greater than 0.98. Similar results are also achieved in the validation period. These results demonstrate that the water level has a strong relationship with the flow.

Results of the LSTM model when using water level and rainfall inputs

Figure 8 shows the estimated results and observed data of the daily flow when rainfall and water level are used as input data in the LSTM model. Note that these results correspond to the optimal values of hyper-parameters for the case of using water level as input data. Detailed values of dimensional and dimensionless errors for the training and validation steps of the LSTM model are summarized in Table 8. It is observed that there is only a slight improvement in estimating the daily flow when using both rainfall and water level (Figure 8 and Table 8), in comparison with the case using water level as input data. The dimensional errors are about 1% of the observed magnitude of the flow monitoring at the outlet of the river basin. The values of NSE and r are greater than 0.95, demonstrating that the LSTM model represented the observed daily flow very well.
Table 8

Estimated errors of the daily flow for training and validating the LSTM model using both the water level and rainfall inputs

YearRMSE
MAE
ME
rNSEPhase
(m3/s)(%)(m3/s)(%)(m3/s)(%)
1991–2011 12.0 0.35 9.88 0.27 4.64 0.12 0.994 0.987 Training 
2012–2020 27.19 1.13 12.3 0.51 −5.09 −0.21 0.981 0.946 Validation 
YearRMSE
MAE
ME
rNSEPhase
(m3/s)(%)(m3/s)(%)(m3/s)(%)
1991–2011 12.0 0.35 9.88 0.27 4.64 0.12 0.994 0.987 Training 
2012–2020 27.19 1.13 12.3 0.51 −5.09 −0.21 0.981 0.946 Validation 
Figure 8

Time series of estimated and measured daily flow when using both water level and rainfall inputs for (a) training and (b) validating the LSTM model.

Figure 8

Time series of estimated and measured daily flow when using both water level and rainfall inputs for (a) training and (b) validating the LSTM model.

Close modal

Discussion

Figure 9 shows the radar plot of different statistical errors for six simulations associated with the data-driven model and traditional approaches (i.e., rating curve using a single segment – RC1, rating curve using two segments – RC2, NAM, LSTM model using rainfall input data denoted LSTM-X, LSTM model using water level input data denoted LSTM-H, and LSTM model using both rainfall and water level input data – LSTM-XH). Among different models, the rating curve and LSTM model (using water level and both rainfall and water level as input datasets) are more suitable for estimating the daily flow in the studied river basin. Table 9 summarizes different statistical characteristics (including mean, maximum, minimum, and standard derivation quantities) of the estimated results and observed data of the daily flow from six simulations mentioned above in the whole considered period (from 1/1/1991 to 31/12/2020). As shown in the table, the statistical characteristics of the estimated results of the daily flow from the rating curve and LSTM model are more or less similar to the observed data.
Table 9

Summary of different characteristics of measured data and estimated results of the daily flow from different models for the period from 1991 to 2020

MethodMean (m3/s)Min (m3/s)Max (m3/s)Standard derivation (m3/s)
Observations 77.51 6.7 3,690 114.09 
RC1 79.54 17.9 4,167.8 110.67 
RC2 78.26 18.4 3,817.8 111.90 
NAM 99.33 18.3 1,710.6 89.49 
LSTM-X 75.01 21.2 3,689.2 102.96 
LSTM-H 77.14 4.9 3,577.4 108.21 
LSTM-XH 75.80 4.1 3,683.0 108.82 
MethodMean (m3/s)Min (m3/s)Max (m3/s)Standard derivation (m3/s)
Observations 77.51 6.7 3,690 114.09 
RC1 79.54 17.9 4,167.8 110.67 
RC2 78.26 18.4 3,817.8 111.90 
NAM 99.33 18.3 1,710.6 89.49 
LSTM-X 75.01 21.2 3,689.2 102.96 
LSTM-H 77.14 4.9 3,577.4 108.21 
LSTM-XH 75.80 4.1 3,683.0 108.82 
Figure 9

Radar plot for: (a) RMSE, MAE and abs(ME), (b) r and NSE in the period from 1991 to 2011, (c) RMSE, MAE, and abs(ME) and (d) r and NSE in the period from 2012 to 2020.

Figure 9

Radar plot for: (a) RMSE, MAE and abs(ME), (b) r and NSE in the period from 1991 to 2011, (c) RMSE, MAE, and abs(ME) and (d) r and NSE in the period from 2012 to 2020.

Close modal

The obtained results clearly show that the RC can capture the observed values of the daily flow at different conditions. The current form of the RC model, however, is not taken into account fully unsteady flow and hysteresis loop (or dynamic loop with rising and recession limbs in the stage–discharge relation due to the changing flow). A single value curve was used for the RC to make the investigation of the RC as simple as possible. Dynamic model approaches (Fread 1973, 1975), which are unique to a particular flood, can be used for considering the impacts of unsteady flow conditions as well as for calculating two different stages of each discharge in the rising and recession limbs of the RC. A further investigation of unsteady flow conditions using dynamic model approaches will be done in future efforts when detailed measurements of flood characteristics and hysteretic processes are collected.

Even though the RC and LSTM models can capture the observed high flows at the daily scale, the question remains on whether they will be able to capture the more extreme events like flood events at the hourly scale that have not been observed and are considered more high risk. The latter results from several uncertain factors such as hydro-meteorological (e.g., heavy rainfall, storms, tsunamis), hydro-morphological conditions, land use alterations, and demographic changes (Dimitriadis et al. 2016). In addition, an inherent uncertainty in the flow could be observed due to the presence of long-term persistent behavior. As reported by Dimitriadis et al. (2021), the LSTM can deal with the long nonlinear dependencies of sequence values and temporal evolution of the inherent uncertainty of the daily flow based on the so-called Hurst phenomenon. This is because of the Hurst phenomenon taking into account the variability and uncertainty of the flow through scales. Such features can be also dealt with in most cases when using the physical-based model MIKE NAM. The MIKE NAM could also outperform the RC in case of more extreme events (Mishra et al. 2022). To this end, sensitivity analysis based on a stochastic approach (e.g., Monte-Carlo or Bootstrap) should be performed to tackle this problem (Jakeman & Hornberger 1993; Dimitriadis et al. 2016; Parsaie & Haghiabi 2021).

Multiple approaches are presented for estimating the daily flow in the river of interest, with different types of input data requirements (e.g., rainfall, water level, both rainfall, and water level). When rainfall was used solely as input data for estimating the daily flow in the river basin, the hydrological model like MIKE NAM or LSTM model represented only a certain degree of agreement between estimated results and measured data of the daily flow.

Multiple approaches consisting of LSTM, RC, and MIKE NAM are presented and applied for evaluating the daily flow in the Hieu river basin, Vietnam. The main findings of the research can be pointed out as follows:

  • Using the available data in the period from 1/1/1991 to 31/12/2020, the rating curves with one and two segments were investigated, revealing that very good data-model comparisons were archived for the daily flow at the studied location. The dimensional errors of the daily flow (consisting of RMSE, MAE, and ME) were less than 1% of the observed magnitude of the flow, while the dimensionless errors (i.e., r and NES) were greater than 0.96.

  • In terms of the LSTM model, optimal values of three modeling hyper-parameters were carefully calibrated based on the observed daily flow in the period from 1/1/1991 to 31/12/2011. The model was then validated using the measured flow in the other period between 1/1/2012 and 31/12/2020. The LSTM model represented very well the measured flow under different conditions in the long-considered period. The values of ME, MAE and RMSE were also only about 1% of the observed magnitude of the flow at the studied location. The values of r and NSE varied in a range from 0.95 to 0.99 when using water elevation input data.

  • Among the three models investigated herein, the rating curves and the LSTM model are found to be more suitable approaches for estimating the daily flow from the time series of water elevation in the Hieu river basin. These models use only the water level records for input data, which makes the models to be valuable tools for calculating the daily flow in river basins.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Aredo
M. R.
,
Hatiye
S. D.
&
Pingale
S. M.
2021
Modeling the rainfall-runoff using MIKE 11 NAM model in Shaya catchment, Ethiopia
.
Modeling Earth Systems and Environment
https://doi.org/10.1007/s40808-020-01054-8
.
Beven
K. J.
1989
Changing ideas in hydrology: the case of physically based models
.
Journal of Hydrology
105
(
1–2
),
157
172
.
https://doi.org/10.1016/0022-1694(89)90101-7
.
Darby
S. E.
&
Thorne
C. R.
1996
Predicting stage-discharge curves in channels with bank vegetation
.
Journal of Hydraulic Engineering
122
(
10
),
583
586
.
https://doi.org/10.1061/(ASCE)0733-9429(1996)122 : 10(583)
.
Dey
P.
&
Mujumdar
P.
2021
On the statistical complexity of streamflow
.
Hydrological Sciences Journal
1–14
,
325
.
https://doi.org/10.1080/02626667.2021.2000991
.
Dimitriadis
P.
,
Tegos
A.
,
Oikonomou
A.
,
Pagana
V.
,
Koukouvinos
A.
,
Mamassis
N.
,
Koutsoyiannis
D.
&
Efstratiadis
A.
2016
Comparative evaluation of 1D and quasi-2D hydraulic models based on benchmark and real-world applications for uncertainty assessment in flood mapping
.
Journal of Hydrology
534
,
478
492
.
https://doi.org/10.1016/j.jhydrol.2016.01.020
.
Dimitriadis
P.
,
Koutsoyiannis
D.
,
Iliopoulou
T.
&
Papanicolaou
P.
2021
A global-scale investigation of stochastic similarities in marginal distribution and dependence structure of key hydrological-cycle processes
.
Hydrology
8
(
2
),
59
.
https://doi.org/10.3390/hydrology8020059
.
Frame
J. M.
,
Kratzert
F.
,
Gupta
H. V.
,
Ullrich
P.
&
Nearing
G. S.
2022
On strictly enforced mass conservation constraints for modeling the rainfall-runoff process
.
Journal of Hydrologic Processes
.
https://doi.org/10.31223/X5BH0P
Fread
D. L.
1973
A Dynamic Model of Stage-Discharge Relations Affected by Changing Discharge. NOAA Techical Memorandom No. NWS HYDRO-16, National Weather Service, Silver Spring
.
Fread
D. L.
1975
Computation of stage-discharge relationships affected by the unsteady flow
.
Water Resources Bulletin
11
(
2
),
213
228
.
https://doi.org/10.1111/j.1752-1688.1975.tb00674.x
.
Ghosh
A.
,
Roy
M. B.
&
Roy
P. K.
2022
Evaluating the performance of MIKE NAM model on rainfall-runoff in lower Gangetic floodplain, West Bengal, India
.
Modeling Earth Systems and Environment
8
,
4001
4017
.
https://doi.org/10.1007/s40808-021-01347-6
.
Hafezparast
M.
,
Araghinejad
S.
&
Fatemi
S. E.
2013
A conceptual rainfall-runoff model using the auto-calibrated NAM models in the Sarisoo river
.
Hydrology Current Research
4
(
1
),
148
.
https://doi.org/10.4172/2157-7587.1000148
.
Hochreiter
S.
&
Schmidhuber
J.
1997
Long short–term memory
.
Neural Computation
9
,
1735
1780
.
https://doi.org/10.1162/neco.1997.9.8.1735
.
Jakeman
A. J.
&
Hornberger
G. M.
1993
How much complexity is warranted in a rainfall-runoff model?
Water Resources Research
29
(
8
),
2637
2649
.
https://doi.org/10.1029/93WR00877
.
Kiang
J. E.
,
Gazoorian
C.
,
McMillan
H.
,
Coxon
G.
,
Le Coz
J.
,
Westerberg
I. K.
,
Belleville
A.
,
Sevrez
D.
,
Sikorska
A. E.
,
Petersen-Overleir
A.
,
Reitan
T.
,
Freer
J.
,
Renard
B.
,
Mansanarez
V.
&
Mason
R.
2018
A comparison of methods for streamflow uncertainty estimation
.
Water Resources Research
54
,
7149
7176
.
https://doi.org/10.1029/2018WR022708
.
Kingma
D. P.
&
Ba
J. L.
2015
Adam: a method for stochastic optimization
. In
International Conference on Learning Representations
,
7–9 May
,
San Diego
, pp.
1
13
.
Kratzert
F.
,
Klotz
D.
,
Brenner
C.
,
Schulz
K.
&
Herrnegger
M.
2018
Rainfall-runoff modeling using long short–term memory (LSTM) networks
.
Hydrology and Earth System Sciences
22
,
6005
6022
.
https://doi.org/10.5194/hess-22-6005-2018
.
Le Coz
J.
,
Renard
B.
,
Bonnifait
L.
,
Branger
F.
&
Le Bouricoud
R.
2014
Combining hydraulic knowledge and uncertain gaugings in the estimation of hydrometric rating curve: a Bayesian approach
.
Journal of Hydrology
509
,
573
587
.
https://doi.org/10.1016/j.jhydrol.2013.11.016
.
Lee
S. K.
,
Dang
T. A.
&
Tran
T. H.
2018
Combining rainfall–runoff and hydrodynamic models for simulating flow under the impact of climate change to the lower Sai Gon-Dong Nai River basin
.
Paddy and Water Environment
16
,
457
465
.
https://doi.org/10.1007/s10333-018-0639-x
.
Mishra
A.
,
Mukherjee
S.
,
Merz
B.
,
Singh
V. P.
,
Wright
D. B.
,
Villarini
G.
,
Paul
S.
,
Kumar
D. N.
,
Khedun
C. P.
,
Niyogi
D.
,
Schumann
G.
&
Stedinger
J. R.
2022
An overview of flood concepts, challenges, and future directions
.
Journal of Hydrologic Engineering
27
(
6
),
1819
1834
.
https://doi.org/10.1061/(ASCE)HE.1943-5584.0002164
.
Moosavi
V.
,
Fard
Z. G.
&
Vafakhah
M.
2022
Which one is more important in daily runoff forecasting using data driven models: input data, model type, preprocessing or data length?
Journal of Hydrology
606
,
127429
.
https://doi.org/10.1016/j.jhydrol.2022.127429
.
Parsaie
A.
&
Haghiabi
A.,H.
2021
Uncertainty analysis of discharge coefficient of circular crested weirs
.
Applied Water Science
26
,
11
26
.
https://doi.org/10.1007/s13201-020-01329-6
.
Pham Van
C.
&
Nguyen-Van
G.
2022
Three different models to evaluate water discharge: an application to a river section at Vinh Tuy location in the Lo river basin, Vietnam
.
Journal of Hydro-Environment Research
40
,
38
50
.
https://doi.org/10.1016/j.jher.2021.12.002
.
Pham Van
C.
,
de Brye
B.
,
Deleersnijder
E.
,
Hoitink
A. J. F.
,
Sassi
M.
,
Spinewine
B.
,
Hidayat
H.
&
Soares-Frazão
S.
2016
Simulations of the flow in the Mahakam river-lake-delta system, Indonesia
.
Environmental Fluid Mechanics
16
,
603
633
.
https://doi.org/10.1007/s10652-016-9445-4
.
Rozos
E.
,
Dimitriadis
P.
&
Bellos
V.
2022
Machine learning in assessing the performance of hydrological models
.
Hydrology
9
(
1
),
5
.
https://doi.org/10.3390/hydrology9010005.
Yin
H.
,
Wang
F.
,
Zhang
X.
,
Zhang
Y.
,
Chen
J.
,
Xia
R.
&
Jin
J.
2022
Rainfall-runoff modeling using long short-term memory based step-sequence framework
.
Journal of Hydrology
610
,
127901
.
https://doi.org/10.1016/j.jhydrol.2022.127901
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).