## Abstract

This paper presents the long short-term memory (LSTM), rating curve (RC), and rainfall–runoff model that can be used for estimating the daily flow at the outlet of river basins. The Hieu river basin in Vietnam is selected as an example for demonstrating the ability of multiple approaches. Hydro-meteorological data at the Quy Chau station are collected over a long period from 1/1/1991 to 31/12/2020. Multiple approaches mentioned above are implemented and used for calculating the daily flow in the studied river basin. The coefficients and modeling parameters in each approach are then carefully determined based on five statistical error indexes. The results revealed that the RC using either one or two segments and the LSTM model using water elevation as input data represented the observed daily flow very well, with the values of dimensional errors (i.e. mean error, mean absolute error and root mean square error) equal only to about of 1% of the observed magnitude of the flow in the studied river basin, while Nash–Sutcliffe efficiency and correlation coefficients are greater than 0.95. Impacts of different types of input datasets on estimated values of the daily flow are also presented when the LSTM model is applied.

## HIGHLIGHTS

Long short-term memory, rating curve, and rainfall–runoff model are implemented for estimating the daily flow in river basins.

Long short-term memory and rating curve using either one or two segments represented the observed daily flow very well in the studied river basin.

Using the long short-term memory model, input datasets affected significantly estimated values of the daily flow.

### Graphical Abstract

## INTRODUCTION

Daily flow is an essential component of river engineering daily management that requires accurate analysis and day-ahead prediction. This is because daily flow relates to various elements, in terms of using, planning, managing as well as developing sustainable strategies for both the quantity and quality of water resources in river basins. Due to the importance of daily flow, different methods have been developed to compute the daily flow in river basins, including (i) semi-empirical formulas such as rating curves (RCs) under a standard or modified form (Le Coz *et al.* 2014; Kiang *et al.* 2018; Pham Van & Nguyen-Van 2022), (ii) physically based models such as rainfall–runoff MIKE NAM (Hafezparast *et al.* 2013; Lee *et al.* 2018; Aredo *et al.* 2021; Ghosh *et al.* 2022), (iii) hydraulic models like one-dimensional sectional-averaged, two-dimensional depth-averaged models (Pham Van *et al.* 2016; Lee *et al.* 2018), lateral distribution method (Darby & Thorne 1996), (iv) data-driven models such as the long short-term memory denoted as LSTM (Hochreiter & Schmidhuber 1997), an alternative and extension of the LSTM that is well known and so called as gated recurrent units (Pham Van & Nguyen-Van 2022), and (v) machine learning (ML) models that is to be robust and efficient, frequently outperforming the standard hydrological models, both conceptual and physically based, and can be used to assess the performance of hydrological models (Rozos *et al.* 2022). However, estimation of the daily flow is still challenging and not straightforward because of various sources of uncertainty, e.g., multiple features of river basins, nonlinear interactions between multiple features, multiple time scales of flow variability as well as input, parameters, process representations, model structure, etc. that inherently exist in each method (Dey & Mujumdar 2021).

A huge number of studies use traditional approaches such as RC, physically based models such as rainfall–runoff MIKE NAM, and hydraulic models for estimating the flow in river basins (e.g., Darby & Thorne 1996; Hafezparast *et al.* 2013; Le Coz *et al.* 2014; Kiang *et al.* 2018; Lee *et al.* 2018; Aredo *et al.* 2021; Ghosh *et al.* 2022; Pham Van & Nguyen-Van 2022). Although the results achieved from these approaches are impressive at least to some extent. However, in traditional approaches, several drawbacks remain, and one of the main downsides is that their accuracy greatly depends on the quality of input datasets and data length (for both RC and MIKE NAM as well as hydraulic models), model type, structure, and parameters (for MIKE NAM and hydraulic models). In addition, the RC does not allow to take into account fully unsteady flow conditions and hysteretic processes (Fread 1973, 1975). In terms of physically based models, as pointed out by Beven (1989), there are still practical physically based predictions such as realistic estimates of the uncertainty associated with the model's predictions. Regarding hydraulic models, there are difficult to apply such kind of models because of the requirement of multiple input data like geometry, bed friction, and three-dimensional flow characteristics that are not always available (Pham Van & Nguyen-Van 2022). Thus, significant efforts are still needed to improve the accuracy of traditional approaches as well as to appropriately represent a nonlinear time series of the daily flow in hydrological science (Kratzert *et al.* 2018; Yin *et al.* 2022).

Recently, the LSTM, which is one of the popular data-driven models, has been applied for flow estimation and prediction in different river basins. Kratzert *et al.* (2018) investigated the potential of using the LSTM model for simulating the daily flow from meteorological observations in 241 basins of the catchment attributes and meteorology for large-sample studies dataset, in the United States. Yin *et al.* (2022) use the same datasets as Kratzert *et al.* (2018) to test the LSTM using the step-sequence framework in the rainfall–runoff model for daily flow predictions. Pham Van & Nguyen-Van (2022) use the gated recurrent units for calculating the flow in the Lo river basin, Vietnam. Frame *et al.* (2022) compared the different types of data-driven models (e.g., LSTM, mass-conserving LSTM), and argued that data-driven models such as LSTM can outperform catchment-scale conceptual and physically based models at predicting flow. Achieved results from previous studies mentioned above show that the LSTM model can be applied to estimate the daily flow from available hydro-meteorological observations with accuracies comparable to the other models. However, as pointed out by Moosavi *et al.* (2022), input variables may affect the performance of data-driven models in general and LSTM models in particular.

The main purpose of this study is to describe and implement multiple approaches including the data-driven model (like the LSTM) and traditional approaches (such as RC and MIKE NAM) that can be used for estimating the daily flow at the outlet of river basins. In particular, the research also aims at (i) identifying the optimal values of the coefficients and modeling parameters in each approach or model, (ii) investigating the impacts of different types of input data (e.g., rainfall, water elevation, both rainfall and water elevation) when using the LSTM model for simulating the daily flow, and (iii) determining the potential appropriate approaches (among multiple approaches mentioned above) that can be applied for simulating the flow at the time scale of the daily in the river basin. In terms of the Hieu river basin, which is the river of interest in the present study, this study is also the first one in which the data-driven model and traditional approaches are implemented and then applied to compute the daily flow. The daily time series of available hydro-meteorological observations (e.g., flow, water elevation, rainfall, and evaporation) in the collection period from 1/1/1990 to 31/12/2020 were used to determine the optimal values for regression coefficients in the RC using one or two segments, modeling parameters in MIKE NAM and LSTM models. Five statistical estimate errors are also used to measure quantitatively the comparison between estimated results and observed data of the daily flow in the studied river basin.

## STUDY AREA AND DATA COLLECTION

### The Hieu river basin

^{2}in which the river basin area from the source to the Quy Chau (hydrological station) is about 1,960 km

^{2}. The Hieu river has various tributaries such as Nam Quang, Nam Giai, Ke Coc – Khe Nha, Chang, Dinh, Khe Nghia, and Khe Da. The river is characterized by materials of gravels, with narrow and shallow characteristics in the upstream area (from the source to Nghia Dan), where the water depth varies in a range from 0.5 to 2.0 m in the dry season (from December to July). The geometry of the river varies significantly because of flow changes and the river's meandering in the downstream region. The width of the river in this part is wide, ranging from 150 to 280 m in the flood season (from August to November) and from 100 to 120 m in the dry season.

In terms of characteristics of the weather, the mean monthly temperature at the Quy Chau station varies between 17.4 and 28.5°C. The annual evapotranspiration changes in a range from 700 to 940 mm, while annual moisture varies between 85 and 94%. The mean monthly rainfall changes from 13.8 to 322.0 mm, with annual rainfall ranging between 1,170 and 2,110 mm. The annual flow at the Quy Chau is about 77.5 m^{3}/s. The daily flow varies in a wide range from 6 m^{3}/s (in the dry season) to 3,690 m^{3}/s (in the flood season). Large flood events often occur in September and October, while the three driest months usually occur from February to April. The total flow volume in the flood season equals 55–75% of the annual flow.

The Hieu river basin has an important role in providing the water supply for Que Phong, Quy Chau, Quy Hop, Nghia Dan, and Tan Ky districts in Nghe An province. In detail, Chang and Dinh tributaries, among the different tributaries mentioned above, are the two largest tributaries that supply water for different users in the Que Phong district. It is thus necessary to determine quantitatively the daily flow in the river basin to allow for achieving a good assessment of the water supply as well as to enhance biodiversity and river ecosystems.

### Data collection

Station . | Quantity . | Location . | Period . | Statistical characteristics . | |||||
---|---|---|---|---|---|---|---|---|---|

Longitude . | Latitude . | Mean . | Standard-deviation . | Skewness . | Kurtosis . | r . | |||

Quy Chau | Evaporation | 105°06′000″ | 19°34′00″ | 1991–2020 | 0.33 | 1.30 | 5.38 | 73.61 | 0.121 |

Quy Chau | Rainfall | 4.45 | 13.80 | 6.87 | 81.16 | 0.551 | |||

Quy Chau | Water level | 105°12'05″ | 19°33'50″ | 68.11 | 0.57 | 4.37 | 41.13 | 0.919 | |

Quy Chau | Water discharge | 77.50 | 114.1 | 11.93 | 253.29 | 1.00 |

Station . | Quantity . | Location . | Period . | Statistical characteristics . | |||||
---|---|---|---|---|---|---|---|---|---|

Longitude . | Latitude . | Mean . | Standard-deviation . | Skewness . | Kurtosis . | r . | |||

Quy Chau | Evaporation | 105°06′000″ | 19°34′00″ | 1991–2020 | 0.33 | 1.30 | 5.38 | 73.61 | 0.121 |

Quy Chau | Rainfall | 4.45 | 13.80 | 6.87 | 81.16 | 0.551 | |||

Quy Chau | Water level | 105°12'05″ | 19°33'50″ | 68.11 | 0.57 | 4.37 | 41.13 | 0.919 | |

Quy Chau | Water discharge | 77.50 | 114.1 | 11.93 | 253.29 | 1.00 |

## METHODOLOGY

### Stage–discharge relationship

*et al.*2014):where

*a*and

*c*are scaling and exponential coefficients, respectively;

*b*and

*H*are, respectively, reference level and water level relative to a given datum;

*Q*is the daily flow. Note that Equation (1) is well known as the standard form that uses only one segment in the RC (denoted RC1).

*a*

_{1}and

*a*

_{2}are scaling coefficients;

*b*

_{1}and

*b*

_{2}are reference levels;

*c*

_{1}and

*c*

_{2}are exponential coefficients;

*br*is the transition point in the RC. The transition point is considered in the RC to take into account prior knowledge of the hydraulic controls and information content of the uncertain gauging in estimation as pointed out by Le Coz

*et al.*(2014).

### Rainfall–runoff model

Among various rainfall–runoff models, the MIKE NAM model is chosen for simulating the daily flow at the outlet location of the river basin of interest. This is because of (i) the model's ability in terms of performance, fast, and accuracy as well as the wide range of applications of the model in reality (Hafezparast *et al.* 2013; Lee *et al.* 2018; Aredo *et al.* 2021; Ghosh *et al.* 2022). The model simulates runoff output from rainfall input of the hydrological cycle via the surface, rootzone, groundwater, and snow storages that represent different physical components and characteristics of a catchment.

The MIKE NAM model produces runoff and other related moisture and groundwater components of the land phase of the hydrological cycle using meteorological input data like rainfall and evaporation. The resulting flow is conceptually divided into overland, rootzone, and groundwater flow. The parameters of the model are thus associated with these flow components, including (i) two parameters representing storage, (ii) five parameters illustrating overland and interflow (also called surface and rootzone flow), and (iii) two main parameters for groundwater flow. Additional information on the above-mentioned modeling parameters can be found in Table 4. Regarding the application of MIKE NAM to the river basin of interest, modeling parameters are calibrated and validated based on the comparison between observed data and simulated values of the daily flow at the Quy Chau station in the period between 1/1/1991 and 31/12/2020.

### LSTM model

The LSTM model, which is a popular data-driven model, is applied to evaluate the daily flow in the river basin of interest. Regarding the model structure, a memory cell is the main building block of the LSTM model, and it represents essentially the hidden layer. There are three main components in a cell of the LSTM model, consisting of an input, forget, and output gates. The input gate receives the input data and information, e.g., rainfall, water level, rainfall and water level in the river basin of interest. The forget gate allows the memory cell to reset the cell state without growing indefinitely. Indeed, the forget gate also decides which information is allowed to go through and which information to suppress. Finally, the output gate provides the output of the model. This gate will decide how to update the values of the hidden unit. In detail, the output gate combines the input data at the present time, the output of the hidden unit at previous time step, and the components of the bias vector for the hidden units in the last iteration, in order to process the final output results. Further information of governing equations of the different gates, cell state, and hidden units in the LSTM model refers to the previous studies (e.g., Hochreiter & Schmidhuber 1997; Kratzert *et al.* 2018; Frame *et al.* 2022).

Estimated daily flow using the LSTM model involves on meteorological and hydrological data in an adaptive intelligent framework that can evaluate the daily flow efficiently (Mishra *et al.* 2022). The number of epochs, number of hidden units, and learning rate are three hyper-parameters in the LSTM model. The optimal values of these hyper-parameters are normally determined by the trial-and-error method in the training step. The latter is performed by using the dataset from 1/1/1991 to 31/12/2011. The LSTM model is then validated by using the dataset in the period from 1/1/2012 to 31/12/2020. In terms of optimization algorithms, it is worth noting that the Adam optimization algorithm (Kingma & Ba 2015) is chosen in both the training and validation steps of the LSTM model since this algorithm is widely applied in real applications (e.g., Hochreiter & Schmidhuber 1997; Kratzert *et al.* 2018; Pham Van & Nguyen-Van 2022). Different types of input data (e.g., rainfall, water level, both rainfall and water level) are tested to investigate the impacts of input data on the output results when using the LSTM model.

### Performance evaluations

*ME*), mean absolute error (

*MAE*), and root mean square error (

*RMSE*). The dimensionless errors consist of two common indices such as Nash–Sutcliffe efficiency (

*NSE*) and Pearson's correlation coefficient (

*r*). These dimensional and dimensionless errors are computed as follows:where and are the observed and estimated daily flows at point number

*i*in a record, respectively; and are the mean values of estimated and observed daily flow, respectively; and

*N*is the length of a record.

## RESULTS AND DISCUSSIONS

### Determination of regression coefficients of the RC

*a*,

*b*, and

*c*in Equation (1) were also performed in order to obtain the optimal values of the regression coefficients (see detail in Table 2). Finally, the optimal values of the regression coefficients are determined as:

*a*

_{1}= 32.15,

*b*

_{1}= 66.64,

*c*

_{1}= 2 for the RC using only the one segment and

*a*

_{1}= 6.95,

*b*

_{1}= 66.0,

*c*

_{1}= 3,

*a*

_{2}= 26.62,

*b*

_{2}= 66.05,

*c*

_{2}= 2 for the RC using two segments. Table 3 summarizes the detailed values of five statistical errors (including

*RMSE*,

*ME*,

*MAE*,

*NSE*, and

*r*) of the daily flow when using the optimal values of regression coefficients mentioned above. Figure 4 shows detailed presentations of (i) daily water elevation versus water discharge when using one and two segments in the RC and (ii) time series of estimated results and observed data of the daily flow in the whole considered period.

C
. | a. | b. | RMSE (m^{3}/s)
. | Sum of squared residuals (m^{3}/s)
. | ||
---|---|---|---|---|---|---|

Optimal value . | 95% confidence interval . | Optimal value . | 95% confidence interval . | |||

1.95 | 32.93 | 32.77÷33.10 | 66.66 | 66.65÷66.67 | 21.32 | 4.98 × 10^{6} |

2.00 | 32.15 | 31.99÷32.31 | 66.64 | 66.64÷66.65 | 21.26 | 4.95 × 10^{6} |

2.10 | 25.15 | 25.02÷25.29 | 66.48 | 66.48÷66.49 | 22.71 | 5.65 × 10^{6} |

2.15 | 22.19 | 22.06÷22.32 | 66.40 | 66.39÷66.41 | 23.46 | 6.03 × 10^{6} |

C
. | a. | b. | RMSE (m^{3}/s)
. | Sum of squared residuals (m^{3}/s)
. | ||
---|---|---|---|---|---|---|

Optimal value . | 95% confidence interval . | Optimal value . | 95% confidence interval . | |||

1.95 | 32.93 | 32.77÷33.10 | 66.66 | 66.65÷66.67 | 21.32 | 4.98 × 10^{6} |

2.00 | 32.15 | 31.99÷32.31 | 66.64 | 66.64÷66.65 | 21.26 | 4.95 × 10^{6} |

2.10 | 25.15 | 25.02÷25.29 | 66.48 | 66.48÷66.49 | 22.71 | 5.65 × 10^{6} |

2.15 | 22.19 | 22.06÷22.32 | 66.40 | 66.39÷66.41 | 23.46 | 6.03 × 10^{6} |

Rating curve using . | RMSE. | MAE. | ME. | r
. | NSE
. | |||
---|---|---|---|---|---|---|---|---|

m^{3}/s
. | % . | m^{3}/s
. | % . | m^{3}/s
. | % . | |||

One segment | 21.26 | 0.58 | 14.78 | 0.40 | −2.03 | −0.06 | 0.983 | 0.965 |

Two segments | 19.56 | 0.53 | 14.0 | 0.38 | − 0.74 | − 0.02 | 0.985 | 0.971 |

Rating curve using . | RMSE. | MAE. | ME. | r
. | NSE
. | |||
---|---|---|---|---|---|---|---|---|

m^{3}/s
. | % . | m^{3}/s
. | % . | m^{3}/s
. | % . | |||

One segment | 21.26 | 0.58 | 14.78 | 0.40 | −2.03 | −0.06 | 0.983 | 0.965 |

Two segments | 19.56 | 0.53 | 14.0 | 0.38 | − 0.74 | − 0.02 | 0.985 | 0.971 |

Parameter . | Description . | Optimal value . | Typical range . | Unit . |
---|---|---|---|---|

U_{max} | Maximum water content in surface storage | 35 | 10–20 | mm |

L_{max} | Maximum water content in root zone storage | 388 | 50–300 | mm |

CQOF | Overland flow runoff coefficient | 0.40 | 0–1 | – |

CKIF | Time constant for interflow | 1,142 | 500–1,000 | hour |

CK1,2 | Time constants for routing overland flow | 29.9 | 3–48 | hour |

TOF | Root zone threshold value for overland flow | 0.372 | 0–0.99 | – |

TIF | Root zone threshold value for interflow | 0.68 | – | |

TG | Root zone threshold value for groundwater recharge | 0.15 | 0–0.99 | – |

CKBF | Time constant for routing baseflow | 2,386 | 1,000–4,000 | hour |

Parameter . | Description . | Optimal value . | Typical range . | Unit . |
---|---|---|---|---|

U_{max} | Maximum water content in surface storage | 35 | 10–20 | mm |

L_{max} | Maximum water content in root zone storage | 388 | 50–300 | mm |

CQOF | Overland flow runoff coefficient | 0.40 | 0–1 | – |

CKIF | Time constant for interflow | 1,142 | 500–1,000 | hour |

CK1,2 | Time constants for routing overland flow | 29.9 | 3–48 | hour |

TOF | Root zone threshold value for overland flow | 0.372 | 0–0.99 | – |

TIF | Root zone threshold value for interflow | 0.68 | – | |

TG | Root zone threshold value for groundwater recharge | 0.15 | 0–0.99 | – |

CKBF | Time constant for routing baseflow | 2,386 | 1,000–4,000 | hour |

The results show that both one and two segments of the RC represented the observed data of the daily flow very well. The dimensional errors of the daily flow are less than 0.6% of the observed magnitude of the flow (Table 3). The correlation coefficient between estimated results and observed data of the daily flow is close to unity, depicting that the rating curve reproduces well the observed variable trend of the daily flow (Figure 4). The values of *NSE* are greater than 0.96, demonstrating that the rating curve also reproduces the observed values of the daily flow very well. These results confirm that the optimal values of the regression coefficients are obtained when using the RC.

In addition, two segments of the RC showed a slight improvement in estimating water discharge compared to the single one (Table 3), in terms of both statistical errors and variable trends. This is consistent with the results reported in previous studies (e.g., Le Coz *et al.* 2014; Pham Van & Nguyen-Van 2022), which showed that multiple segments of the RC are more suitable to estimate the daily flow in river basins.

### Results of the MIKE NAM

Year . | RMSE. | MAE. | ME. | r
. | NSE
. | Phase . | |||
---|---|---|---|---|---|---|---|---|---|

(m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | ||||

1991–2011 | 70.91 | 1.92 | 37.56 | 1.02 | −18.8 | −0.51 | 0.80 | 0.606 | Calibration |

2012–2020 | 84.65 | 3.51 | 46.0 | 1.91 | −28.9 | −1.20 | 0.732 | 0.474 | Validation |

Year . | RMSE. | MAE. | ME. | r
. | NSE
. | Phase . | |||
---|---|---|---|---|---|---|---|---|---|

(m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | ||||

1991–2011 | 70.91 | 1.92 | 37.56 | 1.02 | −18.8 | −0.51 | 0.80 | 0.606 | Calibration |

2012–2020 | 84.65 | 3.51 | 46.0 | 1.91 | −28.9 | −1.20 | 0.732 | 0.474 | Validation |

In the calibration step, the values of dimensional errors (e.g., *ME, MAE*, and *RMSE*) of the daily flow vary in a range from −18.8 to 71 m^{3}/s (see Table 5). These errors equal from 0.5 to 1.9% of the observed magnitude of the flow at the station. The *r**=**0.80* and *NSE* = 0.61 are also obtained. These suggest that the MIKE NAM model reproduced acceptably the observed values of the daily flow.

The MIKE NAM model is validated using the remaining collected dataset (from 1/1/2012 to 31/12/2020). As shown in Table 5, a slight increase in the values of dimensional statistical errors is obtained in the validation step. In detail, the values of dimensional errors change from −29 to 84.7 m^{3}/s (equal appropriately from 1.2 to 3.5% of the observed magnitude of the flow at the station). The values of 0.47 and 0.73 are archived for the *NSE* and *r* coefficients, respectively.

As shown in Figure 5, the MIKE NAM model reproduced an underestimation of the observed values of the daily flow. The discrepancies between estimated results and observed data of the daily flow are still largely up to 1,500 m^{3}/s in flood seasons. This is because of different reasons as follows. Firstly, the use of the meteorological data at the Quy Chau station for calculating the daily flow may not be an appropriate and characteristic option for the runoff associated with the upstream region of the river basin. Secondly, rainfall–runoff processes in the mountain river basin are presenting inherently very complex due to the impacts of different basin characteristics such as land use, elevation, slope, soil moisture, etc. that commonly change locally. However, to make the model as simple as possible, constant values of the modeling parameters are assumed to be used in the space for all calculations. The use of such constant values of the modeling parameter is another reason for the discrepancy mentioned above.

### Results of the LSTM model

The daily flow evaluated from the LSTM model can be thus affected by the input datasets, including meteorological and hydrological quantities such as rainfall and water surface elevation. To investigate the impacts of input on the output when using the LSTM model, three different types of input data are considered, consisting of using (i) rainfall, (ii) water elevation, and (iii) both rainfall and water elevation. Like the rainfall–runoff model, the collected data is also divided into two datasets, i.e., 70% of the dataset for determining the hyper-parameters values based on the trial-and-error method in the training step and the remaining 30% of the dataset for validating the LSTM model. Detailed results in training and validating steps of the LSTM model when using three different input datasets are presented and discussed in this section.

#### Results of the LSTM model when using rainfall input

*ME*,

*MAE*, and

*RMSE*) of the daily flow vary in a range from 1.4 to 33.6 m

^{3}/s (about from 0.04 to 1% of the observed magnitude of the flow in the river basin). The dimensionless errors are equal to 0.91 and 0.95 for the

*NSE*and

*r*coefficients, respectively. These results suggest that the LSTM model reproduced well the observed daily flow in the training period (from 1/1/1991 to 31/12/2011).

Year . | RMSE. | MAE. | ME. | r
. | NSE
. | Phase . | |||
---|---|---|---|---|---|---|---|---|---|

(m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | ||||

1991–2011 | 33.65 | 0.92 | 20.10 | 0.54 | −1.42 | −0.04 | 0.955 | 0.911 | Training |

2012–2020 | 77.37 | 3.21 | 27.86 | 1.16 | 1.71 | 0.07 | 0.749 | 0.560 | Validation |

Year . | RMSE. | MAE. | ME. | r
. | NSE
. | Phase . | |||
---|---|---|---|---|---|---|---|---|---|

(m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | ||||

1991–2011 | 33.65 | 0.92 | 20.10 | 0.54 | −1.42 | −0.04 | 0.955 | 0.911 | Training |

2012–2020 | 77.37 | 3.21 | 27.86 | 1.16 | 1.71 | 0.07 | 0.749 | 0.560 | Validation |

Figure 6(b) shows the results for the validation step of the LSTM model using the dataset in the period from 1/1/2012 to 31/12/2020. In comparison with the results obtained in the training step, large values of dimensional errors (including *RMSE*, *MAE*, and *ME*) are presented in the validation step. In detail, the values of dimensional errors vary between 1.7 and 77.4 m^{3}/s. These errors equal from 0.07 to 3.2% of the observed magnitude of the flow recording at the outlet of the river basin (Table 6). Smaller values of dimensionless errors (i.e., *r* and *NSE*) are also achieved, e.g., *r* = 0.75 and *NSE* = 0.56.

Using the rainfall as input data, the estimated results of the daily flow from the LSTM model are similar to those from the hydrological model named MIKE NAM. These results are consistent with the correlation coefficient between flow and rainfall. The latter is equal to 0.55 as shown in Table 1. On the other hand, as shown in Figure 6, the discrepancy between the estimated results and measured data of the daily flow is still presented when using the rainfall input data in the LSTM model. These results together with the results obtained from MIKE NAM suggest that using the rainfall data for estimating the daily flow achieves only a certain degree of justification of the observed daily flow. This is because the flow in the river basin normally results from integrating various processes such as water concentration on the land surface of the river basin, flow dynamics. However, all these processes are not yet fully taken into account in the MIKE NAM or LSTM model.

#### Results of the LSTM model when using water level input

Year . | RMSE. | MAE. | ME. | r
. | NSE
. | Phase . | |||
---|---|---|---|---|---|---|---|---|---|

(m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | ||||

1991–2011 | 15.96 | 0.43 | 12.38 | 0.34 | 1.92 | 0.05 | 0.990 | 0.980 | Training |

2012–2020 | 21.57 | 0.90 | 6.98 | 0.29 | −3.26 | −0.14 | 0.990 | 0.966 | Validation |

Year . | RMSE. | MAE. | ME. | r
. | NSE
. | Phase . | |||
---|---|---|---|---|---|---|---|---|---|

(m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | ||||

1991–2011 | 15.96 | 0.43 | 12.38 | 0.34 | 1.92 | 0.05 | 0.990 | 0.980 | Training |

2012–2020 | 21.57 | 0.90 | 6.98 | 0.29 | −3.26 | −0.14 | 0.990 | 0.966 | Validation |

It is evident from the results that overall, the LSTM model represented the observed values of the daily flow very well in the training period. The values of *ME*, *MAE*, and *RMSE* are equal to only about 0.4% of the observed magnitude of the flow in the river basin. The values of dimensionless errors (i.e., *r* and *NSE*) are greater than 0.98. Similar results are also achieved in the validation period. These results demonstrate that the water level has a strong relationship with the flow.

#### Results of the LSTM model when using water level and rainfall inputs

*NSE*and

*r*are greater than 0.95, demonstrating that the LSTM model represented the observed daily flow very well.

Year . | RMSE. | MAE. | ME. | r
. | NSE
. | Phase . | |||
---|---|---|---|---|---|---|---|---|---|

(m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | ||||

1991–2011 | 12.0 | 0.35 | 9.88 | 0.27 | 4.64 | 0.12 | 0.994 | 0.987 | Training |

2012–2020 | 27.19 | 1.13 | 12.3 | 0.51 | −5.09 | −0.21 | 0.981 | 0.946 | Validation |

Year . | RMSE. | MAE. | ME. | r
. | NSE
. | Phase . | |||
---|---|---|---|---|---|---|---|---|---|

(m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | (m^{3}/s)
. | (%) . | ||||

1991–2011 | 12.0 | 0.35 | 9.88 | 0.27 | 4.64 | 0.12 | 0.994 | 0.987 | Training |

2012–2020 | 27.19 | 1.13 | 12.3 | 0.51 | −5.09 | −0.21 | 0.981 | 0.946 | Validation |

#### Discussion

Method . | Mean (m^{3}/s)
. | Min (m^{3}/s)
. | Max (m^{3}/s)
. | Standard derivation (m^{3}/s)
. |
---|---|---|---|---|

Observations | 77.51 | 6.7 | 3,690 | 114.09 |

RC1 | 79.54 | 17.9 | 4,167.8 | 110.67 |

RC2 | 78.26 | 18.4 | 3,817.8 | 111.90 |

NAM | 99.33 | 18.3 | 1,710.6 | 89.49 |

LSTM-X | 75.01 | 21.2 | 3,689.2 | 102.96 |

LSTM-H | 77.14 | 4.9 | 3,577.4 | 108.21 |

LSTM-XH | 75.80 | 4.1 | 3,683.0 | 108.82 |

Method . | Mean (m^{3}/s)
. | Min (m^{3}/s)
. | Max (m^{3}/s)
. | Standard derivation (m^{3}/s)
. |
---|---|---|---|---|

Observations | 77.51 | 6.7 | 3,690 | 114.09 |

RC1 | 79.54 | 17.9 | 4,167.8 | 110.67 |

RC2 | 78.26 | 18.4 | 3,817.8 | 111.90 |

NAM | 99.33 | 18.3 | 1,710.6 | 89.49 |

LSTM-X | 75.01 | 21.2 | 3,689.2 | 102.96 |

LSTM-H | 77.14 | 4.9 | 3,577.4 | 108.21 |

LSTM-XH | 75.80 | 4.1 | 3,683.0 | 108.82 |

The obtained results clearly show that the RC can capture the observed values of the daily flow at different conditions. The current form of the RC model, however, is not taken into account fully unsteady flow and hysteresis loop (or dynamic loop with rising and recession limbs in the stage–discharge relation due to the changing flow). A single value curve was used for the RC to make the investigation of the RC as simple as possible. Dynamic model approaches (Fread 1973, 1975), which are unique to a particular flood, can be used for considering the impacts of unsteady flow conditions as well as for calculating two different stages of each discharge in the rising and recession limbs of the RC. A further investigation of unsteady flow conditions using dynamic model approaches will be done in future efforts when detailed measurements of flood characteristics and hysteretic processes are collected.

Even though the RC and LSTM models can capture the observed high flows at the daily scale, the question remains on whether they will be able to capture the more extreme events like flood events at the hourly scale that have not been observed and are considered more high risk. The latter results from several uncertain factors such as hydro-meteorological (e.g., heavy rainfall, storms, tsunamis), hydro-morphological conditions, land use alterations, and demographic changes (Dimitriadis *et al.* 2016). In addition, an inherent uncertainty in the flow could be observed due to the presence of long-term persistent behavior. As reported by Dimitriadis *et al.* (2021), the LSTM can deal with the long nonlinear dependencies of sequence values and temporal evolution of the inherent uncertainty of the daily flow based on the so-called Hurst phenomenon. This is because of the Hurst phenomenon taking into account the variability and uncertainty of the flow through scales. Such features can be also dealt with in most cases when using the physical-based model MIKE NAM. The MIKE NAM could also outperform the RC in case of more extreme events (Mishra *et al.* 2022). To this end, sensitivity analysis based on a stochastic approach (e.g., Monte-Carlo or Bootstrap) should be performed to tackle this problem (Jakeman & Hornberger 1993; Dimitriadis *et al.* 2016; Parsaie & Haghiabi 2021).

Multiple approaches are presented for estimating the daily flow in the river of interest, with different types of input data requirements (e.g., rainfall, water level, both rainfall, and water level). When rainfall was used solely as input data for estimating the daily flow in the river basin, the hydrological model like MIKE NAM or LSTM model represented only a certain degree of agreement between estimated results and measured data of the daily flow.

## CONCLUSION

Multiple approaches consisting of LSTM, RC, and MIKE NAM are presented and applied for evaluating the daily flow in the Hieu river basin, Vietnam. The main findings of the research can be pointed out as follows:

Using the available data in the period from 1/1/1991 to 31/12/2020, the rating curves with one and two segments were investigated, revealing that very good data-model comparisons were archived for the daily flow at the studied location. The dimensional errors of the daily flow (consisting of

*RMSE*,*MAE*, and*ME*) were less than 1% of the observed magnitude of the flow, while the dimensionless errors (i.e.,*r*and*NES*) were greater than 0.96.In terms of the LSTM model, optimal values of three modeling hyper-parameters were carefully calibrated based on the observed daily flow in the period from 1/1/1991 to 31/12/2011. The model was then validated using the measured flow in the other period between 1/1/2012 and 31/12/2020. The LSTM model represented very well the measured flow under different conditions in the long-considered period. The values of

*ME*,*MAE*and*RMSE*were also only about 1% of the observed magnitude of the flow at the studied location. The values of*r*and*NSE*varied in a range from 0.95 to 0.99 when using water elevation input data.Among the three models investigated herein, the rating curves and the LSTM model are found to be more suitable approaches for estimating the daily flow from the time series of water elevation in the Hieu river basin. These models use only the water level records for input data, which makes the models to be valuable tools for calculating the daily flow in river basins.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.