## Abstract

In this context, multiple linear regression (*MLR*) and long short-term memory (*LSTM*) are presented to evaluate water levels in irrigation and drainage systems based on the available water levels at inlet and outlet locations. The Bac Hung Hai irrigation and drainage system is chosen as an example for demonstrating the *MLR* and *LSTM* models. Six statistical metrics including root mean square error (*RMSE*), mean absolute error (*MAE*), mean error (*ME*), Willmott's score (*WS*), Pearson's correlation coefficient (*r*), and Nash–Sutcliffe efficiency (*NSE*) are implemented for quantitatively assessing the agreement between estimated and observed water levels at 12 locations of interest within the system in the period from 2000 to 2021 (with an interval time of 6 hours). The results showed that *MLR* and *LSTM* models can be used for evaluating water levels with high accuracy. The values of dimensional statistical errors equal only about 6% of the maximum water level monitoring at the locations of interest for both *MLR* and *LSTM* models. The values of dimensionless statistical errors range from 0.76 to 0.99 for all 12 locations of interest in the studied system. In addition, both models are benchmarked and could be used for other agricultural systems.

## HIGHLIGHTS

Multiple linear regression and long short-term memory models are presented for evaluating water levels in irrigation and drainage systems.

Both multiple linear regression and long short-term memory model reproduced well the observed water levels.

The LSTM model proved to be a more suitable approach than the multiple linear regression for evaluating water levels at different locations of interest.

### Graphical Abstract

## INTRODUCTION

The water elevation, which is one of the various flow characteristics, is an important quantity in the irrigation and drainage systems. This is because water elevation affects not only the regulation and operation of hydraulic structures and controls (e.g., culverts, sluice gates, pumps) existing inherently in systems but also the management of irrigation and drainage water in an economically efficient and socially equitable manner (Pereira *et al.* 2012). However, due to climate changes, socio-economic developments, urbanization, and change in irrigation and drainage areas, water demands have altered water usage and water supply priorities (Kang & Park 2014). To meet the change in water uses and objectives of sustainable water management, water elevation should be estimated accurately as well as possible in practical applications since water elevation affects the regulation and operation of structures in irrigation and drainage systems, water reallocation, and management policies liked water-saving practices (Malterre *et al.* 2007).

Field measurements can be used to collect water elevation in the irrigation and drainage systems. Field measurements at different locations over a long period are useful data for investigating flow characteristics in detail and proposing suitable strategies for sustainable water management and the environment in the systems. However, it is very difficult to obtain such kind of data because of the time-consuming and financial issues as well as difficulties in field measurements under extreme events such as flood events. Hydraulic models are used to simulate water elevation in irrigation and drainage systems (e.g., Kang & Park 2014; Pham Van *et al.* 2018). Hydraulic models allow for a detailed representation of the river and channel networks of systems of interest as well as related structures and their operation rules in simulations. They, however, often require detailed bathymetry, the geometry of river cross-sections, and sediment properties of the riverbed for determining appropriate values of the bottom friction. The latter is not always available in irrigation and drainage systems (Kang & Park 2014; Pham Van *et al.* 2018). Moreover, updating hydraulic models is significantly more cumbersome. Recently, data-driven models such as long short-term memory (LSTM) and regression analysis have been extensively used for forecasting quantities of interest in irrigation and drainage systems, with different levels of complexity (e.g., Zhang *et al.* 2018; Mouatadid *et al.* 2019; Truong *et al.* 2021).

Long short-term memory (LSTM), which is the popular extension of recurrent neural networks (Pham Van & Nguyen-Van 2022), has been widely applied in simulating water levels and flow in irrigation and drainage systems. For instance, Zhang *et al.* (2018) applied the LSTM model to predict water-stable depth in the Hetao irrigation district, China. Mouatadid *et al.* (2019) combined the LSTM model and the discrete wavelet transform for computing the water flow in the Palos de la Frontera irrigation system, Spain. Recently, Truong *et al.* (2021) used the gradient tree boosting-based model to calculate the water level in the dry season of the Bac Hung Hai irrigation and drainage system (BHHIS), Vietnam. These examples suggest that the LSTM model can be used for evaluating water levels under various flow conditions (in both dry and flood seasons) at different locations in the irrigation and drainage system of interest.

Regression analysis, including both linear and non-linear models, is also used for forecasting quantities of interest in irrigation and drainage systems. The literature shows that most studies using regression analysis focus on investigating evaporation losses (e.g., Mattar *et al.* 2022), evapotranspiration and crop coefficient (e.g., Ohana-Levi *et al.* 2022), and soil water infiltration (e.g., Mohammad *et al.* 2020) in the irrigation and drainage systems. The use of both regression analysis and the LSTM model for evaluating water elevation in irrigation and drainage systems has been limited. It is interesting to note that the present study is the first one in which both MLR and LSTM models are implemented together and applied for evaluating water levels in the BHHIS.

The main objective of the present study is to evaluate the water level in the BHHIS using MLR and the LSTM models using the available data on the water level in the period from 1/1/2000 to 31/12/2021. In detail, water levels at upstream and downstream locations of inlet and outlet culverts (i.e., Xuan Quan, Cau Cat, Cau Xe, and An Tho) are considered as input data in both two models, while its value at upstream and downstream locations of Bao Dap, Kenh Cau, Luc Dien, Tranh, Ba Thuy, and Neo culverts are used as target values. The present study also focuses on water levels associated with various flow conditions in both dry and flood seasons as well as in the long period from 2000 to 2021. Six statistical indexes, including root mean square error (*RMSE*), mean absolute error (*MAE*), mean error (*ME*), Willmott's score (Willmott 1981) denoted by *WS,* Nash–Sutcliffe efficiency (*NSE*), and Pearson's correlation coefficient (*r*) between the observed and estimated water levels, are implemented for quantitatively assessment of the quality of estimating water levels. Radar charts are also used to display both dimensional and dimensionless statistical errors of water levels at 12 locations associated with upstream and downstream locations of six interested culverts inside the BHHIS.

## STUDY SYSTEM AND DATA COLLECTION

### The Bac Hung Hai irrigation and drainage system

*et al.*2018). The system is bounded by four large rivers named the Red, Duong, Thai Binh, and Luoc rivers (Figure 1(a)). Regarding the river and channel network in the BHHIS, Kim Son and Cuu An are two main rivers that provide water for different agricultural purposes (such as irrigation and fishery) in dry seasons as well as drain water for the prevention of various areas caused by high flows in flood seasons. Kim Son river is located in the northern area, limiting from Xuan Quan to Cau Cat with a length of 65.7 km, while Cuu An is the dominant river in the southern area of the system, and the length of the river is about 50 km. These two main rivers are connected through different rivers and channels, e.g., Dinh Dao, Tay Ke Sat, Quang Lang, and Dien Bien (Figure 1(b)). Besides the above-mentioned rivers, the system also consists of different tributaries such as Luong Tai, Trang Ky, An Tao, Bac Ho, Hoa Binh, Nam Ke Sat, Cau Xe, and An Tho (Figure 1(b)). The total length of these main rivers, channels, and tributaries is about 231 km.

In dry seasons from November to March, water in the BHHIS is normally taken from the Red river through Xuan Quan headwords. The water after use in the system is drained into the Thai Binh river through the Cau Cat, Cau Xe, and An Tho culverts and the Luoc river by Mi Dong pumping station. However, due to the decrease in water elevation in the Red river especially in recent years, the maximum value of water discharge taken through the Xuan Quan culvert observationally equals only 60% of the designed water discharge (corresponding to approximately about 45 m^{3}/s). Thus, water inside the BHHIS is also taken from the Thai Binh river through the Cau Cat, Cau Xe, and An Tho culverts to enough water for supplying different agricultural irrigation purposes. This means that water in the BHHIS can be taken from both upstream headwords and downstream culverts, revealing two directions of flow depending on instantaneous time and period in dry seasons. Conversely, water in the system mainly flows in the direction from upstream to downstream in flood seasons (from April to October).

In terms of structures and hydraulic controls inside the BHHIS, there have (i) 11 main hydraulic structures and controls (Figure 1(b)), (ii) 400 pumping stations as well as 800 culverts for both irrigation and drainage, and (iii) thousands of kilometers of small channels as well as the dykes. Eleven main hydraulic structures and controls include Xuan Quan and Bao Dap headwords, three regulation structures (named Kenh Cau, Ba Thuy, Cong Neo culverts), two partition structures (named Luc Dien and Cong Tranh culverts), three drainage structures (named An Tho, Cau Xe, and Cau Cat culverts), and one pumping station (named Mi Dong). These hydraulic structures and controls lie at different locations along the rivers and channels within the system (Figure 1). Due to the combined effects of different factors such as complicated river and channel networks, hydraulic structures and controls, human-induced changes, and inputs of pollutants, regulation and operation of structures are still facing different issues in terms of both water irrigation and drainage, with high stresses and significant difficulties.

The regulation and operation of the hydraulic structures and controls in the BHHIS are currently following the regulation and operation rules No. 5471 (MARD 2016), revealing that a hydrological year is divided into seven regulation periods, with four regulation periods in the dry season and three regulation periods in the wet season. Depending on each regulation period, culverts are opened or closed to maintain as well as to control the water elevation within the system. Thus, accurate estimation of the water level at the upstream and downstream of hydraulic structures and controls is still an important task and challenging issue not only for operating hydraulic structures and controls themselves but also for sustainable water management and the environment of the system under combined impacts of (i) decrease in water elevation from the Red river, (ii) development of urbanization and increase in pollution inside the system, (iii) increase in salt intrusion in the downstream regions of the system.

### Data collection

No. . | Location . | Culvert . | River . | Period . | Abbreviation . | Note . |
---|---|---|---|---|---|---|

1 | Upstream | Xuan Quan | Kim Son | 2000–2021 | UXQ | Inlet culvert |

2 | Downstream | DXQ | ||||

3 | Upstream | Bao Dap | Kim Son | 2000–2021 | UBD | Target culverts |

4 | Downstream | DBD | ||||

5 | Upstream | Kenh Cau | Kim Son | 2000–2021 | UKC | |

6 | Downstream | DCK | ||||

7 | Upstream | Luc Dien | Dien Bien | 2000–2021 | ULD | |

8 | Downstream | DLD | ||||

9 | Upstream | Tranh | Tay Ke Sat | 2000–2021 | UT | |

10 | Downstream | DT | ||||

11 | Upstream | Ba Thuy | Kim Son | 2000–2021 | UBT | |

12 | Downstream | DBT | ||||

13 | Upstream | Neo | Cuu An | 2000–2021 | UN | |

14 | Downstream | DN | ||||

15 | Upstream | Cau Cat | Kim Son | 2000–2021 | UCC | Outlet culverts |

16 | Downstream | DCC | ||||

17 | Upstream | Cau Xe | Cau Xe | 2000–2021 | UCX | |

18 | Downstream | DCX | ||||

19 | Upstream | An Tho | An Tho | 2000–2021 | UAT | |

20 | Downstream | DAT |

No. . | Location . | Culvert . | River . | Period . | Abbreviation . | Note . |
---|---|---|---|---|---|---|

1 | Upstream | Xuan Quan | Kim Son | 2000–2021 | UXQ | Inlet culvert |

2 | Downstream | DXQ | ||||

3 | Upstream | Bao Dap | Kim Son | 2000–2021 | UBD | Target culverts |

4 | Downstream | DBD | ||||

5 | Upstream | Kenh Cau | Kim Son | 2000–2021 | UKC | |

6 | Downstream | DCK | ||||

7 | Upstream | Luc Dien | Dien Bien | 2000–2021 | ULD | |

8 | Downstream | DLD | ||||

9 | Upstream | Tranh | Tay Ke Sat | 2000–2021 | UT | |

10 | Downstream | DT | ||||

11 | Upstream | Ba Thuy | Kim Son | 2000–2021 | UBT | |

12 | Downstream | DBT | ||||

13 | Upstream | Neo | Cuu An | 2000–2021 | UN | |

14 | Downstream | DN | ||||

15 | Upstream | Cau Cat | Kim Son | 2000–2021 | UCC | Outlet culverts |

16 | Downstream | DCC | ||||

17 | Upstream | Cau Xe | Cau Xe | 2000–2021 | UCX | |

18 | Downstream | DCX | ||||

19 | Upstream | An Tho | An Tho | 2000–2021 | UAT | |

20 | Downstream | DAT |

When MLR and LSTM approaches are applied, measured water levels at upstream and downstream locations of Xuan Quan, Cau Cat, Cau Xe, and An Tho culverts are used as input data while measurements of water levels at Bao Dap, Kenh Cau, Luc Dien, Tranh, Ba Thuy, and Neo culverts are used as output data. In detail, the whole data in the collective period are used for determining appropriate coefficients of the regression if the MLR is used. If the LSTM approach is applied, two-thirds of the collected data (corresponding to the period from 1/1/2000 to 31/12/2015 are used for the training step, while one-third of collected data (appropriate from 1/1/2016 to 31/12/2021) are used for the validating step of the model.

## METHOD

### Multiple linear regression

*H*is the water level at target location number

_{j}*j*, is the coefficient,

*α*and

_{i,j}*β*are the regression coefficients, is the water level at the inlet and outlet culverts number

_{i,j}*i*inside the BHHIS, and is the water level at the inlet and outlet culverts number

*i*outside the BHHIS. In detail, consists of the water level at the downstream location of the Xuan Quan culvert, the water level at the upstream location of Cau Cat, Cau Xe, and An Tho culverts, while includes the water level at the upstream location of the Xuan Quan culvert, the water level at the downstream locations of Cau Cat, Cau Xe, and An Tho culverts. At each target location, values of ,

*α*, and

_{i,j}*β*are determined based on the measured data of the water level in the long period from 1/1/2000 to 31/12/2021.

_{i,j}### Long short-term memory model

The LSTM model, which is a well-known alternative network of recurrent neural networks that are capable of selectively remembering patterns for a long duration of time, has been widely applied for predicting time-series water level and water discharge in irrigation systems (e.g., Zhang *et al.* 2018; Mouatadid *et al.* 2019; Truong *et al.* 2021). The LSTM model is initially introduced by Hochreiter & Schmidhuber (1997) to overcome the vanishing gradient problems. The building block of the LSTM is a memory cell, which represents essentially the hidden layer. In each memory cell, there is a recurrent edge that has the desirable weight to overcome the vanishing and exploding gradient problems. The values associated with this recurrent edge are called cell state. The cell state from the previous time step (denoted *C*_{t-1}) is normally modified to get the cell state at the current time step (denoted by *C*_{t}) without being multiplied directly by any weighting factor. There are three different types of gates (named input, forget, and output) in a cell of the LSTM model.

*i*

_{t}) and input node (

*g*

_{t}) are responsible for updating the cell state. They are computed as follows:where

*W*,

_{xi}*W*,

_{hi}*W*,

_{xg}*W*are the weight matrices in the LSTM network,

_{hg}*x*is the input data at time

_{t}*t*,

*h*indicates the hidden units at time

_{t-1}*t*− 1,

*σ*(·) is the sigmoid function, tanh is the hyperbolic tangent,

*b*and

_{i}*b*are components of the bias vector for the hidden units.

_{g}*o*) decides how to update the values of the hidden unit. In detail, the output gate combines the input data at the time

_{t}*t*(

*x*), the output of the hidden unit at time

_{t}*t*− 1 (denoted by

*h*), and the components of the bias vector for the hidden units in the last iteration (

_{t-1}*b*), under the form:

_{o}The LSTM model consists of three main hyper-parameters named number of epochs, number of hidden units, and learning rate. In the present study, when Adam optimization algorithm (among different optimization options) is chosen as it is often used in practical applications (Zhang *et al.* 2018; Mouatadid *et al.* 2019; Truong *et al.* 2021; Pham Van & Nguyen-Van 2022), the appropriate values of above-mentioned hyper-parameters are determined by trial and error in the training step of the LSTM model. The latter is performed by using two-thirds of the data (from 1/1/2000 to 31/12/2015). The LSTM model is then validated by using one-third of the data (in the period from 1/1/2016 to 31/12/2021).

### Statistical metrics

*RMSE*,

*MAE*,

*ME*, Willmott's score (

*WS*), Pearson's correlation coefficient (

*r*), and Nash–Sutcliffe efficiency (

*NSE*) are used for quantitatively assessing the agreement between estimated and observed water levels at upstream and downstream of different culverts of interest in the BHHIS. The

*RMSE*,

*MAE*,

*ME*,

*WS*,

*r*, and

*NSE*are computed as follows:where and are respectively the observed and estimated values of water level at point number

*i*in a time series, and are the mean value of estimated and observed water level, respectively, and

*N*is the total number of points in the considered time series.

It is interesting to note that *RMSE*, *MAE*, and *ME* are valuable indicators because they provide the error in the units of the quantity of interest, which is helpful in the analysis of the results. The *WS*, *r*, and *NSE* coefficient, which determines the relative magnitude of the residual variance (or noise) compared to the observation's variance, are used to provide extensive information on comparisons between observed and estimated values. Besides the statistical errors mentioned above, radar charts, which are a graphical method of displaying multivariate data in the form of a two-dimensional chart of quantities of interest represented on axes starting from the same point, are also used because radar charts are useful ways to display multivariate observations of statistical errors at all target locations.

## RESULTS AND DISCUSSION

### Results of the multiple linear regression

Using the collected data of the water level mentioned previously, MLR is adopted for both upstream and downstream locations of different culverts of interest in the BHHIS based on the minimum root mean square of estimated values and measured data. Table 2 summarizes the appropriate values of regression coefficients for all locations of interest, while detailed values of error estimates of water level are shown in Table 3.

Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | H_{max} (m). | ||||
---|---|---|---|---|---|---|---|---|---|---|---|

(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||

UBD | 0.067 | 1.205 | 0.033 | 0.588 | 0.00 | 0.00 | 0.998 | 0.996 | 0.993 | 5.51 | 5.60 |

DBD | 0.236 | 6.212 | 0.180 | 4.741 | 0.00 | 0.00 | 0.929 | 0.873 | 0.763 | 3.86 | 3.80 |

UKC | 0.212 | 5.736 | 0.155 | 4.195 | 0.00 | 0.00 | 0.938 | 0.888 | 0.788 | 3.60 | 3.69 |

DKC | 0.159 | 4.466 | 0.113 | 3.187 | 0.00 | 0.00 | 0.951 | 0.909 | 0.826 | 3.35 | 3.56 |

ULD | 0.133 | 3.901 | 0.092 | 2.712 | 0.00 | 0.00 | 0.961 | 0.928 | 0.861 | 3.25 | 3.41 |

DLD | 0.142 | 4.339 | 0.095 | 2.902 | 0.00 | 0.00 | 0.955 | 0.916 | 0.839 | 3.23 | 3.28 |

UT | 0.092 | 2.697 | 0.058 | 1.715 | 0.00 | 0.00 | 0.980 | 0.962 | 0.925 | 3.20 | 3.40 |

DT | 0.112 | 3.625 | 0.068 | 2.209 | 0.00 | 0.00 | 0.967 | 0.939 | 0.881 | 3.09 | 3.09 |

UBT | 0.062 | 2.040 | 0.038 | 1.252 | 0.00 | 0.00 | 0.989 | 0.978 | 0.957 | 2.93 | 3.02 |

DBT | 0.193 | 6.414 | 0.135 | 4.487 | 0.00 | 0.00 | 0.927 | 0.871 | 0.758 | 2.98 | 3.01 |

UN | 0.117 | 4.062 | 0.081 | 2.811 | 0.00 | 0.00 | 0.955 | 0.916 | 0.839 | 2.76 | 2.87 |

DN | 0.151 | 5.271 | 0.102 | 3.570 | 0.00 | 0.00 | 0.957 | 0.919 | 0.845 | 2.91 | 2.86 |

Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | H_{max} (m). | ||||
---|---|---|---|---|---|---|---|---|---|---|---|

(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||

UBD | 0.067 | 1.205 | 0.033 | 0.588 | 0.00 | 0.00 | 0.998 | 0.996 | 0.993 | 5.51 | 5.60 |

DBD | 0.236 | 6.212 | 0.180 | 4.741 | 0.00 | 0.00 | 0.929 | 0.873 | 0.763 | 3.86 | 3.80 |

UKC | 0.212 | 5.736 | 0.155 | 4.195 | 0.00 | 0.00 | 0.938 | 0.888 | 0.788 | 3.60 | 3.69 |

DKC | 0.159 | 4.466 | 0.113 | 3.187 | 0.00 | 0.00 | 0.951 | 0.909 | 0.826 | 3.35 | 3.56 |

ULD | 0.133 | 3.901 | 0.092 | 2.712 | 0.00 | 0.00 | 0.961 | 0.928 | 0.861 | 3.25 | 3.41 |

DLD | 0.142 | 4.339 | 0.095 | 2.902 | 0.00 | 0.00 | 0.955 | 0.916 | 0.839 | 3.23 | 3.28 |

UT | 0.092 | 2.697 | 0.058 | 1.715 | 0.00 | 0.00 | 0.980 | 0.962 | 0.925 | 3.20 | 3.40 |

DT | 0.112 | 3.625 | 0.068 | 2.209 | 0.00 | 0.00 | 0.967 | 0.939 | 0.881 | 3.09 | 3.09 |

UBT | 0.062 | 2.040 | 0.038 | 1.252 | 0.00 | 0.00 | 0.989 | 0.978 | 0.957 | 2.93 | 3.02 |

DBT | 0.193 | 6.414 | 0.135 | 4.487 | 0.00 | 0.00 | 0.927 | 0.871 | 0.758 | 2.98 | 3.01 |

UN | 0.117 | 4.062 | 0.081 | 2.811 | 0.00 | 0.00 | 0.955 | 0.916 | 0.839 | 2.76 | 2.87 |

DN | 0.151 | 5.271 | 0.102 | 3.570 | 0.00 | 0.00 | 0.957 | 0.919 | 0.845 | 2.91 | 2.86 |

*RMSE*and

*MAE*, respectively. These errors are only about 6% of the observed magnitude of water level at considered locations. The values of

*ME*are close to zero at most locations, showing that the estimated values tend to be close to observed values. In terms of dimensionless statistical errors, the values of

*WS*and

*r*coefficients at all locations vary between 0.90 and 0.99, revealing that the estimated values of the water level reproduce very well the observed variation of the water level. The

*NSE*changes from 0.80 to 0.99, except at the DBD and DBT locations where the value of the NSE is about 0.76. The maximum values of the estimated water level are very close to the observations at all considered locations (see Table 3). The discrepancy between maximum values of the estimated and measured water levels ranges from −0.21 to 0.05 m. These results suggest that the MLR represents well the observations of water level at almost locations of interest in the BHHIS.

### Results of the LSTM model

#### Training results of the LSTM model

Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | H_{max} (m). | ||||
---|---|---|---|---|---|---|---|---|---|---|---|

(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||

UBD | 0.069 | 1.236 | 0.030 | 0.532 | −0.001 | −0.01 | 0.998 | 0.997 | 0.993 | 5.55 | 5.60 |

DBD | 0.154 | 4.056 | 0.103 | 2.704 | 0.001 | 0.04 | 0.974 | 0.950 | 0.902 | 3.67 | 3.80 |

UKC | 0.153 | 4.152 | 0.107 | 2.896 | −0.001 | −0.03 | 0.971 | 0.945 | 0.894 | 3.64 | 3.69 |

DKC | 0.118 | 3.302 | 0.081 | 2.272 | 0.002 | 0.05 | 0.974 | 0.951 | 0.904 | 3.60 | 3.56 |

ULD | 0.095 | 2.774 | 0.064 | 1.883 | 0.005 | 0.16 | 0.981 | 0.964 | 0.928 | 3.36 | 3.41 |

DLD | 0.103 | 3.133 | 0.068 | 2.085 | 0.00 | 0.07 | 0.977 | 0.956 | 0.914 | 3.26 | 3.28 |

UT | 0.063 | 1.840 | 0.042 | 1.232 | 0.00 | 0.08 | 0.991 | 0.982 | 0.964 | 3.39 | 3.40 |

DT | 0.083 | 2.700 | 0.053 | 1.729 | 0.00 | 0.16 | 0.982 | 0.965 | 0.931 | 3.07 | 3.09 |

UBT | 0.051 | 1.701 | 0.031 | 1.040 | 0.01 | 0.18 | 0.992 | 0.985 | 0.970 | 2.97 | 3.02 |

DBT | 0.124 | 4.109 | 0.082 | 2.729 | 0.00 | −0.14 | 0.974 | 0.951 | 0.905 | 2.95 | 3.01 |

UN | 0.096 | 3.353 | 0.065 | 2.266 | 0.00 | 0.14 | 0.970 | 0.943 | 0.889 | 2.78 | 2.87 |

DN | 0.105 | 3.655 | 0.068 | 2.389 | 0.00 | −0.10 | 0.981 | 0.963 | 0.928 | 2.79 | 2.86 |

Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | H_{max} (m). | ||||
---|---|---|---|---|---|---|---|---|---|---|---|

(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||

UBD | 0.069 | 1.236 | 0.030 | 0.532 | −0.001 | −0.01 | 0.998 | 0.997 | 0.993 | 5.55 | 5.60 |

DBD | 0.154 | 4.056 | 0.103 | 2.704 | 0.001 | 0.04 | 0.974 | 0.950 | 0.902 | 3.67 | 3.80 |

UKC | 0.153 | 4.152 | 0.107 | 2.896 | −0.001 | −0.03 | 0.971 | 0.945 | 0.894 | 3.64 | 3.69 |

DKC | 0.118 | 3.302 | 0.081 | 2.272 | 0.002 | 0.05 | 0.974 | 0.951 | 0.904 | 3.60 | 3.56 |

ULD | 0.095 | 2.774 | 0.064 | 1.883 | 0.005 | 0.16 | 0.981 | 0.964 | 0.928 | 3.36 | 3.41 |

DLD | 0.103 | 3.133 | 0.068 | 2.085 | 0.00 | 0.07 | 0.977 | 0.956 | 0.914 | 3.26 | 3.28 |

UT | 0.063 | 1.840 | 0.042 | 1.232 | 0.00 | 0.08 | 0.991 | 0.982 | 0.964 | 3.39 | 3.40 |

DT | 0.083 | 2.700 | 0.053 | 1.729 | 0.00 | 0.16 | 0.982 | 0.965 | 0.931 | 3.07 | 3.09 |

UBT | 0.051 | 1.701 | 0.031 | 1.040 | 0.01 | 0.18 | 0.992 | 0.985 | 0.970 | 2.97 | 3.02 |

DBT | 0.124 | 4.109 | 0.082 | 2.729 | 0.00 | −0.14 | 0.974 | 0.951 | 0.905 | 2.95 | 3.01 |

UN | 0.096 | 3.353 | 0.065 | 2.266 | 0.00 | 0.14 | 0.970 | 0.943 | 0.889 | 2.78 | 2.87 |

DN | 0.105 | 3.655 | 0.068 | 2.389 | 0.00 | −0.10 | 0.981 | 0.963 | 0.928 | 2.79 | 2.86 |

As summarized in Table 4, the *RMSE* of water level at all locations of interest varies between 0.05 and 0.16 m, while the *MAE* of water level changes from 0.03 to 0.10 m. The values of *ME* vary in a range from −0.001 to 0.01 m. These dimensional statistical errors equal only about 4% of the observed magnitude of the water level at the considered locations. The dimensionless statistical errors (including the *WS*, *r*, and *NSE*) range from 0.89 to 0.99 (Table 4), revealing that the LSTM model reproduces very well the observed values of the water level in different flow conditions as well as in different instantaneous consequences in the training step (Figure 7). Indeed, the discrepancy between maximum values of estimated and observed water levels changes from −0.13 to 0.04 m. These results suggest that appropriate values of the hyper-parameters are obtained in the training step of the LSTM model.

#### Validation results of the LSTM model

Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | H_{max} (m). | ||||
---|---|---|---|---|---|---|---|---|---|---|---|

(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||

UBD | 0.033 | 0.605 | 0.022 | 0.399 | 0.002 | 0.033 | 0.999 | 0.998 | 0.997 | 5.31 | 5.40 |

DBD | 0.123 | 3.476 | 0.084 | 2.376 | 0.007 | 0.208 | 0.976 | 0.955 | 0.912 | 3.31 | 3.53 |

UKC | 0.122 | 3.574 | 0.087 | 2.537 | 0.003 | 0.097 | 0.974 | 0.951 | 0.904 | 3.25 | 3.41 |

DKC | 0.115 | 3.415 | 0.081 | 2.404 | 0.006 | 0.185 | 0.977 | 0.955 | 0.911 | 3.29 | 3.36 |

ULD | 0.104 | 3.252 | 0.074 | 2.328 | −0.003 | −0.091 | 0.979 | 0.958 | 0.918 | 3.11 | 3.19 |

DLD | 0.106 | 3.315 | 0.076 | 2.384 | −0.01 | −0.162 | 0.978 | 0.958 | 0.917 | 3.22 | 3.19 |

UT | 0.070 | 2.333 | 0.050 | 1.667 | −0.01 | −0.298 | 0.989 | 0.979 | 0.957 | 3.05 | 3.01 |

DT | 0.081 | 2.742 | 0.057 | 1.925 | −0.01 | −0.371 | 0.985 | 0.972 | 0.943 | 2.90 | 2.97 |

UBT | 0.055 | 2.110 | 0.042 | 1.593 | −0.02 | −0.587 | 0.991 | 0.984 | 0.965 | 2.68 | 2.61 |

DBT | 0.120 | 4.635 | 0.082 | 3.176 | 0.01 | 0.217 | 0.970 | 0.942 | 0.887 | 2.60 | 2.59 |

UN | 0.100 | 4.006 | 0.073 | 2.903 | −0.01 | −0.459 | 0.967 | 0.940 | 0.882 | 2.48 | 2.50 |

DN | 0.106 | 4.260 | 0.072 | 2.906 | 0.01 | 0.295 | 0.977 | 0.955 | 0.912 | 2.49 | 2.49 |

Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | H_{max} (m). | ||||
---|---|---|---|---|---|---|---|---|---|---|---|

(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||

UBD | 0.033 | 0.605 | 0.022 | 0.399 | 0.002 | 0.033 | 0.999 | 0.998 | 0.997 | 5.31 | 5.40 |

DBD | 0.123 | 3.476 | 0.084 | 2.376 | 0.007 | 0.208 | 0.976 | 0.955 | 0.912 | 3.31 | 3.53 |

UKC | 0.122 | 3.574 | 0.087 | 2.537 | 0.003 | 0.097 | 0.974 | 0.951 | 0.904 | 3.25 | 3.41 |

DKC | 0.115 | 3.415 | 0.081 | 2.404 | 0.006 | 0.185 | 0.977 | 0.955 | 0.911 | 3.29 | 3.36 |

ULD | 0.104 | 3.252 | 0.074 | 2.328 | −0.003 | −0.091 | 0.979 | 0.958 | 0.918 | 3.11 | 3.19 |

DLD | 0.106 | 3.315 | 0.076 | 2.384 | −0.01 | −0.162 | 0.978 | 0.958 | 0.917 | 3.22 | 3.19 |

UT | 0.070 | 2.333 | 0.050 | 1.667 | −0.01 | −0.298 | 0.989 | 0.979 | 0.957 | 3.05 | 3.01 |

DT | 0.081 | 2.742 | 0.057 | 1.925 | −0.01 | −0.371 | 0.985 | 0.972 | 0.943 | 2.90 | 2.97 |

UBT | 0.055 | 2.110 | 0.042 | 1.593 | −0.02 | −0.587 | 0.991 | 0.984 | 0.965 | 2.68 | 2.61 |

DBT | 0.120 | 4.635 | 0.082 | 3.176 | 0.01 | 0.217 | 0.970 | 0.942 | 0.887 | 2.60 | 2.59 |

UN | 0.100 | 4.006 | 0.073 | 2.903 | −0.01 | −0.459 | 0.967 | 0.940 | 0.882 | 2.48 | 2.50 |

DN | 0.106 | 4.260 | 0.072 | 2.906 | 0.01 | 0.295 | 0.977 | 0.955 | 0.912 | 2.49 | 2.49 |

Like the training step, the dimensional statistical errors of the water level (including the *RMSE*, *MAE*, and *ME*) vary between −0.02 and 0.12 m (see Table 5). These errors are less than 4.6% of the observed magnitude of the water level at the locations of interest. The *WS* is close to the unity, while the values of the *r* coefficient between the estimated and observed water levels range from 0.94 to 0.99 (see Table 5), revealing that the LSTM model represents the variable trend of the observed water levels very well. The values of *NSE* change between 0.88 and 0.99, showing that the LSTM model reproduces well the observed water levels at all studied locations. At all 12 locations, computed values of the maximum water level are very close to the observations. Indeed, the discrepancy between maximum values of estimated and observed water levels varies between −0.22 and 0.07 m (see Table 5). These results are consistent with the results obtained in the training step of the LSTM model. This demonstrates that the appropriate values are achieved for the hyper-parameters in the LSTM model. Thus, the LSTM model can be used to evaluate the water level at different locations of interest in the BHHIS.

### Discussion

As can be seen in Tables 3–5, the magnitude values of dimensional statistical errors of the water level in the period from 2000 to 2021 at 12 locations of interest in the BHHIS are equal to 0.24 and 0.16 m for the MLR and LSTM model, respectively. The *r* coefficient and *WS* are close to the unity for both methods. In addition, the value of *NSE* of the water level changes from 0.76 to 0.99 when the MLR method is applied, while its value varies between 0.88 and 0.99 if the LSTM model is used. The discrepancy between maximum values of estimated and observed water levels is more or less about 0.20 m for both methods. These results demonstrate that both MLR and LSTM models can reproduce observed water levels at all 12 locations of interest in the studied system.

Regarding detailed statistical errors of the water level from the MLR and LSTM models, it is also clearly observed that a considerable improvement in estimated water levels is obtained when using the LSTM model in comparison with the MLR. For example, a minimum value of 0.88 is achieved for the *NSE* when using the LSTM model, instead of 0.76 for the case of the MLR. In addition, higher values of *WS* and *r* coefficient at all twelve locations of interest within the BHHIS are also obtained if the LSTM model is used.

Truong *et al.* (2021) used the gradient tree boosting-based (GTB) model to investigate water levels in the dry seasons from 2000 to 2020 at upstream and downstream locations of five culverts named Kenh Cau, Luc Dien, Tranh, Ba Thuy, and Neo in the BHHIS. The results showed that the GTB model reproduced acceptably the measured water levels, with the values of the coefficient of determination ranging from 0.86 to 0.97 at the upstream and downstream locations of the five culverts mentioned above. However, it must be emphasized that the previous study focused only on water levels in the dry seasons when flow hydrodynamics and water elevations in the BHHIS are relatively stable. The present research showed that both MLR and LSTM models reproduced very well the water levels under various flow conditions in both dry and flood seasons in the long period from 1/1/2000 to 31/12/2021. Therefore, these results are strongly believed to support not only irrigation but also drainage purposes in the studied system.

Among 12 locations of interest, the LSTM model showed a remarkable discrepancy in the estimated water level at the upstream location of Neo culvert (denoted by UN) and the others (Figure 1). This can be explained by the different reasons following. Firstly, the water level at the UN is affected by different factors such as flow dynamics, water discharge, and operations of culverts and structures in both upstream and downstream regions. The impacts of these factors, however, are not taken into account in the calculation when using the LSTM model. Secondly, the division of water discharge at the bifurcation/confluence location (see Figure 1) between the Cuu An and Nam Ke Sat rivers can be also another reason for the discrepancy. Finally, the absence of accelerating climate changes and extreme events (e.g., floods, droughts), changes in water demands and water users as well as the physical system, which cannot be taken into account in the present study, may be an additional reason for the discrepancy between UN and the other locations.

## SUMMARY AND CONCLUSIONS

This study presents a MLR and LSTM model that can be used for evaluating water levels in the irrigation and drainage system of interest named BHHIS, Vietnam. The main remarks and conclusions of the study are summarized as follows:

Using the water level data in the period from 1/1/2000 to 31/12/2021, a MLR was identified for 12 locations associated with the upstream and downstream locations of the six culverts inside the BHHIS. The results showed that the dimensional statistical errors of water level (consisting of

*RMSE*,*MAE*, and*ME*) are only about 6% of the maximum water level monitoring at the locations of interest. The values of*WS*and*r*coefficients at all locations varied between 0.90 and 0.99, while the*NSE*ranged from 0.76 to 0.99.When using the LSTM model, the appropriate values of the hyper-parameters of the model were determined using the water level data in the period from 1/1/2000 to 31/12/2015. The model was then validated using the measured water levels in the other period from 1/1/2016 to 31/12/2021. The LSTM model reproduced very well the measured water levels in both the training and validation steps. The dimensional statistical errors of water level were less than 4.6% of the maximum water level measured at the locations of interest. The

*r*coefficient and*WS*were close to unity, while the*NSE*changed between 0.88 and 0.99.The LSTM model proved to be a more suitable approach than the MLR for evaluating water levels at different locations of interest in the BHHIS. The LSTM model used only water levels as input data, which leads the model to be a useful tool for evaluating water levels at both upstream and downstream locations of culverts of interest. The latter will help in operating (opening and closing) the hydraulic structures and controls for both irrigation and drainage purposes. Another advantage of the LSTM model over hydraulic models is that the LSTM model can self-update with new data, whereas updating hydraulic models is significantly more cumbersome.

## ACKNOWLEDGEMENTS

The authors would like to thank the Bac Hung Hai irrigation company for sharing the water elevation data used for different calculations in the present study. The authors also would like to express their thanks to the anonymous reviewers for their useful comments and suggestions with constructive criticism that have helped improve the clarity of the manuscript.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.