In this context, multiple linear regression (MLR) and long short-term memory (LSTM) are presented to evaluate water levels in irrigation and drainage systems based on the available water levels at inlet and outlet locations. The Bac Hung Hai irrigation and drainage system is chosen as an example for demonstrating the MLR and LSTM models. Six statistical metrics including root mean square error (RMSE), mean absolute error (MAE), mean error (ME), Willmott's score (WS), Pearson's correlation coefficient (r), and Nash–Sutcliffe efficiency (NSE) are implemented for quantitatively assessing the agreement between estimated and observed water levels at 12 locations of interest within the system in the period from 2000 to 2021 (with an interval time of 6 hours). The results showed that MLR and LSTM models can be used for evaluating water levels with high accuracy. The values of dimensional statistical errors equal only about 6% of the maximum water level monitoring at the locations of interest for both MLR and LSTM models. The values of dimensionless statistical errors range from 0.76 to 0.99 for all 12 locations of interest in the studied system. In addition, both models are benchmarked and could be used for other agricultural systems.

  • Multiple linear regression and long short-term memory models are presented for evaluating water levels in irrigation and drainage systems.

  • Both multiple linear regression and long short-term memory model reproduced well the observed water levels.

  • The LSTM model proved to be a more suitable approach than the multiple linear regression for evaluating water levels at different locations of interest.

Graphical Abstract

Graphical Abstract

The water elevation, which is one of the various flow characteristics, is an important quantity in the irrigation and drainage systems. This is because water elevation affects not only the regulation and operation of hydraulic structures and controls (e.g., culverts, sluice gates, pumps) existing inherently in systems but also the management of irrigation and drainage water in an economically efficient and socially equitable manner (Pereira et al. 2012). However, due to climate changes, socio-economic developments, urbanization, and change in irrigation and drainage areas, water demands have altered water usage and water supply priorities (Kang & Park 2014). To meet the change in water uses and objectives of sustainable water management, water elevation should be estimated accurately as well as possible in practical applications since water elevation affects the regulation and operation of structures in irrigation and drainage systems, water reallocation, and management policies liked water-saving practices (Malterre et al. 2007).

Field measurements can be used to collect water elevation in the irrigation and drainage systems. Field measurements at different locations over a long period are useful data for investigating flow characteristics in detail and proposing suitable strategies for sustainable water management and the environment in the systems. However, it is very difficult to obtain such kind of data because of the time-consuming and financial issues as well as difficulties in field measurements under extreme events such as flood events. Hydraulic models are used to simulate water elevation in irrigation and drainage systems (e.g., Kang & Park 2014; Pham Van et al. 2018). Hydraulic models allow for a detailed representation of the river and channel networks of systems of interest as well as related structures and their operation rules in simulations. They, however, often require detailed bathymetry, the geometry of river cross-sections, and sediment properties of the riverbed for determining appropriate values of the bottom friction. The latter is not always available in irrigation and drainage systems (Kang & Park 2014; Pham Van et al. 2018). Moreover, updating hydraulic models is significantly more cumbersome. Recently, data-driven models such as long short-term memory (LSTM) and regression analysis have been extensively used for forecasting quantities of interest in irrigation and drainage systems, with different levels of complexity (e.g., Zhang et al. 2018; Mouatadid et al. 2019; Truong et al. 2021).

Long short-term memory (LSTM), which is the popular extension of recurrent neural networks (Pham Van & Nguyen-Van 2022), has been widely applied in simulating water levels and flow in irrigation and drainage systems. For instance, Zhang et al. (2018) applied the LSTM model to predict water-stable depth in the Hetao irrigation district, China. Mouatadid et al. (2019) combined the LSTM model and the discrete wavelet transform for computing the water flow in the Palos de la Frontera irrigation system, Spain. Recently, Truong et al. (2021) used the gradient tree boosting-based model to calculate the water level in the dry season of the Bac Hung Hai irrigation and drainage system (BHHIS), Vietnam. These examples suggest that the LSTM model can be used for evaluating water levels under various flow conditions (in both dry and flood seasons) at different locations in the irrigation and drainage system of interest.

Regression analysis, including both linear and non-linear models, is also used for forecasting quantities of interest in irrigation and drainage systems. The literature shows that most studies using regression analysis focus on investigating evaporation losses (e.g., Mattar et al. 2022), evapotranspiration and crop coefficient (e.g., Ohana-Levi et al. 2022), and soil water infiltration (e.g., Mohammad et al. 2020) in the irrigation and drainage systems. The use of both regression analysis and the LSTM model for evaluating water elevation in irrigation and drainage systems has been limited. It is interesting to note that the present study is the first one in which both MLR and LSTM models are implemented together and applied for evaluating water levels in the BHHIS.

The main objective of the present study is to evaluate the water level in the BHHIS using MLR and the LSTM models using the available data on the water level in the period from 1/1/2000 to 31/12/2021. In detail, water levels at upstream and downstream locations of inlet and outlet culverts (i.e., Xuan Quan, Cau Cat, Cau Xe, and An Tho) are considered as input data in both two models, while its value at upstream and downstream locations of Bao Dap, Kenh Cau, Luc Dien, Tranh, Ba Thuy, and Neo culverts are used as target values. The present study also focuses on water levels associated with various flow conditions in both dry and flood seasons as well as in the long period from 2000 to 2021. Six statistical indexes, including root mean square error (RMSE), mean absolute error (MAE), mean error (ME), Willmott's score (Willmott 1981) denoted by WS, Nash–Sutcliffe efficiency (NSE), and Pearson's correlation coefficient (r) between the observed and estimated water levels, are implemented for quantitatively assessment of the quality of estimating water levels. Radar charts are also used to display both dimensional and dimensionless statistical errors of water levels at 12 locations associated with upstream and downstream locations of six interested culverts inside the BHHIS.

The Bac Hung Hai irrigation and drainage system

The BHHIS is known as the largest irrigation and drainage system in the northern region of Vietnam, covering an area of 214,932 ha in four provinces (i.e., Bac Ninh, Ha Noi, Hai Duong, Hung Yen – see Figure 1(a)) with the population living in the system is of about 3 million (Pham Van et al. 2018). The system is bounded by four large rivers named the Red, Duong, Thai Binh, and Luoc rivers (Figure 1(a)). Regarding the river and channel network in the BHHIS, Kim Son and Cuu An are two main rivers that provide water for different agricultural purposes (such as irrigation and fishery) in dry seasons as well as drain water for the prevention of various areas caused by high flows in flood seasons. Kim Son river is located in the northern area, limiting from Xuan Quan to Cau Cat with a length of 65.7 km, while Cuu An is the dominant river in the southern area of the system, and the length of the river is about 50 km. These two main rivers are connected through different rivers and channels, e.g., Dinh Dao, Tay Ke Sat, Quang Lang, and Dien Bien (Figure 1(b)). Besides the above-mentioned rivers, the system also consists of different tributaries such as Luong Tai, Trang Ky, An Tao, Bac Ho, Hoa Binh, Nam Ke Sat, Cau Xe, and An Tho (Figure 1(b)). The total length of these main rivers, channels, and tributaries is about 231 km.
Figure 1

Map of: (a) the Bac Hung Hai irrigation and drainage system and (b) main rivers, channels together with hydraulic structures and controls in the system.

Figure 1

Map of: (a) the Bac Hung Hai irrigation and drainage system and (b) main rivers, channels together with hydraulic structures and controls in the system.

Close modal

In dry seasons from November to March, water in the BHHIS is normally taken from the Red river through Xuan Quan headwords. The water after use in the system is drained into the Thai Binh river through the Cau Cat, Cau Xe, and An Tho culverts and the Luoc river by Mi Dong pumping station. However, due to the decrease in water elevation in the Red river especially in recent years, the maximum value of water discharge taken through the Xuan Quan culvert observationally equals only 60% of the designed water discharge (corresponding to approximately about 45 m3/s). Thus, water inside the BHHIS is also taken from the Thai Binh river through the Cau Cat, Cau Xe, and An Tho culverts to enough water for supplying different agricultural irrigation purposes. This means that water in the BHHIS can be taken from both upstream headwords and downstream culverts, revealing two directions of flow depending on instantaneous time and period in dry seasons. Conversely, water in the system mainly flows in the direction from upstream to downstream in flood seasons (from April to October).

In terms of structures and hydraulic controls inside the BHHIS, there have (i) 11 main hydraulic structures and controls (Figure 1(b)), (ii) 400 pumping stations as well as 800 culverts for both irrigation and drainage, and (iii) thousands of kilometers of small channels as well as the dykes. Eleven main hydraulic structures and controls include Xuan Quan and Bao Dap headwords, three regulation structures (named Kenh Cau, Ba Thuy, Cong Neo culverts), two partition structures (named Luc Dien and Cong Tranh culverts), three drainage structures (named An Tho, Cau Xe, and Cau Cat culverts), and one pumping station (named Mi Dong). These hydraulic structures and controls lie at different locations along the rivers and channels within the system (Figure 1). Due to the combined effects of different factors such as complicated river and channel networks, hydraulic structures and controls, human-induced changes, and inputs of pollutants, regulation and operation of structures are still facing different issues in terms of both water irrigation and drainage, with high stresses and significant difficulties.

The regulation and operation of the hydraulic structures and controls in the BHHIS are currently following the regulation and operation rules No. 5471 (MARD 2016), revealing that a hydrological year is divided into seven regulation periods, with four regulation periods in the dry season and three regulation periods in the wet season. Depending on each regulation period, culverts are opened or closed to maintain as well as to control the water elevation within the system. Thus, accurate estimation of the water level at the upstream and downstream of hydraulic structures and controls is still an important task and challenging issue not only for operating hydraulic structures and controls themselves but also for sustainable water management and the environment of the system under combined impacts of (i) decrease in water elevation from the Red river, (ii) development of urbanization and increase in pollution inside the system, (iii) increase in salt intrusion in the downstream regions of the system.

Data collection

Water levels measured at 1, 7, 13, and 19 h in each day during the period from 1/1/2000 and 31/12/2021 were collected for both upstream and downstream locations of 10 culverts named An Tho, Cau Xe, Cau Cat, Neo, Ba Thuy, Tranh, Luc Dien, Kenh Cau, Bao Dap, and Xuan Quan (see Table 1). These data are provided by the Bac Hung Hai irrigation company. Figure 2 shows the probability density distribution of water level measurements at upstream and downstream locations of 10 collected culverts, revealing that a normal shape of the probability density distribution of water levels is obtained at almost culverts inside the system, while a slight difference in the probability density distribution of water level data is presented at the inlet culvert (i.e., Xuan Quan) and three outlet culverts (i.e., An Tho, Cau Xe, and Cau Cat).
Table 1

Collective period of water level at upstream and downstream locations of 10 culverts in the BHHIS

No.LocationCulvertRiverPeriodAbbreviationNote
Upstream Xuan Quan Kim Son 2000–2021 UXQ Inlet culvert 
Downstream DXQ 
Upstream Bao Dap Kim Son 2000–2021 UBD Target culverts 
Downstream DBD 
Upstream Kenh Cau Kim Son 2000–2021 UKC 
Downstream DCK 
Upstream Luc Dien Dien Bien 2000–2021 ULD 
Downstream DLD 
Upstream Tranh Tay Ke Sat 2000–2021 UT 
10 Downstream DT 
11 Upstream Ba Thuy Kim Son 2000–2021 UBT 
12 Downstream DBT 
13 Upstream Neo Cuu An 2000–2021 UN 
14 Downstream DN 
15 Upstream Cau Cat Kim Son 2000–2021 UCC Outlet culverts 
16 Downstream DCC 
17 Upstream Cau Xe Cau Xe 2000–2021 UCX 
18 Downstream DCX 
19 Upstream An Tho An Tho 2000–2021 UAT 
20 Downstream DAT 
No.LocationCulvertRiverPeriodAbbreviationNote
Upstream Xuan Quan Kim Son 2000–2021 UXQ Inlet culvert 
Downstream DXQ 
Upstream Bao Dap Kim Son 2000–2021 UBD Target culverts 
Downstream DBD 
Upstream Kenh Cau Kim Son 2000–2021 UKC 
Downstream DCK 
Upstream Luc Dien Dien Bien 2000–2021 ULD 
Downstream DLD 
Upstream Tranh Tay Ke Sat 2000–2021 UT 
10 Downstream DT 
11 Upstream Ba Thuy Kim Son 2000–2021 UBT 
12 Downstream DBT 
13 Upstream Neo Cuu An 2000–2021 UN 
14 Downstream DN 
15 Upstream Cau Cat Kim Son 2000–2021 UCC Outlet culverts 
16 Downstream DCC 
17 Upstream Cau Xe Cau Xe 2000–2021 UCX 
18 Downstream DCX 
19 Upstream An Tho An Tho 2000–2021 UAT 
20 Downstream DAT 
Figure 2

Probability density distribution of water elevation, at: (a) UXQ, (b) DXQ, (c) UBD, (d) DBD, (e) UKC, (f) DKC, (g) ULD, (h) DLD, (i) UT, (k) DT, (l) UBT, (m) DBT, (n) UN, (o) DN, (p) UCC, (q) DCC, (r) UCX, (s) DCX, (t) UAT, and (u) DAT.

Figure 2

Probability density distribution of water elevation, at: (a) UXQ, (b) DXQ, (c) UBD, (d) DBD, (e) UKC, (f) DKC, (g) ULD, (h) DLD, (i) UT, (k) DT, (l) UBT, (m) DBT, (n) UN, (o) DN, (p) UCC, (q) DCC, (r) UCX, (s) DCX, (t) UAT, and (u) DAT.

Close modal

When MLR and LSTM approaches are applied, measured water levels at upstream and downstream locations of Xuan Quan, Cau Cat, Cau Xe, and An Tho culverts are used as input data while measurements of water levels at Bao Dap, Kenh Cau, Luc Dien, Tranh, Ba Thuy, and Neo culverts are used as output data. In detail, the whole data in the collective period are used for determining appropriate coefficients of the regression if the MLR is used. If the LSTM approach is applied, two-thirds of the collected data (corresponding to the period from 1/1/2000 to 31/12/2015 are used for the training step, while one-third of collected data (appropriate from 1/1/2016 to 31/12/2021) are used for the validating step of the model.

Multiple linear regression

Water levels at upstream and downstream locations of different culverts (named Bao Dap, Kenh Cau, Luc Dien, Tranh, Ba Thuy, and Neo) are used as output data, while its value at upstream and downstream locations of inlet and outlet culverts (i.e., Xuan Quan, Cau Cat, Cau Xe, and An Tho) is considered as input data. It is assumed that output data at each location are a function of input data, under the form.
(1)
where Hj is the water level at target location number j, is the coefficient, αi,j and βi,j are the regression coefficients, is the water level at the inlet and outlet culverts number i inside the BHHIS, and is the water level at the inlet and outlet culverts number i outside the BHHIS. In detail, consists of the water level at the downstream location of the Xuan Quan culvert, the water level at the upstream location of Cau Cat, Cau Xe, and An Tho culverts, while includes the water level at the upstream location of the Xuan Quan culvert, the water level at the downstream locations of Cau Cat, Cau Xe, and An Tho culverts. At each target location, values of , αi,j, and βi,j are determined based on the measured data of the water level in the long period from 1/1/2000 to 31/12/2021.

Long short-term memory model

The LSTM model, which is a well-known alternative network of recurrent neural networks that are capable of selectively remembering patterns for a long duration of time, has been widely applied for predicting time-series water level and water discharge in irrigation systems (e.g., Zhang et al. 2018; Mouatadid et al. 2019; Truong et al. 2021). The LSTM model is initially introduced by Hochreiter & Schmidhuber (1997) to overcome the vanishing gradient problems. The building block of the LSTM is a memory cell, which represents essentially the hidden layer. In each memory cell, there is a recurrent edge that has the desirable weight to overcome the vanishing and exploding gradient problems. The values associated with this recurrent edge are called cell state. The cell state from the previous time step (denoted Ct-1) is normally modified to get the cell state at the current time step (denoted by Ct) without being multiplied directly by any weighting factor. There are three different types of gates (named input, forget, and output) in a cell of the LSTM model.

The input gate (denoted by it) and input node (gt) are responsible for updating the cell state. They are computed as follows:
(2)
(3)
where Wxi, Whi, Wxg, Whg are the weight matrices in the LSTM network, xt is the input data at time t, ht-1 indicates the hidden units at time t − 1, σ(·) is the sigmoid function, tanh is the hyperbolic tangent, bi and bg are components of the bias vector for the hidden units.
The cell state at time t is computed as follows:
(4)
with ⊕ refers to element-wise summation (element-wise addition), is the element-wise product (element-wise multiplication), and ft is the forget gate.
The forget gate (ft) allows the memory cell to reset the cell state without growing indefinitely. The forget gate decides which information is allowed to go through and which information to suppress. The forget gate is computed as:
(5)
The output gate (denoted by ot) decides how to update the values of the hidden unit. In detail, the output gate combines the input data at the time t (xt), the output of the hidden unit at time t − 1 (denoted by ht-1), and the components of the bias vector for the hidden units in the last iteration (bo), under the form:
(6)
Note that the hidden units at the current time step are computed as:
(7)

The LSTM model consists of three main hyper-parameters named number of epochs, number of hidden units, and learning rate. In the present study, when Adam optimization algorithm (among different optimization options) is chosen as it is often used in practical applications (Zhang et al. 2018; Mouatadid et al. 2019; Truong et al. 2021; Pham Van & Nguyen-Van 2022), the appropriate values of above-mentioned hyper-parameters are determined by trial and error in the training step of the LSTM model. The latter is performed by using two-thirds of the data (from 1/1/2000 to 31/12/2015). The LSTM model is then validated by using one-third of the data (in the period from 1/1/2016 to 31/12/2021).

Figure 3 shows the general schematics diagram of the LSTM model to calculate water levels at different locations of interest in the BHHIS, including three main gates, i.e., an input, a forget, and an output. The input gate receives the water level data at the upstream and downstream of the Xuan Quan, Cau Cat, Cau Xe, and An Tho culverts, which are processed and analyzed in the forget gate interconnected with the memory cell that consists of several hidden layers. Results are presented in the output gate that consists of the water levels at twelve locations associated with the upstream and downstream of six culverts of interest inside the studied system.
Figure 3

General schematic diagram of the LSTM model for evaluating water levels at different locations of interest in the BHHIS.

Figure 3

General schematic diagram of the LSTM model for evaluating water levels at different locations of interest in the BHHIS.

Close modal

Statistical metrics

Different statistical metrics named RMSE, MAE, ME, Willmott's score (WS), Pearson's correlation coefficient (r), and Nash–Sutcliffe efficiency (NSE) are used for quantitatively assessing the agreement between estimated and observed water levels at upstream and downstream of different culverts of interest in the BHHIS. The RMSE, MAE, ME, WS, r, and NSE are computed as follows:
(8)
(9)
(10)
(11)
(12)
(13)
where and are respectively the observed and estimated values of water level at point number i in a time series, and are the mean value of estimated and observed water level, respectively, and N is the total number of points in the considered time series.

It is interesting to note that RMSE, MAE, and ME are valuable indicators because they provide the error in the units of the quantity of interest, which is helpful in the analysis of the results. The WS, r, and NSE coefficient, which determines the relative magnitude of the residual variance (or noise) compared to the observation's variance, are used to provide extensive information on comparisons between observed and estimated values. Besides the statistical errors mentioned above, radar charts, which are a graphical method of displaying multivariate data in the form of a two-dimensional chart of quantities of interest represented on axes starting from the same point, are also used because radar charts are useful ways to display multivariate observations of statistical errors at all target locations.

Results of the multiple linear regression

Using the collected data of the water level mentioned previously, MLR is adopted for both upstream and downstream locations of different culverts of interest in the BHHIS based on the minimum root mean square of estimated values and measured data. Table 2 summarizes the appropriate values of regression coefficients for all locations of interest, while detailed values of error estimates of water level are shown in Table 3.

Table 2

Values of coefficients in the multiple linear regression

 
 
Table 3

Error estimates of water level at upstream and downstream locations of different culverts of interest when using multiple linear regression

LocationRMSE
MAE
ME
WSrNSEHmax (m)
(m)(%)(m)(%)(m)(%)Est.Obs.
UBD 0.067 1.205 0.033 0.588 0.00 0.00 0.998 0.996 0.993 5.51 5.60 
DBD 0.236 6.212 0.180 4.741 0.00 0.00 0.929 0.873 0.763 3.86 3.80 
UKC 0.212 5.736 0.155 4.195 0.00 0.00 0.938 0.888 0.788 3.60 3.69 
DKC 0.159 4.466 0.113 3.187 0.00 0.00 0.951 0.909 0.826 3.35 3.56 
ULD 0.133 3.901 0.092 2.712 0.00 0.00 0.961 0.928 0.861 3.25 3.41 
DLD 0.142 4.339 0.095 2.902 0.00 0.00 0.955 0.916 0.839 3.23 3.28 
UT 0.092 2.697 0.058 1.715 0.00 0.00 0.980 0.962 0.925 3.20 3.40 
DT 0.112 3.625 0.068 2.209 0.00 0.00 0.967 0.939 0.881 3.09 3.09 
UBT 0.062 2.040 0.038 1.252 0.00 0.00 0.989 0.978 0.957 2.93 3.02 
DBT 0.193 6.414 0.135 4.487 0.00 0.00 0.927 0.871 0.758 2.98 3.01 
UN 0.117 4.062 0.081 2.811 0.00 0.00 0.955 0.916 0.839 2.76 2.87 
DN 0.151 5.271 0.102 3.570 0.00 0.00 0.957 0.919 0.845 2.91 2.86 
LocationRMSE
MAE
ME
WSrNSEHmax (m)
(m)(%)(m)(%)(m)(%)Est.Obs.
UBD 0.067 1.205 0.033 0.588 0.00 0.00 0.998 0.996 0.993 5.51 5.60 
DBD 0.236 6.212 0.180 4.741 0.00 0.00 0.929 0.873 0.763 3.86 3.80 
UKC 0.212 5.736 0.155 4.195 0.00 0.00 0.938 0.888 0.788 3.60 3.69 
DKC 0.159 4.466 0.113 3.187 0.00 0.00 0.951 0.909 0.826 3.35 3.56 
ULD 0.133 3.901 0.092 2.712 0.00 0.00 0.961 0.928 0.861 3.25 3.41 
DLD 0.142 4.339 0.095 2.902 0.00 0.00 0.955 0.916 0.839 3.23 3.28 
UT 0.092 2.697 0.058 1.715 0.00 0.00 0.980 0.962 0.925 3.20 3.40 
DT 0.112 3.625 0.068 2.209 0.00 0.00 0.967 0.939 0.881 3.09 3.09 
UBT 0.062 2.040 0.038 1.252 0.00 0.00 0.989 0.978 0.957 2.93 3.02 
DBT 0.193 6.414 0.135 4.487 0.00 0.00 0.927 0.871 0.758 2.98 3.01 
UN 0.117 4.062 0.081 2.811 0.00 0.00 0.955 0.916 0.839 2.76 2.87 
DN 0.151 5.271 0.102 3.570 0.00 0.00 0.957 0.919 0.845 2.91 2.86 

Figure 4 shows radar charts for dimensional and dimensionless statistical errors of water level at 12 locations, while Figure 5 illustrates the estimations versus observations of water level. It is observed that the magnitude values of dimension statistical errors of water level are less than 0.24 m and 0.18 m for the RMSE and MAE, respectively. These errors are only about 6% of the observed magnitude of water level at considered locations. The values of ME are close to zero at most locations, showing that the estimated values tend to be close to observed values. In terms of dimensionless statistical errors, the values of WS and r coefficients at all locations vary between 0.90 and 0.99, revealing that the estimated values of the water level reproduce very well the observed variation of the water level. The NSE changes from 0.80 to 0.99, except at the DBD and DBT locations where the value of the NSE is about 0.76. The maximum values of the estimated water level are very close to the observations at all considered locations (see Table 3). The discrepancy between maximum values of the estimated and measured water levels ranges from −0.21 to 0.05 m. These results suggest that the MLR represents well the observations of water level at almost locations of interest in the BHHIS.
Figure 4

Radar chart for, (a) dimensional and (b) dimensionless statistical errors when using multiple linear regression.

Figure 4

Radar chart for, (a) dimensional and (b) dimensionless statistical errors when using multiple linear regression.

Close modal
Figure 5

Estimations versus observations of the water level at: (a) UBD, (b) DBD, (c) UKC, (d) DKC, (e) ULD, (f) DLD, (g) UT, (h) DT, (i) UBT, (k) DBT, (l) UN, and (m) DN.

Figure 5

Estimations versus observations of the water level at: (a) UBD, (b) DBD, (c) UKC, (d) DKC, (e) ULD, (f) DLD, (g) UT, (h) DT, (i) UBT, (k) DBT, (l) UN, and (m) DN.

Close modal

Results of the LSTM model

Training results of the LSTM model

To identify the appropriate values of the hyper-parameters (i.e., learning rate, number of hidden units, and number of epochs) in the training step of the LSTM model, two-thirds of the collected data (corresponding to the measured water levels during the period from 1/1/2000 to 31/12/2015) were used. Using the trial-and-error method, the appropriate values for the hyper-parameters were found finally to be equal to 0.001, 64, and 150 for learning rate, number of hidden units, and number of epochs, respectively. The statistical errors of water level at all locations when using above-mentioned values for hyper-parameters are summarized in Table 4, while radar charts for dimensional and dimensionless statistical errors are shown in Figure 6. Figure 7 shows the time series of the water level in the whole training period at UBD and UN locations. Note that among the 12 considered locations, UBD and UN are the locations with the best and worst fits, respectively. The time series of the estimated water level at other locations are shown in Fig. S.1.
Table 4

Error estimates of water level at upstream and downstream locations of different culverts of interest for the training step of the LSTM model

LocationRMSE
MAE
ME
WSrNSEHmax (m)
(m)(%)(m)(%)(m)(%)Est.Obs.
UBD 0.069 1.236 0.030 0.532 −0.001 −0.01 0.998 0.997 0.993 5.55 5.60 
DBD 0.154 4.056 0.103 2.704 0.001 0.04 0.974 0.950 0.902 3.67 3.80 
UKC 0.153 4.152 0.107 2.896 −0.001 −0.03 0.971 0.945 0.894 3.64 3.69 
DKC 0.118 3.302 0.081 2.272 0.002 0.05 0.974 0.951 0.904 3.60 3.56 
ULD 0.095 2.774 0.064 1.883 0.005 0.16 0.981 0.964 0.928 3.36 3.41 
DLD 0.103 3.133 0.068 2.085 0.00 0.07 0.977 0.956 0.914 3.26 3.28 
UT 0.063 1.840 0.042 1.232 0.00 0.08 0.991 0.982 0.964 3.39 3.40 
DT 0.083 2.700 0.053 1.729 0.00 0.16 0.982 0.965 0.931 3.07 3.09 
UBT 0.051 1.701 0.031 1.040 0.01 0.18 0.992 0.985 0.970 2.97 3.02 
DBT 0.124 4.109 0.082 2.729 0.00 −0.14 0.974 0.951 0.905 2.95 3.01 
UN 0.096 3.353 0.065 2.266 0.00 0.14 0.970 0.943 0.889 2.78 2.87 
DN 0.105 3.655 0.068 2.389 0.00 −0.10 0.981 0.963 0.928 2.79 2.86 
LocationRMSE
MAE
ME
WSrNSEHmax (m)
(m)(%)(m)(%)(m)(%)Est.Obs.
UBD 0.069 1.236 0.030 0.532 −0.001 −0.01 0.998 0.997 0.993 5.55 5.60 
DBD 0.154 4.056 0.103 2.704 0.001 0.04 0.974 0.950 0.902 3.67 3.80 
UKC 0.153 4.152 0.107 2.896 −0.001 −0.03 0.971 0.945 0.894 3.64 3.69 
DKC 0.118 3.302 0.081 2.272 0.002 0.05 0.974 0.951 0.904 3.60 3.56 
ULD 0.095 2.774 0.064 1.883 0.005 0.16 0.981 0.964 0.928 3.36 3.41 
DLD 0.103 3.133 0.068 2.085 0.00 0.07 0.977 0.956 0.914 3.26 3.28 
UT 0.063 1.840 0.042 1.232 0.00 0.08 0.991 0.982 0.964 3.39 3.40 
DT 0.083 2.700 0.053 1.729 0.00 0.16 0.982 0.965 0.931 3.07 3.09 
UBT 0.051 1.701 0.031 1.040 0.01 0.18 0.992 0.985 0.970 2.97 3.02 
DBT 0.124 4.109 0.082 2.729 0.00 −0.14 0.974 0.951 0.905 2.95 3.01 
UN 0.096 3.353 0.065 2.266 0.00 0.14 0.970 0.943 0.889 2.78 2.87 
DN 0.105 3.655 0.068 2.389 0.00 −0.10 0.981 0.963 0.928 2.79 2.86 
Figure 6

Radar chart for, (a) dimensional and (b) dimensionless statistical errors in the training step of the LSTM model.

Figure 6

Radar chart for, (a) dimensional and (b) dimensionless statistical errors in the training step of the LSTM model.

Close modal
Figure 7

Time series of the water level at: (a) UBD and (b) UN for the training step.

Figure 7

Time series of the water level at: (a) UBD and (b) UN for the training step.

Close modal

As summarized in Table 4, the RMSE of water level at all locations of interest varies between 0.05 and 0.16 m, while the MAE of water level changes from 0.03 to 0.10 m. The values of ME vary in a range from −0.001 to 0.01 m. These dimensional statistical errors equal only about 4% of the observed magnitude of the water level at the considered locations. The dimensionless statistical errors (including the WS, r, and NSE) range from 0.89 to 0.99 (Table 4), revealing that the LSTM model reproduces very well the observed values of the water level in different flow conditions as well as in different instantaneous consequences in the training step (Figure 7). Indeed, the discrepancy between maximum values of estimated and observed water levels changes from −0.13 to 0.04 m. These results suggest that appropriate values of the hyper-parameters are obtained in the training step of the LSTM model.

Validation results of the LSTM model

The LSTM model is validated using the collected data on the water level in the period from 1/1/2016 to 31/12/2021. Note that the appropriate values of the hyper-parameters determined in the training step are used in the validation calculation. Detailed values of statistical errors of the water level for the validation step of the LSTM model are summarized in Table 5, while the radar charts for dimensional and dimensionless statistical errors are shown in Figure 8. Figure 9 shows the time series of the estimated and observed values of the water level at UBD and UN locations. The time series of the estimated and observed values of the water level at 10 other locations of interest are shown in Fig. S.2.
Table 5

Error estimates of water level at upstream and downstream locations of different culverts of interest for the validating step of the LSTM model

LocationRMSE
MAE
ME
WSrNSEHmax (m)
(m)(%)(m)(%)(m)(%)Est.Obs.
UBD 0.033 0.605 0.022 0.399 0.002 0.033 0.999 0.998 0.997 5.31 5.40 
DBD 0.123 3.476 0.084 2.376 0.007 0.208 0.976 0.955 0.912 3.31 3.53 
UKC 0.122 3.574 0.087 2.537 0.003 0.097 0.974 0.951 0.904 3.25 3.41 
DKC 0.115 3.415 0.081 2.404 0.006 0.185 0.977 0.955 0.911 3.29 3.36 
ULD 0.104 3.252 0.074 2.328 −0.003 −0.091 0.979 0.958 0.918 3.11 3.19 
DLD 0.106 3.315 0.076 2.384 −0.01 −0.162 0.978 0.958 0.917 3.22 3.19 
UT 0.070 2.333 0.050 1.667 −0.01 −0.298 0.989 0.979 0.957 3.05 3.01 
DT 0.081 2.742 0.057 1.925 −0.01 −0.371 0.985 0.972 0.943 2.90 2.97 
UBT 0.055 2.110 0.042 1.593 −0.02 −0.587 0.991 0.984 0.965 2.68 2.61 
DBT 0.120 4.635 0.082 3.176 0.01 0.217 0.970 0.942 0.887 2.60 2.59 
UN 0.100 4.006 0.073 2.903 −0.01 −0.459 0.967 0.940 0.882 2.48 2.50 
DN 0.106 4.260 0.072 2.906 0.01 0.295 0.977 0.955 0.912 2.49 2.49 
LocationRMSE
MAE
ME
WSrNSEHmax (m)
(m)(%)(m)(%)(m)(%)Est.Obs.
UBD 0.033 0.605 0.022 0.399 0.002 0.033 0.999 0.998 0.997 5.31 5.40 
DBD 0.123 3.476 0.084 2.376 0.007 0.208 0.976 0.955 0.912 3.31 3.53 
UKC 0.122 3.574 0.087 2.537 0.003 0.097 0.974 0.951 0.904 3.25 3.41 
DKC 0.115 3.415 0.081 2.404 0.006 0.185 0.977 0.955 0.911 3.29 3.36 
ULD 0.104 3.252 0.074 2.328 −0.003 −0.091 0.979 0.958 0.918 3.11 3.19 
DLD 0.106 3.315 0.076 2.384 −0.01 −0.162 0.978 0.958 0.917 3.22 3.19 
UT 0.070 2.333 0.050 1.667 −0.01 −0.298 0.989 0.979 0.957 3.05 3.01 
DT 0.081 2.742 0.057 1.925 −0.01 −0.371 0.985 0.972 0.943 2.90 2.97 
UBT 0.055 2.110 0.042 1.593 −0.02 −0.587 0.991 0.984 0.965 2.68 2.61 
DBT 0.120 4.635 0.082 3.176 0.01 0.217 0.970 0.942 0.887 2.60 2.59 
UN 0.100 4.006 0.073 2.903 −0.01 −0.459 0.967 0.940 0.882 2.48 2.50 
DN 0.106 4.260 0.072 2.906 0.01 0.295 0.977 0.955 0.912 2.49 2.49 
Figure 8

Radar chart for: (a) dimensional and (b) dimensionless statistical errors in the validation step of the LSTM model.

Figure 8

Radar chart for: (a) dimensional and (b) dimensionless statistical errors in the validation step of the LSTM model.

Close modal
Figure 9

Time series of the water level at: (a) UBD and (b) UN for the validation step.

Figure 9

Time series of the water level at: (a) UBD and (b) UN for the validation step.

Close modal

Like the training step, the dimensional statistical errors of the water level (including the RMSE, MAE, and ME) vary between −0.02 and 0.12 m (see Table 5). These errors are less than 4.6% of the observed magnitude of the water level at the locations of interest. The WS is close to the unity, while the values of the r coefficient between the estimated and observed water levels range from 0.94 to 0.99 (see Table 5), revealing that the LSTM model represents the variable trend of the observed water levels very well. The values of NSE change between 0.88 and 0.99, showing that the LSTM model reproduces well the observed water levels at all studied locations. At all 12 locations, computed values of the maximum water level are very close to the observations. Indeed, the discrepancy between maximum values of estimated and observed water levels varies between −0.22 and 0.07 m (see Table 5). These results are consistent with the results obtained in the training step of the LSTM model. This demonstrates that the appropriate values are achieved for the hyper-parameters in the LSTM model. Thus, the LSTM model can be used to evaluate the water level at different locations of interest in the BHHIS.

Discussion

As can be seen in Tables 35, the magnitude values of dimensional statistical errors of the water level in the period from 2000 to 2021 at 12 locations of interest in the BHHIS are equal to 0.24 and 0.16 m for the MLR and LSTM model, respectively. The r coefficient and WS are close to the unity for both methods. In addition, the value of NSE of the water level changes from 0.76 to 0.99 when the MLR method is applied, while its value varies between 0.88 and 0.99 if the LSTM model is used. The discrepancy between maximum values of estimated and observed water levels is more or less about 0.20 m for both methods. These results demonstrate that both MLR and LSTM models can reproduce observed water levels at all 12 locations of interest in the studied system.

Regarding detailed statistical errors of the water level from the MLR and LSTM models, it is also clearly observed that a considerable improvement in estimated water levels is obtained when using the LSTM model in comparison with the MLR. For example, a minimum value of 0.88 is achieved for the NSE when using the LSTM model, instead of 0.76 for the case of the MLR. In addition, higher values of WS and r coefficient at all twelve locations of interest within the BHHIS are also obtained if the LSTM model is used.

Truong et al. (2021) used the gradient tree boosting-based (GTB) model to investigate water levels in the dry seasons from 2000 to 2020 at upstream and downstream locations of five culverts named Kenh Cau, Luc Dien, Tranh, Ba Thuy, and Neo in the BHHIS. The results showed that the GTB model reproduced acceptably the measured water levels, with the values of the coefficient of determination ranging from 0.86 to 0.97 at the upstream and downstream locations of the five culverts mentioned above. However, it must be emphasized that the previous study focused only on water levels in the dry seasons when flow hydrodynamics and water elevations in the BHHIS are relatively stable. The present research showed that both MLR and LSTM models reproduced very well the water levels under various flow conditions in both dry and flood seasons in the long period from 1/1/2000 to 31/12/2021. Therefore, these results are strongly believed to support not only irrigation but also drainage purposes in the studied system.

Among 12 locations of interest, the LSTM model showed a remarkable discrepancy in the estimated water level at the upstream location of Neo culvert (denoted by UN) and the others (Figure 1). This can be explained by the different reasons following. Firstly, the water level at the UN is affected by different factors such as flow dynamics, water discharge, and operations of culverts and structures in both upstream and downstream regions. The impacts of these factors, however, are not taken into account in the calculation when using the LSTM model. Secondly, the division of water discharge at the bifurcation/confluence location (see Figure 1) between the Cuu An and Nam Ke Sat rivers can be also another reason for the discrepancy. Finally, the absence of accelerating climate changes and extreme events (e.g., floods, droughts), changes in water demands and water users as well as the physical system, which cannot be taken into account in the present study, may be an additional reason for the discrepancy between UN and the other locations.

This study presents a MLR and LSTM model that can be used for evaluating water levels in the irrigation and drainage system of interest named BHHIS, Vietnam. The main remarks and conclusions of the study are summarized as follows:

  • Using the water level data in the period from 1/1/2000 to 31/12/2021, a MLR was identified for 12 locations associated with the upstream and downstream locations of the six culverts inside the BHHIS. The results showed that the dimensional statistical errors of water level (consisting of RMSE, MAE, and ME) are only about 6% of the maximum water level monitoring at the locations of interest. The values of WS and r coefficients at all locations varied between 0.90 and 0.99, while the NSE ranged from 0.76 to 0.99.

  • When using the LSTM model, the appropriate values of the hyper-parameters of the model were determined using the water level data in the period from 1/1/2000 to 31/12/2015. The model was then validated using the measured water levels in the other period from 1/1/2016 to 31/12/2021. The LSTM model reproduced very well the measured water levels in both the training and validation steps. The dimensional statistical errors of water level were less than 4.6% of the maximum water level measured at the locations of interest. The r coefficient and WS were close to unity, while the NSE changed between 0.88 and 0.99.

  • The LSTM model proved to be a more suitable approach than the MLR for evaluating water levels at different locations of interest in the BHHIS. The LSTM model used only water levels as input data, which leads the model to be a useful tool for evaluating water levels at both upstream and downstream locations of culverts of interest. The latter will help in operating (opening and closing) the hydraulic structures and controls for both irrigation and drainage purposes. Another advantage of the LSTM model over hydraulic models is that the LSTM model can self-update with new data, whereas updating hydraulic models is significantly more cumbersome.

The authors would like to thank the Bac Hung Hai irrigation company for sharing the water elevation data used for different calculations in the present study. The authors also would like to express their thanks to the anonymous reviewers for their useful comments and suggestions with constructive criticism that have helped improve the clarity of the manuscript.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Hochreiter
S.
&
Schmidhuber
J.
1997
Long short-term memory
.
Neural Computation
9
,
1735
1780
.
https://doi:10.1162/neco.1997.9.8.1735
.
Kang
M.
&
Park
S.
2014
Modeling water flows in a serial irrigation reservoirs system considering irrigation return flows and reservoir operations
.
Agricultural Water Management
143
,
131
141
.
https://doi.org/10.1016/j.agwat.2014.07.003
.
Malterre
P.
,
Navarro
G.
,
Playam
E.
&
Ed
F. B.
2007
Control of irrigation canals: why and how?
In
International Workshop on Numerical Modelling of Hydrodynamics for Water Resources
,
June 18–21, 2007
,
Zaragoza, Spain
, pp.
271
292
.
MARD – Ministry of Agriculture and Rural development
2016
Regulation and Operation of the Bac Hung Hai Irrigation System
, pp.
34
(No. 5471, in Vietnamese)
.
Mattar
M.
,
Roy
D. K.
,
Al-Ghobari
H. M.
&
Dewidar
A. Z.
2022
Machine learning and regression-based technique for predicting sprinkler irrigation's wind drift and evaporation losses
.
Agricultural Water Management
265
,
107529
.
https://doi.org/10.1016/j.agwat.2022.107529
.
Mohammad
R. P. R.
,
Khodadad
D.
,
Mojtaba
H.
,
Gholamali
K.
,
Nader
M.
,
Mojtaba
G.
,
Mehdi
K.
,
Naser
D.
&
Colby
B.
2020
Prediction of soil water infiltration using multiple linear regression and random forest in a dry flood plain, eastern Iran
.
Catena
194
,
104715
.
https://doi.org/10.1016/j.catena.2020.104715
.
Mouatadid
S.
,
Adamowski
J. F.
,
Tiwari
M. K.
&
Quilty
J. M.
2019
Coupling the maximum overlap discrete wavelet transform and long short-term memory networks for irrigation flow forecasting
.
Agricultural Water Management
219
,
72
85
.
https://doi.org/10.1016/j.agwat.2019.03.045
.
Ohana-Levi
N.
,
Ben-Gal
A.
,
Munitz
S.
&
Netzer
Y.
2022
Grapevine crop evapotranspiration and crop coefficient forecasting using linear and non-linear multiple regression models
.
Agricultural Water Management
262
,
107317
.
https://doi.org/10.1016/j.agwat.2021.107317
.
Pereira
L. S.
,
Cordery
I.
&
Iacovides
I.
2012
Improved indicators of water use performance and productivity for sustainable water conservation and saving
.
Agricultural Water Management
108
,
39
51
.
https://doi.org/10.1016/j.agwat.2011.08.022
.
Pham Van
C.
&
Nguyen-Van
G.
2022
Three different models to evaluate water discharge: an application to a river section at Vinh Tuy location in the Lo river basin, Vietnam
.
Journal of Hydro-Environment Research
40
,
38
50
.
https://doi.org/10.1016/j.jher.2021.12.002
.
Pham Van
C.
,
Giang
N. V.
,
Van
N. T.
,
Chin
L. V.
,
Doanh
N. N.
&
Drogoul
A.
2018
Modelling water flows in the Bac Hung Hai irrigation system
. In
International Symposium on Lowland Technology (ISLT2018)
,
September 26–28, 2018
,
Hanoi, Vietnam
, pp.
1
8
.
Truong
V. H.
,
Ly
Q. V.
,
Le
V. C.
,
Vu
T. B.
,
Le
T. T. T.
,
Tran
T. T.
&
Goethals
P.
2021
Machine learning-based method for forecasting water levels in irrigation and drainage systems
.
Environmental Technology & Innovation
23
,
101762
.
https://doi.org/10.1016/j.eti.2021.101762
.
Willmott
C. J.
1981
On the validation of models
.
Physical Geography
2
(
2
),
184
194
.
10.1080/02723646.1981.10642213
.
Zhang
J.
,
Zhu
Y.
,
Zhang
X.
,
Ye
M.
&
Yang
J.
2018
Developing a long short-term memory-based model for predicting water table depth in agricultural areas
.
Journal of Hydrology
561
,
918
929
.
https://doi.org/10.1016/j.jhydrol.2018.04.065
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data