Abstract
In this context, multiple linear regression (MLR) and long short-term memory (LSTM) are presented to evaluate water levels in irrigation and drainage systems based on the available water levels at inlet and outlet locations. The Bac Hung Hai irrigation and drainage system is chosen as an example for demonstrating the MLR and LSTM models. Six statistical metrics including root mean square error (RMSE), mean absolute error (MAE), mean error (ME), Willmott's score (WS), Pearson's correlation coefficient (r), and Nash–Sutcliffe efficiency (NSE) are implemented for quantitatively assessing the agreement between estimated and observed water levels at 12 locations of interest within the system in the period from 2000 to 2021 (with an interval time of 6 hours). The results showed that MLR and LSTM models can be used for evaluating water levels with high accuracy. The values of dimensional statistical errors equal only about 6% of the maximum water level monitoring at the locations of interest for both MLR and LSTM models. The values of dimensionless statistical errors range from 0.76 to 0.99 for all 12 locations of interest in the studied system. In addition, both models are benchmarked and could be used for other agricultural systems.
HIGHLIGHTS
Multiple linear regression and long short-term memory models are presented for evaluating water levels in irrigation and drainage systems.
Both multiple linear regression and long short-term memory model reproduced well the observed water levels.
The LSTM model proved to be a more suitable approach than the multiple linear regression for evaluating water levels at different locations of interest.
Graphical Abstract
INTRODUCTION
The water elevation, which is one of the various flow characteristics, is an important quantity in the irrigation and drainage systems. This is because water elevation affects not only the regulation and operation of hydraulic structures and controls (e.g., culverts, sluice gates, pumps) existing inherently in systems but also the management of irrigation and drainage water in an economically efficient and socially equitable manner (Pereira et al. 2012). However, due to climate changes, socio-economic developments, urbanization, and change in irrigation and drainage areas, water demands have altered water usage and water supply priorities (Kang & Park 2014). To meet the change in water uses and objectives of sustainable water management, water elevation should be estimated accurately as well as possible in practical applications since water elevation affects the regulation and operation of structures in irrigation and drainage systems, water reallocation, and management policies liked water-saving practices (Malterre et al. 2007).
Field measurements can be used to collect water elevation in the irrigation and drainage systems. Field measurements at different locations over a long period are useful data for investigating flow characteristics in detail and proposing suitable strategies for sustainable water management and the environment in the systems. However, it is very difficult to obtain such kind of data because of the time-consuming and financial issues as well as difficulties in field measurements under extreme events such as flood events. Hydraulic models are used to simulate water elevation in irrigation and drainage systems (e.g., Kang & Park 2014; Pham Van et al. 2018). Hydraulic models allow for a detailed representation of the river and channel networks of systems of interest as well as related structures and their operation rules in simulations. They, however, often require detailed bathymetry, the geometry of river cross-sections, and sediment properties of the riverbed for determining appropriate values of the bottom friction. The latter is not always available in irrigation and drainage systems (Kang & Park 2014; Pham Van et al. 2018). Moreover, updating hydraulic models is significantly more cumbersome. Recently, data-driven models such as long short-term memory (LSTM) and regression analysis have been extensively used for forecasting quantities of interest in irrigation and drainage systems, with different levels of complexity (e.g., Zhang et al. 2018; Mouatadid et al. 2019; Truong et al. 2021).
Long short-term memory (LSTM), which is the popular extension of recurrent neural networks (Pham Van & Nguyen-Van 2022), has been widely applied in simulating water levels and flow in irrigation and drainage systems. For instance, Zhang et al. (2018) applied the LSTM model to predict water-stable depth in the Hetao irrigation district, China. Mouatadid et al. (2019) combined the LSTM model and the discrete wavelet transform for computing the water flow in the Palos de la Frontera irrigation system, Spain. Recently, Truong et al. (2021) used the gradient tree boosting-based model to calculate the water level in the dry season of the Bac Hung Hai irrigation and drainage system (BHHIS), Vietnam. These examples suggest that the LSTM model can be used for evaluating water levels under various flow conditions (in both dry and flood seasons) at different locations in the irrigation and drainage system of interest.
Regression analysis, including both linear and non-linear models, is also used for forecasting quantities of interest in irrigation and drainage systems. The literature shows that most studies using regression analysis focus on investigating evaporation losses (e.g., Mattar et al. 2022), evapotranspiration and crop coefficient (e.g., Ohana-Levi et al. 2022), and soil water infiltration (e.g., Mohammad et al. 2020) in the irrigation and drainage systems. The use of both regression analysis and the LSTM model for evaluating water elevation in irrigation and drainage systems has been limited. It is interesting to note that the present study is the first one in which both MLR and LSTM models are implemented together and applied for evaluating water levels in the BHHIS.
The main objective of the present study is to evaluate the water level in the BHHIS using MLR and the LSTM models using the available data on the water level in the period from 1/1/2000 to 31/12/2021. In detail, water levels at upstream and downstream locations of inlet and outlet culverts (i.e., Xuan Quan, Cau Cat, Cau Xe, and An Tho) are considered as input data in both two models, while its value at upstream and downstream locations of Bao Dap, Kenh Cau, Luc Dien, Tranh, Ba Thuy, and Neo culverts are used as target values. The present study also focuses on water levels associated with various flow conditions in both dry and flood seasons as well as in the long period from 2000 to 2021. Six statistical indexes, including root mean square error (RMSE), mean absolute error (MAE), mean error (ME), Willmott's score (Willmott 1981) denoted by WS, Nash–Sutcliffe efficiency (NSE), and Pearson's correlation coefficient (r) between the observed and estimated water levels, are implemented for quantitatively assessment of the quality of estimating water levels. Radar charts are also used to display both dimensional and dimensionless statistical errors of water levels at 12 locations associated with upstream and downstream locations of six interested culverts inside the BHHIS.
STUDY SYSTEM AND DATA COLLECTION
The Bac Hung Hai irrigation and drainage system
Map of: (a) the Bac Hung Hai irrigation and drainage system and (b) main rivers, channels together with hydraulic structures and controls in the system.
Map of: (a) the Bac Hung Hai irrigation and drainage system and (b) main rivers, channels together with hydraulic structures and controls in the system.
In dry seasons from November to March, water in the BHHIS is normally taken from the Red river through Xuan Quan headwords. The water after use in the system is drained into the Thai Binh river through the Cau Cat, Cau Xe, and An Tho culverts and the Luoc river by Mi Dong pumping station. However, due to the decrease in water elevation in the Red river especially in recent years, the maximum value of water discharge taken through the Xuan Quan culvert observationally equals only 60% of the designed water discharge (corresponding to approximately about 45 m3/s). Thus, water inside the BHHIS is also taken from the Thai Binh river through the Cau Cat, Cau Xe, and An Tho culverts to enough water for supplying different agricultural irrigation purposes. This means that water in the BHHIS can be taken from both upstream headwords and downstream culverts, revealing two directions of flow depending on instantaneous time and period in dry seasons. Conversely, water in the system mainly flows in the direction from upstream to downstream in flood seasons (from April to October).
In terms of structures and hydraulic controls inside the BHHIS, there have (i) 11 main hydraulic structures and controls (Figure 1(b)), (ii) 400 pumping stations as well as 800 culverts for both irrigation and drainage, and (iii) thousands of kilometers of small channels as well as the dykes. Eleven main hydraulic structures and controls include Xuan Quan and Bao Dap headwords, three regulation structures (named Kenh Cau, Ba Thuy, Cong Neo culverts), two partition structures (named Luc Dien and Cong Tranh culverts), three drainage structures (named An Tho, Cau Xe, and Cau Cat culverts), and one pumping station (named Mi Dong). These hydraulic structures and controls lie at different locations along the rivers and channels within the system (Figure 1). Due to the combined effects of different factors such as complicated river and channel networks, hydraulic structures and controls, human-induced changes, and inputs of pollutants, regulation and operation of structures are still facing different issues in terms of both water irrigation and drainage, with high stresses and significant difficulties.
The regulation and operation of the hydraulic structures and controls in the BHHIS are currently following the regulation and operation rules No. 5471 (MARD 2016), revealing that a hydrological year is divided into seven regulation periods, with four regulation periods in the dry season and three regulation periods in the wet season. Depending on each regulation period, culverts are opened or closed to maintain as well as to control the water elevation within the system. Thus, accurate estimation of the water level at the upstream and downstream of hydraulic structures and controls is still an important task and challenging issue not only for operating hydraulic structures and controls themselves but also for sustainable water management and the environment of the system under combined impacts of (i) decrease in water elevation from the Red river, (ii) development of urbanization and increase in pollution inside the system, (iii) increase in salt intrusion in the downstream regions of the system.
Data collection
Collective period of water level at upstream and downstream locations of 10 culverts in the BHHIS
No. . | Location . | Culvert . | River . | Period . | Abbreviation . | Note . |
---|---|---|---|---|---|---|
1 | Upstream | Xuan Quan | Kim Son | 2000–2021 | UXQ | Inlet culvert |
2 | Downstream | DXQ | ||||
3 | Upstream | Bao Dap | Kim Son | 2000–2021 | UBD | Target culverts |
4 | Downstream | DBD | ||||
5 | Upstream | Kenh Cau | Kim Son | 2000–2021 | UKC | |
6 | Downstream | DCK | ||||
7 | Upstream | Luc Dien | Dien Bien | 2000–2021 | ULD | |
8 | Downstream | DLD | ||||
9 | Upstream | Tranh | Tay Ke Sat | 2000–2021 | UT | |
10 | Downstream | DT | ||||
11 | Upstream | Ba Thuy | Kim Son | 2000–2021 | UBT | |
12 | Downstream | DBT | ||||
13 | Upstream | Neo | Cuu An | 2000–2021 | UN | |
14 | Downstream | DN | ||||
15 | Upstream | Cau Cat | Kim Son | 2000–2021 | UCC | Outlet culverts |
16 | Downstream | DCC | ||||
17 | Upstream | Cau Xe | Cau Xe | 2000–2021 | UCX | |
18 | Downstream | DCX | ||||
19 | Upstream | An Tho | An Tho | 2000–2021 | UAT | |
20 | Downstream | DAT |
No. . | Location . | Culvert . | River . | Period . | Abbreviation . | Note . |
---|---|---|---|---|---|---|
1 | Upstream | Xuan Quan | Kim Son | 2000–2021 | UXQ | Inlet culvert |
2 | Downstream | DXQ | ||||
3 | Upstream | Bao Dap | Kim Son | 2000–2021 | UBD | Target culverts |
4 | Downstream | DBD | ||||
5 | Upstream | Kenh Cau | Kim Son | 2000–2021 | UKC | |
6 | Downstream | DCK | ||||
7 | Upstream | Luc Dien | Dien Bien | 2000–2021 | ULD | |
8 | Downstream | DLD | ||||
9 | Upstream | Tranh | Tay Ke Sat | 2000–2021 | UT | |
10 | Downstream | DT | ||||
11 | Upstream | Ba Thuy | Kim Son | 2000–2021 | UBT | |
12 | Downstream | DBT | ||||
13 | Upstream | Neo | Cuu An | 2000–2021 | UN | |
14 | Downstream | DN | ||||
15 | Upstream | Cau Cat | Kim Son | 2000–2021 | UCC | Outlet culverts |
16 | Downstream | DCC | ||||
17 | Upstream | Cau Xe | Cau Xe | 2000–2021 | UCX | |
18 | Downstream | DCX | ||||
19 | Upstream | An Tho | An Tho | 2000–2021 | UAT | |
20 | Downstream | DAT |
Probability density distribution of water elevation, at: (a) UXQ, (b) DXQ, (c) UBD, (d) DBD, (e) UKC, (f) DKC, (g) ULD, (h) DLD, (i) UT, (k) DT, (l) UBT, (m) DBT, (n) UN, (o) DN, (p) UCC, (q) DCC, (r) UCX, (s) DCX, (t) UAT, and (u) DAT.
Probability density distribution of water elevation, at: (a) UXQ, (b) DXQ, (c) UBD, (d) DBD, (e) UKC, (f) DKC, (g) ULD, (h) DLD, (i) UT, (k) DT, (l) UBT, (m) DBT, (n) UN, (o) DN, (p) UCC, (q) DCC, (r) UCX, (s) DCX, (t) UAT, and (u) DAT.
When MLR and LSTM approaches are applied, measured water levels at upstream and downstream locations of Xuan Quan, Cau Cat, Cau Xe, and An Tho culverts are used as input data while measurements of water levels at Bao Dap, Kenh Cau, Luc Dien, Tranh, Ba Thuy, and Neo culverts are used as output data. In detail, the whole data in the collective period are used for determining appropriate coefficients of the regression if the MLR is used. If the LSTM approach is applied, two-thirds of the collected data (corresponding to the period from 1/1/2000 to 31/12/2015 are used for the training step, while one-third of collected data (appropriate from 1/1/2016 to 31/12/2021) are used for the validating step of the model.
METHOD
Multiple linear regression






Long short-term memory model
The LSTM model, which is a well-known alternative network of recurrent neural networks that are capable of selectively remembering patterns for a long duration of time, has been widely applied for predicting time-series water level and water discharge in irrigation systems (e.g., Zhang et al. 2018; Mouatadid et al. 2019; Truong et al. 2021). The LSTM model is initially introduced by Hochreiter & Schmidhuber (1997) to overcome the vanishing gradient problems. The building block of the LSTM is a memory cell, which represents essentially the hidden layer. In each memory cell, there is a recurrent edge that has the desirable weight to overcome the vanishing and exploding gradient problems. The values associated with this recurrent edge are called cell state. The cell state from the previous time step (denoted Ct-1) is normally modified to get the cell state at the current time step (denoted by Ct) without being multiplied directly by any weighting factor. There are three different types of gates (named input, forget, and output) in a cell of the LSTM model.
The LSTM model consists of three main hyper-parameters named number of epochs, number of hidden units, and learning rate. In the present study, when Adam optimization algorithm (among different optimization options) is chosen as it is often used in practical applications (Zhang et al. 2018; Mouatadid et al. 2019; Truong et al. 2021; Pham Van & Nguyen-Van 2022), the appropriate values of above-mentioned hyper-parameters are determined by trial and error in the training step of the LSTM model. The latter is performed by using two-thirds of the data (from 1/1/2000 to 31/12/2015). The LSTM model is then validated by using one-third of the data (in the period from 1/1/2016 to 31/12/2021).
General schematic diagram of the LSTM model for evaluating water levels at different locations of interest in the BHHIS.
General schematic diagram of the LSTM model for evaluating water levels at different locations of interest in the BHHIS.
Statistical metrics




It is interesting to note that RMSE, MAE, and ME are valuable indicators because they provide the error in the units of the quantity of interest, which is helpful in the analysis of the results. The WS, r, and NSE coefficient, which determines the relative magnitude of the residual variance (or noise) compared to the observation's variance, are used to provide extensive information on comparisons between observed and estimated values. Besides the statistical errors mentioned above, radar charts, which are a graphical method of displaying multivariate data in the form of a two-dimensional chart of quantities of interest represented on axes starting from the same point, are also used because radar charts are useful ways to display multivariate observations of statistical errors at all target locations.
RESULTS AND DISCUSSION
Results of the multiple linear regression
Using the collected data of the water level mentioned previously, MLR is adopted for both upstream and downstream locations of different culverts of interest in the BHHIS based on the minimum root mean square of estimated values and measured data. Table 2 summarizes the appropriate values of regression coefficients for all locations of interest, while detailed values of error estimates of water level are shown in Table 3.
Error estimates of water level at upstream and downstream locations of different culverts of interest when using multiple linear regression
Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | Hmax (m) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||
UBD | 0.067 | 1.205 | 0.033 | 0.588 | 0.00 | 0.00 | 0.998 | 0.996 | 0.993 | 5.51 | 5.60 |
DBD | 0.236 | 6.212 | 0.180 | 4.741 | 0.00 | 0.00 | 0.929 | 0.873 | 0.763 | 3.86 | 3.80 |
UKC | 0.212 | 5.736 | 0.155 | 4.195 | 0.00 | 0.00 | 0.938 | 0.888 | 0.788 | 3.60 | 3.69 |
DKC | 0.159 | 4.466 | 0.113 | 3.187 | 0.00 | 0.00 | 0.951 | 0.909 | 0.826 | 3.35 | 3.56 |
ULD | 0.133 | 3.901 | 0.092 | 2.712 | 0.00 | 0.00 | 0.961 | 0.928 | 0.861 | 3.25 | 3.41 |
DLD | 0.142 | 4.339 | 0.095 | 2.902 | 0.00 | 0.00 | 0.955 | 0.916 | 0.839 | 3.23 | 3.28 |
UT | 0.092 | 2.697 | 0.058 | 1.715 | 0.00 | 0.00 | 0.980 | 0.962 | 0.925 | 3.20 | 3.40 |
DT | 0.112 | 3.625 | 0.068 | 2.209 | 0.00 | 0.00 | 0.967 | 0.939 | 0.881 | 3.09 | 3.09 |
UBT | 0.062 | 2.040 | 0.038 | 1.252 | 0.00 | 0.00 | 0.989 | 0.978 | 0.957 | 2.93 | 3.02 |
DBT | 0.193 | 6.414 | 0.135 | 4.487 | 0.00 | 0.00 | 0.927 | 0.871 | 0.758 | 2.98 | 3.01 |
UN | 0.117 | 4.062 | 0.081 | 2.811 | 0.00 | 0.00 | 0.955 | 0.916 | 0.839 | 2.76 | 2.87 |
DN | 0.151 | 5.271 | 0.102 | 3.570 | 0.00 | 0.00 | 0.957 | 0.919 | 0.845 | 2.91 | 2.86 |
Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | Hmax (m) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||
UBD | 0.067 | 1.205 | 0.033 | 0.588 | 0.00 | 0.00 | 0.998 | 0.996 | 0.993 | 5.51 | 5.60 |
DBD | 0.236 | 6.212 | 0.180 | 4.741 | 0.00 | 0.00 | 0.929 | 0.873 | 0.763 | 3.86 | 3.80 |
UKC | 0.212 | 5.736 | 0.155 | 4.195 | 0.00 | 0.00 | 0.938 | 0.888 | 0.788 | 3.60 | 3.69 |
DKC | 0.159 | 4.466 | 0.113 | 3.187 | 0.00 | 0.00 | 0.951 | 0.909 | 0.826 | 3.35 | 3.56 |
ULD | 0.133 | 3.901 | 0.092 | 2.712 | 0.00 | 0.00 | 0.961 | 0.928 | 0.861 | 3.25 | 3.41 |
DLD | 0.142 | 4.339 | 0.095 | 2.902 | 0.00 | 0.00 | 0.955 | 0.916 | 0.839 | 3.23 | 3.28 |
UT | 0.092 | 2.697 | 0.058 | 1.715 | 0.00 | 0.00 | 0.980 | 0.962 | 0.925 | 3.20 | 3.40 |
DT | 0.112 | 3.625 | 0.068 | 2.209 | 0.00 | 0.00 | 0.967 | 0.939 | 0.881 | 3.09 | 3.09 |
UBT | 0.062 | 2.040 | 0.038 | 1.252 | 0.00 | 0.00 | 0.989 | 0.978 | 0.957 | 2.93 | 3.02 |
DBT | 0.193 | 6.414 | 0.135 | 4.487 | 0.00 | 0.00 | 0.927 | 0.871 | 0.758 | 2.98 | 3.01 |
UN | 0.117 | 4.062 | 0.081 | 2.811 | 0.00 | 0.00 | 0.955 | 0.916 | 0.839 | 2.76 | 2.87 |
DN | 0.151 | 5.271 | 0.102 | 3.570 | 0.00 | 0.00 | 0.957 | 0.919 | 0.845 | 2.91 | 2.86 |
Radar chart for, (a) dimensional and (b) dimensionless statistical errors when using multiple linear regression.
Radar chart for, (a) dimensional and (b) dimensionless statistical errors when using multiple linear regression.
Estimations versus observations of the water level at: (a) UBD, (b) DBD, (c) UKC, (d) DKC, (e) ULD, (f) DLD, (g) UT, (h) DT, (i) UBT, (k) DBT, (l) UN, and (m) DN.
Estimations versus observations of the water level at: (a) UBD, (b) DBD, (c) UKC, (d) DKC, (e) ULD, (f) DLD, (g) UT, (h) DT, (i) UBT, (k) DBT, (l) UN, and (m) DN.
Results of the LSTM model
Training results of the LSTM model
Error estimates of water level at upstream and downstream locations of different culverts of interest for the training step of the LSTM model
Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | Hmax (m) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||
UBD | 0.069 | 1.236 | 0.030 | 0.532 | −0.001 | −0.01 | 0.998 | 0.997 | 0.993 | 5.55 | 5.60 |
DBD | 0.154 | 4.056 | 0.103 | 2.704 | 0.001 | 0.04 | 0.974 | 0.950 | 0.902 | 3.67 | 3.80 |
UKC | 0.153 | 4.152 | 0.107 | 2.896 | −0.001 | −0.03 | 0.971 | 0.945 | 0.894 | 3.64 | 3.69 |
DKC | 0.118 | 3.302 | 0.081 | 2.272 | 0.002 | 0.05 | 0.974 | 0.951 | 0.904 | 3.60 | 3.56 |
ULD | 0.095 | 2.774 | 0.064 | 1.883 | 0.005 | 0.16 | 0.981 | 0.964 | 0.928 | 3.36 | 3.41 |
DLD | 0.103 | 3.133 | 0.068 | 2.085 | 0.00 | 0.07 | 0.977 | 0.956 | 0.914 | 3.26 | 3.28 |
UT | 0.063 | 1.840 | 0.042 | 1.232 | 0.00 | 0.08 | 0.991 | 0.982 | 0.964 | 3.39 | 3.40 |
DT | 0.083 | 2.700 | 0.053 | 1.729 | 0.00 | 0.16 | 0.982 | 0.965 | 0.931 | 3.07 | 3.09 |
UBT | 0.051 | 1.701 | 0.031 | 1.040 | 0.01 | 0.18 | 0.992 | 0.985 | 0.970 | 2.97 | 3.02 |
DBT | 0.124 | 4.109 | 0.082 | 2.729 | 0.00 | −0.14 | 0.974 | 0.951 | 0.905 | 2.95 | 3.01 |
UN | 0.096 | 3.353 | 0.065 | 2.266 | 0.00 | 0.14 | 0.970 | 0.943 | 0.889 | 2.78 | 2.87 |
DN | 0.105 | 3.655 | 0.068 | 2.389 | 0.00 | −0.10 | 0.981 | 0.963 | 0.928 | 2.79 | 2.86 |
Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | Hmax (m) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||
UBD | 0.069 | 1.236 | 0.030 | 0.532 | −0.001 | −0.01 | 0.998 | 0.997 | 0.993 | 5.55 | 5.60 |
DBD | 0.154 | 4.056 | 0.103 | 2.704 | 0.001 | 0.04 | 0.974 | 0.950 | 0.902 | 3.67 | 3.80 |
UKC | 0.153 | 4.152 | 0.107 | 2.896 | −0.001 | −0.03 | 0.971 | 0.945 | 0.894 | 3.64 | 3.69 |
DKC | 0.118 | 3.302 | 0.081 | 2.272 | 0.002 | 0.05 | 0.974 | 0.951 | 0.904 | 3.60 | 3.56 |
ULD | 0.095 | 2.774 | 0.064 | 1.883 | 0.005 | 0.16 | 0.981 | 0.964 | 0.928 | 3.36 | 3.41 |
DLD | 0.103 | 3.133 | 0.068 | 2.085 | 0.00 | 0.07 | 0.977 | 0.956 | 0.914 | 3.26 | 3.28 |
UT | 0.063 | 1.840 | 0.042 | 1.232 | 0.00 | 0.08 | 0.991 | 0.982 | 0.964 | 3.39 | 3.40 |
DT | 0.083 | 2.700 | 0.053 | 1.729 | 0.00 | 0.16 | 0.982 | 0.965 | 0.931 | 3.07 | 3.09 |
UBT | 0.051 | 1.701 | 0.031 | 1.040 | 0.01 | 0.18 | 0.992 | 0.985 | 0.970 | 2.97 | 3.02 |
DBT | 0.124 | 4.109 | 0.082 | 2.729 | 0.00 | −0.14 | 0.974 | 0.951 | 0.905 | 2.95 | 3.01 |
UN | 0.096 | 3.353 | 0.065 | 2.266 | 0.00 | 0.14 | 0.970 | 0.943 | 0.889 | 2.78 | 2.87 |
DN | 0.105 | 3.655 | 0.068 | 2.389 | 0.00 | −0.10 | 0.981 | 0.963 | 0.928 | 2.79 | 2.86 |
Radar chart for, (a) dimensional and (b) dimensionless statistical errors in the training step of the LSTM model.
Radar chart for, (a) dimensional and (b) dimensionless statistical errors in the training step of the LSTM model.
Time series of the water level at: (a) UBD and (b) UN for the training step.
As summarized in Table 4, the RMSE of water level at all locations of interest varies between 0.05 and 0.16 m, while the MAE of water level changes from 0.03 to 0.10 m. The values of ME vary in a range from −0.001 to 0.01 m. These dimensional statistical errors equal only about 4% of the observed magnitude of the water level at the considered locations. The dimensionless statistical errors (including the WS, r, and NSE) range from 0.89 to 0.99 (Table 4), revealing that the LSTM model reproduces very well the observed values of the water level in different flow conditions as well as in different instantaneous consequences in the training step (Figure 7). Indeed, the discrepancy between maximum values of estimated and observed water levels changes from −0.13 to 0.04 m. These results suggest that appropriate values of the hyper-parameters are obtained in the training step of the LSTM model.
Validation results of the LSTM model
Error estimates of water level at upstream and downstream locations of different culverts of interest for the validating step of the LSTM model
Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | Hmax (m) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||
UBD | 0.033 | 0.605 | 0.022 | 0.399 | 0.002 | 0.033 | 0.999 | 0.998 | 0.997 | 5.31 | 5.40 |
DBD | 0.123 | 3.476 | 0.084 | 2.376 | 0.007 | 0.208 | 0.976 | 0.955 | 0.912 | 3.31 | 3.53 |
UKC | 0.122 | 3.574 | 0.087 | 2.537 | 0.003 | 0.097 | 0.974 | 0.951 | 0.904 | 3.25 | 3.41 |
DKC | 0.115 | 3.415 | 0.081 | 2.404 | 0.006 | 0.185 | 0.977 | 0.955 | 0.911 | 3.29 | 3.36 |
ULD | 0.104 | 3.252 | 0.074 | 2.328 | −0.003 | −0.091 | 0.979 | 0.958 | 0.918 | 3.11 | 3.19 |
DLD | 0.106 | 3.315 | 0.076 | 2.384 | −0.01 | −0.162 | 0.978 | 0.958 | 0.917 | 3.22 | 3.19 |
UT | 0.070 | 2.333 | 0.050 | 1.667 | −0.01 | −0.298 | 0.989 | 0.979 | 0.957 | 3.05 | 3.01 |
DT | 0.081 | 2.742 | 0.057 | 1.925 | −0.01 | −0.371 | 0.985 | 0.972 | 0.943 | 2.90 | 2.97 |
UBT | 0.055 | 2.110 | 0.042 | 1.593 | −0.02 | −0.587 | 0.991 | 0.984 | 0.965 | 2.68 | 2.61 |
DBT | 0.120 | 4.635 | 0.082 | 3.176 | 0.01 | 0.217 | 0.970 | 0.942 | 0.887 | 2.60 | 2.59 |
UN | 0.100 | 4.006 | 0.073 | 2.903 | −0.01 | −0.459 | 0.967 | 0.940 | 0.882 | 2.48 | 2.50 |
DN | 0.106 | 4.260 | 0.072 | 2.906 | 0.01 | 0.295 | 0.977 | 0.955 | 0.912 | 2.49 | 2.49 |
Location . | RMSE . | MAE . | ME . | WS . | r . | NSE . | Hmax (m) . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
(m) . | (%) . | (m) . | (%) . | (m) . | (%) . | Est. . | Obs. . | ||||
UBD | 0.033 | 0.605 | 0.022 | 0.399 | 0.002 | 0.033 | 0.999 | 0.998 | 0.997 | 5.31 | 5.40 |
DBD | 0.123 | 3.476 | 0.084 | 2.376 | 0.007 | 0.208 | 0.976 | 0.955 | 0.912 | 3.31 | 3.53 |
UKC | 0.122 | 3.574 | 0.087 | 2.537 | 0.003 | 0.097 | 0.974 | 0.951 | 0.904 | 3.25 | 3.41 |
DKC | 0.115 | 3.415 | 0.081 | 2.404 | 0.006 | 0.185 | 0.977 | 0.955 | 0.911 | 3.29 | 3.36 |
ULD | 0.104 | 3.252 | 0.074 | 2.328 | −0.003 | −0.091 | 0.979 | 0.958 | 0.918 | 3.11 | 3.19 |
DLD | 0.106 | 3.315 | 0.076 | 2.384 | −0.01 | −0.162 | 0.978 | 0.958 | 0.917 | 3.22 | 3.19 |
UT | 0.070 | 2.333 | 0.050 | 1.667 | −0.01 | −0.298 | 0.989 | 0.979 | 0.957 | 3.05 | 3.01 |
DT | 0.081 | 2.742 | 0.057 | 1.925 | −0.01 | −0.371 | 0.985 | 0.972 | 0.943 | 2.90 | 2.97 |
UBT | 0.055 | 2.110 | 0.042 | 1.593 | −0.02 | −0.587 | 0.991 | 0.984 | 0.965 | 2.68 | 2.61 |
DBT | 0.120 | 4.635 | 0.082 | 3.176 | 0.01 | 0.217 | 0.970 | 0.942 | 0.887 | 2.60 | 2.59 |
UN | 0.100 | 4.006 | 0.073 | 2.903 | −0.01 | −0.459 | 0.967 | 0.940 | 0.882 | 2.48 | 2.50 |
DN | 0.106 | 4.260 | 0.072 | 2.906 | 0.01 | 0.295 | 0.977 | 0.955 | 0.912 | 2.49 | 2.49 |
Radar chart for: (a) dimensional and (b) dimensionless statistical errors in the validation step of the LSTM model.
Radar chart for: (a) dimensional and (b) dimensionless statistical errors in the validation step of the LSTM model.
Time series of the water level at: (a) UBD and (b) UN for the validation step.
Like the training step, the dimensional statistical errors of the water level (including the RMSE, MAE, and ME) vary between −0.02 and 0.12 m (see Table 5). These errors are less than 4.6% of the observed magnitude of the water level at the locations of interest. The WS is close to the unity, while the values of the r coefficient between the estimated and observed water levels range from 0.94 to 0.99 (see Table 5), revealing that the LSTM model represents the variable trend of the observed water levels very well. The values of NSE change between 0.88 and 0.99, showing that the LSTM model reproduces well the observed water levels at all studied locations. At all 12 locations, computed values of the maximum water level are very close to the observations. Indeed, the discrepancy between maximum values of estimated and observed water levels varies between −0.22 and 0.07 m (see Table 5). These results are consistent with the results obtained in the training step of the LSTM model. This demonstrates that the appropriate values are achieved for the hyper-parameters in the LSTM model. Thus, the LSTM model can be used to evaluate the water level at different locations of interest in the BHHIS.
Discussion
As can be seen in Tables 3–5, the magnitude values of dimensional statistical errors of the water level in the period from 2000 to 2021 at 12 locations of interest in the BHHIS are equal to 0.24 and 0.16 m for the MLR and LSTM model, respectively. The r coefficient and WS are close to the unity for both methods. In addition, the value of NSE of the water level changes from 0.76 to 0.99 when the MLR method is applied, while its value varies between 0.88 and 0.99 if the LSTM model is used. The discrepancy between maximum values of estimated and observed water levels is more or less about 0.20 m for both methods. These results demonstrate that both MLR and LSTM models can reproduce observed water levels at all 12 locations of interest in the studied system.
Regarding detailed statistical errors of the water level from the MLR and LSTM models, it is also clearly observed that a considerable improvement in estimated water levels is obtained when using the LSTM model in comparison with the MLR. For example, a minimum value of 0.88 is achieved for the NSE when using the LSTM model, instead of 0.76 for the case of the MLR. In addition, higher values of WS and r coefficient at all twelve locations of interest within the BHHIS are also obtained if the LSTM model is used.
Truong et al. (2021) used the gradient tree boosting-based (GTB) model to investigate water levels in the dry seasons from 2000 to 2020 at upstream and downstream locations of five culverts named Kenh Cau, Luc Dien, Tranh, Ba Thuy, and Neo in the BHHIS. The results showed that the GTB model reproduced acceptably the measured water levels, with the values of the coefficient of determination ranging from 0.86 to 0.97 at the upstream and downstream locations of the five culverts mentioned above. However, it must be emphasized that the previous study focused only on water levels in the dry seasons when flow hydrodynamics and water elevations in the BHHIS are relatively stable. The present research showed that both MLR and LSTM models reproduced very well the water levels under various flow conditions in both dry and flood seasons in the long period from 1/1/2000 to 31/12/2021. Therefore, these results are strongly believed to support not only irrigation but also drainage purposes in the studied system.
Among 12 locations of interest, the LSTM model showed a remarkable discrepancy in the estimated water level at the upstream location of Neo culvert (denoted by UN) and the others (Figure 1). This can be explained by the different reasons following. Firstly, the water level at the UN is affected by different factors such as flow dynamics, water discharge, and operations of culverts and structures in both upstream and downstream regions. The impacts of these factors, however, are not taken into account in the calculation when using the LSTM model. Secondly, the division of water discharge at the bifurcation/confluence location (see Figure 1) between the Cuu An and Nam Ke Sat rivers can be also another reason for the discrepancy. Finally, the absence of accelerating climate changes and extreme events (e.g., floods, droughts), changes in water demands and water users as well as the physical system, which cannot be taken into account in the present study, may be an additional reason for the discrepancy between UN and the other locations.
SUMMARY AND CONCLUSIONS
This study presents a MLR and LSTM model that can be used for evaluating water levels in the irrigation and drainage system of interest named BHHIS, Vietnam. The main remarks and conclusions of the study are summarized as follows:
Using the water level data in the period from 1/1/2000 to 31/12/2021, a MLR was identified for 12 locations associated with the upstream and downstream locations of the six culverts inside the BHHIS. The results showed that the dimensional statistical errors of water level (consisting of RMSE, MAE, and ME) are only about 6% of the maximum water level monitoring at the locations of interest. The values of WS and r coefficients at all locations varied between 0.90 and 0.99, while the NSE ranged from 0.76 to 0.99.
When using the LSTM model, the appropriate values of the hyper-parameters of the model were determined using the water level data in the period from 1/1/2000 to 31/12/2015. The model was then validated using the measured water levels in the other period from 1/1/2016 to 31/12/2021. The LSTM model reproduced very well the measured water levels in both the training and validation steps. The dimensional statistical errors of water level were less than 4.6% of the maximum water level measured at the locations of interest. The r coefficient and WS were close to unity, while the NSE changed between 0.88 and 0.99.
The LSTM model proved to be a more suitable approach than the MLR for evaluating water levels at different locations of interest in the BHHIS. The LSTM model used only water levels as input data, which leads the model to be a useful tool for evaluating water levels at both upstream and downstream locations of culverts of interest. The latter will help in operating (opening and closing) the hydraulic structures and controls for both irrigation and drainage purposes. Another advantage of the LSTM model over hydraulic models is that the LSTM model can self-update with new data, whereas updating hydraulic models is significantly more cumbersome.
ACKNOWLEDGEMENTS
The authors would like to thank the Bac Hung Hai irrigation company for sharing the water elevation data used for different calculations in the present study. The authors also would like to express their thanks to the anonymous reviewers for their useful comments and suggestions with constructive criticism that have helped improve the clarity of the manuscript.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.