Abstract
This study presents a sediment rating curve (SRC), multiple regression (MR), and long short-term memory (LSTM) model for estimating daily suspended sediment concentration (SSC). The data of daily SSC at Yen Thuong and daily flow at five locations in the Ca River Basin, Vietnam are used to demonstrate multiple approaches. Using the daily flow and SSC data in the period from 2009 to 2019, appropriate coefficients in each method are identified carefully using five popular criteria. The results showed that SRC and MR approaches reproduced acceptably the observed values, with the values of RMSE, MAE, and ME of daily SSC being less than 5% of daily SSC magnitude observed at the station, while NSE ranges from 0.47 to 0.63 and r coefficient varies between 0.69 and 0.80. The LSTM model represented the observed values of daily SSC very well. The values of two dimensionless criteria are greater than 0.94 and its values of three-dimensional criteria are smaller than 2.0% of the observed magnitude of daily SSC in both training and validation steps. The LSTM model is found to be the best among the three investigated approaches. Then, the model is applied to estimate daily SSC values for the period from 1969 to 2008 and the year 2020.
HIGHLIGHTS
Multiple approaches were implemented for estimating daily suspended sediment concentration.
Sediment rating curve and multiple regression reproduced acceptably the observed values of daily SSC.
A very good reproduction of the observed values of daily SSC was obtained when using the LSTM model.
INTRODUCTION
Suspended sediment (SS), generally consisting of fine-grained sediment particles, is an inherent component and important variable in the aquatic environment of rivers. This is because SS can lead to water quality degradation due to the adsorption of different substances (Elskens et al. 2014) and affect riverbed evolution, river navigability, hydraulic structures, and ecological habitats (Walling 1974). The transport of SS inherently indicates complicated process because (i) flow dynamics vary, (ii) there are different sediment sources (e.g., sediments originating from terrestrial erosion in river basins, riverbeds, and riverbanks, as well as sediments re-mobilized from river basins), and (iii) non-linear relationships exist between related variables, geomorphological characteristics of river basins and sediment transport process (Winterwerp 2013; Rajaee & Jafari 2020). Calculation of suspended sediment concentration (SSC) associated with SS transport is essential for environmental protection by promoting water quality, identifying pollution sources, controlling erosion, preserving habitats as well and ensuring the safety and sustainability of infrastructure (Zounemat-Kermani et al. 2016). Moreover, the estimation of SSC is still one of the most difficult issues in hydraulic engineering and water resources management (Afan et al. 2016). Therefore, accurate estimation of SSC associated with SS considering the above sources and physical processes still remains a challenge, although much attention (e.g., Winterwerp 2013; Pham Van et al. 2016; Yilmaz et al. 2018) has been paid to the study and calculation of SSC at different time scales (daily or monthly).
Since the transport of SS is a complex physical process, SSC can be estimated using either physical-based or data-driven models with different levels of complexity and efficiency (e.g., Paniconi & Putti 2015; Pham Van et al. 2016; AlDahoul et al. 2021; Nourani & Behfar 2021; Wai et al. 2022). Physical-based models for sediment transport are complex and time-consuming, often limited in their ability to predict SSC accurately in river basins due to the non-linear nature of sediment transport (Winterwerp 2013; Paniconi & Putti 2015). These models rely on reliable data and require calibration of numerous parameters, making them challenging for practical applications (Winterwerp 2013; Pham Van et al. 2016) as well as resulting in restricted use in numerous river basins worldwide. Therefore, robust and automated methods should be considered as alternatives to address these limitations.
Regarding practical application aspects, where a reduction in the complexity of the problem under consideration is always expected rather than requiring an exact solution, a sediment rating curve (SRC) describing a relationship between water discharge (Q) and SSC is widely used to estimate daily SSC from given flow values (Harrington & Harrington 2013). As reported in many previous studies (e.g., Asselman 2000; Lohani et al. 2007; Kisi et al. 2008; Liu et al. 2013; Zounemat-Kermani et al. 2016; Malik et al. 2017; Yilmaz et al. 2018), SRC is successfully applied to calculate daily SSC in various river basins around the world. With respect to the Ca River Basin, previous studies (e.g., Phuong et al. 2019, 2023) also use the SRC for preliminary investigation of temporal variability of SSC at two locations along the river in dry, wet, and transitional (between dry and wet) seasons from August 2017 to July 2018. It should be noted, however, that the SSC data used to determine the SRC's coefficients are not yet fully characteristic of the variation in Q and SSC over a long period of time because samples were collected only 4–6 times per month in the wet season and 1–4 times per month in the dry season. The use of separate SRC for each season of a hydrological year is indeed a difficult issue, both in terms of use and practical applications.
Multiple regression (MR), which assumes daily SSC at the location of interest as a function of daily flow at related locations in linear or non-linear form, is also used to compute daily SSC in river basins. Aytek & Kisi (2008) used MR in the linear form to evaluate daily SSC at two stations on the Tongue River in Montana, USA. Rajaee et al. (2011) also used MR in the linear form to predict daily SSC at Wapello station in the Iowa River, USA. Ulke et al. (2009) applied MR in both linear and non-linear forms to predict daily SSC and missing data in the Gediz River, Turkey. Malik et al. (2017) simulated daily SSC at the Tekra location on the Pranhita River, India. Recently, Stoichev et al. (2019) conducted an assessment of the spatial distribution of SSC in Aveiro Lagoon, Portugal. In a related study, Wang et al. (2022) also employed the MR to evaluate SSC reduction in the Yellow River, China, considering the influence of climate change and anthropogenic activities. These examples suggest that MR in the linear or non-linear form can be used to evaluate daily SSC in the river basin of interest.
It is interesting to note that SRC and MR are unable to evaluate daily SSC under extreme conditions because they are basically linear or non-linear forms determined based on an assumption, i.e., data are stationary (Rajaee & Jafari 2020). These methods also have a limited ability to capture the noise in SSC data. Moreover, SRC and MR often differ from one location to another and from one river basin to another in terms of the coefficients of SRC and MR (Walling 1974; Ferguson 1987; Asselman 2000; Quilbé et al. 2006; Harrington & Harrington 2013). Furthermore, inaccuracies in the estimation of daily SSC are related to the statistical method used to fit the SRC and MR, as well as the scatter around the regression curve (Asselman 2000). Thus, considerable efforts are still needed to accurately quantify SRC and MR in order to identify an appropriate method for locations of interest in river basins and to gain a better understanding of SS transport associated with various flow conditions in space and time.
To overcome the non-stationarity of SSC data, artificial intelligence (AI) approaches have been applied widely to water resources predictions in general and SSC in particular over the last two decades because of their accurate prediction capability (Afan et al. 2016; Rajaee & Jafari 2020). Machine learning models, one between two main categories (statistical and machine learning models) of AI approaches, have been used extensively to address non-linear hydrological processes like the transport process of SS, with models less focusing on physical characteristics of sediment transport processes, but on using black box methods to establish optimal mathematical relationships between input datasets and output results. These models include artificial neural network, genetic algorithms, adaptive neuro-fuzzy inference system, support vector machine, classification and regression tree, long short-term memory (LSTM), bidirectional LSTM, convolutional neural network, recurrent neural networks, instance-based learning, artificial bee colony, teaching-learning based optimization, gene programming, gene expression programming, wavelet-ANN, wavelet neuro-fuzzy, reduced error pruning tree, etc. (Rajaee & Jafari 2020). Among these, the LSTM is selected for this study due to several compelling reasons. Firstly, the LSTM is a widely acclaimed data-driven model in the field of AI. Secondly, the LSTM excels at learning temporal dependencies, making it exceptionally well-suited for handling time series data and sequential patterns, as demonstrated in recent studies (AlDahoul et al. 2021; Kim & Sandhu 2022; Pham Van & Le 2022; Latif et al. 2023). Finally, the LSTM offers these advantages without introducing excessive complexity when compared to other models like bidirectional LSTM.
The LSTM model can simulate the non-linear and complicated transport processes of SS with high accuracy (AlDahoul et al. 2021; Huang et al. 2021; Nourani & Behfar 2021; Latif et al. 2023). For instance, AlDahoul et al. (2021) used the LSTM model to forecast SSC in Malaysia's Johor River utilizing only water discharge as input datasets. Huang et al. (2021) applied the LSTM model to improve the predictive accuracy of SSC in the Shihmen Reservoir, China. Nourani & Behfar (2021) employed the LSTM model to study the runoff-sediment process from three gauging stations in the Missouri and upper Mississippi regions at both daily and monthly scales, USA. Li & Vanrolleghem (2022) proposed an influence generator based on the LSTM model to generate daily SSC in Quebec City, Canada. Kim & Sandhu (2022) estimated SSC at various locations within the Sacramento-San Joaquin Delta of California, USA based on the LSTM model. Latif et al. (2023) compared the LSTM model with several distinct AI-based models (e.g., ANN, SVM) in predicting SSC in the Johor river, Malaysia. All these demonstrate that the LSTM model can be used to evaluate daily SSC in the studied river basin. Moreover, transferring the LSTM model from one river basin to another is still a challenge to enrich and inform the model's capabilities.
The main objective of the present study is to investigate SRC, MR under both linear and non-linear forms, and LSTM models that can be used to estimate daily SSC at the Yen Thuong location in the Ca River Basin. Especially, the study aims to (i) determine an appropriate approach among the three studied here and (ii) evaluate SSC in the missing period of 1969–2008 and in the year 2020 at Yen Thuong. Hydrological data (i.e., Q at four upstream locations and at Yen Thuong itself, and SSC with daily scale) in the period from 1969 to 2020 are used to determine the coefficients in each approach and for related calculation purposes. Five common performance criteria such as Nash–Sutcliffe efficiency (NSE), Pearson's correlation coefficient (r), root mean square error (RMSE), mean error (ME), and mean absolute error (MAE) are implemented to quantitatively assess the performance of each approach in comparison with the observed data.
CA RIVER BASIN AND COLLECTED DATA
The Ca River Basin
Map of the river basin, together with meteorological and hydrological stations.
In the Ca River Basin, the rainy season occurs from May to October in the upstream region and from June to November in the downstream area. Rainfall in a year often has two peaks, with the greatest value occurring in late September or early October, while the second rainfall peak appears in late May or early June (Pham Van et al. 2022). Monthly precipitation is usually the highest in May and June, while decreasing slightly in July and August. The total rainfall in May and June reaches up to 20% of the annual rainfall. Total precipitation in September and October is high, reaching from 40 to 50% of the annual rainfall. In the dry season, total seasonal precipitation is very low and is only about 15–20% of annual precipitation. The lowest precipitation is usually recorded in February and March, with the total precipitation of these two months being only about 2% of the annual precipitation amount. The meteo-hydrological characteristics in the river basin are also affected by climate change.
Various reservoirs (e.g., Ban Ve, Ban Mong, Khe Bo, and Ngan Truoi, etc.) have been built to address difficult problems of water supply, flood control, inundation, drought, and environment in the Ca River Basin. These multiple-purpose reservoirs have significant impacts on sediment transport and geochemical properties between upstream and downstream of the reservoirs. Measured data of SS at several locations along the Ca River and its tributaries show significant discrepancies in SSC between flood and dry seasons (Phuong et al. 2019; Pham Van et al. 2022). For instance, at the Yen Thuong location (see Figure 1), daily SSC varies in a wide range from 0.26 to 6,560 mg/L. The assessment of SSC will help to quantitatively evaluate the change and trend of sediment in the river basin.
Data collection
Collected data of water discharge and suspended sediment concentration at different locations in the Ca river
No. . | Name . | Locations . | River . | Variable . | Data collected period . | Statistical properties . | Unit . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Longitude (o) . | Latitude (o) . | Range . | Mean . | Std. . | Skewness . | Kurtosis . | Correlation coefficient, r . | . | |||||
1 | Muong Xen | 104°07′00″ | 19°24′00″ | Nam Mo | Q | 1969–2020 | 7.37–1,840 | 77.44 | 95.38 | 6.26 | 73.16 | 0.524 | m3/s |
2 | Quy Chau | 105°10′00″ | 19°34′00″ | Hieu | Q | 1969–2020 | 10.6–3,370 | 74.27 | 130.45 | 10.87 | 190.31 | 0.498 | |
3 | Nghia Khanh | 105°12′00″ | 19°26′00″ | Hieu | Q | 1969–2020 | 8.01–4,290 | 115.29 | 227.89 | 8.38 | 97.80 | 0.605 | |
4 | Dua | 105°02′00″ | 18°59′00″ | Ca | Q | 1969–2020 | 41.4–5,570 | 427.14 | 569.80 | 4.14 | 25.30 | 0.735 | |
5 | Yen Thuong | 105°23′00″ | 18°41′00″ | Ca | Q | 1969–2020 | 72.9–5,890 | 479.57 | 636.36 | 4.15 | 24.12 | 0.761 | |
SSC | 2009–2019 | 0.26–6,560 | 91.16 | 270.90 | 9.14 | 136.51 | 1.0 | mg/L |
No. . | Name . | Locations . | River . | Variable . | Data collected period . | Statistical properties . | Unit . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Longitude (o) . | Latitude (o) . | Range . | Mean . | Std. . | Skewness . | Kurtosis . | Correlation coefficient, r . | . | |||||
1 | Muong Xen | 104°07′00″ | 19°24′00″ | Nam Mo | Q | 1969–2020 | 7.37–1,840 | 77.44 | 95.38 | 6.26 | 73.16 | 0.524 | m3/s |
2 | Quy Chau | 105°10′00″ | 19°34′00″ | Hieu | Q | 1969–2020 | 10.6–3,370 | 74.27 | 130.45 | 10.87 | 190.31 | 0.498 | |
3 | Nghia Khanh | 105°12′00″ | 19°26′00″ | Hieu | Q | 1969–2020 | 8.01–4,290 | 115.29 | 227.89 | 8.38 | 97.80 | 0.605 | |
4 | Dua | 105°02′00″ | 18°59′00″ | Ca | Q | 1969–2020 | 41.4–5,570 | 427.14 | 569.80 | 4.14 | 25.30 | 0.735 | |
5 | Yen Thuong | 105°23′00″ | 18°41′00″ | Ca | Q | 1969–2020 | 72.9–5,890 | 479.57 | 636.36 | 4.15 | 24.12 | 0.761 | |
SSC | 2009–2019 | 0.26–6,560 | 91.16 | 270.90 | 9.14 | 136.51 | 1.0 | mg/L |
Probability density of water discharge, at (a) Muong Xen, (b) Quy Chau, (c) Nghia Khanh, (d) Dua, (e) Yen Thuong and of SSC at (f) Yen Thuong.
Probability density of water discharge, at (a) Muong Xen, (b) Quy Chau, (c) Nghia Khanh, (d) Dua, (e) Yen Thuong and of SSC at (f) Yen Thuong.
METHODS
SRC with single expression for both rising and falling climbs
As shown in Equation (1), a non-linear regression of the SRC is depicted if the exponent coefficient is different from unity. Otherwise, a linear regression of the SRC is obtained. In this study, different values for the exponent coefficient b are tested to determine an appropriate value for the location of interest when the same available datasets for flow and SSC are used. In other words, both linear and non-linear forms for the power function of the single SRC are examined.
SRC with different expressions for rising and falling climbs
The relationship between Q and SSC has an apparent and significant scatter, showing that there is a hysteresis of SSC with different values for the rising and falling climbs of the discharge-sediment relationship (Asselman 2000; Harrington & Harrington 2013). In this case, different coefficients are used for the rising and falling climbs of the SRC. In other words, the power function under the form of Equation (1) is also applied with different values for the rating and exponent coefficients of the rising and falling climbs. Detailed results using SRC with different expressions for rising and falling climbs at the location of interest are presented in Section 4.
MR between SSC and discharge
The LSTM model
Daily SSC at Yen Thuong can be also computed by using the LSTM model which is a well-known alternative network of recurrent neural networks. The LSTM model (Hochreiter & Schmidhuber 1997) uses a memory cell to represent essentially the hidden layer and to overcome vanishing and exploding gradient problems. The architecture of the model mainly consists of input, forget, and output gates that allow respectively for updating the cell state, resetting the cell state without growing indefinitely, and deciding how to update the values of the hidden unit. Further information on the LSTM model can be found in relevant references (e.g., AlDahoul et al. 2021; Huang et al. 2021; Kim & Sandhu 2022; Li & Vanrolleghem 2022; Pham Van & Le 2022; Latif et al. 2023).
In terms of hyper-parameters, there are three dominant hyper-parameters, i.e., the number of epochs, the number of hidden units, and the learning rate in the LSTM model. These hyper-parameters are determined by the trial-and-error method and based on five performance criteria. In detail, hyper-parameters are first calibrated using 70% of the datasets in the training step. The model is then validated using 30% of the remaining datasets. Two types of input datasets (consisting of water discharge at: (i) Yen Thuong only and (ii) five stations named Muong Xen, Quy Chau, Nghia Khanh, Dua, and Yen Thuong) are examined to evaluate the effects of input datasets on output results when the LSTM model is applied.
Evaluation criteria




Dimensional and dimensionless errors are used to evaluate the quality performance of each method. Three-dimensional criteria (i.e., RMSE, ME, and MAE) are valuable indicators since they indicate the error in the units, which is helpful in analyzing the results. Two-dimensionless criteria (i.e., NSE and r), which determine the relative magnitude of residual variance (or noise) compared to the variance of observations, are used to provide comprehensive information on comparisons between observed and estimated values (Pham Van & Le 2022).
RESULTS
Results of SRC with single expression
Estimated values of coefficients when using different SRC with single expression for both rising and falling climbs
No. . | Sediment rating curve . | Coefficients . | Abbreviation . | |
---|---|---|---|---|
a . | b . | . | ||
1 | Linear SRC | 0.32 | 1.0 | SRC1 |
2 | Non-linear SRC | 0.0022 | 1.605 | SRC2 |
3 | Non-linear SRC | 8.40 × 10−5 | 2.0 | SRC3 |
No. . | Sediment rating curve . | Coefficients . | Abbreviation . | |
---|---|---|---|---|
a . | b . | . | ||
1 | Linear SRC | 0.32 | 1.0 | SRC1 |
2 | Non-linear SRC | 0.0022 | 1.605 | SRC2 |
3 | Non-linear SRC | 8.40 × 10−5 | 2.0 | SRC3 |
Estimated errors of SSC when using SRC with single expression for both rising and falling climbs
Abbreviation . | RMSE . | MAE . | ME . | r . | NSE . | SSC magnitude (mg/L) . | ||||
---|---|---|---|---|---|---|---|---|---|---|
(mg/L) . | (%) . | (mg/L) . | (%) . | (mg/L) . | (%) . | Est. . | Obs. . | |||
SRC1 | 187.07 | 2.85 | 96.72 | 1.47 | −64.21 | −0.98 | 0.761 | 0.523 | 1,908 | 6,560 |
SRC2 | 177.36 | 2.70 | 50.53 | 0.77 | 17.98 | 0.27 | 0.759 | 0.571 | 2,500 | |
SRC3 | 186.40 | 2.84 | 57.90 | 0.88 | 37.54 | 0.57 | 0.737 | 0.531 | 2,914 |
Abbreviation . | RMSE . | MAE . | ME . | r . | NSE . | SSC magnitude (mg/L) . | ||||
---|---|---|---|---|---|---|---|---|---|---|
(mg/L) . | (%) . | (mg/L) . | (%) . | (mg/L) . | (%) . | Est. . | Obs. . | |||
SRC1 | 187.07 | 2.85 | 96.72 | 1.47 | −64.21 | −0.98 | 0.761 | 0.523 | 1,908 | 6,560 |
SRC2 | 177.36 | 2.70 | 50.53 | 0.77 | 17.98 | 0.27 | 0.759 | 0.571 | 2,500 | |
SRC3 | 186.40 | 2.84 | 57.90 | 0.88 | 37.54 | 0.57 | 0.737 | 0.531 | 2,914 |
Estimated results of daily SSC when using linear and non-linear regressions for SRC with single expression for both rising and falling climbs.
Estimated results of daily SSC when using linear and non-linear regressions for SRC with single expression for both rising and falling climbs.
As shown in Table 3, SRC2 outperformed all other sediment rating curves in four of five criteria (namely RMSE, MAE, ME, and NSE). Among the three selected pairs of coefficients a and b, SRC1 stands out as the best performer according to the r criterion. However, it's worth mentioning that the difference in the correlation coefficient r between SRC1 and SRC2 is quite small and insignificant, with r being 0.761 for SRC1 and 0.759 for SRC2. When considering the magnitude of SSC, the estimated value of SSC in SRC3 (2,914 mg/L) is the closest to the observed value (6,560 mg/L). This inconsistency in results may arise from various factors, including data distribution and the use of varying exponent coefficient b.
Both linear and non-linear SRC are investigated, resulting in a slight improvement of estimated values of daily SSC if the non-linear SRC is applied. On the other hand, the plot of observed values of Q versus SSC (in the period from 2009 to 2019) illustrates apparent and significant scatters (see Figure 3), revealing that there is a hysteresis of the SSC with different values for the rising and falling climbs of the hydrographs. This complicated hydrograph of SSC versus Q consists of the results obtained in the dry and rainy seasons of 2017 in the previous studies (Phuong et al. 2019, 2023). Different values of coefficients a and b should be tested for the rising and falling climbs of the SRC.
Results of SRC with different expressions
In order to identify coefficients a and b for rising and fall climbs of the SRC, the time series of collected data of daily SSC in the period from 2009 to 2019 are firstly divided into two datasets separately, i.e., one dataset for the rising climb and the other one for the falling climb. The latter is performed using the differences and approximate derivatives function in MATLAB. Then, coefficients a and b for either rising or falling climb are determined similarly to the single SRC. Finally, appropriate values of coefficients a and b are found for each climb of the hydrography as summarized in Table 4, depicting the exponent coefficient b is more or less the same for falling and rising climbs while the significant discrepancy of the rating coefficient a between climbs is obtained. In addition, the exponent coefficient is also similar when using single or multiple sediment rating curves (see Tables 3 and 4).
Estimated values of coefficients when using different coefficients for falling and rising climbs of the SRC
No. . | State . | Sediment rating curve . | Coefficients . | Abbreviation . | |
---|---|---|---|---|---|
a . | b . | ||||
1 | Falling climb | Non-linear | 0.0032 | 1.59 | SRC (fall) |
2 | Rising climb | Non-linear | 0.0018 | 1.60 | SRC (rise) |
No. . | State . | Sediment rating curve . | Coefficients . | Abbreviation . | |
---|---|---|---|---|---|
a . | b . | ||||
1 | Falling climb | Non-linear | 0.0032 | 1.59 | SRC (fall) |
2 | Rising climb | Non-linear | 0.0018 | 1.60 | SRC (rise) |
Error estimates of daily SSC when using different coefficients for falling and rising climbs of the SRC
Abbreviation . | RMSE . | MAE . | ME . | r . | NSE . | SSC magnitude (mg/L) . | ||||
---|---|---|---|---|---|---|---|---|---|---|
(mg/L) . | (%) . | (mg/L) . | (%) . | (mg/L) . | (%) . | Est. . | Obs. . | |||
SRC (fall) | 210.46 | 3.21 | 64.93 | 0.99 | 17.15 | 0.26 | 0.793 | 0.625 | 2,714 | 6,560 |
SRC (rise) | 105.88 | 4.64 | 31.55 | 1.38 | 10.80 | 0.47 | 0.759 | 0.560 | 1,871 | 2,280 |
Abbreviation . | RMSE . | MAE . | ME . | r . | NSE . | SSC magnitude (mg/L) . | ||||
---|---|---|---|---|---|---|---|---|---|---|
(mg/L) . | (%) . | (mg/L) . | (%) . | (mg/L) . | (%) . | Est. . | Obs. . | |||
SRC (fall) | 210.46 | 3.21 | 64.93 | 0.99 | 17.15 | 0.26 | 0.793 | 0.625 | 2,714 | 6,560 |
SRC (rise) | 105.88 | 4.64 | 31.55 | 1.38 | 10.80 | 0.47 | 0.759 | 0.560 | 1,871 | 2,280 |
Estimated results of daily SSC when using different coefficients for (a) falling and (b) rising climbs of the SRC.
Estimated results of daily SSC when using different coefficients for (a) falling and (b) rising climbs of the SRC.
Results of MR
Estimated values of coefficients when using MR of water discharge for estimating SSC at Yen Thuong
β . | αo . | α1 . | α2 . | α3 . | α4 . | α5 . | Abbr. . |
---|---|---|---|---|---|---|---|
Muong Xen . | Quy Chau . | Nghia Khanh . | Dua . | Yen Thuong . | |||
1.0 | −70.98 | 27.15 × 102 | −15.39 × 102 | 14.36 × 102 | 4.05 × 102 | 24.75 × 102 | MRQ1 |
1.6 | 11.17 | 5.87 × 10−3 | 0.12 × 10−3 | 0.015 × 10−3 | 0.57 × 10−3 | 1.73 × 10−3 | MRQ2 |
2.0 | 35.48 | 3.24 × 10−4 | 0.25 × 10−4 | −0.107 × 10−4 | 0.25 × 10−4 | 0.60 × 10−4 | MRQ3 |
β . | αo . | α1 . | α2 . | α3 . | α4 . | α5 . | Abbr. . |
---|---|---|---|---|---|---|---|
Muong Xen . | Quy Chau . | Nghia Khanh . | Dua . | Yen Thuong . | |||
1.0 | −70.98 | 27.15 × 102 | −15.39 × 102 | 14.36 × 102 | 4.05 × 102 | 24.75 × 102 | MRQ1 |
1.6 | 11.17 | 5.87 × 10−3 | 0.12 × 10−3 | 0.015 × 10−3 | 0.57 × 10−3 | 1.73 × 10−3 | MRQ2 |
2.0 | 35.48 | 3.24 × 10−4 | 0.25 × 10−4 | −0.107 × 10−4 | 0.25 × 10−4 | 0.60 × 10−4 | MRQ3 |
Estimate errors of daily SSC when using the MR of Q
Abbr. . | RMSE . | MAE . | ME . | r . | NSE . | SSC magnitude (mg/L) . | ||||
---|---|---|---|---|---|---|---|---|---|---|
(mg/L) . | (%) . | (mg/L) . | (%) . | (mg/L) . | (%) . | Est. . | Obs. . | |||
MRQ1 | 172.53 | 2.63 | 57.29 | 0.87 | −5.30 | −0.08 | 0.771 | 0.594 | 1,737 | 6,560 |
MRQ2 | 171.68 | 2.62 | 52.80 | 0.80 | 0.00 | 0.00 | 0.773 | 0.598 | 2,314 | |
MRQ3 | 196.83 | 3.00 | 80.32 | 1.22 | 0.00 | 0.00 | 0.687 | 0.472 | 3,287 |
Abbr. . | RMSE . | MAE . | ME . | r . | NSE . | SSC magnitude (mg/L) . | ||||
---|---|---|---|---|---|---|---|---|---|---|
(mg/L) . | (%) . | (mg/L) . | (%) . | (mg/L) . | (%) . | Est. . | Obs. . | |||
MRQ1 | 172.53 | 2.63 | 57.29 | 0.87 | −5.30 | −0.08 | 0.771 | 0.594 | 1,737 | 6,560 |
MRQ2 | 171.68 | 2.62 | 52.80 | 0.80 | 0.00 | 0.00 | 0.773 | 0.598 | 2,314 | |
MRQ3 | 196.83 | 3.00 | 80.32 | 1.22 | 0.00 | 0.00 | 0.687 | 0.472 | 3,287 |
Time series of daily SSC when using the MR, with (a) β = 1.0, (b) β = 1.6, and (c) β = 2.0.
Time series of daily SSC when using the MR, with (a) β = 1.0, (b) β = 1.6, and (c) β = 2.0.
It can be observed that the MR reproduces similar results of daily SSC when the exponent coefficient β = 1.0 or β = 1.6 is used (Table 7). The values of three-dimensional errors vary in a range from 0 to 172.5 mg/L (approximated from 0 to 2.6% of the observed magnitude of SSC at Yen Thuong). The value of NSE is about 0.60, while the coefficient r equals 0.77. In the case β = 2.0, larger values are obtained for three-dimensional errors and smaller values are obtained for two-dimensionless errors (Table 7). In addition, the value β = 1.6 is found to be the best one among different tested values.
When delving into the estimated results in depth, it becomes evident that MRQ2 excelled in all five criteria (i.e., NSE, r, RMSE, MAE, and ME) among the three multiple regressions under investigation (Table 7). However, when assessing the magnitude of SSC, the estimated value in MRQ3 (3,287 mg/L) closely approximates the observed value (6,560 mg/L). Once again, this inconsistency in results may be attributed to variations in data distribution and the use of different exponent coefficient b.
Observed data (SSCObs.) versus estimated values (SSCEst.) of SSC when using the MR, with (a) β = 1.0, (b) β = 1.6, and (c) β = 2.0.
Observed data (SSCObs.) versus estimated values (SSCEst.) of SSC when using the MR, with (a) β = 1.0, (b) β = 1.6, and (c) β = 2.0.
Results of the LSTM
In terms of the LSTM model, the hyper-parameters named learning rate, number of hidden units, and number of epochs are determined using the available observed data of flow and SSC in the period from 1/1/2009 to 31/12/2019. In detail, 70% of datasets (corresponding to the period from 1/1/2009 to 12/9/2016) were used for the training step, while the remaining 30% of datasets (from 13/9/2016 to 31/12/2019) were applied for the validation phase as in many previous studies (e.g., Klemes 1986; Kisi et al. 2012; Buyukyildiz & Kumcu 2017; Choubin et al. 2018; Khosravi et al. 2018; Pham Van & Le 2022; Pham Van & Nguyen-Van 2022). Two kinds of input data are considered when using the LSTM model. The first one uses only water discharge at the Yen Thuong location as the input data, while the second one utilizes the flow datasets at whole five collected locations. Similar to SRC and MR, appropriate values of hyper-parameters are also determined by trial-and-error method using five common performance criteria, resulting in learning rate = 0.001, number of hidden units = 128, and number of epochs = 350.
Error estimates of daily SSC when using water discharge at Yen Thuong as input data in the LSTM model
Step . | RMSE . | MAE . | ME . | r . | NSE . | SSC magnitude (mg/L) . | Abbr. . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
(mg/L) . | (%) . | (mg/L) . | (%) . | (mg/L) . | (%) . | Est. . | Obs. . | . | |||
Training | 61.74 | 0.94 | 29.89 | 0.46 | −1.80 | −0.03 | 0.978 | 0.956 | 6,265 | 6,560 | LSTM (T,1) |
Validation | 50.00 | 1.92 | 28.06 | 1.08 | 3.88 | 0.15 | 0.970 | 0.940 | 2,636 | 2,610 | LSTM (V,1) |
Step . | RMSE . | MAE . | ME . | r . | NSE . | SSC magnitude (mg/L) . | Abbr. . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
(mg/L) . | (%) . | (mg/L) . | (%) . | (mg/L) . | (%) . | Est. . | Obs. . | . | |||
Training | 61.74 | 0.94 | 29.89 | 0.46 | −1.80 | −0.03 | 0.978 | 0.956 | 6,265 | 6,560 | LSTM (T,1) |
Validation | 50.00 | 1.92 | 28.06 | 1.08 | 3.88 | 0.15 | 0.970 | 0.940 | 2,636 | 2,610 | LSTM (V,1) |
Time series of SSC, for (a) training and (b) validation steps when using only Q at Yen Thuong as input datasets in the LSTM model.
Time series of SSC, for (a) training and (b) validation steps when using only Q at Yen Thuong as input datasets in the LSTM model.
Observed data (SSCObs.) versus estimated values (SSCEst.) of daily SSC for (a) training and (b) validation steps when using only Q at Yen Thuong as input data in the LSTM model.
Observed data (SSCObs.) versus estimated values (SSCEst.) of daily SSC for (a) training and (b) validation steps when using only Q at Yen Thuong as input data in the LSTM model.
Error estimates of daily SSC when using water discharge at five stations as input data in the LSTM model
Step . | RMSE . | MAE . | ME . | r . | NSE . | SSC magnitude (mg/L) . | Abbr. . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
(mg/L) . | (%) . | (mg/L) . | (%) . | (mg/L) . | (%) . | Est. . | Obs. . | ||||
Training | 19.38 | 0.30 | 11.58 | 0.18 | −0.84 | −0.01 | 0.998 | 0.996 | 6,561 | 6,560 | LSTM (T,5) |
Validation | 16.02 | 0.61 | 11.15 | 0.43 | 1.45 | 0.06 | 0.997 | 0.994 | 2,610 | 2,610 | LSTM (V,5) |
Step . | RMSE . | MAE . | ME . | r . | NSE . | SSC magnitude (mg/L) . | Abbr. . | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
(mg/L) . | (%) . | (mg/L) . | (%) . | (mg/L) . | (%) . | Est. . | Obs. . | ||||
Training | 19.38 | 0.30 | 11.58 | 0.18 | −0.84 | −0.01 | 0.998 | 0.996 | 6,561 | 6,560 | LSTM (T,5) |
Validation | 16.02 | 0.61 | 11.15 | 0.43 | 1.45 | 0.06 | 0.997 | 0.994 | 2,610 | 2,610 | LSTM (V,5) |
Time series of daily SSC, for (a) training and (b) validation steps when using flow at five stations as input datasets in the LSTM model.
Time series of daily SSC, for (a) training and (b) validation steps when using flow at five stations as input datasets in the LSTM model.
Observed data (SSCObs.) versus estimated values (SSCEst.) of daily SSC for (a) training and (b) validation steps when using Q at five stations as input data in the LSTM model.
Observed data (SSCObs.) versus estimated values (SSCEst.) of daily SSC for (a) training and (b) validation steps when using Q at five stations as input data in the LSTM model.
Radar plots for (a) dimensional and (b) dimensionless performance criteria of daily SSC when applying different methods.
Radar plots for (a) dimensional and (b) dimensionless performance criteria of daily SSC when applying different methods.
APPLICATION AND DISCUSSION
Application of the LSTM model to estimate daily SSC
Statistical properties of Q and SSC in different periods
Approach . | Range . | Mean . | Standard deviation . | Covariance . | Skewness . | Kurtosis . | Period . |
---|---|---|---|---|---|---|---|
Water discharge (m3/s) | |||||||
Obs. | 72.9–5,890 | 479.57 | 636.36 | 404,959.12 | 4.15 | 24.12 | 2009–2019 |
42.7–9,030 | 529.48 | 694.19 | 481,901.41 | 3.90 | 24.40 | 1969–2008, 2020 | |
Suspended sediment concentration (mg/L) | |||||||
Obs. | 0.26–6,560 | 91.16 | 270.90 | 73,385.36 | 9.14 | 136.51 | 2009–2019 |
LSTM | 0.29–9,228 | 238.95 | 581.02 | 337,585.04 | 6.47 | 60.38 | 1969–2008, 2020 |
Approach . | Range . | Mean . | Standard deviation . | Covariance . | Skewness . | Kurtosis . | Period . |
---|---|---|---|---|---|---|---|
Water discharge (m3/s) | |||||||
Obs. | 72.9–5,890 | 479.57 | 636.36 | 404,959.12 | 4.15 | 24.12 | 2009–2019 |
42.7–9,030 | 529.48 | 694.19 | 481,901.41 | 3.90 | 24.40 | 1969–2008, 2020 | |
Suspended sediment concentration (mg/L) | |||||||
Obs. | 0.26–6,560 | 91.16 | 270.90 | 73,385.36 | 9.14 | 136.51 | 2009–2019 |
LSTM | 0.29–9,228 | 238.95 | 581.02 | 337,585.04 | 6.47 | 60.38 | 1969–2008, 2020 |
Daily time series of (a) water discharge and (b) SSC using the LSTM model at the location of interest in the period from 1969 to 2020.
Daily time series of (a) water discharge and (b) SSC using the LSTM model at the location of interest in the period from 1969 to 2020.
Discussion
Impact of rise and falling direction of SRC
As mentioned previously (see Figure 3), the plot of Q versus SSC at the location of interest shows clearly apparent and significant scatters, even when collected data are divided into two sub-datasets corresponding to the falling and rising climbs. To adopt that issue, different values of rating and exponent coefficients are investigated for each climb, showing that using a separate SRC for the falling and rising climbs represents better-observed values of SSC than those obtained from a single SRC. For instance, instead of adopting a value of 0.571 for NSE when using the single SRC (denoted SRC1 in Table 3), the coefficient NSE = 0.625 is achieved in the falling climb. This result is consistent with the results reported in different previous studies (e.g., Duan et al. 2013; Phuong et al. 2023). In addition, a slight improvement in estimating daily SSC is also obtained when using SRC with different expressions for falling and rising climbs of the discharge-sediment hydrograph. As shown in Figure 4, the datasets for each climb inherently present significant scatters. The latter results from the non-linear variation of the flow as well as from excluding related impact factors (e.g., bed evolution, sediment transport processes, the interaction between suspended and bedload sediments) in calculations when the SRC was applied.
Previous studies (e.g., Phuong et al. 2019) investigated the impact of climate change and reservoirs on the SSC at Dua and Yen Thuong locations, revealing that different sediment rating curves are carried out for dry and rainy seasons as well as for transition time between dry and rainy seasons based on the collected data in the period from August 2017 to July 2018. Again, the abovementioned data were collected 4–6 times per month in the rainy season and 1–4 times per month in the dry season. Such collected data, however, are not yet fully characteristic of the variation in Q and SSC over a long period of time. On the contrary, large values of NSE and r coefficients are obtained, especially when using the LSTM model (e.g., NSE and r coefficients are greater than 0.94 for both the training and validation steps) in comparison with a value of 0.63 for the determination coefficient reported by Phuong et al. (2023). This means that a significant improvement in estimated values of daily SSC was obtained for the studied location. Thus, the present results of daily SSC strongly believed to be useful for investigating temporal variations of SSC not only in daily but also yearly scales.
Impact of input data in LSTM model
Two kinds of input data of the LSTM model were used in calculations, showing that the model simulates much better the observed values of daily SSC at the location of interest when flow datasets at whole five locations are considered in input data. It is not a surprise that the more input datasets, the more accurate the model. In addition, the LSTM model also represented observed values of daily SSC much better than those obtained from the single or multiple SRC when using the same input datasets (i.e., time series of water discharge at Yen Thuong location only). In the case of using water discharge at all five locations, the LSTM model also reproduced observations of SSC better than those achieved from the SRC and MR. These suggest that the LSTM model is more suitable than the SRC and MR at the location of interest. The use of the LSTM model is also flexible, in terms of the input datasets, which makes the model to be applicable tool for other locations in river basins as well as for practical applications.
CONCLUSION
Three approaches named SRC, MR under linear and non-linear forms, and LSTM model are implemented to evaluate daily SSC at Yen Thuong location in the Ca River Basin, Vietnam. The dominant remarks of the present study can be summarized as:
Using the daily data of Q and SSC in the period from 1/1/2009 to 31/12/2019, both linear and non-linear forms of the single SRC were examined, resulting in the best estimated-observed data comparisons found to the power function with the values of 0.0022 and 1.605 for the rating and exponent coefficients. The values of the three popular performance errors (i.e., MAE, ME, and RMSE) were smaller than 3% of the observed magnitude of the daily SSC. Indeed, the values of 0.76 and 0.57 were also obtained for the r and NSE coefficients.
Different forms of the SRC were investigated separately for rising and falling climbs of the discharge-sediment hydrography. An improvement of estimated values of daily SSC was obtained, especially for the falling climbs as r = 0.80, NSE = 0.63, and the values of three-dimensional criteria were less than 5% of the observed SSC magnitude at the gauging station.
In terms of the MR, daily SSC at the location of interest was found to have a close relationship with the water discharge at five locations under the linear (the exponent coefficient equals unity) or power functions (the exponent coefficient equals 1.60). For both cases, the values of dimensional criteria are less than 3% of the observed magnitude of the daily SSC, while NSE and r were equal to 0.60 and 0.77, respectively.
If the LSTM model was applied, a very good reproduction of the observed values of daily SSC was obtained. The values of ME, MAE, and RMSE were less than 2% of the observed magnitude of daily SSC. The values of r and NSE varied between 0.94 and 0.99 when datasets of the flow at either one or five related locations were used as input in the LSTM model.
Among the three investigated methods, the LSTM model is found to be a more suitable approach for evaluating daily SSC from available datasets of the flow at the studied location. This model uses only datasets of the flow for input, which makes the model to be a useful tool for calculating daily SSC in river basins.
ACKNOWLEDGEMENTS
The authors would like to thank the North Central Regional Hydro-meteorological Center for sharing data used for different calculations in the present study. The authors would like to thank the editor and five anonymous reviewers for their fruitful suggestions and valuable comments.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.