The prediction of saltwater intrusion in estuaries plays an important role, supporting decision-makers or farmers in building strategies, and also, policies made for agricultural development and water resource management. The objective of this study is to develop machine learning models, namely gated recurrent unit (GRU), GRU-GWO (grey wolf optimiser), and GRU-SFO (sailfish optimiser algorithm) to predict saltwater intrusion for 1, 7, 15, and 30 days ahead in the Mekong estuary of Vietnam. Several statistical indices, namely root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2), were applied to evaluate the accuracy of the saltwater intrusion model. The results showed that the GWO and SFO optimisation algorithms successfully improved the accuracy of the GRU model to predict saltwater intrusion. For a one day forecast, the R2 value of the proposed models ranged from 0.89 to 0.91, for a seven day forecast, it ranged from 0.81 to 0.85, for a 15-day forecast, from 0.67 to 0.76, and for a 30-day forecast, from 0.52 to 0.55. The results indicated the ability of the GRU model to predict saltwater intrusion.

  • Saltwater salinity forecasting was done using hybrid machine learning.

  • Model performance was evaluated using RMSE, R2, and MAE.

  • GRU, GRU-GWO, and GRU-SFO achieved high accuracy in saltwater salinity forecasting.

River water resources play an important role in human consumption, urbanisation, industrial, and agricultural development, and provide habitats for aquatic species in many regions of the world (Roy & Sharan 2016; Jung et al. 2022). Reduced water quality in rivers causes significant damage to the environment and human health (Melesse et al. 2020). Therefore, monitoring and evaluating the quality of water in rivers is very necessary to build appropriate strategies to manage water resources because more than a billion people have only limited access to drinking water resources. This is why building a power model is very important to predict the quality of water resources in rivers.
Figure 1

Location of the study area.

Figure 1

Location of the study area.

Close modal
Figure 2

Database used to predict saltwater intrusion.

Figure 2

Database used to predict saltwater intrusion.

Close modal

Estuaries are transitional systems that adjust the volume of fresh water entering the sea and salt water from the sea entering the estuaries, merging into the river flow at zero salinity (Saccotelli et al. 2024). Estuary systems are considered as the environment that presents a multitude of important ecosystem services, for example, cultural services, raw materials, and fish resources. In environmental threats, saltwater intrusion is considered one of the most important threats (Alizadeh et al. 2018; Keyes et al. 2021). This threat increases the salt content in water and the expansion of salt intrusion through rivers. Monitoring environmental health in estuaries is very necessary for the management of water resources and the protection of the ecosystem for sustainable development.

From reviews in the literature, the water salinity prediction models have been divided into two models: physical-based models and data-based models. Physical-based models include QUAL2K (Fan et al. 2009; Sahoo & Swain 2021) or Mike (Etemad-Shahidi et al. 2008; Paliwal & Patra 2011). Although these models have been widely used in previous studies to assess water quality, such as water salinity, these models are very complex and time-consuming and, especially, require detailed data over a long period of time. Therefore, the application of these models in large areas is a great challenge, especially in developing countries where data are scarce. To reduce these limitations, in recent years, several studies have used statistical models based on linear and non-linear relationships between input and output variables. These models have been widely applied in previous studies due to their simplicity and ability to provide fast predictions in limited data sets. Moreover, these models are very effective in determining linear relationships or simple non-linear relationships between variables. Finally, these models can be used in systems with limited capacity (Riefer 1982; Chambers & Hastie 2017). However, several studies have pointed out that these models are very difficult to represent complete multivariate non-linear relationships. In many cases, there are non-linear, randomly, and delayed relationships between several water quality parameters; so, it is very difficult to build a statistical model to predict events in this case. Moreover, these models are very sensitive to noisy or missing data. Finally, specific statistical models are often applied to a given region; this makes them difficult to be applied in different regions (Rogger et al. 2012).

To reduce the limitations of statistical model quality, several studies have been developed, like the machine learning models, that explore the hidden and complex relationships between input variables and output variables. These models have advantages over traditional models (Jiang et al. 2022). They include the following: The data required to be collected for the machine learning model are obviously easy, especially in the context of the advancement of remote sensing data. In addition, machine learning models are less sensitive than physics-based models in cases of missing data, and machine learning models can handle huge amounts of data at different scales (Rajput et al. 2023). These models include random forest (RF) (Hidayat & Astsauri 2022; Khan et al. 2022) and support vector machine (Jiang et al. 2019; Shafiei et al. 2022). Several studies have used machine learning to estimate the salinity of water in estuaries. For example, Saccotelli et al. (2024) used a support vector machine to predict water salinity in the estuary in the Po Goro River. Tran et al. (2021) used different machine learning methods such as K-nearest neighbours, RF, support vector machine, and long short-term memory to monitor saltwater intrusion in the Ham Long River in the Mekong Delta (MD) of Vietnam. Fang et al. (2017) built the hybrid model by integrating a support vector machine with a genetic algorithm to predict saltwater intrusion in the Min River Estuary. Melesse et al. (2020) built two individual models (M5 Prime (M5P) and RF) and eight hybrid models (bagging-M5P, bagging-RF, random subspace (RS)-M5P, RS-RF, random committee (RC)-M5P, RC-RF, additive regression (AR)-M5P, and AR-RF) to predict the EC value in the Babol-Rood River. In addition to the machine learning models, deep learning is currently playing an increasingly important role in solving environmental problems, such as water quality prediction. These approaches include recurrent neural networks such as the gated recurrent unit (GRU) (Wang et al. 2024) and the LSTM (Wang et al. 2023), as well as temporal convolution-based models such as the temporal convolutional network (TCN) (Yao et al. 2024). Wang et al. (2024) used two optimisation algorithms, namely the improved sparrow search algorithm (SSA) and the attention (AT) mechanism to improve the performance of GRU for water quality prediction. Wang et al. (2023) applied variational mode decomposition and improved grasshopper optimisation algorithm (IGOA) to improve the performance of the LSTM neural network for water quality prediction. While in the study of Yao et al. (2024), the authors used the TCN to efficiently capture temporal dependencies. The authors indicated that using this model is faster to train traditional models. Deep learning models are particularly effective when processing large data qualities with complex non-linear relationships. Despite their scientific advances and practical value, studies using machine learning and deep learning to estimate or predict saltwater intrusion still have limitations. First, the machine learning and deep learning models developed from previous studies are mainly applied to simulate salinity in different development scenarios. That is, researchers have used machine learning to predict and evaluate the extent of salinity changes in the aquatic environment in certain development scenarios, such as dam construction and adjustment of the water source. Therefore, the prediction ability of machine learning models has not been explored much. Second, most previous studies use machine learning to monitor salinity at the weekend or monthly level due to the demand for large computational time (Sarkar et al. 2024; Vaferi et al. 2024). However, saltwater prediction instruction at the finer level, such as the daily level, is very rare. Several studies have pointed out that saltwater prediction instruction at the daily level plays an important role in supporting farmers in the development and management of water resources in the delta. For example, farmers need the saltwater level in a day to construct irrigation planning for agriculture.

Therefore, the objective of this study is the development of a machine learning model based on the GRU and two optimisation algorithms (sailfish optimiser (SFO) and grey wolf optimiser (GWO)) to predict the daily intrusion of saltwater in the Hau River Estuary in the MD. The novelty of this study is that this is the first study to apply the hybrid machine learning model to predict the saltwater intrusion in the MD, where the saltwater intrusion situation is increasingly important due to climate change and the rise of sea level. The main contribution of this study is the development and evaluation of the potential of hybrid machine learning to predict saltwater intrusion. The prediction not only provides knowledge of saltwater in the river but also of different studies on hydrology.

Study area

The Mekong River, approximately 4,900 km long, is the longest in Southeast Asia. It flows from the Tibetan Plateau (China) through five countries (Myanmar, Lao PDR, Thailand, Cambodia, and Vietnam) through an extensive delta into the East Sea through the MD (Dinh et al. 2020). The river basin covers an area of 795,000 km2, which presents opportunities and challenges for the countries it traverses (Figure 1).

The Lower Mekong River Basin experiences a tropical monsoonal climate characterised by two equal distinct wet and dry seasons. This climate pattern influences agricultural activities and water resource management in the region. The Vietnamese Mekong Delta (VMD), located downstream of the Mekong River with an area of approximately 39,400 km2, represents a vital agricultural area, which benefits from an annual average rainfall of 1,400–2,200 mm. While the rainy season from May to October provides most of the precipitation, the dry season from November to April presents unique challenges for water management and agriculture. The runoff generated in the basin, approximately 500 km³, flows into the East Sea through two main rivers, the Tien River and the Hau River. The MD region has a relatively flat topography, with elevations ranging from 0 to 2 m above sea level. The tidal regime at the river mouths is a mixed tide, combining diurnal and semi-diurnal tides, with amplitudes reaching up to 3 m. The MD is highly vulnerable to climate change and sea level rise, impacting the livelihoods of millions. The flat topography and ongoing land due to excessive groundwater extraction, with rates reaching 18–20 mm/year (Minderhoud et al. 2018), exacerbate the risks associated with rising sea levels. According to the Ministry of Natural Resources and Environment of Vietnam, projections indicate that the MD will face increased flood risks and increased saltwater intrusion in the coming years, requiring proactive measures to mitigate these challenges. If sea levels rise 100 cm due to climate change, the MD will have the highest risk of flooding, with 47.29% of its area at risk. The areas with the highest risk of flooding are the Cà Mau Province with 79.62% and the Kiên Giang Province with 75.68% of their areas at risk. Saltwater intrusion under the impact of sea level rise is projected to push the saline boundary (2.5 mg/L) 10 and 20 km further inland along the main rivers by the mid-2030 and 2090s, respectively, especially during the dry season. Furthermore, the changes in water flow due to hydroelectric operations and economic activities further complicate the hydrological dynamics. These factors exacerbate the hazards of saltwater intrusion, drought (Quang et al. 2021), and water pollution, making them more complex.

Geodatabase

The MD is considered one of the regions most affected by climate change and rising sea level in the world. Several previous studies have pointed out that by the end of the twentieth century, the sea level will rise by about 50–80 cm in the MD, which will cause increasingly serious salt intrusion problems, especially during the dry season. Furthermore, to meet the growing demand for water resources, several countries upstream of the Mekong River have built small- and medium-sized water conservation facilities. This leads to a reduction in the water level downstream of the river, which aggravates the situation of salt intrusion. Therefore, the prediction of salt intrusion plays an important role in supporting local governments and agriculture in managing natural resources for agricultural development. In this study, the salinity value data varied from 1 January 2015 to 30 June 2020 at the Tra Vinh station, available from the Ministry of Natural Resources and Environment, which were used to predict saline intrusion with a forecast time of 1, 5, 7, 15, and 30 days (Figure 2).

Methodology

GRU networks

The GRU model is a gating mechanism in the recurrent neural networks proposed by Cho (2014). The structure of GRU includes two gates: the first is the reset gate. This reset gate determines the amount of information to ignore. The previous hidden state, concatenated with the input data, passes through a sigmoid (to keep only the relevant coordinates) and is then multiplied by the old state (Yang et al. 2023).

The second is the update gate to change the method used to calculate the hidden states in RNNs. The main function of the update gate is to determine the ideal amount of information from the past that is important for the future. One of the main reasons why this function is so important is that the model can copy every detail from the past to eliminate the fading gradient problem. The input data and the old hidden state are concatenated and passed through a sigmoid function whose role is to determine which components are important (He et al. 2024).

The inputs of both the reset and update gates are considered as the input of the current time step Xt and the hidden state of the previous time step Ht − 1. The output is calculated by the fully connected layer using the sigmoid function, which is used as an activation function.

The GRU model introduces the concept of the reset gate and the update gate in order to change the method used to calculate the hidden states in RNNs.

Grey wolf optimisation

GWO algorithm is a metaheuristic optimisation method and was developed by Mirjalili et al. (2014). This algorithm is inspired by the social hierarchy and hunting techniques of grey wolves in the wild. In nature, grey wolves live in groups and they are organised by a hierarchical order. Grey wolves are divided into four types: alpha wolves, beta wolves, delta wolves, and omega wolves (Mirjalili et al. 2014). Alpha wolves are the leaders of these groups, and beta and delta wolves support alpha wolves in decision-making. The situation of omega is weaker, and they must obey alpha, beta, and delta wolves. In the GWO algorithm, the situation of alpha grey wolves is the best situation and this situation has been updated often in each iteration. It should be noted that each situation is considered a solution in the algorithm. The situation of beta grey wolves is considered the second best situation generated by the previous iteration. The delta wolves are considered the third best situation. The alpha, beta, and delta wolves control the omega wolves. Omega wolves are the pawns, the old wolves, the hunters, and the guardians (the best solution generated in the previous iteration) (Zhang et al. 2017; Chen et al. 2019). In the training process, GWO was divided into three main stages: search for a prey, surround it, and attack it. The situation of alpha, beta, and delta is closer to that of a prey. However, the situation of omega is farther from a prey. However, the situation can change in each iteration (Liu et al. 2021). GWO is considered one of the powerful algorithms and has several advantages over other swarm intelligence algorithms. GWO requires a minimum of parameters to operate, which makes this algorithm simpler and easier. This algorithm is also fast in convergence, thus saving the computation time. In addition, GWO has the advantage of balancing the exploration and exploitation processes. Finally, several studies have indicated that GWO can be easily applied in several optimisation problems due to its flexible design, and the performance of this algorithm has been justified in several different fields. That is why it is selected to improve the performance of the GRU model to predict water quality (Hao & Sobhani 2021).

Sailfish optimiser algorithm

SFO is a metaheuristic algorithm and was developed by Shadravan et al. (2019). This method is inspired by the group hunting behaviour of sailfish in nature. The SFO algorithm includes two populations: sailfish and sardines (Shadravan et al. 2019). In the hunt process, sailfish have the function of improving the search space, and the sardine population has the function of diversifying the search space. This means that the sailfish population moves around the sardines, helping them diversify their chances of success when hunting (Kumar et al. 2022). The sailfish represent potential solutions, and the sardines represent the optimal solution. In the SFO algorithm, the populations of sailfish and sardines are randomly initialised. After the fish population, the parameters that represent the attack power of the fish species are initialised. The values of this parameter decrease after each iteration and are kept in the matrix. Each school of fish will update its position at each iteration until the termination conditions are met (Kumar et al. 2022; Rajoriya & Gupta 2023). In the SFO algorithm, elite sails are the best sailfish, and the best sailfish are wounded sardines. The positions of sardines and sailfish change based on the positions of the most elite fish. During each iteration, the positions of elite sailfish and wounded sardines are updated if better positions of the prey are observed. The best positions of the sailfish and sardine are returned by the matrix, and the algorithm terminates (Kumar et al. 2022; Ikram et al. 2023). SFO is considered a simple algorithm and has the advantage of balancing the exploration and exploitation processes. This advantage makes them the diversity of swarm, the assurance of high convergence speed, and the avoidance of local optimisation. In addition, this algorithm has been successfully optimised for the various machine learning algorithms to solve the environmental problem. Therefore, it is selected to optimise the GRU in this study to predict water quality (Kumar et al. 2022).

Experimental set-up for GRU

This study uses machine learning and optimisation algorithms to predict saltwater intrusion. The saltwater intrusion prediction process was performed on the Python platform. The GRU model was coded using the TensorFlow library, while the integration of GWO and SFO optimisation algorithms into the GRU model was done by programming.

For machine learning algorithms, automatic hyperparameter optimisation was performed. The flow chart was performed, as shown in Figure 3.
Figure 3

Methodology used to predict saltwater in this study.

Figure 3

Methodology used to predict saltwater in this study.

Close modal

Initially, the data set was divided into two parts: 80% of the data to build machine learning models and 20% to validate these models. This approach has been widely applied in previous studies to simplify model optimisation and maximise the use of available data (Vermeulen & Van Niekerk 2017; Wang et al. 2021). The model hyperparameters were optimised directly in the process of integrating the GWO and SFO algorithms with the GRU model, and this makes less use of the intermediate data. Moreover, the absence of validation data in the model training process is to ensure that the proposed models can perform better on unpublished data. This is very important to reproduce the models in other regions of the world.

The GRU network was used to predict saltwater in the estuary of the Mekong River because it is designed to model sequential data. When training the model using data from the previous 15 days, a GRU strategy was developed in this study, which is the prediction of salt water 1, 5, 7, 15, and 30 days later. The selection of the previous 15 days to predict water salinity on different time horizons (1, 5, 7, 15, and 30 days) depends on several reasons related to the nature of time series and GRU model capabilities. Estuaries are dynamic systems in which salinity depends on river discharge, tide, and human activities. Therefore, using the previous 15 days, models can understand the recent trends which can provide the rich temporal framework to predict salinity in the future with high accuracy. Moreover, using too short time (less than 15 days) may influence the model accuracy, while using a longer time may lead to the waste of resources. The prediction values are not used as the input data to predict the subsequent values to simplify the model structure and the training process. In fact, the models used the observation data in the past to predict the value in the future. This approach method can be more stable and easier to train, especially in a simple prediction task, or the relationship between values in the past and in the future is simpler. It should be noted that the accuracy of the GRU model depends on the adjustment of the parameters. The construction of the salinity prediction model is carried out through the following main steps. First, the data are processed and analysed on the basis of the day and month salinity cycles. The daily cycle is calculated by dividing 31 days from 31 days, while the monthly cycle is calculated by dividing 12 months from 12 months to standardise the data and better represent the fluctuation of time. The training data include years 2015, 2016, and 2019, while the testing data are collected from 2020 to evaluate the predictive ability of the model. The GRU model is chosen as the baseline with standard configurations of four layers, using the Tanh activation function, the Sindom degradation activation function, and a Glorot uniform kernel weight initializer. Each GRU layer has 50 units and the dropout is set to 0.2 to avoid over installation. However, due to the limited number of samples and data features, the optimisation solution was applied to optimise the number of GRU units for each input and output day. Two optimisation algorithms, the GWO and SFO optimisation algorithms, were applied to find the optimal number of GRU units for each specific case. The results of the GWO algorithm show that the optimal units for inputs of 15 days and outputs of 1, 7, 15, and 30 days are 64, 86, 64, and 24 days, respectively. Similarly, the SFO algorithms provide good results; for example, for 15 input days and 1, 7, 15, and 30, the optimal number of units is 48, 86, 36, and 64, respectively. The two optimisation algorithms not only reduce overfitting but also significantly improve the prediction performance for different input and output periods. This demonstrates the flexibility and efficiency of the optimising algorithms applied to salinity prediction problems. First, a trial and error of the recurring ‘orthogonal’, bias units. The models proposed have been coded using the Python platform with the library of Tensorflow. Furthermore, these algorithms have been used to use the workstation: (2 × CPU INTEL XEON PLATINUM 8168 UP 3.7 GHz – 48 CORE/96 THREAD; 256G/2133 ECC REGISTERED DDR4 (8 × 32G); SSD SAMSUNG 990 PRO HEATSINK 1TB PCIe NVMe 4.0 × 4–7450MB/s; 2× NDIVIA RTX A5000 24GB GDDR6).

Performance matrices

In this study, four statistical indices, namely RMSE, MAE, R2, and NSE, were used to evaluate the performance of the proposed models. Among them, RMSE and MAE calculate the computational errors of the machine learning models by measuring the EC value of the proposed models and the observation EC value. The indices were calculated by the following equations:

Here, Ypredicted and Yobserved are the predicted value and the observed value for point i, and n is the number of observation points.

R2 is the statistical measure that presents the proportion of the variance of the observed data explained by the models. That is, this index evaluates the degree of agreement between the observed values and the predicted values.
where is the average observation value, Y are the prediction and observation values for point i, and m is the number of observation points.

NSE is the effective measure for evaluating the accuracy of the prediction model by comparing the cumulative squared error between the prediction values and the observation value with the variance of the observations.

Assessment of the performance of machine learning

Figure 4 presents the R2 value for the model to predict water salinity for 1, 7, 15, and 30 days ahead. The results showed that when the days ahead increased, the model prediction reduced. For the day ahead, the GRU-GWO model was better than the other models with a value of 0.914, followed by the GRU-SFO model with an R2 value of 0.912 and the GRU model with an R2 value of 0.9, respectively. For 7 days ahead, the GRU-GWO model was better than the others with an R2 value of 0.85, followed by GRU-SFO with an R2 value of 0.83, and GRU with an R2 value of 0.81. For 15 days ahead, the GRU-GWO model was superior to the other models with an R2 value of 0.76, followed by GRU-SFO with an R2 value of 0.73, and GRU with an R2 value of 0.67. For 30 days ahead, the GRU-GWO model continues to be more accurate than the other models with an R2 value of 0.55, followed by GRU-SFO with an R2 value of 0.53, and GRU with an R2 value of 0.52.
Figure 4

R2 value for the test data set by the proposed models.

Figure 4

R2 value for the test data set by the proposed models.

Close modal

In terms of days ahead, for the GRU-GWO model, the R2 value reduces from 0.9 for 1 day ahead to 0.81 for 7 days ahead, 0.67 for 15 days ahead, and 0.52 for 30 days ahead. For the GRU-SFO model, the R2 value reduces from 0.91 for 1 day ahead to 0.83 for 7 days ahead, 0.73 for 15 days ahead, and 0.53 for 30 days ahead. For the GRU model, the value reduces from 0.9 for 1 day ahead to 0.81 for 7 days ahead, 0.67 for 15 days ahead, and 0.52 for 30 days ahead.

In addition to the R2 index, this study uses the MAE and RMSE indices to assess the precision of the proposed models. For a day ahead, the GRU-GWO model was better than the other models in terms of the MAE (0.55) and RMSE (0.76) values, followed by GRU-SFO with the MAE (0.56) and RMSE (0.77) values, and GRU with the MAE (0.59) and RMSE (0.78) values. For 7 days ahead, the GRU-GWO model was even more accurate than the other models with the MAE (0.68) and RMSE (0.97) values, followed by GRU-SFO with values of MAE (0.75) and RMSE (1.03), and GRU with values of MAE (0.99) and RMSE (1.43). For 15 days ahead, the GRU-GWO model was superior to the other models with MAE (0.89) and RMSE (0.21), followed by GRU-SFO with the values of MAE (0.98) and RMSE (1.32), and GRU with the values of MAE (0.99) and RMSE (1.43). For 30 days ahead, the GRU-GWO model continues to be better than the other models with MAE (1.25) and RMSE (1.63), followed by GRU-SFO with the values of MAE (1.26) and RMSE (1.67), and GRU with the values of MAE (1.37) and RMSE (1.72), respectively (Table 1).

Table 1

Performance of models proposed with the MAE and RMSE values

Model1 day ahead
7 days ahead
15 days ahead
30 days ahead
MAERMSER2MAERMSER2MAERMSER2MAERMSER2
GRU 0.59 0.78 0.9 0.76 1.05 0.81 0.99 1.43 0.67 1.375 1.772 0.52 
GRU-GWO 0.55 0.76 0.914 0.68 0.97 0.85 0.89 1.21 0.76 1.25 1.63 0.55 
GRU-SFO 0.56 0.77 0.912 0.75 1.03 0.83 0.98 1.32 0.73 1.26 1.67 0.53 
Model1 day ahead
7 days ahead
15 days ahead
30 days ahead
MAERMSER2MAERMSER2MAERMSER2MAERMSER2
GRU 0.59 0.78 0.9 0.76 1.05 0.81 0.99 1.43 0.67 1.375 1.772 0.52 
GRU-GWO 0.55 0.76 0.914 0.68 0.97 0.85 0.89 1.21 0.76 1.25 1.63 0.55 
GRU-SFO 0.56 0.77 0.912 0.75 1.03 0.83 0.98 1.32 0.73 1.26 1.67 0.53 

Machine learning for estuary salinity prediction

Figure 5 shows the results of the salinity prediction for 1, 7, 15, and 30 days ahead using the GRU, GRU-GWO, and GRU-SFO models. The results show that all models have the ability to predict salinity fluctuations well during the research process. For the salinity prediction results, the model forecast curves are relatively consistent with the observed values. In particular, the GRU-GWO and GRU-SFO models have better forecasting performance than the GRU model, as shown by the curve of the predicted value of this model versus the observed value, especially in periods of large fluctuations. Meanwhile, the predicted value of the GRU model tends to be lower than the observed value, especially during peak periods.
Figure 5

1, 7, 15, and 30 days ahead salinity prediction.

Figure 5

1, 7, 15, and 30 days ahead salinity prediction.

Close modal

For the seven day forecast, the two hybrid models, GRU-GWO and GRU-SFO, show good agreement between the predicted and the observed values. During peak periods, the predicted value deviates by less than 1 g/L from the observed value. At the same time, although the GRU model also has a good predictive ability, the predictive value of this model has a larger error than that of the two hybrid models.

For the 15-day forecast, the errors of the three models GRU-GWO, GRU-SFO, and GRU have increased, but the GRU-GWO and GRU-SFO models have maintained a relative accuracy, with a dynamic phase error of about 1–2 g/L compared with the observed values during peak phases. However, the GRU model has an error greater than 2 g/L compared with the observed value during these periods.

For the 30-day forecast, we can see a clear deterioration in the accuracy of the proposed model. However, the two models GRU-GWO and GRU-SFO still show higher accuracy than the GRU model. The errors of these two models can be 2 g/L higher than the observed values during the peak periods. Meanwhile, the predicted value of the GRU model can be about 3 g/L different from the observed value.

The Mekong River Delta (MRD), which is home to nearly 20 million people, is the most important agricultural and aquaculture region in our country. Rice is the main crop in the region with an area of ∼1.9 million hectares and contributes approximately 50% (∼23 million tons) of Vietnam's total rice production. However, the MRD is severely affected by climate change and sea level rise, as well as the reduction of river sediments due to the impact of upstream hydroelectric dams. These risks cause the loss of arable land for agricultural development and reduce rice production in the delta of the MRD (Darby et al. 2016). According to Darby et al. (2016), climate change is projected to simultaneously increase both rainfall and air temperature in the Mekong River basin. Therefore, the projected changes in streamflow by different climate models for the period 2032–2042 compared with the period 1982–1992 range from 11 to +15% in the wet season and 10 to +33% in the dry season. Climate change also causes the polar ice caps to melt and the seawater to expand, thereby raising sea levels. The data from satellite imagery and ground and satellite monitoring stations show that sea levels have risen at a rate of 1.7 mm/year over the period 1900–2009, with an increase of 3 mm/year over the period 1993–2009 (Church & White 2011). By 2100, sea levels are projected to rise by 39–85 cm compared with the average sea level during the period 1986–2005 (Mengel et al. 2016). Rising sea levels also cause saltwater intrusion. For the MD region, recent research predicts that sea levels in the region will increase by 46–77 cm by the end of the twentieth century (Thuc et al. 2016). With its flat terrain and dense canal system, the MD is very sensitive to rising sea levels and increasing saltwater intrusion, especially during the dry months of October to May every year (Oppenheimer et al. 2015). Approximately 1.8 million acres of land in the MD are estimated to be affected by saltwater intrusion during the dry season, of which 1.3 million hectares of aquaculture area will have salt concentrations increased by more than 5 g/L. Therefore, forecasting saltwater intrusion on a daily scale will help farmers to allocate irrigation water during the day and week, serving agricultural development.

This study proposes two optimisation algorithms, GWO and SFO, to improve the performance of the GRU model to predict salinity intrusion into the Mekong River Estuary in the Tra Vinh province. Although both algorithms have been successful in improving the performance of the GRU model, the performance of the GRU-GWO model is higher than that of the GRU-SFO model, because GWO is considered an algorithm that can converge faster than metaheuristic algorithms because of updating the location of the wolf during the foraging process. The GWO algorithm is capable of balancing the exploration and exploitation processes. Balancing these two processes is very important in reducing local optimisation and increasing the overall accuracy of the model (Faris et al. 2018; Miao et al. 2020). Although SFO is also considered one of the effective algorithms, this algorithm is quite complex and requires complex parameter adjustments during the optimisation process. Additionally, the SFO algorithm focusses on the exploration process to explore the search space, leading to an imbalance between exploration and exploitation, which can slow the convergence of optimal solutions (Zhang & Mo 2021; Zhang & Mo 2022).

In this study, the time period for predicting water quality depends on the objective and needs of farmers. Predictions of 1–7 days correspond to immediate operational decisions, for example, water resource management for drainage. The results of this study showed that the proposed models were with precision with the value of R2 plus 0.85. This confirms the reliability for practical applications. This corresponds to previous studies such as by Saccotelli et al. (2024) who justified the effectiveness of short-term models in providing information on daily changes in water quality. For the prediction of water quality for medium-term horizons of 7–15 days, the results of this model are very useful for the management strategy such as preventing the effects of water quality change for the agricultural system. Although the precision of these models decreased slightly with the R2 value from 0.73 to 0.75, these models were still effective. These results were consistent with previous studies that highlighted the effectiveness of machine learning for water quality prediction in the medium-term horizon (Tran et al. 2021). However, when increasing the prediction horizon (30 days), these models faced significant challenges due to the change in environmental factors. In this study, when predicting at 30 days, the accuracy of the proposed models decreased dramatically with an R2 value of 0.55. This reflects the limitations of the proposed models in predicting water quality over a long period. This has been confirmed by previous studies (Fang et al. 2017).

To evaluate the performance of the models proposed in this study, a small comparison is made between the results of this study and previous studies that used machine learning and deep learning to predict water quality, for example, saline intrusion. Khan et al. (2022) used the RF model to evaluate water quality in the Indus River of Pakistan. The results showed that with an R2 value of 0.82 of the RF model, the performance of this model was lower than that of our model with an R2 value of 0.91. Fang et al. (2017) integrated the genetic optimisation algorithm with the support vector machine model to predict estuary salinity in the Min River Estuary, China. The results indicated that the GA-SVM model had an R2 value of 0.85. The model performance in the study by Fang et al. was less than in our study. Tran et al. (2021) used different machine learning algorithms, namely random forest regression (RFR), extreme gradient boosting regression (XGBR), CatBoost regression (CBR), and the light gradient boosting regression (LGBR) models to predict groundwater salinity in the MRD of Vietnam. The results supported that the CBR model was more accurate with an R2 value of 0.84. The R2 value in this study was lower than that in our study. Although there are differences between the models related to the structures, data, and characteristics of the region, it can be found that the performance of the hybrid model in our study was better than the previous studies. Therefore, the models in this study can be used to assess water quality, to support decision-makers in water resource management.

Although machine learning has been successful in predicting saltwater intrusion, it has been successful not only in this study but also in previous studies (Nguyen et al. 2023; Zhou & Li 2024). However, the extension of this method to other regions of the world also needs careful consideration. First, climate change and construction of an upstream dam affect saline intrusion, which makes it difficult to predict future saltwater intrusion, although, in general, previous studies have emphasised that the machine learning/deep learning models have better performance than the traditional hydraulic models in predicting hydrological problems and natural disasters (Nguyen et al. 2023). However, it remains to be considered whether the machine learning/deep learning models can accurately predict saltwater intrusion under the above conditions. Theoretically, if we have data related to climate change, sea level rise, and dam construction upstream, and can integrate them into saltwater intrusion forecasting models, it is entirely possible to forecast saltwater intrusion under the above conditions.

The application of machine learning/deep learning models in different regions of the world also needs to be carefully considered. Basically, each model is suitable for a certain set of data, based on the geographical, climatic, hydrological, and socioeconomic characteristics of each region because there may not be common conditions that represent all regions (Nguyen et al. 2024a, b). Therefore, many previous studies have emphasised the need to collect data from different regions to use as the input data for the machine learning models to solve this problem. However, it is also important to emphasise that data collection is not simply related to financial issues or to data-sharing policies, especially in developing countries. Some studies also propose to use traditional models to generate input data for the machine learning models to minimise the costs associated with data collection in different regions. In addition, building hybrid models, such as integrating optimisation algorithms into machine learning/deep learning models to improve model performance, is also considered one of the effective solutions to this problem (Nguyen et al. 2024a, b).

The MD is the downstream part of the Mekong River with low and relatively flat terrain. Along with the main stream, the Tien River and the Hau River, the MD has a dense canal system with an average density of 4 km in 1 km2, creating favourable conditions for saltwater intrusion because tides bring saltwater deep into the river and inland fields, especially in the dry season, when the flow from the upstream of the Mekong River decreases. The highest salinity usually occurs mainly in April or May due to the influence of tides in the East Sea, the West Sea, or both. Furthermore, due to the low flow from the upstream of the Mekong River, combined with the tide that brings seawater with salinity deep into the inland fields, the situation of saltwater intrusion becomes more serious. This saltwater transmission into the river also follows the rhythm of the tidal transmission process. At a fixed location, there are usually two salinity peaks and two salinity legs during the day with water level changes (usually the salinity peak appears 1–2 h later than the peak tide level); salinity gradually decreases deeper into the river. At the river mouth, salinity also has a daily cycle, a 15-day cycle, and a monthly cycle similar to the tidal cycle. Due to the influence of meteorological factors, especially the east wind in February–March, the peak tide and average water levels increase by 20–30 cm, leading to an increase in salinity (Nguyen et al. 2014, 2020; Eslami et al. 2019).

Although this study has successfully built a machine learning model for salinity forecasting in the Mekong estuary, it also has limitations in the use of data. First, the salinity in estuaries can be affected by many factors such as rainfall, tide, discharge, and water level; in the future, we will try to integrate these factors as variables that affect salinity intrusion in the machine learning model. Second, one of the limitations of this study lies in the availability of data. This study is based on the salinity data collected from 2015 to 2020 at the Mekong estuary in the Tra Vinh province. Incorporating a longer historical data set to capture long-term trends and rare events would be more beneficial. Furthermore, collecting data from other estuaries of the Mekong River with different characteristics will improve the generalizability of the proposed model, enabling it to predict the salinity in different regions. This is consistent with the long-term goal of the study, which is to develop a machine learning model to forecast salinity intrusion in different regions of the MD and the Red River Delta.

This study develops a potential methodological framework for forecasting salinity intrusion in the Mekong River Estuary, where salinity intrusion is becoming increasingly severe in the context of climate change and sea level rise. The results of this study are certainly useful for the management of water quality in estuaries.

The Mekong River is the main source of irrigation and domestic water in the MD. However, the deterioration of its water supply, especially saltwater intrusion into the estuary of the river, especially in the context of climate change, causes significant damage to the environment and agricultural development in the region. Therefore, saltwater intrusion prediction plays an important role, supporting local authorities and farmers in building effective measures to reduce the effects of saltwater intrusion. The aim of this study is to develop machine learning models, namely GRU, GRU-GWO, and GRU-SFO, to predict saltwater intrusion for 1, 7, 15, and 30 days ahead in the Mekong estuary of Vietnam. This study is considered the first study to use machine learning, namely GRU, GRU-GWO, and GRU-SFO, to predict saltwater intrusion into the estuary in the MRD. The results are as follows:

  • - This study highlighted the power capability of the GRU model in predicting saltwater intrusion. This study plays an important role that can be modified to predict saltwater intrusion in other regions.

  • - Both the GWO and the SFO algorithms were successful in improving the performance of the GRU model. The new comprehensive models can be used to predict saltwater intrusion in other estuaries, especially in data-limited regions.

Among the proposed models, the GRU-GWO model showed more performance than the other models in predicting saltwater intrusion for all 1, 7, 15, and 30 days ahead. The use of this model can support local authorities and farmers in managing water resources for agricultural development.

The results of this study may be an effective tool to predict saltwater intrusion into estuaries around the world. The methodology used in this study can be developed to be applied to other environmental problems, too.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Alizadeh
M. J.
,
Kavianpour
M. R.
,
Danesh
M.
,
Adolf
J.
,
Shamshirband
S.
&
Chau
K.-W.
(
2018
)
Effect of river flow on the quality of estuarine and coastal waters using machine learning models
,
Engineering Applications of Computational Fluid Mechanics
,
12
(
1
),
810
823
.
Chen
W.
,
Hong
H.
,
Panahi
M.
,
Shahabi
H.
,
Wang
Y.
,
Shirzadi
A.
,
Pirasteh
S.
,
Alesheikh
A. A.
,
Khosravi
K.
&
Panahi
S.
(
2019
)
Spatial prediction of landslide susceptibility using GIS-based data mining techniques of ANFIS with whale optimization algorithm (WOA) and grey wolf optimizer (GWO)
,
Applied Sciences
,
9
(
18
),
3755
.
Cho
K.
(
2014
)
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. arXiv preprint arXiv:1409.1259
.
Church
J. A.
&
White
N. J.
(
2011
)
Sea-level rise from the late 19th to the early 21st century
,
Surveys in Geophysics
,
32
,
585
602
.
Darby
S. E.
,
Hackney
C. R.
,
Leyland
J.
,
Kummu
M.
,
Lauri
H.
,
Parsons
D. R.
,
Best
J. L.
,
Nicholas
A. P.
&
Aalto
R.
(
2016
)
Fluvial sediment supply to a mega-delta reduced by shifting tropical-cyclone activity
,
Nature
,
539
(
7628
),
276
279
.
Dinh
K. D.
,
Anh
T. N.
,
Nguyen
N. Y.
,
Bui
D. D.
&
Srinivasan
R.
(
2020
)
Evaluation of grid-based rainfall products and water balances over the Mekong River basin
,
Remote Sensing
,
12
(
11
),
1858
.
Eslami
S.
,
Hoekstra
P.
,
Nguyen Trung
N.
,
Ahmed Kantoush
S.
,
Van Binh
D.
,
Duc Dung
D.
,
Tran Quang
T.
&
van der Vegt
M.
(
2019
)
Tidal amplification and salt intrusion in the Mekong Delta driven by anthropogenic sediment starvation
,
Scientific Reports
,
9
(
1
),
18746
.
Etemad-Shahidi
A.
,
Dorostkar
A.
&
Liu
W.-C.
(
2008
)
Prediction of salinity intrusion in Danshuei estuarine system
,
Hydrology Research
,
39
(
5–6
),
497
505
.
Fang
Y.
,
Chen
X.
&
Cheng
N.-S.
(
2017
)
Estuary salinity prediction using a coupled GA-SVM model: a case study of the Min River Estuary, China
,
Water Science and Technology: Water Supply
,
17
(
1
),
52
60
.
Faris
H.
,
Aljarah
I.
,
Al-Betar
M. A.
&
Mirjalili
S.
(
2018
)
Grey wolf optimizer: a review of recent variants and applications
,
Neural Computing and Applications
,
30
,
413
435
.
He
F.
,
Wan
Q.
,
Wang
Y.
,
Wu
J.
,
Zhang
X.
&
Feng
Y.
(
2024
)
Daily runoff prediction with a seasonal decomposition-based deep GRU method
,
Water
,
16
(
4
),
618
.
Ikram
R. M. A.
,
Dehrashid
A. A.
,
Zhang
B.
,
Chen
Z.
,
Le
B. N.
&
Moayedi
H.
(
2023
)
A novel swarm intelligence: cuckoo optimization algorithm (COA) and sailfish optimizer (SFO) in landslide susceptibility assessment
,
Stochastic Environmental Research and Risk Assessment
,
37
(
5
),
1717
1743
.
Jung
C.
,
Ahn
S.
,
Sheng
Z.
,
Ayana
E. K.
,
Srinivasan
R.
&
Yeganantham
D.
(
2022
)
Evaluate river water salinity in a semi-arid agricultural watershed by coupling ensemble machine learning technique with SWAT model
,
JAWRA Journal of the American Water Resources Association
,
58
(
6
),
1175
1188
.
Keyes
A. A.
,
McLaughlin
J. P.
,
Barner
A. K.
&
Dee
L. E.
(
2021
)
An ecological network approach to predict ecosystem service vulnerability to species losses
,
Nature Communications
,
12
(
1
),
1586
.
Khan
M. A.
,
Shah
M. I.
,
Javed
M. F.
,
Khan
M. I.
,
Rasheed
S.
,
El-Shorbagy
M.
,
El-Zahar
E. R.
&
Malik
M.
(
2022
)
Application of random forest for modelling of surface water salinity
,
Ain Shams Engineering Journal
,
13
(
4
),
101635
.
Kumar
B. S.
,
Santhi
S.
&
Narayana
S.
(
2022
)
Sailfish optimizer algorithm (SFO) for optimized clustering in wireless sensor network (WSN)
.
Journal of Engineering, Design and Technology
,
20
(
6
),
1449
1467
.
Melesse
A. M.
,
Khosravi
K.
,
Tiefenbacher
J. P.
,
Heddam
S.
,
Kim
S.
,
Mosavi
A.
&
Pham
B. T.
(
2020
)
River water salinity prediction using hybrid machine learning models
,
Water
,
12
(
10
),
2951
.
Mengel
M.
,
Levermann
A.
,
Frieler
K.
,
Robinson
A.
,
Marzeion
B.
&
Winkelmann
R.
(
2016
)
Future sea level rise constrained by observations and long-term commitment
,
Proceedings of the National Academy of Sciences
,
113
(
10
),
2597
2602
.
Minderhoud
P.
,
Coumou
L.
,
Erban
L.
,
Middelkoop
H.
,
Stouthamer
E.
&
Addink
E.
(
2018
)
The relation between land use and subsidence in the Vietnamese Mekong Delta
,
Science of the Total Environment
,
634
,
715
726
.
Mirjalili
S.
,
Mirjalili
S. M.
&
Lewis
A.
(
2014
)
Grey wolf optimizer
,
Advances in Engineering Software
,
69
,
46
61
.
Nguyen
H. D.
,
Van
C. P.
,
Nguyen
T. G.
,
Dang
D. K.
,
Pham
T. T. N.
,
Nguyen
Q.-H.
&
Bui
Q.-T.
(
2023
)
Soil salinity prediction using hybrid machine learning and remote sensing in Ben Tre province on Vietnam's Mekong River Delta
,
Environmental Science and Pollution Research
,
30
(
29
),
74340
74357
.
Nguyen
H. D.
,
Dang
D. K.
,
Nguyen
N. Y.
,
Pham Van
C.
,
Van Nguyen
T. T.
,
Nguyen
Q.-H.
,
Nguyen
X. L.
,
Pham
L. T.
,
Pham
V. T.
&
Bui
Q.-T.
(
2024a
)
Integration of machine learning and hydrodynamic modeling to solve the extrapolation problem in flood depth estimation
,
Journal of Water and Climate Change
,
15
(
1
),
284
304
.
Nguyen
H. D.
,
Dang
D. K.
,
Nguyen
Q.-H.
,
Phan-Van
T.
,
Bui
Q.-T.
,
Petrisor
A.-I.
&
Nghiem
S. V.
(
2024b
)
Monitoring the effects of climate, land cover and land use changes on multi-hazards in the Gianh River watershed, Vietnam
,
Environmental Research Letters.
,
19
(
10
),
104033
.
Oppenheimer
M.
,
Campos
M.
,
Warren
R.
,
Birkmann
J.
,
Luber
G.
,
O'Neill
B.
,
Takahashi
K.
,
Brklacich
M.
,
Semenov
S.
&
Licker
R.
(
2015
)
Emergent Risks and Key Vulnerabilities. Climate Change 2014 Impacts, Adaptation and Vulnerability: Part A: Global and Sectoral Aspects
.
Cambridge, UK
:
Cambridge University Press
, pp.
1039
1100
.
Quang
C.
,
Hoa
H.
,
Giang
N.
&
Hoa
N.
(
2021
)
Assessment of meteorological drought in the Vietnamese Mekong Delta in period 1985–2018
,
IOP Conference Series: Earth and Environmental Science
,
652
,
012020
.
Rajput
S. P.
,
Webber
J. L.
,
Bostani
A.
,
Mehbodniya
A.
,
Arumugam
M.
,
Nanjundan
P.
&
Wendimagegen
A.
(
2023
)
Using machine learning architecture to optimize and model the treatment process for saline water level analysis
,
Water Reuse
,
13
(
1
),
51
67
.
Rogger
M.
,
Kohl
B.
,
Pirkl
H.
,
Viglione
A.
,
Komma
J.
,
Kirnbauer
R.
,
Merz
R.
&
Blöschl
G.
(
2012
)
Runoff models and flood frequency statistics for design flood estimation in Austria – Do they tell a consistent story?
Journal of Hydrology
,
456
,
30
43
.
Saccotelli
L.
,
Verri
G.
,
De Lorenzis
A.
,
Cherubini
C.
,
Caccioppoli
R.
,
Coppini
G.
&
Maglietta
R.
(
2024
)
Enhancing estuary salinity prediction: a machine learning and deep learning based approach
,
Applied Computing and Geosciences
,
23
,
100173
.
Sahoo
D.
&
Swain
R.
(
2021
) ‘
Water quality modelling using QUAL-2 K at Bray Marina, UK
’,
International Conference on Hydraulics, Water Resources and Coastal Engineering
,
Singapore
:
Springer
.
Shadravan
S.
,
Naji
H. R.
&
Bardsiri
V. K.
(
2019
)
The sailfish optimizer: a novel nature-inspired metaheuristic algorithm for solving constrained engineering optimization problems
,
Engineering Applications of Artificial Intelligence
,
80
,
20
34
.
Thuc
T.
,
Van Thang
N.
,
Huong
H. T. L.
,
Van Khiem
M.
,
Hien
N. X.
&
Phong
D. H.
(
2016
)
Climate Change and Sea Level Rise Scenarios for Vietnam
.
Hanoi, Vietnam
:
Ministry of Natural Resources and Environment
.
Tran
D. A.
,
Tsujimura
M.
,
Ha
N. T.
,
Van Binh
D.
,
Dang
T. D.
,
Doan
Q.-V.
,
Bui
D. T.
,
Ngoc
T. A.
,
Thuc
P. T. B.
&
Pham
T. D.
(
2021
)
Evaluating the predictive power of different machine learning algorithms for groundwater salinity prediction of multi-layer coastal aquifers in the Mekong Delta, Vietnam
,
Ecological Indicators
,
127
,
107790
.
Vaferi
B.
,
Dehbashi
M.
,
Alibak
A. H.
&
Yousefzadeh
R.
(
2024
)
Exploring the performance of machine learning models to predict carbon monoxide solubility in underground pure/saline water
,
Marine and Petroleum Geology
,
162
,
106742
.
Wang
J.
,
Peng
J.
,
Li
H.
,
Yin
C.
,
Liu
W.
,
Wang
T.
&
Zhang
H.
(
2021
)
Soil salinity mapping using machine learning algorithms with the Sentinel-2 MSI in arid areas, China
,
Remote Sensing
,
13
(
2
),
305
.
Wang
Z.
,
Wang
Q.
&
Wu
T.
(
2023
)
A novel hybrid model for water quality prediction based on VMD and IGOA optimized for LSTM
,
Frontiers of Environmental Science & Engineering
,
17
(
7
),
88
.
Yao
Z.
,
Wang
Z.
,
Huang
J.
,
Xu
N.
,
Cui
X.
&
Wu
T.
(
2024
)
Interpretable prediction, classification and regulation of water quality: a case study of Poyang lake, China
,
Science of the Total Environment
,
951
,
175407
.
Zhang
Y.
&
Mo
Y.
(
2022
)
Chaotic adaptive sailfish optimizer with genetic characteristics for global optimization
,
The Journal of Supercomputing
,
78
(
8
),
10950
10996
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).