Abstract
Salinity is of paramount importance in shaping water quality, ecosystem health, and the capacity to sustain diverse human and environmental demands in estuarine environments. Electrical conductivity (EC) is commonly utilized as an indirect measure of salinity, serving as a proxy for estimating other ion constituents within the Sacramento-San Joaquin Delta (Delta) of California, United States. This study investigates and contrasts four machine learning (ML) models (Regression Trees, Random Forest, Gradient Boosting, and Artificial Neuronal Networks) for approximating ion constituent concentrations based on EC measurements, emphasizing the enhancement of conversion for constituents exhibiting pronounced non-linear relationships with EC. Among the four models, the Artificial Neuronal Networks model outshines the others in predicting ion constituents from EC, especially for those displaying strong non-linear relationships with EC. All four ML models surpass traditional parametric regression equations in terms of accuracy in estimating ion concentrations. Furthermore, an interactive web browser-based dashboard is developed, catering to users with or without programming expertise, enabling ion level simulation within the Delta. By furnishing more precise ion constituent estimations, this research enriches the understanding of salinity's effects on water quality in the Delta and fosters well-informed water management decisions.
HIGHLIGHTS
The study applies four machine learning models to enhance the prediction of ion concentrations in the Sacramento-San Joaquin Delta based on electrical conductivity measurements.
The research introduces an interactive web-based dashboard, facilitating simulations of ion levels and providing a user-friendly platform for understanding salinity's impact on water quality in the Delta.
INTRODUCTION
Estuaries, as dynamic and complex environments where rivers meet the sea, host a wide variety of ecological processes and support diverse ecosystems. These transitional zones are characterized by the interplay between fresh water and saltwater, resulting in a gradient of salinity levels. Salinity in estuarine areas is a critical factor influencing the distribution and survival of aquatic species (Attrill & Rundle 2002), and biogeochemical cycles (Nixon 1981). Furthermore, estuaries are often integral to human activities, such as commercial and recreational fishing, transportation, and coastal development. Consequently, understanding and accurately simulating salinity in estuaries is essential for preserving ecosystem integrity, managing water resources, and ensuring the sustainability of human activities in these areas.
In light of these considerations, researchers and water resource managers have been focusing on developing effective methods to estimate and predict salinity in estuarine environments (Cloern et al. 2014). Numerical models, remote sensing techniques, and data-driven approaches are some of the tools employed to simulate and monitor salinity dynamics. Accurate salinity simulations are crucial in informing decisions related to water resource allocation, environmental protection, and the sustainable management of estuarine ecosystems. In this context, the present study investigates salinity in the Sacramento-San Joaquin Delta (Delta) of California, a critical estuarine region, with the aim of advancing the understanding of its influence on water quality and facilitating well-informed water management decisions.
Salinity in the Delta plays a vital role in determining water quality, the overall health of the ecosystem, and its ability to support various human and environmental needs, such as agricultural irrigation, habitat preservation, and drinking water supply (Rabalais et al. 2002; Kemp et al. 2005; Savenije 2005; Cloern & Jassby 2012). Salinity is a measure of the concentration of dissolved salts in water, and its accurate estimation is crucial for effective water resource management. Typically, salinity is measured indirectly as electrical conductance (EC) using automatic sensors and reported as specific conductance (Hutton et al. 2016; Rath et al. 2017; Bañón et al. 2021). EC, which represents the ability of water to conduct electric current, has a strong correlation with the concentration of dissolved ions in water.
EC can be employed as a predictor for other ion constituents in the Delta, including total dissolved solids (TDS), dissolved chloride (Cl−), dissolved sulfate (), dissolved sodium (Na+), dissolved calcium (Ca2+), dissolved magnesium (Mg2+), dissolved potassium (K+), dissolved bromide (Br−), and alkalinity. These ion constituents have diverse impacts on water quality and its suitability for various uses. For example, high concentrations of chloride and sulfate ions can adversely affect crop yields (Maas & Hoffman 1977), while elevated levels of sodium and calcium ions can lead to issues such as soil degradation and scaling in water infrastructure.
The maximum safe levels of ion constituents in water depend on the intended use. For aquatic species living in the Delta, excessive concentrations of certain ions can be toxic, leading to negative effects on their survival, growth, and reproduction. The United States Environmental Protection Agency (US EPA) provides guidelines for maximum ion concentrations in fresh water to protect aquatic life (US EPA 2018). For drinking water, the World Health Organization (WHO) and US EPA establish maximum contaminant levels (MCLs) for various ions to ensure water safety and protect public health (WHO 2011; US EPA 2021). Given these guidelines, failure to accurately simulate ion concentrations can have severe implications. Incorrect estimations of ions like chloride or sulfate could result in the inappropriate allocation of water for agricultural use, leading to reduced crop yields or soil degradation. Similarly, elevated levels of certain ions could be toxic to aquatic life, disrupting local food chains and reducing biodiversity. Inaccurate simulations could also compromise the safety of drinking water, putting public health at risk. Therefore, the need for precise and reliable ion concentration simulation is not merely academic but has direct, tangible implications for environmental sustainability and human well-being.
However, the concentration of each ion constituent is measured from discrete water samples (i.e., grab samples) and is available much less frequently than EC. This limitation poses a challenge for continuous monitoring of water quality and informed decision-making. The need to convert EC to other constituents arises from the increasing interest in reporting model results in terms of other constituents, as salinity is currently modeled in terms of EC. Therefore, developing accurate and reliable regression models between EC and other ion constituents is critical for understanding the impacts of salinity on water quality in the Delta and for making informed management decisions.
Various regression models have been developed in previous studies, with EC as the predictor and individual constituents as the predictand, based on grab sample data in the Delta (Jung 2000; Suits 2002; Hutton 2006; Hutton et al. 2022; Denton 2015). Most recently, Hutton et al. (2022) developed simplified statistical equations to estimate salinity constituent concentrations from EC, assuming that three sources govern the salinity level in the Delta: seawater intrusion (Ocean source), fresh water (Sacramento River), and the agricultural source (drainage-influenced San Joaquin River). While their work significantly contributed to the field, the accuracy of their equations for ions with strong non-linear relationships with EC, such as alkalinity and potassium (K+), requires improvement.
This study aims to address these limitations by developing machine learning (ML) models that can emulate and potentially improve upon the existing regression equations developed by Hutton et al. (2022) to simulate ion constituents from EC. A more recent study (Namadi et al. 2022) demonstrated the potential of ML algorithms as an alternative to parametric regression models for predicting ion constituents in the Delta. This study focused on the South Delta, testing ML models at seven stations during a short period from 2018 to 2020 when the grab samples were regularly collected. The findings provided preliminary evidence that ML can be successfully applied to estimate ion constituents from EC in the region.
Building on these promising results, the current study aims to further test the effectiveness of ML models for ion constituent prediction by expanding the analysis to a larger area and a more extended period. By using a more comprehensive dataset, we hope to validate the applicability of ML models to a broader context and establish their utility for estimating ion constituents across the entire Sacramento-San Joaquin Delta. Ultimately, this research will contribute to enhancing water quality monitoring and support better-informed water management decisions, ensuring the protection of aquatic life and the safety of drinking water supplies.
METHODOLOGY
Study locations and study dataset
Map showing 30 study stations in the interior Delta. Note: The inset map shows the location of the San Francisco Bay and Sacramento-San Joaquin Delta (Bay-Delta), containing the Delta study area (highlighted in the red rectangle).
Map showing 30 study stations in the interior Delta. Note: The inset map shows the location of the San Francisco Bay and Sacramento-San Joaquin Delta (Bay-Delta), containing the Delta study area (highlighted in the red rectangle).
To assemble the most comprehensive ion sample dataset in the Delta to date, this study combines grab samples from three sources. The primary dataset is derived from Hutton et al. (2022), which includes ion grab samples, EC, and X2 position collected between 1959 and 2018 at 19 stations within the study area (Figure 1). The second dataset consists of samples collected by the Department of Water Resources between 2018 and 2020 at seven stations in the South Delta sub-regions (Stations: 5, 8, 9, 10, 11, 12, 13) (Figure 1). The third dataset covers 13 stations in the interior Delta, with samples collected from 2018 to 2022 (Figure 1).
Incorporating the second and third datasets provides several advantages, including an increased sample size, data from a wider range of hydrologic conditions, and coverage of the most recent critical drought years (i.e., 2021–2022). Additionally, the combined dataset captures a large variation of ions in the Delta, allowing for validation and testing of the models under more extreme conditions. Ultimately, this approach enhances the robustness of our models and improves their predictive capabilities. The final dataset for this study contains ion data spanning from 1959 to 2022 and encompasses 30 locations in the study area.

Scatter plots showing the relationship between salinity (represented by EC) and ion constituents with linear relationship with EC (Group 1).
Scatter plots showing the relationship between salinity (represented by EC) and ion constituents with linear relationship with EC (Group 1).
Scatter plots showing the relationship between salinity (represented by EC) and ion constituents with bifurcation relationship with EC (Group 2).
Scatter plots showing the relationship between salinity (represented by EC) and ion constituents with bifurcation relationship with EC (Group 2).
Scatter plots showing the relationship between salinity (represented by EC) and ion constituents with non-linear relationship with EC (Group 3).
Scatter plots showing the relationship between salinity (represented by EC) and ion constituents with non-linear relationship with EC (Group 3).
Model development
This study employs four non-parametric supervised ML techniques to estimate ion constituents based on the EC at the study stations: regression trees (RT), random forest (RF), gradient boosting (GB), and artificial neural network (ANN). The equations from Hutton et al. (2022) serve as benchmark models for comparison.
Due to the complexity of the Delta's channel network and bathymetry, as well as the varying impacts of ocean tides, channel diversions, island drainage, pumping, and San Joaquin River inflow on local hydrodynamics, the source of the water and the proportions of water quality constituents at each study location can differ significantly. Consequently, we utilize subregions as categorical variables (Old and Middle River (OMR), South Delta, and San Joaquin Corridor) within the input data for the ML models.
Since ML algorithms cannot directly process categorical variables, we employ one-hot encoding to convert the names of the three subregions into binary vectors. This encoding assigns a separate column to each subregion category, with 0 and 1 indicating the absence or presence of that category, respectively. This approach eliminates the arbitrary assignment of numerical values that could mislead the learning algorithm.
Additionally, we include the month, water year type (WYT), and X2 position (when a specific sample was taken) as input features to evaluate their potential impacts on model outcomes. The Sacramento River X2 position is a water quality management indicator that represents the distance from the Golden Gate Bridge to the location where the salinity level reaches 2 parts per thousand (ppt). This indicator helps to monitor the fresh water–saltwater interface and its influence on the Delta's water quality. WYT is a classification based on water availability, which varies from year to year depending on factors such as precipitation and snowpack. WYT categories include wet, above normal, below normal, dry, and critical. These variables provide essential information about the hydrologic conditions affecting the study area.
Thus, our study consists of two numerical predictors (EC and X2 position) and three categorical predictors (location, month, and WYT), with nine targets (ion constituents). We maintain consistency with Hutton et al. (2022) by selecting the same predictors used in their study.
The input–output datasets are randomly divided into two groups for training (80% of the dataset) and testing (20% of the dataset). We evaluate the performance of the four ML models using two criteria: R2 (Equation (1)) and mean absolute error (MAE) (Equation (2)). R2 values range from 0 to 1, with values closer to 1 indicating that model simulations capture most of the variability in the observed data. MAE is a positive number, with values close to 0 signifying that the model-simulated values are very close to the observed values.
Using both R2 and MAE allows us to assess the performance of the models from different perspectives. R2 measures the proportion of variance in the observed data that can be explained by the model, while MAE provides a direct measure of the average error between observed and predicted values. Considering both criteria enables us to ensure that the selected model captures the overall data trends (as indicated by a high R2 value) while also minimizing the average error between observed and predicted values (as indicated by a low MAE value). This comprehensive evaluation helps to identify the most accurate and reliable model for predicting ion constituents in the Delta.





Decision trees
Decision trees are popular ML methods that can be applied to both regression and classification problems. This method stratifies the predictor space into several rectangular regions and assigns the mean of each region to all observed data included in that specific region (Loh 2011; James et al. 2013). Tree-based ML models are useful for interpretation, as their results indicate the importance of predictors. The split points, which are specific values where the tree decides to divide the data into different paths or branches, suggest the best threshold for each predictor.
Random forest
Random Forest (RF) is an ensemble learning method that has demonstrated strong predictive performance in addressing a wide range of classification and regression analysis problems (Breiman 2001; Liaw & Matthew 2002). One of the key features of RF is the use of the bootstrap technique, which is a resampling method that helps reduce the variance of statistical learning methods.
Bootstrap works by creating multiple samples of the original dataset by randomly selecting observations with replacement. This means that each sample can have multiple copies of the same observation, and some observations may be left out altogether. By creating multiple samples, the bootstrap technique helps to create variability in the data, which in turn helps to reduce the variance of the statistical learning method (Tibshirani & Efron 1993). This allows for the production of new populations from the primary population by resampling data (James et al. 2013).

In addition to bootstrapping, RF also incorporates the concept of feature randomness. At each split in a decision tree, a random subset of features is considered, which further increases the diversity of individual trees and reduces overfitting (Cutler et al. 2007). This combination of bootstrapping and feature randomness results in a robust and accurate ensemble model. The individual functions of the RF model were determined using the ‘sklearn’ library in the Python environment.
Gradient boosting



Artificial neural network
Artificial intelligence-based neural network (ANN) models have emerged as popular predictive tools in various domains, providing valuable insights for model identification, analysis, and forecasting. The ANN's effectiveness is largely due to its ability to model non-linear relationships between dependent and independent variables, which is particularly beneficial when dealing with complex real-world problems (Hopfield 1988; Zhang et al. 2015). Over the years, ANNs have found applications in diverse fields such as finance, environmental modeling, healthcare, and engineering (Gurney 1997; Zhang et al. 1998; Maier & Dandy 2000). The power of ANNs stems from their structure, which is inspired by the human brain's neural network. This allows them to learn and adapt their internal parameters to improve their predictive performance iteratively. The learning process involves adjusting the weights and biases within the network to minimize the error between the predicted and actual outputs (Haykin 1998; Bishop & Nasrabadi 2006). As a result, ANNs can uncover hidden patterns and relationships in data that might be missed by traditional linear regression or other ML techniques (Cybenko 1989; Hornik et al. 1989).
Moreover, ANNs have demonstrated the ability to handle noisy or incomplete data, making them particularly suitable for modeling complex systems where data quality may be an issue (Gardner & Dorling 1998; Karlik & Olgac 2011). Furthermore, ANNs have the advantage of being universal function approximators, meaning that they can theoretically approximate any continuous function to a desired level of accuracy, given an appropriate network structure and sufficient training data (Hornik et al. 1989). In summary, the versatility, robustness, and adaptability of ANNs have led to their widespread adoption as powerful predictive models for tackling a broad range of classification and regression analysis problems across various fields.
After detailing the individual characteristics, configurations, and justifications for employing each of the four ML models – DT, RF, GB, and ANN – we present a comparative summary table (Table 1). This table succinctly encapsulates the key features of each model, offering a side-by-side view to highlight their unique attributes and commonalities. By examining the table, readers can gain an overview of the models' underlying architectures, loss functions, learning algorithms, scalability, and other relevant features, thereby providing a comprehensive understanding of the tools used in our study.
Comparative overview of key features across selected ML models
Feature/Model . | Decision trees . | Random forests . | Gradient boosting . | Artificial neural networks . |
---|---|---|---|---|
Model type | Tree-based | Ensemble | Ensemble | Neural network |
Basic unit | Decision tree | Decision trees | Weak learners | Neurons |
Hidden layers | None | None | None | One or more |
Loss function | Gini/Entropy | Gini/Entropy | Various | MSE, cross-entropy, etc. |
Learning algorithm | ID3, CART, etc. | Bagging | Boosting | Gradient descent, Adam, etc. |
Regularization | Pruning | Voting/Averaging | Shrinkage | Dropout, weight decay, etc. |
Scalability | Moderate | High | Moderate to high | High |
Robustness | Moderate | High | High | Varies |
Interpretability | High | Moderate | Low | Low |
Speed/Efficiency (Training) | Fast | Moderate | Moderate | Varies |
Speed/Efficiency (Inference) | Fast | Fast | Fast | Fast |
Applications | Classification, regression | Classification, regression, anomaly detection | Classification, regression, ranking | Classification, regression, NLP, image processing |
Feature/Model . | Decision trees . | Random forests . | Gradient boosting . | Artificial neural networks . |
---|---|---|---|---|
Model type | Tree-based | Ensemble | Ensemble | Neural network |
Basic unit | Decision tree | Decision trees | Weak learners | Neurons |
Hidden layers | None | None | None | One or more |
Loss function | Gini/Entropy | Gini/Entropy | Various | MSE, cross-entropy, etc. |
Learning algorithm | ID3, CART, etc. | Bagging | Boosting | Gradient descent, Adam, etc. |
Regularization | Pruning | Voting/Averaging | Shrinkage | Dropout, weight decay, etc. |
Scalability | Moderate | High | Moderate to high | High |
Robustness | Moderate | High | High | Varies |
Interpretability | High | Moderate | Low | Low |
Speed/Efficiency (Training) | Fast | Moderate | Moderate | Varies |
Speed/Efficiency (Inference) | Fast | Fast | Fast | Fast |
Applications | Classification, regression | Classification, regression, anomaly detection | Classification, regression, ranking | Classification, regression, NLP, image processing |
K-fold cross-validation
After finalizing the ion constituent simulation models using various ML techniques, we compared their performance based on two evaluation criteria: R2 and MAE. Based on these criteria, we selected the best-performing model for our study. To further validate the performance and robustness of our chosen model, we employed the K-fold cross-validation method.
K-fold cross-validation is a widely used technique for assessing the performance of a model in ML and statistical modeling. It involves dividing the original dataset into K equal-sized subsets or ‘folds’ and then iteratively training the model on K–1 folds while using the remaining fold as a validation set. The process is repeated K times, with each fold serving as the validation set exactly once (Stone 1974; Geisser 1975; Efron 1983). In this example, we have used K = 5, dividing the dataset into five equal-sized subsets. These subsets were created randomly to ensure each fold was a good representation of the whole dataset, and we used a fixed random seed for reproducibility. Each fold represents 100% of the data, with 20% of the data being used for testing and 80% for training in each iteration. For each iteration, four of these subsets are used as the training set, and the remaining subset is used as the validation set. We opted for random K-fold cross-validation as opposed to stratified K-fold because our dataset did not exhibit significant imbalances in the distribution of target variables that would necessitate stratification.
The main advantage of K-fold cross-validation is that it allows us to assess the model's performance on different subsets of the data, providing a more comprehensive understanding of its generalization capability. This helps to prevent overfitting and ensures that the model is not biased toward a specific subset of the data. Since the model is tested on the entire dataset throughout the K iterations, it provides a more reliable performance estimation (Hastie et al. 2009).
In our study, we used K = 5, meaning that we divided our dataset into five equal-sized subsets. For each iteration, four of these subsets were used as the training set, and the remaining subset was used as the validation set. After completing all five iterations, we averaged the performance metrics (R2 and MAE) across the five validation sets to obtain the final performance estimation of our selected model.
Dashboard
In this study, we have developed an interactive and user-friendly dashboard to provide a convenient tool for users with or without programming knowledge to simulate ion levels in the Delta using our ML models. The dashboard allows users to explore the results of the four ML simulators (RT, RF, GB, and ANN) for nine ion constituents based on the selected hydrological conditions. Users can interactively input the five predictor variables (EC, Sacramento X2, Location, Month, and WYT) to generate simulations of ion levels from the four ML models.
To access the dashboard, users can navigate to the following URL using a web browser: https://dwrdashion.azurewebsites.net/Dashboard. A step-by-step guide for using the dashboard is also available at the website to assist first-time users in effectively leveraging its features.
Our pre-trained models are stored on a GitHub repository, and their functionality is made available through Microsoft Azure. By connecting the Azure server to the GitHub repository, the models are hosted and executed on the Azure server, ensuring seamless integration and accessibility.
The development of the interactive and user-friendly dashboard offers several advantages that make it a valuable resource for stakeholders, researchers, and decision-makers. Some of these advantages include:
Accessibility: The dashboard is designed to be accessible to users with various levels of technical expertise, removing the barrier of programming knowledge and allowing a wider audience to benefit from the simulation results.
Real-time results: By leveraging the power of Microsoft Azure, the dashboard provides real-time results based on user inputs, enabling users to explore different scenarios and assess the impacts of various hydrological conditions on ion levels in the Delta.
Scalability: Microsoft Azure offers robust scalability, allowing the dashboard to handle a large number of users and data inputs without compromising performance or reliability. This feature ensures that the tool remains available and responsive even during periods of high demand (Microsoft 2021).
Ease of maintenance and updates: Hosting the pre-trained models on GitHub and utilizing Microsoft Azure for the dashboard makes it easier to maintain and update the models as new data becomes available or as improvements are made to the algorithms. This ensures that users have access to the latest and most accurate information at all times.
Data security and privacy: Microsoft Azure is a secure and reliable platform that adheres to strict security standards, ensuring that user data and the models are protected and confidential information is not compromised (Microsoft 2021).
Collaboration: The dashboard provides a common platform for stakeholders, researchers, and decision-makers to collaborate and share insights, fostering a data-driven approach to understanding and addressing the challenges faced in managing the Delta's water system.
In summary, the dashboard, in combination with Microsoft Azure, offers a powerful, accessible, and scalable solution for simulating ion levels in the Delta, promoting data-driven decision-making, and facilitating collaboration among various stakeholders.
RESULTS
This section first presents the performance of the equations developed by Hutton et al. (2022) in simulating nine ion constituents at 30 locations in the Delta. The performance of the proposed models is evaluated next.
Simulation of ion constituents using the benchmark model
The performance of the benchmark model on nine ion constituents is evaluated using two metrics, R2 and MAE (Table 2). The ion constituents are divided into three groups (Group 1, Group 2, and Group 3), as mentioned in the methodology section. The number of samples (sample size) and data range for each ion are provided in Table 2. The standard deviation (SD) for each ion is given, which represents the variability or dispersion of the ion concentration values in the dataset.
Performance of the benchmark model in simulating ion constituents in the Delta
Group . | Ion . | Sample size . | Data range . | SD . | R2 . | MAE . |
---|---|---|---|---|---|---|
Group 1 | TDS | 1,466 | 49–2,120 | 204 | 0.99 | 12.7 |
Mg2+ | 1,336 | 2–102 | 8.6 | 0.96 | 1.24 | |
Group 2 | Na+ | 1,575 | 6–343 | 44 | 0.94 | 4.77 |
Ca2+ | 1,335 | 5.8–244 | 18 | 0.87 | 3.31 | |
Cl− | 1,972 | 4–775 | 77 | 0.92 | 10.26 | |
Group 3 | ![]() | 1,066 | 5–350 | 46.5 | 0.52 | 14.61 |
Br− | 1,239 | 0.01–2.3 | 0.22 | 0.9 | 0.04 | |
Alkalinity | 1,039 | 26–198 | 27.6 | 0.79 | 9.52 | |
K+ | 1,148 | 0.87–11 | 1.35 | 0.62 | 0.51 |
Group . | Ion . | Sample size . | Data range . | SD . | R2 . | MAE . |
---|---|---|---|---|---|---|
Group 1 | TDS | 1,466 | 49–2,120 | 204 | 0.99 | 12.7 |
Mg2+ | 1,336 | 2–102 | 8.6 | 0.96 | 1.24 | |
Group 2 | Na+ | 1,575 | 6–343 | 44 | 0.94 | 4.77 |
Ca2+ | 1,335 | 5.8–244 | 18 | 0.87 | 3.31 | |
Cl− | 1,972 | 4–775 | 77 | 0.92 | 10.26 | |
Group 3 | ![]() | 1,066 | 5–350 | 46.5 | 0.52 | 14.61 |
Br− | 1,239 | 0.01–2.3 | 0.22 | 0.9 | 0.04 | |
Alkalinity | 1,039 | 26–198 | 27.6 | 0.79 | 9.52 | |
K+ | 1,148 | 0.87–11 | 1.35 | 0.62 | 0.51 |
The sample size for each ion constituent varies, with alkalinity having the minimum sample size of 1,039 grab samples and Cl-having the maximum sample size of 1,972 samples. The benchmark model, based on the equations by Hutton et al. (2022), demonstrates the best performance for ion constituents in Group 1, which have a strong linear relationship with EC. The performance of the model decreases for ion constituents in Groups 2 and 3.
In particular, the R2 values for and K+ are 0.52 and 0.62, respectively, suggesting that there is significant room for improvement in the estimation of these two constituents. Although the R2 value for TDS is quite high at 0.99, it does not necessarily imply that the model is near perfect. The second metric, MAE, is 12.7 milligrams per liter (mg/l), which is not negligible. One of the objectives of this study is to decrease the MAE difference between observed and simulated values, even for ion constituents yielding high R2 values in Hutton et al.’s (2022) study.
Simulation of ion constituents via proposed models
This section assesses the performance of four alternative ion simulation models and aims to compare their performance with each other in order to select the best model. Once the best model is identified, its performance is compared against the benchmark Hutton et al.’s (2022) model. The generalization performance of a model developed via an ML method is based on its ability to predict test data not used in training. Assessment of this performance is crucial for selecting the most suitable model and measuring its usefulness. Test error, which is the model prediction error over a test sample of data not used in training, serves as a key metric in this evaluation. One of the best approaches for training and testing a model is to randomly divide the data into two parts: training data and test data. The training data are used to fit or develop the models, while the test data are used to assess the model generalization error by comparing simulated ion concentrations to observed values not used in the model development. This approach helps mitigate the risk of overfitting, ensuring that the chosen model provides meaningful results for a variety of conditions.
In this study, a random hyperparameter search was performed to optimize the ANN models for each ion constituent. The results of this search, presented in Table 3, show the selected number of neurons (N) and activation functions (Act) for each ion constituent model. Interestingly, the number of neurons in the ANN models increases for ion constituents belonging to Group 2 and further increases for Group 3. This finding suggests that as the non-linearity and complexity of the ion constituent models increase, the ANN models require more neurons to accurately simulate these constituents. This observation aligns with the expectation that more complex relationships between input and output variables necessitate a higher number of neurons in the hidden layers of the ANN models, providing better performance in capturing non-linear relationships.
Optimal number of neurons and activation functions for each ion constituent model
. | TDS . | Mg2+ . | Na+ . | |||
---|---|---|---|---|---|---|
Hidden layer . | N . | Act . | N . | Act . | N . | Act . |
1 | 30 | elu | 30 | relu | 30 | tanh |
2 | 30 | sigmoid | 30 | elu | 30 | elu |
3 | 30 | elu | 30 | tanh | 30 | sigmoid |
4 | 30 | relu | 30 | relu | 30 | elu |
. | Ca2+ . | Cl− . | ![]() . | |||
Hidden layer . | N . | Act . | N . | Act . | N . | Act . |
1 | 40 | elu | 30 | relu | 44 | relu |
2 | 40 | sigmoid | 30 | elu | 44 | relu |
3 | 40 | relu | 30 | simoid | 44 | relu |
4 | 30 | tanh | 30 | elu | 22 | relu |
. | Br− . | Alkalinity . | K+ . | |||
Hidden layer . | N . | Act . | N . | Act . | N . | Act . |
1 | 44 | elu | 30 | tanh | 44 | relu |
2 | 44 | sigmoid | 30 | relu | 44 | relu |
3 | 30 | elu | 30 | tanh | 44 | relu |
4 | 30 | tanh | 30 | elu | 22 | relu |
. | TDS . | Mg2+ . | Na+ . | |||
---|---|---|---|---|---|---|
Hidden layer . | N . | Act . | N . | Act . | N . | Act . |
1 | 30 | elu | 30 | relu | 30 | tanh |
2 | 30 | sigmoid | 30 | elu | 30 | elu |
3 | 30 | elu | 30 | tanh | 30 | sigmoid |
4 | 30 | relu | 30 | relu | 30 | elu |
. | Ca2+ . | Cl− . | ![]() . | |||
Hidden layer . | N . | Act . | N . | Act . | N . | Act . |
1 | 40 | elu | 30 | relu | 44 | relu |
2 | 40 | sigmoid | 30 | elu | 44 | relu |
3 | 40 | relu | 30 | simoid | 44 | relu |
4 | 30 | tanh | 30 | elu | 22 | relu |
. | Br− . | Alkalinity . | K+ . | |||
Hidden layer . | N . | Act . | N . | Act . | N . | Act . |
1 | 44 | elu | 30 | tanh | 44 | relu |
2 | 44 | sigmoid | 30 | relu | 44 | relu |
3 | 30 | elu | 30 | tanh | 44 | relu |
4 | 30 | tanh | 30 | elu | 22 | relu |
Performance of four alternative models (train for 80% and test for 20% of samples) to simulate Cl−. RT, regression trees; RF, random forest; GB, gradient boosting; ANN, artificial neural network.
Performance of four alternative models (train for 80% and test for 20% of samples) to simulate Cl−. RT, regression trees; RF, random forest; GB, gradient boosting; ANN, artificial neural network.
Performance of four alternative models (train for 80% and test for 20% of samples) to simulate .
Performance of four alternative models (train for 80% and test for 20% of samples) to simulate .
Performance of four alternative models (train for 80% and test for 20% of samples) to simulate Br−.
Performance of four alternative models (train for 80% and test for 20% of samples) to simulate Br−.
Overall, the results demonstrate that the ANN model outperforms the other models in simulating Group 1 constituents (TDS and Mg2+), Group 2 constituents (Na+, Ca2+, and Cl−), and two constituents in Group 3 (Br− and ), based on both R2 and MAE values during testing. For alkalinity and K+ in Group 3, the RF model exhibited slightly better performance. These findings indicate the potential of ANNs in improving the estimation of ion concentrations in various groups when compared to traditional RT, RF, and GB models.
MAE values for the nine ion constituents across the 5-fold cross-validation using the selected ANN models vs. MAE of benchmark model.
MAE values for the nine ion constituents across the 5-fold cross-validation using the selected ANN models vs. MAE of benchmark model.
A comparison of the box plots with the stars reveals that the ANN models consistently outperform the benchmark model in terms of MAE during the K-fold cross-validation iterations. This finding underscores the superiority of the ANN models in simulating ion concentrations more accurately compared to the benchmark model, further highlighting the potential of ANNs in enhancing the estimation of ion concentrations in various groups.


ANN model performance on simulating the concentrations of nine ion constituents based on percent improvement from the benchmark model represented by R2 and MAE.
ANN model performance on simulating the concentrations of nine ion constituents based on percent improvement from the benchmark model represented by R2 and MAE.
Ion simulator dashboard
Screenshot of the interactive dashboard interface, displaying the results of the four ML models (ANN, RT, RF, and GB) for simulating nine ion constituents in the Delta region.
Screenshot of the interactive dashboard interface, displaying the results of the four ML models (ANN, RT, RF, and GB) for simulating nine ion constituents in the Delta region.
The dashboard features dropdown menus, sliders, and other interactive elements that allow users to easily customize their queries. For instance, the EC value can be adjusted via a slider, while the location and WYT can be selected from dropdown lists. After the desired inputs are selected, the user can click a ‘Compute’ button, and the ion concentration predictions for each of the four models will be displayed in graphical form, as preferred by the user.
The dashboard's interactive features enable users to adjust input parameters and visualize the outcomes for different hypothetical hydrological conditions, comparing the performance of ANN, RT, RF, and GB models.
DISCUSSION
Numerous studies have explored the use of parametric regression equations to predict ion constituents in various water bodies (Jung 2000; Suits 2002; Hutton 2006, Hutton et al. 2022; Denton 2015). Each successive study has built upon the previous work to improve the predictive performance of these equations. Hutton et al. (2022) represent the latest advancement in this line of research, offering the most accurate parametric regression models to date.
Contrary to previous studies that relied on parametric regression equations, our study found that ML models not only simplify the prediction process but also improve the accuracy of ion constituent estimates. Specifically, our models showed better performance for ions that have non-linear relationships with EC, addressing a significant limitation in existing models. This demonstrates the adaptability and robustness of ML algorithms in dealing with complex hydrological data.
In a previous study, we (Namadi et al. 2022) took the pioneering step of applying ML models to simulate ion constituents' levels in the South Delta, using a dataset that spanned from 2018 to 2020. Although that study laid important groundwork, its scope was limited, both in terms of geographic coverage and time span. The current study addresses these limitations and significantly expands on this foundational work. By integrating three datasets, we have been able to cover a more extensive range of stations and a prolonged period of time, thus ensuring a more representative sample of water quality conditions in the region. Specifically, we included data from 30 stations and extended the time span to 64 years, thereby providing a robust foundation for our analysis.
To simulate ion constituents, we employed four different ML models: RT, RF, GB, and ANN. The rationale behind testing four distinct algorithms lies in determining which model is best suited to address our research problem, as each model offers unique advantages. However, it is crucial to further elaborate on the limitations of these models for a more comprehensive understanding.
RTs are easily interpretable, as they provide a clear visualization of the decision-making process. However, they can suffer from overfitting, which may limit their generalization performance and pose a problem when applying these models to new, untested locations. RFs use an ensemble of decision trees to alleviate the overfitting problem associated with single RT, but their interpretability can be compromised due to the ensemble approach.
GB combines multiple weak learners to produce a strong learner. It offers improved predictive performance compared to individual trees but may be computationally intensive, making it less suitable for real-time applications where quick predictions are needed. ANNs are powerful models capable of capturing complex, non-linear relationships; however, they can be more difficult to interpret and require a substantial amount of data for training, limiting their usefulness in settings with limited data availability.
Moreover, all the models have inherent limitations when handling missing or imbalanced data, a frequent issue in hydrological studies. These limitations could potentially impact the reliability and applicability of our findings. Therefore, acknowledging these constraints not only provides a more balanced view of our study but also helps identify areas for future research and iterative model refinement.
In our evaluation, we used two performance metrics, R2 and MAE, to assess the models' performance. The use of both metrics is essential because relying on just one of them can be misleading. While R2 measures the proportion of the variance in the dependent variable that is predictable from the independent variables, it may not fully capture the model's accuracy, especially when the errors are large. On the other hand, MAE provides a more direct measure of the average magnitude of the errors, making it a valuable complementary metric to R2 in determining the model's overall performance.
The results indicate that ANNs outperformed the other models for simulating Group 1 (TDS and Mg2+) and Group 2 (Na+, Ca2+, and Cl−) ion constituents, as well as Br− and in Group 3. ANN can generate comparable results to RF for alkalinity and K+. One possible explanation for this is that ANNs are adept at capturing complex, non-linear relationships in data, a feature particularly useful for hydrological variables like ion concentrations that may not follow linear patterns. Unlike traditional regression models, ANNs can identify hidden layers of abstraction or features in the data, enabling more accurate predictions. Moreover, the architecture of the ANN allows for more intricate connections between variables, which could be critical in capturing the multi-faceted relationships in ion concentrations. The use of K-fold cross-validation, with K = 5, confirmed the robustness of the ANN model, demonstrating that the model is not overfitted and can generalize well to new data.
While the benchmark model provided satisfactory results for Groups 1 and 2, the ANN models demonstrated even better performance with notably smaller errors (measured by MAE). For Group 3 ion constituents (, Br−, alkalinity, and K+), where the ANN models showed a remarkable improvement in performance compared to the benchmark model. Specifically, for these Group 3 ion constituents, the ANN model improved MAE by a range of 20–59%. This marked increase in performance emphasizes the distinct advantages of using ANN models over traditional parametric regression equations, particularly in capturing the complex, non-linear relationships between ion constituents and EC. The development of a user-friendly dashboard has made it possible for users with or without programming knowledge to interact with and visualize the results of the ML simulators. The dashboard, which is hosted on Microsoft Azure, offers a convenient way to explore hypothetical hydrological conditions and compare the results of different ML models for ion constituents.
Despite the promising results achieved in this study, there are some limitations that should be acknowledged. One of the main limitations is that the ML models developed in this study were trained for 30 different water quality stations in the Delta. Consequently, the applicability of these models is restricted to these specific stations. This limitation may hinder the utility of the models in predicting ion constituents in areas of the Delta not covered by the current dataset. To address this limitation, follow-up work is planned to expand the scope and applicability of the ML models. Specifically, we will develop and apply ML models to additional locations in the Delta, thus ensuring a more comprehensive understanding of water quality dynamics in the region. This will involve collecting new data and updating the models to accommodate the additional information, allowing for more accurate predictions in previously unexplored areas. Also, this may include the use of transfer learning techniques, which can adapt the models to new locations using a limited amount of new data.
Furthermore, it is essential to recognize that water quality conditions and their drivers may change over time due to various factors such as climate change, land use changes, or evolving water management practices. To ensure the continued relevance and accuracy of our models, we will regularly update them with the most recent data and evaluate their performance against emerging trends and conditions. To maintain this ongoing relevance, our update process will adhere to a structured framework. This involves initial data collection from reliable sources, preprocessing the acquired data, retraining the models with both old and new data, and conducting a rigorous testing phase for model validation. Finally, the updated models will be deployed to replace the older versions in the dashboard. By addressing these limitations in our follow-up work, we aim to provide even more valuable tools for understanding the impacts of salinity on water quality in the Delta and informing water management decisions that protect aquatic life and ensure the safety of drinking water supplies. For instance, water resource managers and policymakers can use the models and dashboard to simulate various water management scenarios and assess their impacts on ion constituent concentrations in the Delta. This will enable them to make well-informed decisions regarding the allocation of water resources and the implementation of water conservation measures to maintain optimal water quality and preserve aquatic ecosystems. Furthermore, public health officials can use the dashboard to monitor water quality in real-time and identify areas where ion concentrations exceed regulatory standards. This will allow for prompt action to protect public health by issuing advisories, implementing treatment processes, or taking other necessary measures to ensure the safety of drinking water supplies.
CONCLUSION
This study represents a paradigm shift in the approach to water quality modeling in the Delta region. By leveraging ML techniques, our research not only demonstrates equal or better performance compared to traditional parametric regression equations but also introduces a level of adaptability previously absent in the field. Particularly, the use of ANN and RF models showcased superior ability in simulating ion constituent concentrations, addressing the limitations of non-linear relationships that have constrained previous models.
A pioneering aspect of our work is the creation of an interactive dashboard accessible to users across various disciplines and levels of technical expertise. This tool serves as a nexus for data-driven, informed decision-making, significantly demystifying the complexities of water quality dynamics for stakeholders, researchers, and policymakers. This platform has immediate utility, providing real-time guidance for water management scenarios, thus potentially leading to more sustainable practices and better public health outcomes.
Our research marks a substantial contribution to water quality modeling by introducing ML as a robust alternative to traditional modeling techniques. It opens up exciting avenues for future work, such as extending ML models to other geographical locations within the Delta, refining models based on real-time data, and possibly incorporating additional variables like climate change factors. Such directions could lead to a more comprehensive, adaptable, and forward-looking approach to water quality management in the Delta and beyond. These advancements not only deepen our understanding of water quality in the Delta but also offer actionable insights for real-world applications, ranging from resource allocation to environmental conservation and public health protection.
ACKNOWLEDGEMENTS
We appreciate the support of the Municipal Water Quality Investigations (MWQI) program. Our sincere gratitude is extended to Tetra Tech, Inc. for generously providing the data essential for this study. We also wish to thank our colleagues in the Water Quality Evaluation Section of the North Central Region Office, whose diligent efforts in collecting the grab samples were invaluable to our research.
DATA AVAILABILITY STATEMENT
All relevant data are available from https://github.com/PeymanHNamadi/Ion_Study_Dashboard/tree/main.
CONFLICT OF INTEREST
The authors declare there is no conflict.