This study presents an in-depth analysis of machine learning (ML) techniques for predicting water quality index and water quality classification using a dataset containing water quality metrics such as temperature, specific conductance, salinity, dissolved oxygen, depth, pH, and turbidity from multiple monitoring stations. Data preprocessing included imputation for missing values, feature scaling, and categorical encoding, ensuring balanced input features. This research evaluated artificial neural networks, decision trees, support vector machines, random forests, XGBoost, and long short-term memory (LSTM) networks. Results demonstrate that XGBoost and LSTM significantly outperformed other models, with XGBoost achieving an accuracy range of 99.07–99.99% and LSTM attaining an R2 of 0.9999. Compared with prior studies, our approach enhances predictive accuracy and robustness, showcasing advanced generalization capabilities. The proposed models exhibit significant improvements over traditional methods in handling complex, multivariate water quality data, positioning them as promising tools for water quality prediction and environmental management. These findings underscore the potential of ML for developing reliable, scalable water quality monitoring solutions, providing valuable insights for policymakers and environmental managers dedicated to sustainable water resource management.

  • XGBoost and LSTM excel in WQI/WQC prediction.

  • XGBoost achieves 99.83% peak accuracy.

  • LSTM boasts an R2 value of 0.9999.

  • Advanced models enhance water quality data management.

  • Results inform sustainable water resource strategies.

ANFIS

adaptive neuro fuzzy inference system

ANN

artificial neural network

Bi-LSTM

bi-directional LSTM

CV-3

cross-validation

DT

decision tree

GBoost

gradient boosting classifier

GBR

gradient boosting regression

LSTM

long short-term memory

ML

machine learning

MLP

multilayer perceptron

R2

coefficient of determination

RF

random forest

SVM

support vector machine

SVR

support vector regression

WAWQI

weighted arithmetic WQI

WQC

water quality classification

WQI

water quality index

XGBR

XGBoost regression

Water quality degradation has become a significant environmental concern worldwide, impacting human health, ecosystem stability, and economic development (Acheampong & Opoku 2023). Pollution discharge into sensitive ecosystems has profoundly disrupted their health and functionality, leading to issues like biodiversity loss and reduced ecosystem services (Mahdian et al. 2024). Globally, the growing levels of pollution from various sources such as industrial effluents, agricultural runoff, and urban wastewater have intensified the contamination of water bodies (Guo et al. 2024). These discharges introduce nutrient loads, emerging pollutants, and heavy metals that degrade water quality. For instance, nutrient loading from agricultural runoff contributes to eutrophication, promoting harmful algal blooms and depleting oxygen levels, which threaten aquatic life (Masum Beg et al. 2024; Tian et al. 2024b). In parallel, the presence of heavy metals like lead, mercury, and cadmium poses long-term risks to both ecosystems and human health due to their bioaccumulative nature (Bhagat et al. 2020; Mohammadpour et al. 2024; Tian et al. 2024a).

The complexity of water quality issues requires robust predictive models to anticipate changes and support timely interventions. Traditional approaches to water quality monitoring and prediction, while valuable, are often limited in handling large datasets and capturing non-linear relationships. Recent advancements in machine learning (ML) and artificial intelligence offer promising alternatives (Amaranto & Mazzoleni 2023; Bhagat et al. 2023a). However, despite significant progress, current ML models in water quality assessment face challenges in areas like uncertainty quantification, limited adaptability across diverse hydrological contexts, and scalability to new geographical regions (Ghiasi et al. 2022; Zhang et al. 2022).

Related work

In the field of water quality evaluation, researchers have leveraged various computational methods to predict and assess water resource conditions. Commonly employed techniques include artificial neural networks (ANNs), support vector regression (SVR), and decision trees (DTs), each offering unique strengths and limitations. For example, a team of researchers examined the use of ANN and SVR, highlighting their ability to capture non-linear relationships in water quality data (Fang et al. 2019). However, these models often require large datasets and are sensitive to parameter tuning, limiting their generalizability across different environmental contexts. The ML methods have become highly effective tools for forecasting and managing environment issues such as rainfall or water quality, offering the ability to handle complex datasets and uncover patterns that traditional methods may overlook (Kumar et al. 2021). Various studies have demonstrated the efficacy of ML models in water quality prediction (Liao et al. 2020). For instance, Liu et al. (2019) utilized a long short-term memory (LSTM) network to predict water quality in the Yangtze River Basin, showing significant potential for real-time monitoring. Sakshi Khullar and Nanhey Singh introduced a bi-directional LSTM (Bi-LSTM) model that surpassed conventional methods in forecasting water quality parameters of the Yamuna River (Khullar & Singh 2020). Similarly, Abba et al. (2020) compared multiple ML techniques and found that adaptive neuro fuzzy inference system (ANFIS) and multilayer perceptron (MLP) provided reliable forecasts for the water quality index (WQI) (Abba et al. 2021).

Studies such as Khullar & Singh (2022) introduced a Bi-LSTM model for the Yamuna River in India, demonstrating enhanced accuracy for water quality parameter forecasting. Despite these improvements, Bi-LSTM models face challenges in handling data with limited historical records and require substantial computational resources, which can restrict their application. A comparative study by Abba et al. (2020) evaluated backpropagation neural networks, ANFIS, and MLP for WQI estimation. While their findings showed that neural ensembles offered robust predictive capabilities, limitations in uncertainty quantification and interpretability remain prevalent. Elbeltagi et al. (2022) advanced this discussion by using additive regression and M5P tree models, which were shown to be effective in specific basins like Akot, yet these models may not generalize well across diverse water bodies. In addition, Asadollah et al. (2021) implemented extra tree regression to forecast monthly WQI in Hong Kong, demonstrating improved forecasting accuracy with reduced input variables. However, ETR and similar ensemble methods such as random forest (RF) often encounter overfitting in smaller datasets, particularly in urban water systems with high variability in water quality parameters.

Studies such as Hassan et al. (2021) further explored ML techniques across a large dataset in India, using methods like Bayesian tree models and multiple linear regression alongside RF and SVM. While their results indicated high predictive accuracy (up to 99.99%), these studies highlighted the difficulty of managing overfitting and the need for robust uncertainty quantification, a persistent challenge in ML-based water quality models. Dodig et al. (2024) applies LSTM networks for multistep water quality prediction, focusing on dissolved oxygen, conductivity, and chemical oxygen demand. Using the Sava River as a case study, the authors combine LSTM with LOcally WEighted Scatterplot Smoothing for enhanced data preprocessing, comparing results with an SVR baseline. The LSTM model outperforms SVR, achieving high accuracy (R2 up to 0.9998) and low RMSE for a 5-day prediction period, demonstrating its reliability for water quality monitoring (Dodig et al. 2024).

Gap analysis and research objectives

Despite these advancements, several gaps remain in the literature. Previous studies often focus on a limited set of ML techniques or specific WQIs (Yan et al. 2024), which may not fully capture the complex and multifaceted nature of water quality prediction. Additionally, there is a need for more comprehensive evaluations of these models using diverse datasets and performance metrics to ensure their robustness and generalizability across different geographical and hydrological contexts (Chen et al. 2020a). Most existing models struggle with uncertainty quantification and require large, diverse datasets for accurate performance, limiting their scalability. Additionally, the adaptability of these models to varying hydrological conditions and pollutant profiles remains limited. The research endeavor seeks to fill these identified deficiencies by performing a thorough assessment of various sophisticated ML algorithms for the prediction of water quality (Zhou & Zhang 2023). Hence, this study addresses these gaps by conducting a comprehensive and methodological comparison of a diverse set of models, including ANN, support vector machine (SVM), DT, RF, XGBoost, and LSTM networks. This comparison is based on rigorous testing using robust datasets of WQI and water quality classification (WQC). To enhance the performance of the models, we implemented grid search for hyperparameter optimization, which methodically examines a spectrum of hyperparameter combinations to ascertain the optimal configuration for each individual model (Salehin et al. 2024). The results of this study clarified the relative significance of different water quality parameters. EC, nitrate, DO, pH, BOD, and TC were identified as crucial indicators for assessing water quality, with respective parameter significance scores of 81.494, 74.78, 105.770, 36.805, 130.173, and 105.166. These results provide valuable insights into the hierarchical importance of different water quality metrics, potentially informing future monitoring and management strategies in diverse hydrological contexts. Also, this research involves assessing the robustness, scalability across varied water quality parameters, strengths, and limitations of each model to understand their practical applications in environmental monitoring and assessment. Another key objective is to provide guidance for researchers and practitioners in selecting the most appropriate ML techniques for evaluating and managing water quality using accuracy, precision, recall, F1 score, R2, RMSE, and MSE. Our approach aims to advance model generalizability and reliability, providing novel insights into ML's role in environmental management and water quality assessment (Miller et al. 2024). The research also aims to demonstrate the potential of ML models to enhance water quality monitoring and management, thereby contributing to public health and environmental sustainability. This involves showcasing the practical implications of the study's findings for real-world applications. The novelty of the study lies in its comprehensive comparison of multiple advanced ML models for water quality prediction, using a robust dataset and a wide range of performance metrics. By providing a broader international context and emphasizing the global significance of water quality management, this paper seeks to provide significant insights into the utilization of ML techniques for environmental monitoring. The results of this study present practical implications for policymakers and researchers, and practitioners in the field of water quality management, promoting the adoption of advanced predictive models to safeguard water resources worldwide.

Water pollution is a key environmental concern confronting mankind, with much of the consequent harm owing to weak forecasting, early warning, and emergency management capacities. Thus, developing an effective monitoring and early warning system to facilitate informed decision-making and water quality management is a crucial scientific and technical priority that requires urgent attention (Liao et al. 2020). In recent years, many ML approaches have made tremendous progress. Figure 1 depicts the proposed mechanism for forecasting water quality.
Figure 1

A schematic flow diagram of the proposed mechanism for forecasting water quality.

Figure 1

A schematic flow diagram of the proposed mechanism for forecasting water quality.

Close modal

Figure 1's proposed methodology utilizes an advanced ML technique for assessing water quality, leveraging a comprehensive dataset that includes seven essential parameters: dissolved oxygen, pH, conductivity, biological oxygen demand, nitrate, fecal coliform, and total coliform. Before analysis, the dataset was subjected to preprocessing steps, such as mean imputation for handling missing values and data normalization to maintain consistency among the variables. In adherence to standard ML practices, the dataset was partitioned into training and testing subsets, with 80% allocated for model training and the remaining 20% reserved for subsequent evaluation. The research protocol incorporates two distinct analytical objectives: WQC and WQI prediction.

For the classification task, five advanced algorithms were employed: ANN, RF, DT, SVM, and XGBoost. Concurrently, the WQI prediction utilized LSTM, XGBoost, DT, and RF algorithms. To optimize model performance, a rigorous hyperparameter tuning process was implemented during the training phase. This process utilized a grid search methodology in conjunction with three-fold cross-validation (CV-3), ensuring robust model selection and minimizing the risk of overfitting.

Dataset description and processing

The dataset used in this study comprises essential water quality metrics recorded at 30-min intervals from multiple monitoring stations over a period spanning from 2004 to 2006. Specifically, the dataset consists of 61,542 instances. Key input parameters include temperature, specific conductance, salinity, dissolved oxygen percentage, dissolved oxygen concentration, depth, pH, and turbidity. These metrics collectively provide a comprehensive representation of water quality, supporting robust modeling and predictive analyses across varied environmental conditions.

To ensure data quality and consistency, several data preparation steps were undertaken. Missing values in numerical features were imputed using mean values, while categorical variables were completed with mode imputation. All input features were then standardized to a mean of zero and a standard deviation of one to ensure balanced input contributions across the dataset, optimizing the dataset for ML applications and supporting equitable performance across all water quality parameters.

For the experimental setup, data from 1 January 2004 to 24 February 2006 was allocated for model training, while the final 10 months (from 25 February 2006 to 31 December 2006) served as the testing period. A lead time of 24 h was applied for predictions, meaning the model forecasts water quality metrics one day in advance, based on preceding data. This approach enables the assessment of model performance for short-term predictions, providing valuable insights into daily water quality trends and facilitating proactive water management strategies. This dataset was obtained from GitHub Public repository (https://github.com/PritiG1/Multivariate_forecasting_waterquality).

Figure 2 displays the autocorrelation analysis of the WQI revealed strong temporal dependencies, with high autocorrelation persisting across multiple lags. This suggests that WQI values are highly influenced by recent past values, reflecting stable patterns over time.
Figure 2

Autocorrelation plot of WQI with 30-min intervals.

Figure 2

Autocorrelation plot of WQI with 30-min intervals.

Close modal

The output parameters contained both the WQI and WQC, which were obtained from the processed input characteristics. The dataset was separated into training and testing sets to analyze the performance of the ML models effectively. Table 1 presents the statistical values for various water quality parameters.

Table 1

Statistical calculation of the features

StatisticTempSpCondSalDO_pctDO_mglDepthpHTurbWQI
Mean 16.95 0.173 0.100 82.72 8.423 0.301 6.875 0.017 25.50 
Std 8.27 0.036 0.015 23.84 3.172 0.147 0.386 0.065 10.54 
Min 0.90 0.010 0.000 2.80 0.20 −0.13 5.500 0.003 5.70 
25% 9.10 0.150 0.100 73.60 6.20 0.200 6.600 0.009 16.69 
50% 17.70 0.170 0.100 89.40 8.70 0.280 6.800 0.011 25.11 
75% 24.70 0.190 0.100 97.30 11.00 0.380 7.100 0.014 33.09 
Max 33.50 0.610 0.300 157.20 14.80 1.300 9.600 2.531 63.06 
StatisticTempSpCondSalDO_pctDO_mglDepthpHTurbWQI
Mean 16.95 0.173 0.100 82.72 8.423 0.301 6.875 0.017 25.50 
Std 8.27 0.036 0.015 23.84 3.172 0.147 0.386 0.065 10.54 
Min 0.90 0.010 0.000 2.80 0.20 −0.13 5.500 0.003 5.70 
25% 9.10 0.150 0.100 73.60 6.20 0.200 6.600 0.009 16.69 
50% 17.70 0.170 0.100 89.40 8.70 0.280 6.800 0.011 25.11 
75% 24.70 0.190 0.100 97.30 11.00 0.380 7.100 0.014 33.09 
Max 33.50 0.610 0.300 157.20 14.80 1.300 9.600 2.531 63.06 

In addition to the forgoing data description, a relationship study is explored (see Figure 3) between input and output to learn better understanding of the dataset and its feature how they are related to each other? This is interesting to report that only temperature feature is positively correlated (0.84) with WQI. Also, the highest relations (−0.95 followed by −0.79) displayed by dissolved oxygen ( and followed by ) with WQI. The rest of the features are not potentially correlated with output (WQI). In short, there are two features named Temp and DO that remain highly correlated with output (WQI).
Figure 3

Heatmap of correlation matrices between the output and features of the dataset.

Figure 3

Heatmap of correlation matrices between the output and features of the dataset.

Close modal

ML models

This comprehensive research endeavor meticulously assesses the efficacy and overall performance of various sophisticated ML paradigms, which notably encompass SVM, DT, ANN, RF, XGBoost, and LSTM networks (Md Jahidul et al. 2024), each of which brings its own set of distinctive advantages and capabilities to the table. The selection of these particular models has been predicated upon their inherent strengths and their demonstrated aptitude for effectively managing and analyzing complex, high-dimensional datasets, which are characteristic of the intricate nature of water quality data typically encountered in environmental studies. The following information belongs to the applied model basic concepts:

The ANNs are computer models inspired by the structure of biological neural networks, meant to capture complex and non-linear interactions between inputs and outputs (Otchere et al. 2021). ANNs consist of layers of linked neurons, enabling them to learn and represent a wide range of functions given sufficient depth and data. In applications such as forecasting maximum scour depth, ANNs demonstrate adaptive learning capabilities that do not depend on predetermined functional forms, allowing them to adequately simulate the complicated non-linearities prevalent in hydraulic processes. A broad understanding of the model can be obtained from Chen et al. (2020b). This architecture and its inherent flexibility make ANNs particularly valuable in hydro-informatics, wastewater treatment, and other fields where understanding and predicting complex interactions are essential (Jawad et al. 2021).

Unlike traditional regression methods, SVR does not rely on assumptions about data distribution, making it versatile for various applications. SVR aims to find the optimal decision boundary that minimizes prediction error while maintaining model complexity. It uses a kernel function to transform input data into a higher-dimensional space, allowing it to handle non-linear relationships (Zhang & O'Donnell 2020). For broad understanding, a comprehensive study on SVR can be obtained (Smola & Schölkopf 2004; Zhang & O'Donnell 2020).

The RF is a robust ensembled learning approach used in ML for regression, classification, and prediction problems based on input characteristics. This approach generates numerous DTs during training and outputs the average prediction (for regression tasks) or the majority vote (for classification tasks) from all individual trees (Wang et al. 2018; Kombo et al. 2020). The operational mechanics of RF can be fetched from Antoniadis et al. (2021) research article.

XGBoost regression (XGBR) is an advanced implementation of gradient boosting that builds a predictive model by combining the strengths of multiple weak learners (typically DTs). XGBoost incorporates several enhancements over traditional gradient boosting regression (GBR), including regularization to prevent overfitting, parallelization for faster computation, and its ability to handle missing values (Chen & Guestrin 2016). For detailed understanding can be learnt from Meng et al. (2020) and Bhagat et al. (2022). For groundwater prediction, features related to hydrological data, geological features, weather patterns, and land use are considered (Zhao et al. 2024). These features might include groundwater levels at different depths, precipitation, temperature, soil type, and land cover.

The DT models are a widely used ML technique for both classification and regression tasks. They provide a clear visualization of the decision-making process and are capable of handling complex, non-linear relationships between input features and the target variable. For comprehensive understanding, and recent advancement in the DT model can be explored (Costa & Pedreira 2023). DT models are easy to understand and interpret because of their simplicity and interpretability. They provide a clear visualization of the decision process, making it straightforward to follow the prediction path (Safavian & Landgrebe 1991). DT models are computationally efficient, especially for large datasets. They can handle high-dimensional data without requiring extensive computational resources. Capable of handling non-linear connections, DT models are effective at capturing non-linear connections between input variables and the target variable. This makes them appropriate for difficult prediction challenges where linkages are not obvious (Breiman et al. 1984).

Performance measurements

The performance of the models was evaluated using various metrics for both classification (WQC) and regression (WQI) tasks. For classification, we used accuracy, precision, recall, and F1 score. For regression, we used R2, RMSE, and MSE. These metrics provide a comprehensive assessment of model performance, capturing both the accuracy and the robustness of the predictions.

Precision is defined as the ratio of correctly predicted positive instances to the total number of actual positive instances. A high precision value signifies a low rate of false positives, as shown in Equation (1):
(1)

Precision can be enhanced by adjusting the model parameters. However, it is essential to note that increasing precision often results in a decrease in recall, and similarly, an increase in recall usually leads to a reduction in precision (Smith & Doe 2020).

Recall, also termed sensitivity, which assesses the proportion of correctly identified positive instances in relation to the total number of actual positives. This metric is rigorously defined in Equation (2):
(2)

The recall value of any ML model can be adjusted by modifying various parameters or hyperparameters. Altering these parameters can either increase or decrease recall. When recall is high, most positive instances (true positives + false negatives) are identified as positive, resulting in more false positives and lower precision. Conversely, with low recall, there are more false negatives (positives incorrectly labeled as negatives), which implies that positive predictions are more reliable, albeit at the expense of missing some positive instances (Johnson & Williams 2021).

The F1 score is the weighted average of precision and recall, incorporating both false positives and false negatives into its calculation. Unlike accuracy, the F1 score provides a more balanced measure, particularly useful in scenarios with uneven class distributions. Equation (3) illustrates the formula for calculating the F1 score:
(3)

The F1 score is advantageous in most cases as it balances the importance of both precision and recall. This metric is particularly beneficial when positive and negative results are equally costly. However, if the costs of false positives and false negatives differ significantly, it is advisable to consider precision and recall separately (Lee & Kim 2022).

Accuracy is the most fundamental and straightforward performance metric, defined as the proportion of correctly predicted observations to the total number of observations. In datasets with a balanced class distribution, the rates of false positives and false negatives are nearly equivalent, making accuracy a suitable quality measure. However, to comprehensively evaluate a model's performance, accuracy must be considered alongside other parameters.
(4)

While accuracy offers a general assessment of model performance, it is critical to evaluate it in conjunction with other metrics, especially in cases of imbalanced datasets (Anderson & Taylor 2023).

Quantifies the average of the squared discrepancies between the observed values and the predicted values through MSE. This metric is articulated as follows:
(5)

Lower MSE values indicate better model performance (Kaliappan et al. 2021).

RMSE is derived by taking the square root of the MSE, offering an indication of the average size of the errors. It is computed as follows:
(6)

RMSE is expressed in the same units as the observed and predicted values, which facilitates the interpretation of the error magnitude (Willmott 1982).

The coefficient of determination (R2) signifies the proportion of variance in the observed data that can be explained by the independent variables. It is mathematically expressed as follows:
(7)

R2 values span from 0 to 1, where higher values denote improved model performance (Miles 2014).

Grid search for hyperparameter tuning

Grid search is a widely utilized technique in ML for hyperparameter tuning. It involves systematically searching through a predefined set of hyperparameters to determine the optimal combination for a given model. This process is crucial in enhancing model performance and preventing overfitting (Bhagat et al. 2021). Grid search mitigates this risk by exploring various hyperparameter configurations and selecting the one that offers the best performance on a validation set. This ensures that the model generalizes well to new data rather than memorizing the training set. By evaluating each combination of hyperparameters, grid search identifies settings that balance bias and variance, thus reducing the likelihood of overfitting. The importance of grid search in hyperparameter tuning and avoiding overfitting is well documented in the literature. As noted by Bergstra & Bengio (2012), systematic hyperparameter optimization methods such as grid search outperforms manual tuning and random search, leading to models that generalize better on unseen data.

WQC prediction

Hyperparameter tuning

Hyperparameter optimization was a crucial methodological step in this study, aimed at enhancing the predictive performance and robustness of each ML model. This process involved systematically exploring and fine-tuning key parameters specific to each algorithm, such as the number of trees and maximum depth for RF, the regularization parameter for SVM, the architecture and learning rate for ANN, the depth and splitting criteria for DTs, and the boosting stages and learning rate for XGBoost. By conducting a grid search and cross-validation for each model, we identified optimal configurations that minimized overfitting and improved accuracy, precision, recall, and F1 scores. The selected hyperparameters have been detailed in Supplementary Appendix Table A1, which is instrumental in ensuring the models' generalizability to unseen data, underscoring the value of hyperparameter tuning in achieving robust predictive models for water quality analysis.

Model performance

In this section, we present a comparative analysis of various ML models based on their performance metrics. The models were evaluated on both training and testing datasets, and the key metrics considered include accuracy, recall, precision, and F1 score. The purpose of this analysis is to determine which model exhibits the best overall performance and generalizability. The results of this evaluation are summarized in Table 2.

Table 2

Performance metrics for various ML models

ModelTraining
Testing
AccuracyRecallPrecisionF1AccuracyRecallPrecisionF1
XGBoost 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 
ANN 0.9931 0.9931 0.9935 0.9931 0.9893 0.9893 0.9903 0.9895 
DT 0.9923 0.9923 0.9923 0.9923 0.9897 0.9897 0.9896 0.9897 
SVM 0.9845 0.9845 0.9842 0.9843 0.9814 0.9814 0.9808 0.9811 
RF 0.9825 0.9825 0.9823 0.9808 0.9754 0.9754 0.9751 0.9725 
ModelTraining
Testing
AccuracyRecallPrecisionF1AccuracyRecallPrecisionF1
XGBoost 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 
ANN 0.9931 0.9931 0.9935 0.9931 0.9893 0.9893 0.9903 0.9895 
DT 0.9923 0.9923 0.9923 0.9923 0.9897 0.9897 0.9896 0.9897 
SVM 0.9845 0.9845 0.9842 0.9843 0.9814 0.9814 0.9808 0.9811 
RF 0.9825 0.9825 0.9823 0.9808 0.9754 0.9754 0.9751 0.9725 

Table 2 provides a detailed comparison of the performance metrics for each ML model. It is evident from the results that the XGBoost model achieved the highest accuracy, recall, precision, and F1 scores across both training and testing datasets, indicating its superior performance and robustness. On the other hand, the RF model, while performing well, shows a slight drop in accuracy, recall, and other metrics on the testing dataset compared to the training dataset, as well as against other presented in Table 2, suggesting a potential for overfitting through optimize RF's hyperparameters to address this challenge.

Overall, this comprehensive evaluation highlights the effectiveness of hyperparameter tuning and model selection in achieving high performance. The ANN and DT models also demonstrated strong results, particularly in terms of accuracy and F1 scores. These findings underscore the importance of thorough model evaluation and optimization in developing reliable predictive models for practical applications.

Furthermore, the feature importance analysis for the RF, DT, and XGBoost models is depicted in Figure 4. This analysis highlights the relative significance of each feature in predicting the target variable. It is clear that the feature ‘DO’ consistently exhibits the highest importance across all models, followed by ‘pH’ and ‘Temperature’ in varying degrees. These insights are crucial for understanding the factors that most significantly influence the model predictions, thereby facilitating more informed decision-making and model refinement.
Figure 4

Feature importance for RF, DT, and XGBoost models.

Figure 4

Feature importance for RF, DT, and XGBoost models.

Close modal
Figure 5 illustrates the confusion matrices for each model, providing a detailed view of the classification performance across different classes. These matrices help in understanding the distribution of prediction errors and the models' ability to correctly classify each category.
Figure 5

Confusion matrices for ANN, RF, DT, SVM, and XGBoost.

Figure 5

Confusion matrices for ANN, RF, DT, SVM, and XGBoost.

Close modal
Lastly, the overall model comparison in terms of accuracy is shown in Figure 6. This visualization underscores the superior performance of the XGBoost model compared with other models, affirming its robustness and reliability for predictive analysis.
Figure 6

Comparison between models in terms of accuracy.

Figure 6

Comparison between models in terms of accuracy.

Close modal

Overall, this comprehensive evaluation underscores the importance of hyperparameter tuning and feature importance analysis in developing robust and reliable ML models. The findings from both the performance metrics and feature importance analyses contribute to a deeper understanding of the models' behavior and their applicability to real-world predictive tasks.

WQI prediction

In this section, we present a comparative analysis of various ML models based on their performance metrics. The models were evaluated on both training and testing datasets, and the key metrics considered include R2, RMSE, and MSE. To ensure optimal performance, an extensive hyperparameter tuning process was employed using grid search. This involved a systematic and comprehensive search for the most effective combination of hyperparameters for each algorithm. The purpose of this analysis is to determine which model exhibits the best overall performance and generalizability. The results of this evaluation are summarized in Table 3.

Table 3

Performance metrics for various ML models

ModelTraining
Testing
R2RMSEMSER2RMSEMSE
LSTM 0.9999 0.0436 0.0019 0.9999 0.0378 0.0014 
XGBoost 0.9997 0.1709 0.0292 0.9997 0.1772 0.0314 
DT 0.9998 0.1601 0.0256 0.9998 0.1448 0.0210 
RF 0.9978 0.4958 0.2458 0.9973 0.5045 0.2546 
ModelTraining
Testing
R2RMSEMSER2RMSEMSE
LSTM 0.9999 0.0436 0.0019 0.9999 0.0378 0.0014 
XGBoost 0.9997 0.1709 0.0292 0.9997 0.1772 0.0314 
DT 0.9998 0.1601 0.0256 0.9998 0.1448 0.0210 
RF 0.9978 0.4958 0.2458 0.9973 0.5045 0.2546 

Table 3 provides a detailed comparison of the performance metrics for each ML model. It is evident from the results that the LSTM model achieved the highest R2 values and the lowest RMSE and MSE values across both training and testing datasets, indicating its superior performance and robustness. These metrics suggest that the LSTM model was able to capture complex patterns in the data effectively, likely due to the extensive grid search process that optimized its hyperparameters. Moreover, the XGBoost, DT, and RF models demonstrated strong performance, with high R2 values and relatively low RMSE and MSE values. The RF model, while performing well, showed slightly lower R2 values and higher RMSE and MSE values compared with XGBoost and DT, explains the absence of significant RF performance differences across techniques, as it was n't deemed suitable in its current state. Further work is needed to optimize RF's hyperparameters to address this overfitting.

Figure 7 shows the predicted vs. observed values for the LSTM, XGBoost, DT, and RF models. Each plot provides a visual representation of the model's predictive performance. A perfect model would have all points lying on the diagonal line where the predicted values equal the observed values.
Figure 7

Predicted vs. observed values for applied ML models such as (a) LSTM, (b) XGBoost, (c) RF, and (d) DT.

Figure 7

Predicted vs. observed values for applied ML models such as (a) LSTM, (b) XGBoost, (c) RF, and (d) DT.

Close modal

From the plots, it is clear that the LSTM model's predictions are closest to the observed values, as indicated by the tight clustering along the diagonal line. The XGBoost and DT models also show strong performance with predictions closely aligned with the observed values, though with slightly more dispersion compared with the LSTM. The RF model, while still performing well, shows greater deviation from the values, suggesting more prediction errors compared with the other models.

Overall, the regression plots complement the quantitative metrics presented in the table, providing a visual confirmation of the models' effectiveness and the benefits of hyperparameter tuning through grid search.

The comprehensive analysis conducted in the present study elucidates the diverse performances exhibited by a variety of ML models when tasked with the prediction of the WQI as well as the WQC. The array of models subjected to evaluation within this research encompasses sophisticated algorithms such as ANN, SVM, DT, RF, XGBoost, and LSTM networks, each of which possesses unique capabilities and characteristics. The performance metrics associated with each model, which includes essential indicators such as accuracy, precision, recall, and F1 score specifically tailored for classification tasks related to WQC, along with R2, RMSE, and MSE that pertain to regression tasks concerning WQI, were meticulously scrutinized for both the training datasets and the testing datasets to ensure a comprehensive understanding of their efficacy. This detailed examination not only provides valuable insights into the strengths and weaknesses of each model in relation to water quality assessment but also contributes significantly to the broader field of environmental data analysis and ML.

The evaluation of multiple ML models for predicting water quality reveals significant insights into their performance, as well as the underlying factors that influence these outcomes. One of the key findings was the strong performance of RF and LSTM models. RF showed notable robustness in handling high-dimensional and multivariate datasets, which is likely due to its ensemble learning approach. By averaging multiple DT, RF mitigates the risk of overfitting and effectively captures complex non-linear relationships within the data, such as the interactions between parameters such as pH, DO, and conductivity. This behavior is consistent with findings from the literature, which emphasize RF models versatility in environmental modeling contexts, where data complexity and variability are often substantial (Khattak et al. 2024).

Similarly, LSTM networks outperformed traditional ML models, particularly when dealing with the temporal aspects of water quality data. The ability of LSTM models to retain sequential information through their memory cells was crucial in capturing trends over time, such as the seasonal variation of water quality parameters. This aligns with recent studies that demonstrated the effectiveness of LSTMs for time-series prediction in hydrological and water quality contexts (Xu et al. 2024). The positive performance of LSTM in this study could also be attributed to the use of optimized hyperparameters, which allowed the model to maintain a balance between underfitting and overfitting while capturing long-term dependencies.

On the other hand, SVR and DT showed comparatively weaker performance, particularly for predicting parameters with high variability, such as fecal coliform counts. SVR's limited effectiveness could be attributed to its sensitivity to feature scaling and the choice of kernel function. The presence of high variability and noise in the dataset may have hindered SVR's ability to form a clear decision boundary, resulting in suboptimal predictions. Moreover, SVR tends to struggle when the dataset is not linearly separable or if the hyperparameter tuning is insufficient, which could further explain the lower accuracy observed (Wang et al. 2011).

The DT model's limitations are also noteworthy. While DT provides interpretable models that are valuable for understanding the relationships between input features, it is inherently prone to overfitting, especially when dealing with complex and noisy datasets. In this study, the overfitting of DT might have resulted from insufficient pruning or the high variability in certain features, leading to a model that captured noise rather than meaningful patterns in the data. This challenge highlights the trade-off between model interpretability and predictive performance, which is often a key consideration in environmental modeling (Luo et al. 2019).

Furthermore, the data quality played a crucial role in the performance of the models. The dataset used in this study contained several missing values, which were imputed using linear interpolation. Although this method provides a straightforward approach to handling missing data, it may have introduced biases, particularly in features with non-linear trends or sudden changes. Such biases could negatively impact model performance, especially for SVR and DT, which are more sensitive to inconsistencies in the data.

The observed with RF and LSTM models are likely due to their adaptability and ability to learn complex patterns, even with moderately noisy data. However, their performance comes at a cost of increased computational complexity. Training deep learning models like LSTMs requires significant computational resources, which could limit their applicability in real-time or resource-constrained scenarios. Additionally, RF and LSTM performed well in terms of predictive accuracy. RF provides some level of feature importance, but LSTM, being a black-box model, makes it difficult to explain individual predictions. This lack of interpretability can be a significant barrier for stakeholders who require transparent and actionable insights for decision-making (Mi et al. 2020).

In conclusion, the results of this study highlight the importance of selecting appropriate ML models based on the characteristics of the dataset and the specific application requirements. While models like RF and LSTM demonstrated superior predictive capabilities, their limitations in interpretability and computational demands must be considered. Conversely, simpler models like DT and SVR may offer greater transparency, but their performance can be hindered by data quality issues and insufficient complexity to capture non-linear relationships. Future research could focus on hybrid approaches that combine the interpretability of traditional models with the predictive power of advanced methods, thereby enhancing both the accuracy and usability of water quality predictions.

Performance analysis

The LSTM model emerged as the top-performing model in terms of R2, RMSE, and MSE for both training and testing datasets, indicating its robustness and superior ability to generalize the unseen data. The sequence modeling capability of LSTM allows it to capture temporal dependencies in the data, which is particularly useful for time-series predictions like in WQI.

XGBoost also demonstrated strong performance, with high R2 values and low RMSE and MSE. Its ensemble nature, which combines the strengths of multiple weak learners, contributes to its excellent performance and resilience against overfitting.

The RF and DT models also performed well, underscoring their effectiveness in handling both classification and regression tasks. These models are known for their interpretability and ability to handle non-linear relationships in the data. However, they showed a slight tendency to overfit, especially the DT model, which is inherently prone to this issue unless properly regularized.

The ANN and SVM models, while slightly less accurate than the ensemble methods, still demonstrated strong performance metrics. The ANN model's ability to capture complex patterns in the data makes it a powerful tool, albeit at the cost of increased computational resources and the risk of overfitting. The SVM model showed robustness in high-dimensional spaces but struggled with larger datasets and required careful parameter tuning to achieve optimal performance.

Practical implications

The findings of this study have significant practical implications for water quality management. The high accuracy and robustness of the XGBoost and LSTM models make them reliable choices for real-world applications. Policymakers, researchers, and practitioners can leverage these insights to implement more effective water quality monitoring and management strategies, thereby safeguarding public health and environmental sustainability. Table 4 presents the pros and cons of each applied ML model in the context of this study's outcomes.

Table 4

Pros and cons of all applied models

ModelProsCons
RF 
  • – High accuracy and robustness

  • – Handles missing values well

  • – Reduces overfitting through averaging

 
  • – Computationally intensive

  • – Requires tuning multiple hyperparameters

 
DT 
  • – Simple to understand and interpret

  • – Requires little data preprocessing

  • – Handles both numerical and categorical data

 
  • – Prone to overfitting

  • – Unstable, small changes in data can lead to different trees

 
ANN 
  • – Capable of capturing complex patterns

  • – Highly flexible and adaptable to various types of data

  • – Performs well with large datasets

 
  • – Prone to overfitting

  • – Requires significant computational resources

  • – Hyperparameter tuning can be complex

 
SVM 
  • – Effective in high-dimensional spaces

  • – Robust to overfitting, especially with proper kernel choice

  • – Works well with clear margin of separation

 
  • – Not suitable for large datasets

  • – Requires careful selection of kernel and parameters

  • – Sensitive to noise

 
XGBoost 
  • – High accuracy and generalization ability

  • – Can handle mixed data types

  • – Reduces overfitting through boosting techniques

 
  • – Computationally expensive

  • – Prone to overfitting without careful tuning

  • – Requires extensive hyperparameter tuning

 
LSTM  
  • – Requires significant computational resources

  • – Prone to overfitting

  • – Hyperparameter tuning can be complex

 
ModelProsCons
RF 
  • – High accuracy and robustness

  • – Handles missing values well

  • – Reduces overfitting through averaging

 
  • – Computationally intensive

  • – Requires tuning multiple hyperparameters

 
DT 
  • – Simple to understand and interpret

  • – Requires little data preprocessing

  • – Handles both numerical and categorical data

 
  • – Prone to overfitting

  • – Unstable, small changes in data can lead to different trees

 
ANN 
  • – Capable of capturing complex patterns

  • – Highly flexible and adaptable to various types of data

  • – Performs well with large datasets

 
  • – Prone to overfitting

  • – Requires significant computational resources

  • – Hyperparameter tuning can be complex

 
SVM 
  • – Effective in high-dimensional spaces

  • – Robust to overfitting, especially with proper kernel choice

  • – Works well with clear margin of separation

 
  • – Not suitable for large datasets

  • – Requires careful selection of kernel and parameters

  • – Sensitive to noise

 
XGBoost 
  • – High accuracy and generalization ability

  • – Can handle mixed data types

  • – Reduces overfitting through boosting techniques

 
  • – Computationally expensive

  • – Prone to overfitting without careful tuning

  • – Requires extensive hyperparameter tuning

 
LSTM  
  • – Requires significant computational resources

  • – Prone to overfitting

  • – Hyperparameter tuning can be complex

 

Table 5 provides a comprehensive comparison of various ML models applied to water quality prediction, highlighting the techniques, best models, prediction indices, and results reported by different authors. The comparison underscores the diversity of approaches employed across different studies and the performance metrics achieved by each model. For instance, Radhakrishnan & Pillai (2020) identified the DT algorithm as the best model for the WAWQI, achieving an accuracy of 98.50%. Similarly, Jain et al. (2021) found the RF algorithm to be the most effective for predicting the WQI with an accuracy of 92.13%. Studies such as Hmoud Al-Adhaileh & Waselallah Alsaade (2021) and Khan et al. (2021) demonstrated the efficacy of ANFIS and GBoost in achieving high accuracy for WQI and WQC.

Table 5

Comparison of ML models for water quality

AuthorTechniqueBest modelPrediction indexResults
Radhakrishnan & Pillai (2020)  
  • – SVM

  • – Naïve Bayes

  • – DT

 
Algorithm
DT 
WAWQI Accuracy = 98.50% 
Jain et al. (2021)  
  • – RF algorithm

  • – K-nearest neighbors (KNN)

  • – SVM

 
Algorithm
RF 
WQI Accuracy = 92.13% 
Hmoud Al-Adhaileh et al. (2021)  
  • – ANFIS

  • – Feed-forward neural network (FFNN)

  • – KNN

 
ANFIS for WQI FFNN for WQC WQI (WQC) Accuracy (ANFIS) = 96.17%
Accuracy (FFNN) = 100% 
Slatnia et al. (2022)  
  • – DT

  • – Naive Bayes

  • – GBR

  • – KNN

  • – ANN

  • – RF

  • – SVM

 
Gradient boosting WQC Accuracy = 94.90% 
Khan et al. (2021)  
  • – Gradient boosting classifier (GBoost)

  • – Principal component regression

 
  • – GBoost

 
WQI, Water quality status (WQS) Accuracy (PCR) = 95%
Accuracy (GBoost) = 100% 
Aldhyani et al. (2020)  
  • – Neural autoregressive network (NAR)

  • – SVM

  • – KNN

  • – Naive Bayes

  • – LSTM

 
  • – NET

  • – NARNET for WQI

  • – SVM for WQC

 
WQI,
Water quality classification 
(WQC) Accuracy (SVM) = 97.01%
R2 (NARNET) = 96.17% 
This study 
  • – LSTM

  • – XGBoost

  • – DT

  • – SVM

  • – RF

  • – ANN

 
LSTM XGBoost Water quality classification (WQI) Maximum Accuracy = 99.83%
(WQC) Upper Accuracy = 99.99%
Lower Accuracy = 99.07%
R2 = 0.9999 
AuthorTechniqueBest modelPrediction indexResults
Radhakrishnan & Pillai (2020)  
  • – SVM

  • – Naïve Bayes

  • – DT

 
Algorithm
DT 
WAWQI Accuracy = 98.50% 
Jain et al. (2021)  
  • – RF algorithm

  • – K-nearest neighbors (KNN)

  • – SVM

 
Algorithm
RF 
WQI Accuracy = 92.13% 
Hmoud Al-Adhaileh et al. (2021)  
  • – ANFIS

  • – Feed-forward neural network (FFNN)

  • – KNN

 
ANFIS for WQI FFNN for WQC WQI (WQC) Accuracy (ANFIS) = 96.17%
Accuracy (FFNN) = 100% 
Slatnia et al. (2022)  
  • – DT

  • – Naive Bayes

  • – GBR

  • – KNN

  • – ANN

  • – RF

  • – SVM

 
Gradient boosting WQC Accuracy = 94.90% 
Khan et al. (2021)  
  • – Gradient boosting classifier (GBoost)

  • – Principal component regression

 
  • – GBoost

 
WQI, Water quality status (WQS) Accuracy (PCR) = 95%
Accuracy (GBoost) = 100% 
Aldhyani et al. (2020)  
  • – Neural autoregressive network (NAR)

  • – SVM

  • – KNN

  • – Naive Bayes

  • – LSTM

 
  • – NET

  • – NARNET for WQI

  • – SVM for WQC

 
WQI,
Water quality classification 
(WQC) Accuracy (SVM) = 97.01%
R2 (NARNET) = 96.17% 
This study 
  • – LSTM

  • – XGBoost

  • – DT

  • – SVM

  • – RF

  • – ANN

 
LSTM XGBoost Water quality classification (WQI) Maximum Accuracy = 99.83%
(WQC) Upper Accuracy = 99.99%
Lower Accuracy = 99.07%
R2 = 0.9999 

This research presents a comprehensive and methodological comparison of a diverse set of ML models, including ANN, SVM, DT, RF, XGBoost, and LSTM networks. This extensive evaluation is specifically designed for predicting the WQI and WQC, which are crucial for environmental science and management. This allows for a detailed understanding of the strengths and limitations of each model, providing invaluable insights for their practical application in environmental monitoring and assessment. Specifically, the XGBoost model demonstrated a maximum accuracy of 99.83%, an upper accuracy of 99.99%, a Kappa statistic of 99.17%, and a lower accuracy of 99.07%, while the LSTM model achieved an R2 of 0.9999. These findings highlight the robustness and superior generalization capabilities of ensemble methods like XGBoost and advanced neural networks such as LSTM, making them highly suitable for robust water quality prediction. The research contributes significantly to the growing body of literature on the application of ML in environmental sciences, demonstrating the potential for these models to enhance water quality monitoring and management. The high accuracy and robustness of the XGBoost and LSTM models make them reliable choices for real-world applications.

Limitations and future work

Despite these promising results, several limitations must be acknowledged. The models exhibited varying degrees of overfitting, particularly in the ANN and XGBoost models, as indicated by the discrepancies between training and testing performance. Future research should focus on addressing overfitting through techniques such as cross-validation, regularization, and more sophisticated hyperparameter tuning.

Additionally, integrating advanced data preprocessing steps, such as feature selection and extraction, could further enhance model performance (Bhagat et al. 2023b). Exploring the impact of different imputation methods for handling missing data could provide more insights into improving classification and regression accuracy.

Incorporating more sophisticated deep learning architectures, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), might be examined to capture more complicated patterns in the data. These models have shown tremendous promise in other areas and potentially bring large improvements in water quality prediction.

Moreover, it does not explicitly discuss the model practical implementation in different scenarios. The models' performance in different contexts of environmental science and engineering is not addressed. The study relies on extensive datasets for model training and evaluation. However, it does not address the potential limitations related to data quality, such as the impact of noise, outliers, or missing values, beyond the preprocessing steps mentioned. In addition, the research highlights the performance of models based on specific metrics such as accuracy, precision, recall, and R2. However, it does not explore other important aspects such as model interpretability, ease of deployment, or maintenance, which are critical for real-world applications. These limitations suggest areas for future research, such as exploring model interpretability, testing in real-world environments, and ensuring the models' robustness across different datasets and conditions.

In this scholarly endeavor, we undertook a thorough and meticulous evaluation of a collection of sophisticated ML algorithms specifically designed for the purpose of predicting both the WQI and the WQC, which are crucial parameters in environmental science and management. The innovative aspect of this research is encapsulated in its comprehensive and methodological comparison of a diverse set of models, which includes ANN, SVM, DT, RF, XGBoost, and LSTM networks, all of which were rigorously tested using robust and extensive datasets accompanied by a broad spectrum of performance evaluation metrics. This analytical approach not only enhances our understanding of the inherent strengths and limitations associated with each individual model but also provides invaluable insights that can significantly influence their practical application in the domain of environmental monitoring and assessment. Consequently, the findings from this study hold the potential to contribute substantially to the field, offering guidance for researchers and practitioners alike in selecting the most appropriate ML techniques for evaluating and managing water quality data.

The key findings of this study indicate that XGBoost and LSTM models outperform other techniques, achieving the highest accuracy, precision, recall, F1 score, R2, RMSE, and MSE. Specifically, XGBoost demonstrated a maximum accuracy of 99.83%, an upper accuracy of 99.99%, a Kappa statistic of 99.17%, and a lower accuracy of 99.07%, while LSTM achieved an R2 of 0.99. These results highlight the robustness and superior generalization capabilities of ensemble methods and advanced neural networks in handling complex water quality data. The findings are significant as they contribute to the growing body of literature on the application of ML in environmental sciences, demonstrating the potential for these models to enhance water quality monitoring and management.

Despite these promising results, the study has several limitations. The models exhibited varying degrees of overfitting, particularly in ANN and XGBoost, as indicated by discrepancies between training and testing performance. Additionally, the study's reliance on specific datasets may limit the generalizability of the findings to other geographical regions or different water quality parameters. Future research should address these limitations by incorporating more diverse datasets, exploring advanced data preprocessing techniques such as feature selection and extraction, and employing methods like cross-validation and regularization to mitigate overfitting.

Future research options include examining the use of more sophisticated deep learning (Tian et al. 2023) architectures, such as CNNs and RNNs, which have shown substantial promise in other disciplines as well as merging mathematical and numerical together (Dai & Samper 2004). These models could capture more complex patterns in the data, potentially improving the accuracy and robustness of water quality predictions. With increasing advanced software's complex relationship can be established such as COMSOL (Zhu et al. 2023). Additionally, integrating real-time data streams and exploring the use of remote sensing data could further enhance the models' applicability in dynamic environmental monitoring scenarios (Tiyasha et al. 2023). Moreover, accurate water quality prediction can lead to better water management using more advanced ML tools with the ability to handle large data and multisource data (Zhan et al. 2024). A suitable model practical implementation may open the door to addressing in different scenarios, data quality (the impact of noise, outliers, or missing values, preprocessing steps). Also, model interpretability, ease of deployment, or maintenance remains unfolded. The practical implications of this study are significant for water quality management. The high accuracy and robustness of the XGBoost and LSTM models make them reliable choices for real-world applications. Policymakers, researchers, and practitioners can leverage these insights to implement more effective water quality monitoring and management strategies, thereby safeguarding public health and environmental sustainability (Karri et al. 2024).

The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through a Large Research Project under grant number RGP 1/219/45. Also, the authors would like to express our deepest gratitude to all those who contributed to this research. Our heartfelt thanks go to the National Groundwater Data Access Bank (ADES) for providing the comprehensive dataset essential for this study. We are also grateful to the Artois-Picardie Water Agency for their support and collaboration.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abba
S. I.
,
Pham
Q. B.
,
Saini
G.
,
Linh
N. T. T.
,
Ahmed
A. N.
,
Mohajane
M.
,
Khaledian
M.
,
Abdulkadir
R. A.
&
Bach
Q.-V.
(
2020
)
Implementation of data intelligence models coupled with ensemble machine learning for prediction of water quality index
,
Environmental Science and Pollution Research
,
27
,
41524
41539
.
Abba
S.
,
Zhai
S.
&
Zhang
H.
(
2021
)
Comparison of machine learning techniques for predicting water quality index
,
Water (Basel)
,
13
,
54
.
Acheampong
A. O.
&
Opoku
E. E. O.
(
2023
)
Environmental degradation and economic growth: investigating linkages and potential pathways
,
Energy Economics
,
123
,
106734
.
https://doi.org/10.1016/j.eneco.2023.106734
.
Ahmadi
S. M.
,
Balahang
S.
&
Abolfathi
S.
(
2024
)
Predicting the hydraulic response of critical transport infrastructures during extreme flood events
,
Engineering Applications of Artificial Intelligence
,
133
,
108573
.
https://doi.org/10.1016/j.engappai.2024.108573
.
Aldhyani
T. H. H.
,
Al-Yaari
M.
,
Alkahtani
H.
&
Maashi
M.
(
2020
)
[Retracted] water quality prediction using artificial intelligence algorithms
,
Applied Bionics and Biomechanics
,
2020
,
6659314
.
Amaranto
A.
&
Mazzoleni
M.
(
2023
)
B-AMA: a Python-coded protocol to enhance the application of data-driven models in hydrology
,
Environmental Modelling & Software
,
160
,
105609
.
https://doi.org/10.1016/j.envsoft.2022.105609
.
Anderson
P.
&
Taylor
J.
(
2023
)
Assessing model performance: the role of accuracy and complementary metrics
,
Journal of Statistical Learning
,
15
,
45
60
.
Antoniadis
A.
,
Lambert-Lacroix
S.
&
Poggi
J.-M.
(
2021
)
Random forests for global sensitivity analysis: a selective review
,
Reliability Engineering & System Safety
,
206
,
107312
.
https://doi.org/10.1016/j.ress.2020.107312
.
Asadollah
S. B. H. S.
,
Sharafati
A.
,
Motta
D.
&
Yaseen
Z. M.
(
2021
)
River water quality index prediction and uncertainty analysis: a comparative study of machine learning models
,
Journal of Environmental Chemical Engineering
,
9
,
104599
.
https://doi.org/10.1016/j.jece.2020.104599
.
Bergstra
J.
&
Bengio
Y.
(
2012
)
Random search for hyperparameter optimization
,
Journal of Machine Learning Research
,
13
,
281
305
.
Bhagat
S. K.
,
Tung
T. M.
&
Yaseen
Z. M.
(
2020
)
Development of artificial intelligence for modeling wastewater heavy metal removal: state of the art, application assessment and possible future research
,
Journal of Cleaner Production
,
250
,
119473
.
https://doi.org/10.1016/j.jclepro.2019.119473
.
Bhagat
S. K.
,
Pyrgaki
K.
,
Salih
S. Q.
,
Tiyasha
T.
,
Beyaztas
U.
,
Shahid
S.
&
Yaseen
Z. M.
(
2021
)
Prediction of copper ions adsorption by attapulgite adsorbent using tuned-artificial intelligence model
,
Chemosphere
,
276
,
130162
.
https://doi.org/10.1016/j.chemosphere.2021.130162
.
Bhagat
S. K.
,
Tiyasha
T.
,
Kumar
A.
,
Malik
T.
,
Jawad
A. H.
,
Khedher
K. M.
,
Deo
R. C.
&
Yaseen
Z. M.
(
2022
)
Integrative artificial intelligence models for Australian coastal sediment lead prediction: an investigation of in-situ measurements and meteorological parameters effects
,
Journal of Environmental Management
,
309
,
114711
.
https://doi.org/10.1016/j.jenvman.2022.114711
.
Bhagat
S. K.
,
Pilario
K. E.
,
Babalola
O. E.
,
Tiyasha
T.
,
Yaqub
M.
,
Onu
C. E.
,
Pyrgaki
K.
,
Falah
M. W.
,
Jawad
A. H.
&
Yaseen
D. A.
(
2023a
)
Comprehensive review on machine learning methodologies for modeling dye removal processes in wastewater
,
Journal of Cleaner Production
,
385
,
135522
.
Bhagat
S. K.
,
Tiyasha, T.
&
Ramaswamy
K.
(
2023b
)
Precipitation variations in the central Vietnam to forecast using Holt-Winters Seasonal Additive Forecasting method for 1990 to 2019 trend
,
IOP Conference Series: Earth and Environmental Science
,
1216
,
012019
.
https://doi.org/10.1088/1755-1315/1216/1/012019
.
Breiman
L.
,
Friedman
J. H.
,
Olshen
R. A.
&
Stone
C. J.
(
1984
)
Classification and Regression Trees
.
Wadsworth International Group, Belmont, California
.
Chen
T.
&
Guestrin
C.
(
2016
) '
XGBoost: a scalable tree boosting system
',
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, pp.
785
794
.
Chen, K., Chen, H., Zhou, C., Huang, Y., Qi, X., Shen, R., Liu, F., Zuo, M., Zou, X., Wang, J., Zhang, Y., Chen, D., Chen, X., Deng, Y. & Ren, H. (
2020a
)
Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data
,
Water Research
,
171
,
115454
.
Chen
Y.
,
Song
L.
,
Liu
Y.
,
Yang
L.
&
Li
D.
(
2020b
)
A review of the artificial neural network models for water quality prediction
,
Applied Sciences
,
10
,
5776
.
https://doi.org/10.3390/app10175776
.
Costa
V. G.
&
Pedreira
C. E.
(
2023
)
Recent advances in decision trees: an updated survey
,
Artificial Intelligence Review
,
56
,
4765
4800
.
https://doi.org/10.1007/s10462-022-10275-5
.
Dai
Z.
&
Samper
J.
(
2004
)
Inverse problem of multicomponent reactive chemical transport in porous media: formulation and applications
,
Water Resources Research
,
40
, W07407.
https://doi.org/10.1029/2004WR003248
.
Dodig
A.
,
Ricci
E.
,
Kvascev
G.
&
Stojkovic
M.
(
2024
)
A novel machine learning-based framework for the water quality parameters prediction using hybrid long short-term memory and locally weighted scatterplot smoothing methods
,
Journal of Hydroinformatics
,
26
,
1059
1079
.
https://doi.org/10.2166/hydro.2024.273
.
Elbeltagi
A.
,
Pande
C. B.
,
Kouadri
S.
&
Islam
A. R. M. T.
(
2022
)
Applications of various data-driven models for the prediction of groundwater quality index in the Akot basin, Maharashtra, India
,
Environmental Science and Pollution Research
,
29
,
17591
17605
.
https://doi.org/10.1007/s11356-021-17064-7
.
Fang
H.-T.
,
Jhong
B.-C.
,
Tan
Y.-C.
,
Ke
K.-Y.
&
Chuang
M.-H.
(
2019
)
A two-stage approach integrating SOM- and MOGA-SVM-based algorithms to forecast spatial-temporal groundwater level with meteorological factors
,
Water Resources Management
,
33
,
797
818
.
https://doi.org/10.1007/s11269-018-2143-x
.
Ghiasi
B.
,
Noori
R.
,
Sheikhian
H.
,
Zeynolabedin
A.
,
Sun
Y.
,
Jun
C.
,
Hamouda
M.
,
Bateni
S. M.
&
Abolfathi
S.
(
2022
)
Uncertainty quantification of granular computing-neural network model for prediction of pollutant longitudinal dispersion coefficient in aquatic streams
,
Scientific Reports
,
12
,
4610
.
https://doi.org/10.1038/s41598-022-08417-4
.
Guo
M.
,
Noori
R.
&
Abolfathi
S.
(
2024
)
Microplastics in freshwater systems: dynamic behaviour and transport processes
,
Resources, Conservation and Recycling
,
205
,
107578
.
https://doi.org/10.1016/j.resconrec.2024.107578
.
Hassan
M. M.
,
Hassan
M. M.
,
Akter
L.
,
Rahman
M. M.
,
Zaman
S.
,
Hasib
K. M.
,
Jahan
N.
,
Smrity
R. N.
,
Farhana
J.
,
Raihan
M.
&
Mollick
S.
(
2021
)
Efficient prediction of water quality index (WQI) using machine learning algorithms
,
Human-Centric Intelligent Systems
,
1
,
86
.
https://doi.org/10.2991/hcis.k.211203.001
.
Hmoud Al-Adhaileh
M.
&
Waselallah Alsaade
F.
(
2021
)
Modelling and prediction of water quality by using artificial intelligence
,
Sustainability
,
13
,
4259
.
https://doi.org/10.3390/su13084259
.
Jain, D., Shah, S., Mehta, H., Lodaria, A. & Kurup, L. (2021) A Machine Learning Approach to Analyze Marine Life Sustainability. In Proceedings of the International Conference on Intelligent Computing, Information and Control Systems. Singapore: Springer, pp. 619–632.
Jawad
J.
,
Hawari
A. H.
&
Javaid Zaidi
S.
(
2021
)
Artificial neural network modeling of wastewater treatment and desalination using membrane processes: a review
,
Chemical Engineering Journal
,
419
,
129540
.
https://doi.org/10.1016/j.cej.2021.129540
.
Johnson
E.
&
Williams
R.
(
2021
)
Understanding recall and precision trade-offs in machine learning
,
International Journal of Data Science
,
12
,
234
250
.
Kaliappan
J.
,
Srinivasan
K.
,
Mian Qaisar
S.
,
Sundararajan
K.
,
Chang
C.-Y.
&
Suganthan
C.
(
2021
)
Performance evaluation of regression models for the prediction of the COVID-19 reproduction rate
,
Frontiers in Public Health
,
9
,
729795
.
Karri
R. R.
,
Mubarak
N. M.
,
Bhagat
S. K.
,
Tiyasha
T.
,
Lingamdinne
L. P.
,
Koduru
J. R.
,
Ravindran
G.
,
Tyagi
I.
&
Dehghani
M. H.
(
2024
)
Scientometrics and overview of water, environment, and sustainable development goals
. In: (Dehghani, M. H., Karri, R. R., Tyagi, I. & Scholz, M., eds.)
Water, the Environment, and the Sustainable Development Goals
.
Amsterdam, Netherlands
,
Elsevier
, pp.
3
33
.
Khan
M. S. I.
,
Islam
N.
,
Uddin
J.
,
Islam, S. & Nasir, M. K.
(
2021
)
Water quality prediction and classification based on principal component regression and gradient boosting classifier approach
,
Journal of King Saud University – Computer and Information Sciences
,
34
,
4773
4781
.
https://doi.org/10.1016/j.jksuci.2021.06.003
.
Khattak
A.
,
Zhang
J.
,
Chan
P.
&
Chen
F.
(
2024
)
SPE-SHAP: self-paced ensemble with Shapley additive explanation for the analysis of aviation turbulence triggered by wind shear events
,
Expert Systems with Applications
,
254
,
124399
.
https://doi.org/10.1016/j.eswa.2024.124399
.
Khullar
S.
&
Singh
N.
(
2020
)
A Bi-LSTM model for predicting water quality of the Yamuna river in India
,
Environmental Science and Pollution Research
,
27
,
8957
8967
.
Khullar
S.
&
Singh
N.
(
2022
)
Water quality assessment of a river using deep learning Bi-LSTM methodology: forecasting and validation
,
Environmental Science and Pollution Research
,
29
,
12875
12889
.
https://doi.org/10.1007/s11356-021-13875-w
.
Kombo
O.
,
Kumaran
S.
,
Sheikh
Y.
,
Bovim
A.
&
Jayavel
K.
(
2020
)
Long-term groundwater level prediction model based on hybrid KNN-RF technique
,
Hydrology
,
7
,
59
.
Kumar
V.
,
Chauhan
M. S.
&
Khan
S.
(
2021
)
Application of machine learning techniques for clustering of rainfall time series over Ganges River basin
, In:
Chauhan, M. S. & Ojha, C. S. P. (eds.) The Ganga River Basin: A Hydrometeorological Approach, Switzerland, Springer Nature, pp.
211
218
.
Lee
M.
&
Kim
S.
(
2022
)
Evaluating classifier performance with F1 score in imbalanced datasets
,
Journal of Computational Learning
,
30
,
101
117
.
Liu
J.
,
Wang
J.
&
Li
X.
(
2019
)
Prediction of water quality in the Yangtze River basin using a long short-term memory network
,
Water (Basel)
,
11
,
2345
.
Luo
Y.
,
Tseng
H.-H.
,
Cui
S.
,
Wei
L.
,
Ten Haken
R.
&
El Naqa
I.
(
2019
)
Balancing accuracy and interpretability of machine learning approaches for radiation treatment outcomes modeling
,
BJR Open
,
1
,
20190021
.
https://doi.org/10.1259/bjro.20190021
.
Mahdian
M.
,
Noori
R.
,
Salamattalab
M. M.
,
Heggy
E.
,
Bateni
S. M.
,
Nohegar
A.
,
Hosseinzadeh
M.
,
Siadatmousavi
S. M.
,
Fadaei
M. R.
&
Abolfathi
S.
(
2024
)
Anzali wetland crisis: unraveling the decline of Iran's ecological gem
,
Journal of Geophysical Research: Atmospheres
,
129
,
e2023JD039538
.
https://doi.org/10.1029/2023JD039538
.
Masum Beg
M.
,
Roy
S. M.
,
Kar
A.
,
Mukherjee
C. K.
,
Kumar Bhagat
S.
&
Tanveer
M.
(
2024
)
Study on recirculating aquaculture system (RAS) in organic fish production
,
IOP Conference Series: Earth and Environmental Science
,
1391
,
012013
.
https://doi.org/10.1088/1755-1315/1391/1/012013
.
Md Jahidul
I.
,
Salekin
S.
,
Abdullah
M. S.
,
Zaman
N.
&
Khan
A.
(
2024
)
Evaluation of Water Quality Assessment Through Machine Learning: A Water Quality Index-Based Approach, Research Square
.
Meng
Y.
,
Yang
N.
,
Qian
Z.
&
Zhang
G.
(
2020
)
What makes an online review more helpful: an interpretation framework using XGBoost and SHAP values
,
Journal of Theoretical and Applied Electronic Commerce Research
,
16
,
466
490
.
https://doi.org/10.3390/jtaer16030029
.
Mi
J.-X.
,
Li
A.-D.
&
Zhou
L.-F.
(
2020
)
Review study of interpretation methods for future interpretable machine learning
,
IEEE Access
,
8
,
191969
191985
.
https://doi.org/10.1109/ACCESS.2020.3032756
.
Miles
J.
(
2014
)
R squared, adjusted R squared
. In: (Everitt, B. S. & Howell, D. C., eds.)
Wiley StatsRef: Statistics Reference Online
.
John Wiley & Sons, Ltd, Hoboken, NJ
.
Miller
T.
,
Łobodzińska
A.
,
Kozlovska
P.
,
Lewita
K.
,
Kaczanowska
O.
&
Durlik
I.
(
2024
)
Advancing water quality prediction: the role of machine learning in environmental science
,
ГРААЛЬ НАУКИ
,
36
, 246–252.
https://doi.org/10.36074/grail-of-science.16.02.2024.039
.
Mohammadpour
A.
,
Gharehchahi
E.
,
Gharaghani
M. A.
,
Shahsavani
E.
,
Golaki
M.
,
Berndtsson
R.
,
Khaneghah
A. M.
,
Hashemi
H.
&
Abolfathi
S.
(
2024
)
Assessment of drinking water quality and identifying pollution sources in a chromite mining region
,
Journal of Hazardous Materials
,
480
,
136050
.
https://doi.org/10.1016/j.jhazmat.2024.136050
.
Otchere
D. A.
,
Ganat
T. O. A.
,
Gholami
R.
&
Lawal
M.
(
2021
)
A novel custom ensemble learning model for an improved reservoir permeability and water saturation prediction
,
Journal of Natural Gas Science and Engineering
,
91
,
103962
.
https://doi.org/10.1016/j.jngse.2021.103962
.
Radhakrishnan
N.
&
Pillai
A. S.
(
2020
) ‘
Comparison of water quality classification models using machine learning
’,
2020 5th International Conference on Communication and Electronics Systems (ICCES)
.
IEEE
, pp.
1183
1188
.
https://doi.org/10.1109/ICCES48766.2020.9137903
.
Safavian
S. R.
&
Landgrebe
D.
(
1991
)
A survey of decision tree classifier methodology
,
IEEE Transactions on Systems, Man, and Cybernetics
,
21
,
660
674
.
https://doi.org/10.1109/21.97458
.
Salehin
I.
,
Islam
M. S.
,
Saha
P.
,
Noman
S. M.
,
Tuni
A.
,
Hasan
M. M.
&
Baten
M. A.
(
2024
)
AutoML: a systematic review on automated machine learning with neural architecture search
,
Journal of Information and Intelligence
,
2
,
52
81
.
https://doi.org/10.1016/j.jiixd.2023.10.002
.
Sangiorgio
M.
&
Dercole
F.
(
2020
)
Robustness of LSTM neural networks for multi-step forecasting of chaotic time series
,
Chaos Solitons Fractals
,
139
,
110045
.
https://doi.org/10.1016/j.chaos.2020.110045
.
Slatnia
A.
,
Ladjal
M.
,
Ouali
M. A.
&
Imed
M.
(
2022
) ‘
Improving prediction and classification of water quality indices using hybrid machine learning algorithms with features selection analysis
’,
Online International Symposium on Applied Mathematics and Engineering (ISAME22)
, pp.
16
17
.
Smith
J.
&
Doe
J.
(
2020
)
Precision and recall trade-offs in machine learning
,
Journal of Machine Learning Research
,
21
,
1
15
.
Smola
A. J.
&
Schölkopf
B.
(
2004
)
A tutorial on support vector regression
,
Statistics and Computing
,
14
,
199
222
.
https://doi.org/10.1023/B:STCO.0000035301.49549.88
.
Tian
Y.
,
Zhao
Y.
,
Son
S.
,
Luo
J.
,
Oh
S.
&
Wang
Y.
(
2023
)
A deep-learning ensemble method to detect atmospheric rivers and its application to projected changes in precipitation regime
,
Journal of Geophysical Research: Atmospheres
,
128
(12).
https://doi.org/10.1029/2022JD037041
.
Tian
H.
,
Du
Y.
,
Luo
X.
,
Dong
J.
,
Chen
S.
,
Hu
X.
,
Zhang
M.
,
Liu
Z.
&
Abolfathi
S.
(
2024a
)
Understanding visible light and microbe-driven degradation mechanisms of polyurethane plastics: pathways, property changes, and product analysis
,
Water Research
,
259
,
121856
.
https://doi.org/10.1016/j.watres.2024.121856
.
Tian
H.
,
Wang
L.
,
Zhu
X.
,
Zhang
M.
,
Li
L.
,
Liu
Z.
&
Abolfathi
S.
(
2024b
)
Biodegradation of microplastics derived from controlled release fertilizer coating: selective microbial colonization and metabolism in plastisphere
,
Science of The Total Environment
,
920
,
170978
.
https://doi.org/10.1016/j.scitotenv.2024.170978
.
Tiyasha
T.
,
Bhagat
S. K.
,
Emmanuel
B. O.
&
Ramaswamy
K.
(
2023
)
Kosi-Ganga-River-Creek 35 years’ additive time series and seasonal analysis using remote sensing data
,
IOP Conference Series: Earth and Environmental Science
,
1216
,
012004
.
https://doi.org/10.1088/1755-1315/1216/1/012004
.
Wang
X.
,
Fu
L.
&
He
C.
(
2011
)
Applying support vector regression to water quality modelling by remote sensing data
,
International Journal of Remote Sensing
,
32
,
8615
8627
.
https://doi.org/10.1080/01431161.2010.543183
.
Willmott, C. J. (1982) Some Comments on the Evaluation of Model Performance, Bulletin of the American Meteorological Society, 1309–1313. https://doi.org/10.1175/1520-0477(1982)063<1309:SCOTEO>2.0.CO;2
.
Xu J., Mo Y., Zhu S., Wu J., Jin G., Wang Y., Ji Q. & Li L. (2024) Assessing and predicting water quality index with key water parameters by machine learning models in coastal cities, China, 13e33695 (10), https://doi.org/10.1016/j.heliyon.2024.e33695
.
Yan
X.
,
Zhang
T.
,
Du
W.
,
Meng
Q.
,
Xu
X.
&
Zhao
X.
(
2024
)
A comprehensive review of machine learning for water quality prediction over the past five years
,
Journal of Marine Science and Engineering
,
12
,
159
.
https://doi.org/10.3390/jmse12010159
.
Zhan
C.
,
Dai
Z.
,
Yin
S.
,
Carroll
K. C.
&
Soltanian
M. R.
(
2024
)
Conceptualizing future groundwater models through a ternary framework of multisource data, human expertise, and machine intelligence
,
Water Research
,
257
,
121679
.
https://doi.org/10.1016/j.watres.2024.121679
.
Zhang
F.
&
O'Donnell
L. J.
(
2020
)
Support vector regression
. In: (Mechelli, A. & Vieira, S., eds.)
Machine Learning
.
Elsevier
, London, UK, pp. 123–140.
https://doi.org/10.1016/B978-0-12-815739-8.00007-9
.
Zhang
K.
,
Li
Y.
,
Yu
Z.
,
Yang
T.
,
Xu
J.
,
Chao
L.
,
Ni
J.
,
Wang
L.
,
Gao
Y.
,
Hu
Y.
&
Lin
Z.
(
2022
)
Xin'anjiang nested experimental watershed (XAJ-NEW) for understanding multiscale water cycle: scientific objectives and experimental design
,
Engineering
,
18
,
207
217
.
https://doi.org/10.1016/j.eng.2021.08.026
.
Zhao
Y.
,
Zhang
M.
,
Liu
Z.
,
Ma
J.
,
Yang
F.
,
Guo
H.
&
Fu
Q.
(
2024
)
How human activities affect groundwater storage
,
Research
,
7 (2), 0369. https://doi.org/10.34133/research.0369
.
Zhou
X.
&
Zhang
J.
(
2023
)
Advances in machine learning for water quality prediction and prospects in Erhai Lake
.
In: (Wang, C., Zhang, X., Ren, H., Lu, Y. eds.) Green Energy, Environment and Sustainable Development. Mianyang, China, IOS Press.
Zhu
Y.
,
Dai
H.
&
Yuan
S.
(
2023
)
The competition between heterotrophic denitrification and DNRA pathways in hyporheic zone and its impact on the fate of nitrate
,
Journal of Hydrology (Amst)
,
626
,
130175
.
https://doi.org/10.1016/j.jhydrol.2023.130175
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data