Artificial intelligence (AI) has become a useful tool in numerous domains, including environmental science. This review explores the application of machine learning and deep learning, as AI technologies, applied in calculating and modelling water quality indexes (WQIs) and water quality classification. WQIs are used to assess the overall status of water bodies and compliance with environmental regulations. Given a large amount of monitoring data, traditional methods for calculating WQIs can be labour-intensive and subject to human error. AI offers a compelling alternative, with the potential to enhance accuracy, reduce time, and provide insights into complex environmental data. This paper examines recent progress in applying AI to water quality assessment through WQIs, including the creation of predictive models that incorporate diverse water quality parameters and the implementation of AI in real-time monitoring systems. The challenges of deploying AI, such as data availability, model transparency, and system integration, are also discussed. Through a detailed analysis of recent studies and practical implementations, this review analyses the potential of AI to contribute to water quality management and suggests directions for future research.

Water quality monitoring is an important responsibility of each state, to make sure that the population has access to safe water and the needs are met without creating pressures on the water resources. UN Sustainable Development Goal 6 aims to ensure the availability and sustainable management of water and sanitation for all. It focuses on providing safe and affordable drinking water, access to proper sanitation facilities, and promoting good hygiene practices. The goal also emphasizes the importance of protecting and restoring water ecosystems to maintain water quality and encourages efficient and sustainable management of water resources.

Water management is very complex and includes monitoring water quality and quantity parameters as well as biodiversity and aquatic life, identifying pollution sources and pollutants removal, sanitation, flood protection, resource allocation, etc.

The main sources of water pollution are discharges from urban agglomerations, leakage, and run-off from agriculture and industrial activities. Water monitoring includes a large number of hydrological, physical, chemical, and biological parameters, some of which are measured on site and others by sample analysis in the laboratory. Most countries have monitoring programs that specify sampling locations, parameters to be determined and frequency of sampling that are associated with efforts and costs for labour, reagents, equipment, etc.

Each country or region has its own quality standards that define limit values for parameters and classification systems to evaluate the state of a water body and its adequacy for different uses. For example, in the European Union, this field is regulated by the Water Framework Directive (WFD) and monitoring data are reported to the European Environment Agency by all Member States. The classification of the water bodies follows the One Out – All Out principle, which means that the status is given by the ranking of the worst parameter. For example, a water body cannot have ‘good’ status if one parameter ranks as ‘poor’. Other countries have developed classification systems based on water quality indexes (WQIs), which are dimensionless numbers that aggregate the values of several selected indicators.

There are different WQIs, but the process of calculating the WQI usually includes the following steps:

  • - Selection of relevant water quality parameters;

  • - Assigning a weight to each parameter;

  • - Calculation of sub-indexes (e.g. based on limit values for a certain quality class);

  • - Aggregation of sub-indexes into the WQI.

According to the value of the WQI, the water is then categorized into quality classes, depending on the calculation method and water uses.

The WQI method is laborious and has several limitations that may lead to inaccurate results, but for many years it has been a good instrument to assess the overall water quality and its long-term trends.

Artificial intelligence (AI) refers to the simulation of human intelligence processes by machines, particularly computer systems, which include learning, reasoning, self-correction, perception, and interaction. AI can analyse in a short period of time huge amounts of data, identify patterns or anomalies, calculate indicators, provide visual representations of data, etc., which can support the assessment of water quality, as well as identification of pollution sources and remediation measures.

The recent development of AI tools for assessing water quality has the potential to bring significant improvements in this sector. It may reduce monitoring efforts and costs and increase the accuracy of WQI prediction and water quality classification (WQC) in several ways, some of which are mentioned below:

  • - Advanced analysis of monitoring data, followed by calculation of WQI may indicate which parameters have the strongest influence in determining the WQI value and allow reducing the numbers of monitored parameters or their frequency;

  • - Complex modelling of WQI values may allow the prediction of WQI;

  • - AI algorithms can assign datasets into quality classes based on raw monitoring data without the need of calculating WQIs.

In addition, the combination of AI and remote sensing may be able in the future to replace traditional monitoring methods with satellite data and real-time, on-site sensors data (IoT – Internet of Things), reducing the cost and efforts of water quality monitoring.

The objective of this paper is to conduct a comprehensive review of recently published research utilizing AI techniques in practical WQI and WQC applications. By examining results from diverse geographical locations and datasets, this study aims to provide a detailed analysis of available tools, their applicability, limitations, and areas for future research.

The databases such as scholar.google.com, scopus.com, and sciencedirect.com have been searched for publications on ‘artificial intelligence water quality index’ and ‘machine learning water quality index’ for the period 2015–2024 using terminologies as indicated in Table 1. Ninety publications were retrieved and were further analysed in terms of addressing the topic of the review, and the information regarding the WQI methodology and AI tools that were used in the research. Fifty-six original research articles were included in the review, as they were found to include sufficient and relevant information about the water quality datasets and the methods used for processing them. The papers were reviewed for:

  • - Location and type of study site;

  • - Reference period of monitoring data;

  • - WQI methodology and included parameters;

  • - AI tools used in data processing and results regarding their performance.

Table 1

Terminology used in reviewed papers (abbreviations in brackets)

Water types Coastal Water (CW), Drinking Water (DW), Groundwater (GW), Irrigation Water (IW), Surface Water (SW), Wastewater Treatment Plant (WWTP) 
Water quality parameters Dissolved Oxygen (DO), Hydrogen Potential (pH), Total Dissolved Solids (TDS), Electrical Conductivity (EC), Total Hardness (TH), Calcium (Ca) ions, Sodium (Na) ions, Potassium (K), Magnesium (Mg) ions, Total Alkalinity (TA), Chemical Oxygen Demand (COD), Suspended Solids (SS), Temperature (T), Biological Oxygen Demand (BOD/BOD5), Nitrates (NO3), Nitrites (NO2), Ammonium (NH4), Ammonia (NH3), Dissolved Inorganic Nitrogen (DIN), Total Organic Nitrogen (TON), Phosphates (PO4), Total Phosphorus (TP), Sulphates (SO4), Chloride (Cl), Bicarbonate (HCO3), Turbidity (TU), Transparency (TR), Faecal Coliform (FC), Total Coliform Bacteria (TC), Salinity (SAL), Molybdate Reactive Phosphorus (MRP), Chlorophyll a (CHL), Fats, Oils, and Grease (FOG), Manganese (Mn), Arsenic (As), Nickel (Ni), Boron (B), Lead (Pb), Zinc (Zn), Fluoride (F), Iron (Fe), Chromium (Cr), Cadmium (Cd), Copper (Cu), Selenium (Se), Mercury (Hg), Cobalt (Co), Potential Salinity (PS), Sodium Adsorption Ratio (SAR), Exchangeable Sodium Percentage (ESP), Magnesium Adsorption Ratio (MAR), Anionic Surfactant (LAS), Kelly Index (KI), Sodium Residual Carbonate (RSC), Magnesium Hazard (MH), Prussiate (CN), Sulphide (S) 
WQI calculation method Department of Environment Water Quality Index (DOE-WQI), Weighted Arithmetic Water Quality Index (WA-WQI), Canada Council of Ministries of the Environment Water Quality Index (CCME-WQI), National Sanitation Foundation Water Quality Index (NSF-WQI), Oregon Water Quality Index (OWQI), Minimum Operator Index (MOI), Irish Water Quality Index (IEWQI), Groundwater Quality Index (GQI/GWQI), Raw Water Quality Index (RWQI), Raw Water Quality Index Fuzzy (RWQIF), Irrigation Water Quality Indices (IWQIs), British Colombia WQI (BCWQI), Vietnam Water Quality Index (VN_WQI), Entropy Water Quality Index (EWQI), Weighted Quadratic Mean (WQM) WQI, Santiago-Guadalajara River (SGR-WQI), Nemerow Pollution Index (NPI), Water Quality Index-Department of Environment (Malaysia) (WQI-JAS), WQI + Fuzzy Hierarchical Analysis Process of the Water Quality Index (FAHP-WQI), Fuzzy-GIS-based Groundwater Quality Index (FGQI), Log-Weighted Quadratic Mean (LWQM), Sinusoidal Weighted Mean (SWM), Scottish Research Development Department (SRDD) Index, West Java (WJ) Index, Log-Weighted Quadratic Mean (LQM), Sinusoidal Weighted Mean (SWM), Heavy Metal Pollution Index (HPI), Normalized Difference Water Index (NDWI), Automated Water Extraction Index with no shadows (AWEI-nsh) 
Classification Water Quality Classification (WQC) 
AI tools Adaptive Boosting (AdaBoost), Adaptive Neuro-Fuzzy Inference System (ANFIS), Additive Regression (AR), Artificial Intelligence (AI), Artificial Neural Networks (ANN), Back Propagation Neural Networks (BPNN), Bagging Classifier (BC), Bagged Tree Model (BTM), Bootstrap, CATBoost, Convolutional Neural Network (CNN), Cubist Regression Trees (CB), Decision Tree Regressor (DT), Deep Feed-Forward Neural Network (DFFNN), Deep Neural Network (DNN), Discriminant Analysis (DA), Elastic Net Regression (ENR), Empirical Predictive Modeling (EPM), Ensemble Trees (ET), Extra Trees Regression (ETR), Extreme Learning Machine (ELM), Factor Analysis (FA), Feed-Forward Neural Network (FFNN), Gaussian Naïve Bayes (GNB), Gaussian Process Regression (GPR), Generalized Additive Models (GAM), Gradient Boosting Regressor (GB)/Gradient Boosting Machine (GBM), Gradient Boosted Trees (GBT), Histogram-based Gradient Boosting (HGBM), Isolation Forest (IF), K-Nearest Neighbors (KNN) Regressor, Kernel Approximation Regression (KAR), Kernel Density Estimation (KDE), Lasso Regression (LR), Levenberg–Marquardt three-layer back propagation algorithm (LMBP), Light Gradient Boosting (LightGBM/LGB), Linear Discriminant Analysis (LDA), Linear Regression (LR), Locally Weighted Linear Regression (LWLR), Logistic Regression (LR), Long Short-Term Memory (LSTM), M5P Tree (M5P), Machine Learning (ML), Mamdani Fuzzy Logic (MFL), Monte Carlo Simulation (MCS) (for model uncertainty), Multilayer Perceptron (MLP), Multilinear Regression (MLR), Multinomial Logistic Regression (MNLR), Multivariate Adaptive Regression Splines (MARS), Naïve Bayes (NB), Neural Net (NN), Neural Network Ensemble (NNE), Partial Least Squares Regression (PLSR), Particle Swarm Optimization (PSO), Polynomial Regression (PR), Principal Components Regression (PCR), Radial Basis Function Neural Network (RBFNN), Random Forest (RF), Random Subspace (RSS), Recurrent Neural Networks (RNN), Reduced Error Pruning Tree (REPT), Regression Trees (RT), Ridge Regression (RR), Stepwise Regression (SW), Stochastic Gradient Descent (SGD), Support Vector Machines (SVM), Support Vector Regressor (SVR), Takagi-Sugeno Fuzzy Neural Network, Extreme Gradient Boosting XGBoost Regressor (XGBR), Wavelet De-noising Technique-Based Augmented Neuro-Fuzzy Inference System (WDT-ANFIS) 
Water types Coastal Water (CW), Drinking Water (DW), Groundwater (GW), Irrigation Water (IW), Surface Water (SW), Wastewater Treatment Plant (WWTP) 
Water quality parameters Dissolved Oxygen (DO), Hydrogen Potential (pH), Total Dissolved Solids (TDS), Electrical Conductivity (EC), Total Hardness (TH), Calcium (Ca) ions, Sodium (Na) ions, Potassium (K), Magnesium (Mg) ions, Total Alkalinity (TA), Chemical Oxygen Demand (COD), Suspended Solids (SS), Temperature (T), Biological Oxygen Demand (BOD/BOD5), Nitrates (NO3), Nitrites (NO2), Ammonium (NH4), Ammonia (NH3), Dissolved Inorganic Nitrogen (DIN), Total Organic Nitrogen (TON), Phosphates (PO4), Total Phosphorus (TP), Sulphates (SO4), Chloride (Cl), Bicarbonate (HCO3), Turbidity (TU), Transparency (TR), Faecal Coliform (FC), Total Coliform Bacteria (TC), Salinity (SAL), Molybdate Reactive Phosphorus (MRP), Chlorophyll a (CHL), Fats, Oils, and Grease (FOG), Manganese (Mn), Arsenic (As), Nickel (Ni), Boron (B), Lead (Pb), Zinc (Zn), Fluoride (F), Iron (Fe), Chromium (Cr), Cadmium (Cd), Copper (Cu), Selenium (Se), Mercury (Hg), Cobalt (Co), Potential Salinity (PS), Sodium Adsorption Ratio (SAR), Exchangeable Sodium Percentage (ESP), Magnesium Adsorption Ratio (MAR), Anionic Surfactant (LAS), Kelly Index (KI), Sodium Residual Carbonate (RSC), Magnesium Hazard (MH), Prussiate (CN), Sulphide (S) 
WQI calculation method Department of Environment Water Quality Index (DOE-WQI), Weighted Arithmetic Water Quality Index (WA-WQI), Canada Council of Ministries of the Environment Water Quality Index (CCME-WQI), National Sanitation Foundation Water Quality Index (NSF-WQI), Oregon Water Quality Index (OWQI), Minimum Operator Index (MOI), Irish Water Quality Index (IEWQI), Groundwater Quality Index (GQI/GWQI), Raw Water Quality Index (RWQI), Raw Water Quality Index Fuzzy (RWQIF), Irrigation Water Quality Indices (IWQIs), British Colombia WQI (BCWQI), Vietnam Water Quality Index (VN_WQI), Entropy Water Quality Index (EWQI), Weighted Quadratic Mean (WQM) WQI, Santiago-Guadalajara River (SGR-WQI), Nemerow Pollution Index (NPI), Water Quality Index-Department of Environment (Malaysia) (WQI-JAS), WQI + Fuzzy Hierarchical Analysis Process of the Water Quality Index (FAHP-WQI), Fuzzy-GIS-based Groundwater Quality Index (FGQI), Log-Weighted Quadratic Mean (LWQM), Sinusoidal Weighted Mean (SWM), Scottish Research Development Department (SRDD) Index, West Java (WJ) Index, Log-Weighted Quadratic Mean (LQM), Sinusoidal Weighted Mean (SWM), Heavy Metal Pollution Index (HPI), Normalized Difference Water Index (NDWI), Automated Water Extraction Index with no shadows (AWEI-nsh) 
Classification Water Quality Classification (WQC) 
AI tools Adaptive Boosting (AdaBoost), Adaptive Neuro-Fuzzy Inference System (ANFIS), Additive Regression (AR), Artificial Intelligence (AI), Artificial Neural Networks (ANN), Back Propagation Neural Networks (BPNN), Bagging Classifier (BC), Bagged Tree Model (BTM), Bootstrap, CATBoost, Convolutional Neural Network (CNN), Cubist Regression Trees (CB), Decision Tree Regressor (DT), Deep Feed-Forward Neural Network (DFFNN), Deep Neural Network (DNN), Discriminant Analysis (DA), Elastic Net Regression (ENR), Empirical Predictive Modeling (EPM), Ensemble Trees (ET), Extra Trees Regression (ETR), Extreme Learning Machine (ELM), Factor Analysis (FA), Feed-Forward Neural Network (FFNN), Gaussian Naïve Bayes (GNB), Gaussian Process Regression (GPR), Generalized Additive Models (GAM), Gradient Boosting Regressor (GB)/Gradient Boosting Machine (GBM), Gradient Boosted Trees (GBT), Histogram-based Gradient Boosting (HGBM), Isolation Forest (IF), K-Nearest Neighbors (KNN) Regressor, Kernel Approximation Regression (KAR), Kernel Density Estimation (KDE), Lasso Regression (LR), Levenberg–Marquardt three-layer back propagation algorithm (LMBP), Light Gradient Boosting (LightGBM/LGB), Linear Discriminant Analysis (LDA), Linear Regression (LR), Locally Weighted Linear Regression (LWLR), Logistic Regression (LR), Long Short-Term Memory (LSTM), M5P Tree (M5P), Machine Learning (ML), Mamdani Fuzzy Logic (MFL), Monte Carlo Simulation (MCS) (for model uncertainty), Multilayer Perceptron (MLP), Multilinear Regression (MLR), Multinomial Logistic Regression (MNLR), Multivariate Adaptive Regression Splines (MARS), Naïve Bayes (NB), Neural Net (NN), Neural Network Ensemble (NNE), Partial Least Squares Regression (PLSR), Particle Swarm Optimization (PSO), Polynomial Regression (PR), Principal Components Regression (PCR), Radial Basis Function Neural Network (RBFNN), Random Forest (RF), Random Subspace (RSS), Recurrent Neural Networks (RNN), Reduced Error Pruning Tree (REPT), Regression Trees (RT), Ridge Regression (RR), Stepwise Regression (SW), Stochastic Gradient Descent (SGD), Support Vector Machines (SVM), Support Vector Regressor (SVR), Takagi-Sugeno Fuzzy Neural Network, Extreme Gradient Boosting XGBoost Regressor (XGBR), Wavelet De-noising Technique-Based Augmented Neuro-Fuzzy Inference System (WDT-ANFIS) 

The steps involved in the application of machine learning (ML) tools in water quality assessment are presented schematically in Figure 1.
Figure 1

General representation of ML application on water quality data (author's compilation).

Figure 1

General representation of ML application on water quality data (author's compilation).

Close modal

WQI prediction tools are based on regression, which means that data from the past are analysed to identify patterns and relationships between independent variables (water quality parameters) and dependent variables (WQI value) and to predict how values will evolve. For a model to be reliable, it needs to be built on sufficient data that covers a relevant period of time. As an example, if water monitoring data are available for a period of 10 years, the first seven years can be used for model training, and the last three years to verify if the WQI values predicted by the model are in line with those calculated from actual data.

WQC classification tools are ML models that assign a label to a dataset, for instance ‘good water quality’. These models also need to be trained on real data and checked for accuracy.

A selection of AI tools and their application is presented in Table 2.

Table 2

Selected AI tools and their possible application for WQI and/or WQC

AI toolforFeaturesResults
Random forest (RF) WQI, WQC Handles high-dimensional data well, robust against overfitting, and provides feature importance Achieved high accuracy (Shams et al. 2023; Solangi et al. 2024
Support vector machine (SVM) WQI, WQC Effective in high-dimensional spaces, robust against overfitting, especially with the right kernel Demonstrated high accuracy in predicting WQI and classifying water quality (Haghiabi et al. 2018
Artificial neural network (ANN) WQI, individual parameters Capable of capturing complex non-linear relationships, adaptable to various types of data Effective in various studies with high accuracy in prediction tasks (Rana et al. 2023
Long short-term memory (LSTM) WQI Excellent for time-series data, capable of learning long-term dependencies Achieved high accuracy in studies often outperforming other models in time-series predictions (Nguyen et al. 2023
Extreme gradient boosting (XGBoost) WQI, WQC High performance, efficient computation, and strong predictive power Consistently high accuracy often outperforming other models (Solangi et al. 2024
Decision tree classifier (DT) WQC Simple to interpret, handles both numerical and categorical data, useful for feature selection High accuracy particularly effective in combination with ensemble methods (Solangi et al. 2024
K-nearest neighbors (KNN) WQC Simple and intuitive, effective for small datasets Good accuracy though performance can degrade with high-dimensional data (Zamri et al. 2022
Adaptive neuro-fuzzy inference system (ANFIS) WQI, WQC Combines neural networks and fuzzy logic principles, effective for modeling complex relationships Demonstrated good performance in various studies (Haghiabi et al. 2018
CatBoost WQC Handles categorical data well, robust against overfitting, and efficient computation Achieved high accuracy in studies often used in ensemble methods for improved performance (Nasir et al. 2022
Multilayer perceptron (MLP) WQI, WQC Capable of learning complex patterns, adaptable to various types of data Effective with high accuracy in prediction tasks (Palabıyık & Akkan 2024
Logistic regression (LR) WQC Simple to implement, interpretable, and effective for binary classification problems Good accuracy often used as a baseline model (Nallakaruppan et al. 2024
Naive Bayes (NB) WQC Simple, fast, and effective for large datasets Good accuracy, particularly effective for text classification and categorical data (Ilić et al. 2022
AI toolforFeaturesResults
Random forest (RF) WQI, WQC Handles high-dimensional data well, robust against overfitting, and provides feature importance Achieved high accuracy (Shams et al. 2023; Solangi et al. 2024
Support vector machine (SVM) WQI, WQC Effective in high-dimensional spaces, robust against overfitting, especially with the right kernel Demonstrated high accuracy in predicting WQI and classifying water quality (Haghiabi et al. 2018
Artificial neural network (ANN) WQI, individual parameters Capable of capturing complex non-linear relationships, adaptable to various types of data Effective in various studies with high accuracy in prediction tasks (Rana et al. 2023
Long short-term memory (LSTM) WQI Excellent for time-series data, capable of learning long-term dependencies Achieved high accuracy in studies often outperforming other models in time-series predictions (Nguyen et al. 2023
Extreme gradient boosting (XGBoost) WQI, WQC High performance, efficient computation, and strong predictive power Consistently high accuracy often outperforming other models (Solangi et al. 2024
Decision tree classifier (DT) WQC Simple to interpret, handles both numerical and categorical data, useful for feature selection High accuracy particularly effective in combination with ensemble methods (Solangi et al. 2024
K-nearest neighbors (KNN) WQC Simple and intuitive, effective for small datasets Good accuracy though performance can degrade with high-dimensional data (Zamri et al. 2022
Adaptive neuro-fuzzy inference system (ANFIS) WQI, WQC Combines neural networks and fuzzy logic principles, effective for modeling complex relationships Demonstrated good performance in various studies (Haghiabi et al. 2018
CatBoost WQC Handles categorical data well, robust against overfitting, and efficient computation Achieved high accuracy in studies often used in ensemble methods for improved performance (Nasir et al. 2022
Multilayer perceptron (MLP) WQI, WQC Capable of learning complex patterns, adaptable to various types of data Effective with high accuracy in prediction tasks (Palabıyık & Akkan 2024
Logistic regression (LR) WQC Simple to implement, interpretable, and effective for binary classification problems Good accuracy often used as a baseline model (Nallakaruppan et al. 2024
Naive Bayes (NB) WQC Simple, fast, and effective for large datasets Good accuracy, particularly effective for text classification and categorical data (Ilić et al. 2022

The research papers examined in this study encompassed a wide variety of geographical locations, time-spans of data collection, WQI calculation methods, and AI tools employed for data processing.

Most of the studies are on monitoring data from India (23.2%), followed by Pakistan (12.5%) and China, Ireland, and Malaysia (8.9% each) (Figure 2). A large number of studies are for surface water (SW) (51.8%), 33.9% for groundwater (GW), 8.9% for coastal water (CW), 3.6% for irrigation water (IW), and 1.8% for drinking water (DW) (Figure 2).
Figure 2

Distribution of study areas (a) and types of water (b).

Figure 2

Distribution of study areas (a) and types of water (b).

Close modal

Three main categories can be distinguished based on the purpose and methodology of the studies:

  • - Complex datasets, including a large number of parameters that are used to calculate WQIs, followed by analysis of the influence of individual parameters on the final results and comparison between results obtained with all data vs. smaller datasets that still allow accurate prediction of water quality, in order to reduce the number of variables.

  • - Different methodologies for WQI calculation applied to the same dataset in order to assess their performance and improve the calculation method.

  • - Different ML methods applied to the same dataset and same WQI methodology, followed by assessment of performance by comparing the results of the AI models with those given by real monitoring data.

For instance, in a study carried out on monitoring data from Thailand for the period 2016–2021, it has been possible to reduce the number of parameters for WQI calculation from 13 to four, using ANN and the Bootstrap method (Chawishborwornworng et al. 2024). A long-term study on lake water in Finland (1980–2023) has shown that the long short-term memory model was the least sensitive model when COD and TP were removed from inputs, compared with SVR, RF, and ANN, using the British Columbia WQI (Kim et al. 2024). Another study in China has allowed the reduction of the number of parameters from 22 to nine, using redundancy analysis (RDA), on a dataset from 2017 (Li et al. 2021). A study on irrigation water in Vietnam found that coliform, dissolved oxygen, turbidity, and total suspended solids are the most important parameters for water quality assessment (Lap et al. 2023). In another study, electrical conductivity had the highest influence on WQI for groundwater, while pH had the lowest influence (Raheja et al. 2022). However, there is also a report where 14 of 17 parameters included in WQI calculations were found to be significant. Moreover, the model was found to perform well also with only 12 parameters instead of 17 (Fernández Del Castillo et al. 2022). A noteworthy result is that, despite the superior performance of the 10-parameter model, a significantly simpler model incorporating only BOD, turbidity, and phosphate demonstrated a remarkably high level of accuracy in predicting river water quality (Asadollah et al. 2021).

Some of the reviewed papers aim to improve the water quality index methodology with the help of AI tools. The fuzzy-GIS-based groundwater quality index (FGQI), combining geographical information system (GIS) with the groundwater quality index, is proposed as a reliable tool for groundwater quality assessment (Jha et al. 2020). Another study made the first attempt to include heavy metal concentrations in groundwater assessment, demonstrating the robustness of the model (Sajib et al. 2023). The automated water extraction index with no shadows (AWEI-nsh) proposed by another group, using remote sensing data, may be applied when shadows are excluded from the image (Li et al. 2021). The WQI model could also be improved by adjusting parameter weight values and using new aggregation functions, namely sinusoidal weighted mean and log-weighted quadratic mean (Ding et al. 2023). In addition, using ML methods, satellite data can be used to improve water quality monitoring in coastal waters that are optically complex (Hafeez et al. 2019).

A comparison between models using a minimum number of parameters (coliform, pH, temperature, turbidity, and total dissolved solids) indicates that gradient boosting and polynomial regression performed better in predicting WQI, whereas MLP performed better in predicting WQC (Ahmed et al. 2019). However, it should be mentioned that the regression model only included a relatively short period of time (2009–2012). A study carried out by Abba et al. (2020) on a limited number of parameters indicated that the Adaptive Neuro-Fuzzy Inference System (ANFIS) model performed best at one location and BPNN at two other locations, which seems to indicate that the results may be specific to certain regions.

Two studies indicate that MLs are efficient tools to predict irrigation water quality and AdaBoost can predict all parameters (El Bilali et al. 2021; Trabelsi & Bel Hadj Ali 2022).

Several studies report successful application of regression models to predict WQI, but it is not clear how regression was possible for monitoring data collected during one year or even one set of samples collected from different locations (Kadam et al. 2019, 2022; Hussein et al. 2023; Ibrahim et al. 2023; Uddin et al. 2023b; Abbas et al. 2024). Aslam et al. (2022) used data for two years (2020–2021) and found that the hybrid RT-ANN algorithm can produce good results for short-term data, but the stability of the model would be improved using long-term datasets.

Both for regression models, as well as for classifiers, the research data show that there is no model that performs best for a majority of the datasets. In fact, almost each study reports a different method that gave the best results in their case, which seems to indicate that the outcomes depend on the AI tool, but also on the characteristics of the dataset and perhaps on the WQI or classification methodology. In each situation, there was either a different type of water or a different set of variables or WQI calculation method, so it is not possible to make comparisons between studies.

The AI tools reported by several studies to perform best in the case of WQI prediction are gradient boosting (Ahmed et al. 2019; Nguyen et al. 2023; Abbas et al. 2024), ANFIS (Tiwari et al. 2018; Hmoud Al-Adhaileh & Waselallah Alsaade 2021; Ibrahim et al. 2023), and SVM (Li et al. 2021; Shamsuddin et al. 2022; Ibrahim et al. 2023).

In the case of classifiers, several studies indicate good performance of XGBoost or other gradient boost method (Shams et al. 2023; Uddin et al. 2023b; Singh et al. 2024; Solangi et al. 2024), as well as SVM (Derdour et al. 2022; Hussein et al. 2023).

The review has revealed a diversity of datasets, water quality assessment methodologies and AI tools used for modelling, so this field appears to be very far from a world-wide standardization.

ML methods, including gradient boosting and polynomial regression, have shown effectiveness in predicting WQI, although results can vary based on the dataset and region. The research indicates that no single AI model consistently outperforms others across different datasets, suggesting that outcomes depend on the specific AI tool, dataset characteristics, and WQI methodology used. Overall, AI tools like gradient boosting, ANFIS, and SVM have been reported to perform well in WQI prediction, while classifiers like XGBoost and SVM have shown good performance in classification tasks.

The reviewed studies and the main results are presented in Table 3.

Table 3

Reviewed studies and their main results

Location/type of wateraPeriodsWQI method/parametersAI toolsResultsObservations/study casesReferences
Ukraine/SW 2000–2021 WQI/BOD5, SS, DO, NO3, NO2, SO4, PO4, Cl, NH4 AAN, ELM, DTR, RF, BBA, GP, KNN, SVR, XGBR GP, SVR, and XGBR 100% reliability (R2 = 1) Southern Bug River Masood et al. (2023)  
Ireland/CW 2019–2020 Eight WQI models/NH3, TR, TON, T, DO, NH4, pH, SAL, MRP, BOD, CHL MCS, GPR WQM and RMS models were found to exhibit a higher prediction accuracy Cork Harbour coastal water Uddin et al. (2023a)  
China/GW Not specified WQI Uddin/pH, NH3, Mn, Ni, B, Pb, Zn, F, COD, Fe DT, RF, XGBR XGBR model surpasses DT and RF models in water quality prediction Groundwater Quality at the Yopurga Landfill Zheng et al. (2024)  
Malaysia/SW 2012–2018 DOE-WQI/DO, NH3, BOD, COD, SS, pH ANN, SVM, RF, NB RF classifier outperformed NB, ANN, and SVM Langat Basin in Selangor Suwadi et al. (2022)  
Pakistan/SW 2012–2019 WA-WQI, CCME-WQI, NSF-WQI, OWQI, MOI/pH, DO, EC, TU, FC, T DT, KNN, LR, MLP, NB DT algorithm had the highest classification accuracy of 99.6% Rawal Dam, Islamabad Ahmed et al. (2021)  
Malaysia/SW 2001–2010 DOE-WQI custom/DO, BOD, COD, SS, NH3, pH RBFNN, BPNN The BPNN model performed the best results of R2 = 0.7007 Langat River and Klang River Hameed et al. (2017)  
Vietnam/IW 2005–2018 BOD5, NH4, PO4, TU, TSS, TC, DO XGBoost LSTM, RNN Coefficients of determination ranging from 0.84 (RNN) to 0.96 (XGBoost) Red River Delta irrigation water Nguyen et al. (2023)  
Malaysia/SW 2005–2014 DOE-WQI/DO, BOD, COD, NH3, SS, pH BPNN CopulaGAN and TVAE outperformed other methods Selangor River and Skudai River Chia et al. (2023)  
India/GW 2013–2014 GQI-10, GQI-7/TDS, NO3, Ca, Mg, Na, Cl, K, F, SO4, TH Fuzzy-GIS-based Groundwater Quality Index (FGQI) FGQI model can predict groundwater quality better than GQI-10 and GQI-7 models Tiruchirappalli district, Tamil Nadu state in the southern part of India Jha et al. (2020)  
Ireland/CW 2017–2022 IEWQI/ pH, T, SAL, BOD5, DO, TR, DIN, MRP, TON LR, Regression Trees, SVM, GPR, KARs, ET, NN IEWQI model is effective in evaluating the impact of various anthropogenic pressures Cork Harbour Uddin et al. (2023c)  
Thailand/SW 2016–2021 pH, DO, T, BOD, FC, TSS, TDS, TC, TH, TU, TP, NO3, NH3 ANN Bootstrap ANN had excellent performance compared with other models in terms of accuracy R = 0.993 Lower Mun River Basin Chawishborwornworng et al. (2024)  
Hong Kong/SW 1999–2015 CHL, SS, TU/Remote sensing data SVR, RF, ANN, CB, EPM ANN exhibits the best performance R = 0.9, Machine learning methods outperformed the multivariate regression models Pearl River Estuary Hafeez et al. (2019)  
Ireland/CW 2022 IEWQI/pH, DO, SAL, BOD5, T, TR, TON, MRP, DIN IF, KDE R2 increased from 0.92 to 0.95 when data outliers were removed Cork Harbour Uddin et al. (2024)  
Algeria/GW Not specified EC, pH, Na, K, SO4, NO3, Ca, Mg, Cl, HCO3 DT, KNN, DA, SVM, ET SVM classifier obtained the highest forecast accuracy, with 95.4% 12 municipalities of the Wilaya of Naâma in Algeria Derdour et al. (2022)  
Iran/GW 2019 GQI, GWQI/a data-fusion index based on four pollutants: Mn, As, Pb, and Fe Mamdani fuzzy logic (MFL), SVM, ANN, RF RF (R2 = 0.995) and MFL (R2 = 0.921) had the best and worst performances, respectively Gulfepe-Zarinabad sub-basin in northwest Iran, 28 groundwater samples Nadiri et al. (2022)  
Bangladesh/GW Not specified GWQI/T, pH, EC, TDS, Zn, Fe, Mn, Cr, Cd, Cu ET, GPR, LR, SVM, ANN, RT The GWQI model had high sensitivity (R2 = 1.0) Savar sub-district of Bangladesh groundwater Sajib et al. (2023)  
India/SW Not specified DO, TC, BOD, NO3, pH, EC NN, RF, MNLR, SVM, BTM MLR highest accuracy at 99.83%, SVM lowest accuracy at 96.98% Datasets from the Kaggle website Hassan et al. (2021)  
Pakistan/SW 2009–2012 FC, pH, T, TU, TDS, NO3 MLR, PR, RF, GB, SVM, RR, ENR, NN, MLP, GNB, LR, SGD, KNN, DT, BC GB and PR performed better in predicting WQI, MLP performed better in predicting WQC Rawal Water Lake Ahmed et al. (2019)  
India/GW Not specified TU, SO4, TH, MGH, DO, BOD COD, NO3, As CNN, DNN, RNN >95% accuracy based on R2 values, RNN less precise Gold mining sites of Kolar Gold Fields, Karnataka Gupta et al. (2023)  
Brazil/SW 2009–2014 RWQI, RWQIF CHL, FC, colour, Cyanobacteria, Fe, Mn, pH, TU Fuzzy logic The Spearman correlation coefficient between RWQI and RWQIF was 89% 24 water sources, associated with WWTP in the southeast of Brazil Oliveira et al. (2019)  
Egypt/GW 2020 IWQI/T, pH, EC, TDS, K, Na, Mg, Ca, Cl, SO4, HCO3, CO3, NO3 ANFIS, SVM ANFIS and SVM achieved R2 0.99 and 0.97 in training and 0.97 and 0.76 in testing El Kharga Oasis, Western Desert of Egypt Ibrahim et al. (2023)  
China/SW 2016 pH, HCO3, TP, TN, BOD, NH3, Fe, Cu, Zn, volatile phenol, DO, TDS, Cl, SO4, Na, Ca, Mg, COD, PO4, Cr, remote sensing data PSO + remote sensing spectral indices (difference index, DI; ratio index, RI; and normalized difference index, NDI) The model based on RI, DI, and NDI values of the 1.6 order is much better than the others at predicting the water quality index of the study area (R2 = 0.92) In the Ebinur Lake Watershed, there are two prominent absorption features situated around 700 and 950 nm Wang et al. (2017)  
China/SW 2012–2015 HPI/COD, BOD, NH3, petroleum, TP, F, LAS, Pb, Cu, Zn, Se, As, Cd, Cr Takagi-Sugeno fuzzy neural network Ammonia nitrogen and total phosphorus were the main contaminants in the Huangshui River The Huangshui River is a major tributary of the upper Yellow River Zhao et al. (2022)  
Tunisia/GW, IW 2019–2021 IWQ/TDS, PS, SAR, ESP, MAR, T, pH, EC RF, SVR, ANN, AdaBoost AdaBoost model is best for predicting all parameters (r 0.88–0.89) Downstream Medjerda River Basin Trabelsi & Bel Hadj Ali (2022)  
Morocco/GW 2009–2019 IWQ/EC, pH, T, Cl, SO4, CO3, HCO3, NO3, NO2, NH4, Na, K, Ca, Mg AdaBoost, SVR, RF, ANN AdaBoost performed best, followed by RF Berrechid Aquifer Groundwater El Bilali et al. (2021)  
India/SW 1999–2010 WQI/DO, pH, BOD, NH3, T BPNN, ANFIS, SVR, MLR, NNE ANFIS was best for Nizamuddin station, while BPNN was best for Palla and Udi (Chambal) R (>0.9) Nizamuddin, Palla and Udi (Chambal), across the Yamuna River, India Abba et al. (2020)  
Finland/SW 1980–2023 BCWQI/EC, DO, COD, pH, TP, TU, SD SVR, RF, ANN, LSTM LSTM is the least sensitive model to exclusion of COD and TP, R = 0.91 Lake Paijanne, Finland Kim et al. (2024)  
Egypt/GW Not specified GWQI/pH, EC, TDS, Na, SAR, PI, KI, RSC, MH SVM SVM WQI accuracy values ranging from 0.88 to 0.90 Abu-Sweir and Abu-Hammad, Ismalia, Egypt Abu El-Magd et al. (2023)  
Pakistan/GW 2022 WQI/EC, pH, TDS, HCO3, Cl, SO4, Ca, Mg, Na, K, NO3, F, Fe, As DT, SVM, KNN, ET, DA SVM performed as best classifier, accuracy 90.8% for raw data and 89.2% for normalized data Sakrand, province of Sindh Hussein et al. (2023)  
China/SW 2017 WQI/NH3, COD, BOD5, DO, TN, TP, TH, SS, Chroma, TU PO4, Cr, SO3, Fe, Cu, Zn, volatile phenol, Cl, Co, SAL, TDS pH RF, SVM, PLSR, PLSR-SVM Sentinel-2MSI data at 10 m resolution NDWI PLSR-SVM provided better WQI than the other models R2v = 0.87 TDS, COD, and TN are the most influential in WQI. Proposed AWEI-nsh  Li et al. (2021)  
Malaysia/SW 2009–2010 T, EC, SAL, NO3, TU, PO4, Cl, K, Na, Mg, Fe, FC ANFIS, RBF-ANN, MLP-ANN, WDT-ANFIS WDT-ANFIS model predicted well all the parameters (R2 ≥ 0.9) Johor River Basin Najah Ahmed et al. (2019)  
Pakistan/GW 2022 WQI/TDS, Na, K, Ca, Mg, HCO3, SO4, Cl, pH, EC, NO3, well depth RF, GB, SVM, XGBoost, KNN, DT RF and GB lead with 95 and 96% accuracy, SVM 92%. KNN 84%, DT 77% 422 data samples from Mirpurkash Abbas et al. (2024)  
Pakistan/GW Not specified EC, pH, TDS, Ca, Mg, TH, Cl, NO3, NO2, SO4 LR, DT, XGBoost, RF, KNN DT and XGBoost achieve accuracies of 100%. RF 88%, KNN 75%, LR 50% Pano Aqil city, Pakistan, Indus River Solangi et al. (2024)  
India/SW 2005–2014 WQI/DO, pH, EC, BOD, NO3, FC, T, TC ANFIS, FFNN, KNN ANFIS accuracy 96.17% for predicting WQI, FFNN 100% accuracy for WQC Different locations in India (1,679 samples from 666 different sources) Hmoud Al-Adhaileh & Waselallah Alsaade (2021)  
Pakistan/SW Not specified COD, TOC, NH3, As, Ni, Zn, oil and grease AdaBoost, KNN, GB, RF, SVR, BR GB performed best R2 = 0.88 training, R2 = 0.85 testing Aik-Stream, industrially polluted, 150 sites Ejaz et al. (2024)  
China/SW 2021 WQI/DO, COD, BOD5, NH3, TP, TN, F, CN, S, Se, As, Cu, Zn, Hg, Cd, Cr, Pb, pH LightGBM Proposal on new aggregation functions and machine learning to improve water quality assessment 17 sampling sites in the Chaobai River Basin Ding et al. (2023)  
Ireland/CW 2019 Seven WQI models/T, pH, DO, TON, NH4, MRP, BOD5, TR, CHL, DIN SVM, NB, RF, KNN, XGBoost KNN (100% correct) and XGBoost (99.9% correct) were best for all seven WQI models 29 monitoring sites of the Cork Harbour Uddin et al. (2023b)  
India/SW 1996–2012 pH, EC, Cl, NO3, NH4, FC Fuzzy C-means FCM-ANFIS and subtractive clustering SC-ANFIS SC-ANFIS (R2 = 0.9919) gave more accurate results than FCM-ANFIS Eight different monitoring stations across River Satluj in northern India Tiwari et al. (2018)  
Algeria/GW 1999–2020 WQI/TDS, EC, T, pH, TH, Ca, Mg, Na, K, Cl, HCO3, SO4 MLR, RF, M5P, RSS, AR, ANN, SVR, LWLR MLR model has higher accuracy in first scenario, RF better in second scenario R = 0.9984 Illizi region, southeast Algeria, 114 samples from 57 wells Kouadri et al. (2021)  
Vietnam/SW 2007–2020 VN_WQI/pH, TU, T, TSS, DO, BOD5, NH4, PO4, COD, TC MLR, SVM, DT, RF, MLP The RF model delivers the highest accuracy in predicting the WQI, achieving a similarity score of 0.94 An Kim Hai irrigation system, north of Vietnam Lap et al. (2023)  
India/GW 2016 EWQI and WQI/pH, EC, TH, Ca, Mg, Na, K, HCO3, Cl, SO4, NO3, F DNN, GBM, XGBoost DNN outperformed in predicting both indices Haryana state (India) 392 datasets Raheja et al. (2022)  
Egipt/IW 2020 IWQI/pH, EC, Ca, Mg, Na, K, HCO3, Cl, SO4 SVM, XGBoost, RF, SW, PCR, PLS SW emerged as the best followed by PCR and PLS Bahr El-Baqr, Egypt, 105 water samples Mokhtar et al. (2022)  
India/GW 2015 WQI/pH, EC, TDS, TH, Ca, Mg, Na, K, Cl, HCO3, SO4, NO3, PO4 ANN, MLR, LMBP in ANN The precision level is high in the ANN model Shivganga River basin, Ghat region, 34 well samples Kadam et al. (2019)  
India/GW Not specified WA-WQI/pH, TU, EC, TDS, pH, TH, Cl, F ANN, SVM, RF, XGBoost, MLR XGBoost model gave best results: training R2 = 0.969, testing R2 = 0.987 Ujjain city of Madhya Pradesh in India, 54 samples from the urban area Mohseni et al. (2024)  
India/SW 2004–2014 WQI/pH, EC, DO, BOD, NO3, TC DT, RF, GBT, ANN, SVM for WQC Average accuracy >80 GBT performed better than other models Major rivers and their tributaries of India (n = 3,595) Singh et al. (2024)  
Hong Kong/SW 1998–2017 BOD, COD, DO, EC, NO3, NO2, PO4, pH, T, TU ETR, SVR, DTR ETR model generally yields more accurate WQI predictions (R2test = 0.98) Lam Tsuen River, Tai Po city Asadollah et al. (2021)  
India/SW 2016–2020 WQI/ DO, T, pH, TH, Cl, TC, EC, SO4, Na, PO4, K, BOD, F, NO3 MLP, RF, SVM, NB, DT MLP regressor and MLP classifier outperform other models Bhavani River, Kerala and Tamil Nadu, 10,560 data samples, 31 attributes Nair & Vijaya (2022)  
Ireland/CW 2020 WQM-WQI/t, TON, NH3, NH4, DO, pH, SAL, MRP, BOD, TR, CHL RF, DT, KNN, XGBoost, ExT, SVM, LR, GNB DT, ExT, and GXB could provide accurate and robust results in predicting WQIs Cork Harbour 29 monitoring locations Uddin et al. (2022)  
Mexico/SW 2009–2021 SGR-WQI/Cd, Cr, BOD5, DO, FC, F, FOG, Hg, NH3, NO3, Pb, pH, TSS, S, TDS, T, Zn SLR, MLR, LRR, GAM, LR, LDA GAM was better for WQI, LR for WQC Santiago-Guadalajara River Fernández Del Castillo et al. (2022)  
Iraq/SW WWTP 2015–2019 NPI/BOD, COD, TSS, Cl, pH, SO4, NO3, PO4 ANN, FA A successful ANN model was built based on NPI where the R2 was 0.965 North Rustumiyia WWTP, Diyala River Mohammed & Al-Obaidi (2021)  
Vietnam/SW 2010–2017 T, pH, DO, BOD, COD, TU, TSS, TC, NH4, PO4, turbidity (TUR) AdaBoost, GB, HGBM, LGB, XGBoost, DT, ET, RF, MLP, RBF, DFFNN, CNN All 12 models were good at WQI prediction, XGBoost was best (R2 = 0.989) Four monitoring stations on La Buong River Khoi et al. (2022)  
India/DW 2005–2014 DO, pH, EC, BOD, NO3, FC, TC SVM, RF, LR, DT, CATBoost, XGBoost, MLP CATBoost model was the most accurate classifier with 94.51% 1,679 samples, various Indian states Nasir et al. (2022)  
Malaysia/SW 2012–2016 WQI-JAS/DO, BOD, COD, SS, AN, pH ANN, DT, SVM SVM was the best performing model for predicting WQI Langat River Basin 560 records, 14 monitoring stations Shamsuddin et al. (2022)  
Iran/GW Not specified WQI + FAHP-WQI/EC, SAR, PI, MAR, KR, PS, SO4, Cl, Na, Mg, Ca, HCO3, TDS, pH, FC GEP, M5P, MARS MARS is slightly more accurate than the M5P model for estimating WQI (R = 1 training, R = 0.999 test) 96 deep wells in the Yazd-Ardakan Plain Goodarzi et al. (2023)  
Pakistan/GW 2020–2021 PKWQI/ pH, DO, TDS, ES, SAL, Cl, TH, SO4, NO3 RT, RF, M5P, REPT, BA, CVPS, RFC The RT-ANN algorithm outperformed all other algorithms in terms of accuracy, with the highest RSQ value of 0.951 39 locations COD and BOD were not considered Aslam et al. (2022)  
India/SW 2005–2014 DO, pH, EC, BOD, NO3, FC, TC RF, XGBoost, GB, AdaBoost for WQC. KNN, DT, SVR, MLP for WQI GB gave best results, with a classification accuracy of 99.50%. MLP was best for WQI R2 = 99.8% Lakes and rivers in India Shams et al. (2023)  
Location/type of wateraPeriodsWQI method/parametersAI toolsResultsObservations/study casesReferences
Ukraine/SW 2000–2021 WQI/BOD5, SS, DO, NO3, NO2, SO4, PO4, Cl, NH4 AAN, ELM, DTR, RF, BBA, GP, KNN, SVR, XGBR GP, SVR, and XGBR 100% reliability (R2 = 1) Southern Bug River Masood et al. (2023)  
Ireland/CW 2019–2020 Eight WQI models/NH3, TR, TON, T, DO, NH4, pH, SAL, MRP, BOD, CHL MCS, GPR WQM and RMS models were found to exhibit a higher prediction accuracy Cork Harbour coastal water Uddin et al. (2023a)  
China/GW Not specified WQI Uddin/pH, NH3, Mn, Ni, B, Pb, Zn, F, COD, Fe DT, RF, XGBR XGBR model surpasses DT and RF models in water quality prediction Groundwater Quality at the Yopurga Landfill Zheng et al. (2024)  
Malaysia/SW 2012–2018 DOE-WQI/DO, NH3, BOD, COD, SS, pH ANN, SVM, RF, NB RF classifier outperformed NB, ANN, and SVM Langat Basin in Selangor Suwadi et al. (2022)  
Pakistan/SW 2012–2019 WA-WQI, CCME-WQI, NSF-WQI, OWQI, MOI/pH, DO, EC, TU, FC, T DT, KNN, LR, MLP, NB DT algorithm had the highest classification accuracy of 99.6% Rawal Dam, Islamabad Ahmed et al. (2021)  
Malaysia/SW 2001–2010 DOE-WQI custom/DO, BOD, COD, SS, NH3, pH RBFNN, BPNN The BPNN model performed the best results of R2 = 0.7007 Langat River and Klang River Hameed et al. (2017)  
Vietnam/IW 2005–2018 BOD5, NH4, PO4, TU, TSS, TC, DO XGBoost LSTM, RNN Coefficients of determination ranging from 0.84 (RNN) to 0.96 (XGBoost) Red River Delta irrigation water Nguyen et al. (2023)  
Malaysia/SW 2005–2014 DOE-WQI/DO, BOD, COD, NH3, SS, pH BPNN CopulaGAN and TVAE outperformed other methods Selangor River and Skudai River Chia et al. (2023)  
India/GW 2013–2014 GQI-10, GQI-7/TDS, NO3, Ca, Mg, Na, Cl, K, F, SO4, TH Fuzzy-GIS-based Groundwater Quality Index (FGQI) FGQI model can predict groundwater quality better than GQI-10 and GQI-7 models Tiruchirappalli district, Tamil Nadu state in the southern part of India Jha et al. (2020)  
Ireland/CW 2017–2022 IEWQI/ pH, T, SAL, BOD5, DO, TR, DIN, MRP, TON LR, Regression Trees, SVM, GPR, KARs, ET, NN IEWQI model is effective in evaluating the impact of various anthropogenic pressures Cork Harbour Uddin et al. (2023c)  
Thailand/SW 2016–2021 pH, DO, T, BOD, FC, TSS, TDS, TC, TH, TU, TP, NO3, NH3 ANN Bootstrap ANN had excellent performance compared with other models in terms of accuracy R = 0.993 Lower Mun River Basin Chawishborwornworng et al. (2024)  
Hong Kong/SW 1999–2015 CHL, SS, TU/Remote sensing data SVR, RF, ANN, CB, EPM ANN exhibits the best performance R = 0.9, Machine learning methods outperformed the multivariate regression models Pearl River Estuary Hafeez et al. (2019)  
Ireland/CW 2022 IEWQI/pH, DO, SAL, BOD5, T, TR, TON, MRP, DIN IF, KDE R2 increased from 0.92 to 0.95 when data outliers were removed Cork Harbour Uddin et al. (2024)  
Algeria/GW Not specified EC, pH, Na, K, SO4, NO3, Ca, Mg, Cl, HCO3 DT, KNN, DA, SVM, ET SVM classifier obtained the highest forecast accuracy, with 95.4% 12 municipalities of the Wilaya of Naâma in Algeria Derdour et al. (2022)  
Iran/GW 2019 GQI, GWQI/a data-fusion index based on four pollutants: Mn, As, Pb, and Fe Mamdani fuzzy logic (MFL), SVM, ANN, RF RF (R2 = 0.995) and MFL (R2 = 0.921) had the best and worst performances, respectively Gulfepe-Zarinabad sub-basin in northwest Iran, 28 groundwater samples Nadiri et al. (2022)  
Bangladesh/GW Not specified GWQI/T, pH, EC, TDS, Zn, Fe, Mn, Cr, Cd, Cu ET, GPR, LR, SVM, ANN, RT The GWQI model had high sensitivity (R2 = 1.0) Savar sub-district of Bangladesh groundwater Sajib et al. (2023)  
India/SW Not specified DO, TC, BOD, NO3, pH, EC NN, RF, MNLR, SVM, BTM MLR highest accuracy at 99.83%, SVM lowest accuracy at 96.98% Datasets from the Kaggle website Hassan et al. (2021)  
Pakistan/SW 2009–2012 FC, pH, T, TU, TDS, NO3 MLR, PR, RF, GB, SVM, RR, ENR, NN, MLP, GNB, LR, SGD, KNN, DT, BC GB and PR performed better in predicting WQI, MLP performed better in predicting WQC Rawal Water Lake Ahmed et al. (2019)  
India/GW Not specified TU, SO4, TH, MGH, DO, BOD COD, NO3, As CNN, DNN, RNN >95% accuracy based on R2 values, RNN less precise Gold mining sites of Kolar Gold Fields, Karnataka Gupta et al. (2023)  
Brazil/SW 2009–2014 RWQI, RWQIF CHL, FC, colour, Cyanobacteria, Fe, Mn, pH, TU Fuzzy logic The Spearman correlation coefficient between RWQI and RWQIF was 89% 24 water sources, associated with WWTP in the southeast of Brazil Oliveira et al. (2019)  
Egypt/GW 2020 IWQI/T, pH, EC, TDS, K, Na, Mg, Ca, Cl, SO4, HCO3, CO3, NO3 ANFIS, SVM ANFIS and SVM achieved R2 0.99 and 0.97 in training and 0.97 and 0.76 in testing El Kharga Oasis, Western Desert of Egypt Ibrahim et al. (2023)  
China/SW 2016 pH, HCO3, TP, TN, BOD, NH3, Fe, Cu, Zn, volatile phenol, DO, TDS, Cl, SO4, Na, Ca, Mg, COD, PO4, Cr, remote sensing data PSO + remote sensing spectral indices (difference index, DI; ratio index, RI; and normalized difference index, NDI) The model based on RI, DI, and NDI values of the 1.6 order is much better than the others at predicting the water quality index of the study area (R2 = 0.92) In the Ebinur Lake Watershed, there are two prominent absorption features situated around 700 and 950 nm Wang et al. (2017)  
China/SW 2012–2015 HPI/COD, BOD, NH3, petroleum, TP, F, LAS, Pb, Cu, Zn, Se, As, Cd, Cr Takagi-Sugeno fuzzy neural network Ammonia nitrogen and total phosphorus were the main contaminants in the Huangshui River The Huangshui River is a major tributary of the upper Yellow River Zhao et al. (2022)  
Tunisia/GW, IW 2019–2021 IWQ/TDS, PS, SAR, ESP, MAR, T, pH, EC RF, SVR, ANN, AdaBoost AdaBoost model is best for predicting all parameters (r 0.88–0.89) Downstream Medjerda River Basin Trabelsi & Bel Hadj Ali (2022)  
Morocco/GW 2009–2019 IWQ/EC, pH, T, Cl, SO4, CO3, HCO3, NO3, NO2, NH4, Na, K, Ca, Mg AdaBoost, SVR, RF, ANN AdaBoost performed best, followed by RF Berrechid Aquifer Groundwater El Bilali et al. (2021)  
India/SW 1999–2010 WQI/DO, pH, BOD, NH3, T BPNN, ANFIS, SVR, MLR, NNE ANFIS was best for Nizamuddin station, while BPNN was best for Palla and Udi (Chambal) R (>0.9) Nizamuddin, Palla and Udi (Chambal), across the Yamuna River, India Abba et al. (2020)  
Finland/SW 1980–2023 BCWQI/EC, DO, COD, pH, TP, TU, SD SVR, RF, ANN, LSTM LSTM is the least sensitive model to exclusion of COD and TP, R = 0.91 Lake Paijanne, Finland Kim et al. (2024)  
Egypt/GW Not specified GWQI/pH, EC, TDS, Na, SAR, PI, KI, RSC, MH SVM SVM WQI accuracy values ranging from 0.88 to 0.90 Abu-Sweir and Abu-Hammad, Ismalia, Egypt Abu El-Magd et al. (2023)  
Pakistan/GW 2022 WQI/EC, pH, TDS, HCO3, Cl, SO4, Ca, Mg, Na, K, NO3, F, Fe, As DT, SVM, KNN, ET, DA SVM performed as best classifier, accuracy 90.8% for raw data and 89.2% for normalized data Sakrand, province of Sindh Hussein et al. (2023)  
China/SW 2017 WQI/NH3, COD, BOD5, DO, TN, TP, TH, SS, Chroma, TU PO4, Cr, SO3, Fe, Cu, Zn, volatile phenol, Cl, Co, SAL, TDS pH RF, SVM, PLSR, PLSR-SVM Sentinel-2MSI data at 10 m resolution NDWI PLSR-SVM provided better WQI than the other models R2v = 0.87 TDS, COD, and TN are the most influential in WQI. Proposed AWEI-nsh  Li et al. (2021)  
Malaysia/SW 2009–2010 T, EC, SAL, NO3, TU, PO4, Cl, K, Na, Mg, Fe, FC ANFIS, RBF-ANN, MLP-ANN, WDT-ANFIS WDT-ANFIS model predicted well all the parameters (R2 ≥ 0.9) Johor River Basin Najah Ahmed et al. (2019)  
Pakistan/GW 2022 WQI/TDS, Na, K, Ca, Mg, HCO3, SO4, Cl, pH, EC, NO3, well depth RF, GB, SVM, XGBoost, KNN, DT RF and GB lead with 95 and 96% accuracy, SVM 92%. KNN 84%, DT 77% 422 data samples from Mirpurkash Abbas et al. (2024)  
Pakistan/GW Not specified EC, pH, TDS, Ca, Mg, TH, Cl, NO3, NO2, SO4 LR, DT, XGBoost, RF, KNN DT and XGBoost achieve accuracies of 100%. RF 88%, KNN 75%, LR 50% Pano Aqil city, Pakistan, Indus River Solangi et al. (2024)  
India/SW 2005–2014 WQI/DO, pH, EC, BOD, NO3, FC, T, TC ANFIS, FFNN, KNN ANFIS accuracy 96.17% for predicting WQI, FFNN 100% accuracy for WQC Different locations in India (1,679 samples from 666 different sources) Hmoud Al-Adhaileh & Waselallah Alsaade (2021)  
Pakistan/SW Not specified COD, TOC, NH3, As, Ni, Zn, oil and grease AdaBoost, KNN, GB, RF, SVR, BR GB performed best R2 = 0.88 training, R2 = 0.85 testing Aik-Stream, industrially polluted, 150 sites Ejaz et al. (2024)  
China/SW 2021 WQI/DO, COD, BOD5, NH3, TP, TN, F, CN, S, Se, As, Cu, Zn, Hg, Cd, Cr, Pb, pH LightGBM Proposal on new aggregation functions and machine learning to improve water quality assessment 17 sampling sites in the Chaobai River Basin Ding et al. (2023)  
Ireland/CW 2019 Seven WQI models/T, pH, DO, TON, NH4, MRP, BOD5, TR, CHL, DIN SVM, NB, RF, KNN, XGBoost KNN (100% correct) and XGBoost (99.9% correct) were best for all seven WQI models 29 monitoring sites of the Cork Harbour Uddin et al. (2023b)  
India/SW 1996–2012 pH, EC, Cl, NO3, NH4, FC Fuzzy C-means FCM-ANFIS and subtractive clustering SC-ANFIS SC-ANFIS (R2 = 0.9919) gave more accurate results than FCM-ANFIS Eight different monitoring stations across River Satluj in northern India Tiwari et al. (2018)  
Algeria/GW 1999–2020 WQI/TDS, EC, T, pH, TH, Ca, Mg, Na, K, Cl, HCO3, SO4 MLR, RF, M5P, RSS, AR, ANN, SVR, LWLR MLR model has higher accuracy in first scenario, RF better in second scenario R = 0.9984 Illizi region, southeast Algeria, 114 samples from 57 wells Kouadri et al. (2021)  
Vietnam/SW 2007–2020 VN_WQI/pH, TU, T, TSS, DO, BOD5, NH4, PO4, COD, TC MLR, SVM, DT, RF, MLP The RF model delivers the highest accuracy in predicting the WQI, achieving a similarity score of 0.94 An Kim Hai irrigation system, north of Vietnam Lap et al. (2023)  
India/GW 2016 EWQI and WQI/pH, EC, TH, Ca, Mg, Na, K, HCO3, Cl, SO4, NO3, F DNN, GBM, XGBoost DNN outperformed in predicting both indices Haryana state (India) 392 datasets Raheja et al. (2022)  
Egipt/IW 2020 IWQI/pH, EC, Ca, Mg, Na, K, HCO3, Cl, SO4 SVM, XGBoost, RF, SW, PCR, PLS SW emerged as the best followed by PCR and PLS Bahr El-Baqr, Egypt, 105 water samples Mokhtar et al. (2022)  
India/GW 2015 WQI/pH, EC, TDS, TH, Ca, Mg, Na, K, Cl, HCO3, SO4, NO3, PO4 ANN, MLR, LMBP in ANN The precision level is high in the ANN model Shivganga River basin, Ghat region, 34 well samples Kadam et al. (2019)  
India/GW Not specified WA-WQI/pH, TU, EC, TDS, pH, TH, Cl, F ANN, SVM, RF, XGBoost, MLR XGBoost model gave best results: training R2 = 0.969, testing R2 = 0.987 Ujjain city of Madhya Pradesh in India, 54 samples from the urban area Mohseni et al. (2024)  
India/SW 2004–2014 WQI/pH, EC, DO, BOD, NO3, TC DT, RF, GBT, ANN, SVM for WQC Average accuracy >80 GBT performed better than other models Major rivers and their tributaries of India (n = 3,595) Singh et al. (2024)  
Hong Kong/SW 1998–2017 BOD, COD, DO, EC, NO3, NO2, PO4, pH, T, TU ETR, SVR, DTR ETR model generally yields more accurate WQI predictions (R2test = 0.98) Lam Tsuen River, Tai Po city Asadollah et al. (2021)  
India/SW 2016–2020 WQI/ DO, T, pH, TH, Cl, TC, EC, SO4, Na, PO4, K, BOD, F, NO3 MLP, RF, SVM, NB, DT MLP regressor and MLP classifier outperform other models Bhavani River, Kerala and Tamil Nadu, 10,560 data samples, 31 attributes Nair & Vijaya (2022)  
Ireland/CW 2020 WQM-WQI/t, TON, NH3, NH4, DO, pH, SAL, MRP, BOD, TR, CHL RF, DT, KNN, XGBoost, ExT, SVM, LR, GNB DT, ExT, and GXB could provide accurate and robust results in predicting WQIs Cork Harbour 29 monitoring locations Uddin et al. (2022)  
Mexico/SW 2009–2021 SGR-WQI/Cd, Cr, BOD5, DO, FC, F, FOG, Hg, NH3, NO3, Pb, pH, TSS, S, TDS, T, Zn SLR, MLR, LRR, GAM, LR, LDA GAM was better for WQI, LR for WQC Santiago-Guadalajara River Fernández Del Castillo et al. (2022)  
Iraq/SW WWTP 2015–2019 NPI/BOD, COD, TSS, Cl, pH, SO4, NO3, PO4 ANN, FA A successful ANN model was built based on NPI where the R2 was 0.965 North Rustumiyia WWTP, Diyala River Mohammed & Al-Obaidi (2021)  
Vietnam/SW 2010–2017 T, pH, DO, BOD, COD, TU, TSS, TC, NH4, PO4, turbidity (TUR) AdaBoost, GB, HGBM, LGB, XGBoost, DT, ET, RF, MLP, RBF, DFFNN, CNN All 12 models were good at WQI prediction, XGBoost was best (R2 = 0.989) Four monitoring stations on La Buong River Khoi et al. (2022)  
India/DW 2005–2014 DO, pH, EC, BOD, NO3, FC, TC SVM, RF, LR, DT, CATBoost, XGBoost, MLP CATBoost model was the most accurate classifier with 94.51% 1,679 samples, various Indian states Nasir et al. (2022)  
Malaysia/SW 2012–2016 WQI-JAS/DO, BOD, COD, SS, AN, pH ANN, DT, SVM SVM was the best performing model for predicting WQI Langat River Basin 560 records, 14 monitoring stations Shamsuddin et al. (2022)  
Iran/GW Not specified WQI + FAHP-WQI/EC, SAR, PI, MAR, KR, PS, SO4, Cl, Na, Mg, Ca, HCO3, TDS, pH, FC GEP, M5P, MARS MARS is slightly more accurate than the M5P model for estimating WQI (R = 1 training, R = 0.999 test) 96 deep wells in the Yazd-Ardakan Plain Goodarzi et al. (2023)  
Pakistan/GW 2020–2021 PKWQI/ pH, DO, TDS, ES, SAL, Cl, TH, SO4, NO3 RT, RF, M5P, REPT, BA, CVPS, RFC The RT-ANN algorithm outperformed all other algorithms in terms of accuracy, with the highest RSQ value of 0.951 39 locations COD and BOD were not considered Aslam et al. (2022)  
India/SW 2005–2014 DO, pH, EC, BOD, NO3, FC, TC RF, XGBoost, GB, AdaBoost for WQC. KNN, DT, SVR, MLP for WQI GB gave best results, with a classification accuracy of 99.50%. MLP was best for WQI R2 = 99.8% Lakes and rivers in India Shams et al. (2023)  

aWater types: surface water (SW), groundwater (GW), coastal water (CW), irrigation water (IW), and drinking water (DW).

There is a wide variety of AI tools available to support WQI and WQC prediction, and some of them have been demonstrated to have high accuracy and reliability. Choosing the appropriate AI tool for WQI modelling depends on the specific requirements of the problem, the nature of the dataset, and the desired performance metrics. Ensemble methods like random forest and XGBoost, as well as neural network-based approaches like ANN and LSTM, are particularly effective for complex water quality assessment tasks. Combining these models or using hybrid approaches can further enhance prediction accuracy and robustness (Shams et al. 2023).

It should be mentioned though that the review has revealed a wide variety of datasets and locations, and that not all results may be easily replicated. Although some short-term studies may be justified by the lack of long-term monitoring data, the methods should be tested before application on different datasets.

AI and ML tools offer advantages such as the ability to process vast amounts of information and handle complex, non-linear relationships in water quality data, deal with partial or missing data, and provide real-time or near-real-time predictions, which can be crucial for water quality monitoring and management.

To make use of these recently developed tools, water managers may develop gradual approaches to reduce costs and improve the efficiency of monitoring, including:

  • - Analysis of historical datasets to identify critical parameters, pollution hotspots, and water quality trends;

  • - Integration of the spatial dimension (GIS) and real-time monitoring systems (sensors);

  • - Set alerts for key parameters that indicate when threshold values are exceeded for quick intervention;

  • - Develop new models for data integration and/or WQC based on individual parameters' inputs.

Some AI tools that can be applied include linear and logistic regression, decision trees, random forests, support vector machines, neural networks, etc.

Open access to water quality monitoring data has allowed data scientists to make fast and significant progress in developing new tools for extracting relevant information from the data, which can further support authorities to identify pollution sources, design remediation measures, and assess their efficiency in improving water quality.

In general, a model trained on data specific to a certain region is only able to produce accurate results for that region. If more monitoring data would become available, further research could explore how models developed in one part of the world perform in other regions and even contribute to a global model. Combinations of traditional monitoring and real-time data from in-situ probes and remote sensing could lead to quick identification of pollution sources and allow early interventions for remediation. In addition, climate change considerations could improve the accuracy of the models.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Abba
S. I.
,
Pham
Q. B.
,
Saini
G.
,
Linh
N. T. T.
,
Ahmed
A. N.
,
Mohajane
M.
,
Khaledian
M.
,
Abdulkadir
R. A.
&
Bach
Q.-V.
(
2020
)
Implementation of data intelligence models coupled with ensemble machine learning for prediction of water quality index
,
Environ Sci Pollut Res
,
27
,
41524
41539
.
https://doi.org/10.1007/s11356-020-09689-x
.
Abbas
F.
,
Cai
Z.
,
Shoaib
M.
,
Iqbal
J.
,
Ismail
M.
,
Arifullah
,
Alrefaei
A. F.
&
Albeshr
M. F.
(
2024
)
Machine Learning Models for Water Quality Prediction: A Comprehensive Analysis and Uncertainty Assessment in Mirpurkhas, Sindh, Pakistan
,
Water
,
16
,
941
.
https://doi.org/10.3390/w16070941
.
Abu El-Magd
S. A.
,
Ismael
I. S.
,
El-Sabri
M. A. S.
,
Abdo
M. S.
&
Farhat
H. I.
(
2023
)
Integrated machine learning–based model and WQI for groundwater quality assessment: ML, geospatial, and hydro-index approaches
,
Environ Sci Pollut Res
,
30
,
53862
53875
.
https://doi.org/10.1007/s11356-023-25938-1
.
Ahmed
U.
,
Mumtaz
R.
,
Anwar
H.
,
Shah
A. A.
,
Irfan
R.
&
García-Nieto
J.
(
2019
)
Efficient Water Quality Prediction Using Supervised Machine Learning
,
Water
,
11
,
2210
.
https://doi.org/10.3390/w11112210
.
Ahmed
M.
,
Mumtaz
R.
&
Hassan Zaidi
S. M.
(
2021
)
Analysis of water quality indices and machine learning techniques for rating water pollution: a case study of Rawal Dam, Pakistan
,
Water Supply
,
21
,
3225
3250
.
https://doi.org/10.2166/ws.2021.082
.
Asadollah
S. B. H. S.
,
Sharafati
A.
,
Motta
D.
&
Yaseen
Z. M.
(
2021
)
River water quality index prediction and uncertainty analysis: A comparative study of machine learning models
,
Journal of Environmental Chemical Engineering
,
9
,
104599
.
https://doi.org/10.1016/j.jece.2020.104599
.
Aslam
B.
,
Maqsoom
A.
,
Cheema
A. H.
,
Ullah
F.
,
Alharbi
A.
&
Imran
M.
(
2022
)
Water Quality Management Using Hybrid Machine Learning and Data Mining Algorithms: An Indexing Approach
,
IEEE Access
,
10
,
119692
119705
.
https://doi.org/10.1109/ACCESS.2022.3221430
.
Chawishborwornworng
C.
,
Luanwuthi
S.
,
Umpuch
C.
&
Puchongkawarin
C.
(
2024
)
Bootstrap approach for quantifying the uncertainty in modeling of the water quality index using principal component analysis and artificial intelligence
,
Journal of the Saudi Society of Agricultural Sciences
,
23
,
17
33
.
https://doi.org/10.1016/j.jssas.2023.08.004
.
Chia
M. Y.
,
Koo
C. H.
,
Huang
Y. F.
,
Di Chan
W.
&
Pang
J. Y.
(
2023
)
Artificial Intelligence Generated Synthetic Datasets as the Remedy for Data Scarcity in Water Quality Index Estimation
,
Water Resour Manage
,
37
,
6183
6198
.
https://doi.org/10.1007/s11269-023-03650-6
.
Derdour
A.
,
Jodar-Abellan
A.
,
Pardo
,
Ghoneim
S. S. M.
&
Hussein
E. E.
(
2022
)
Designing Efficient and Sustainable Predictions of Water Quality Indexes at the Regional Scale Using Machine Learning Algorithms
,
Water
,
14
,
2801
.
https://doi.org/10.3390/w14182801
.
Ding
F.
,
Zhang
W.
,
Cao
S.
,
Hao
S.
,
Chen
L.
,
Xie
X.
,
Li
W.
&
Jiang
M.
(
2023
)
Optimization of water quality index models using machine learning approaches
,
Water Research
,
243
,
120337
.
https://doi.org/10.1016/j.watres.2023.120337
.
Ejaz
U.
,
Khan
S. M.
,
Jehangir
S.
,
Ahmad
Z.
,
Abdullah
A.
,
Iqbal
M.
,
Khalid
N.
,
Nazir
A.
&
Svenning
J.-C.
(
2024
)
Monitoring the Industrial waste polluted stream - Integrated analytics and machine learning for water quality index assessment
,
Journal of Cleaner Production
,
450
,
141877
.
https://doi.org/10.1016/j.jclepro.2024.141877
.
El Bilali
A.
,
Taleb
A.
&
Brouziyne
Y.
(
2021
)
Groundwater quality forecasting using machine learning algorithms for irrigation purposes
,
Agricultural Water Management
,
245
,
106625
.
https://doi.org/10.1016/j.agwat.2020.106625
.
Fernández Del Castillo
A.
,
Yebra-Montes
C.
,
Verduzco Garibay
M.
,
De Anda
J.
,
Garcia-Gonzalez
A.
&
Gradilla-Hernández
M. S.
(
2022
)
Simple Prediction of an Ecosystem-Specific Water Quality Index and the Water Quality Classification of a Highly Polluted River through Supervised Machine Learning
,
Water
,
14
,
1235
.
https://doi.org/10.3390/w14081235
.
Goodarzi
M. R.
,
Niknam
A. R.
,
Barzkar
A.
,
Niazkar
M.
,
Zare Mehrjerdi
Y.
,
Abedi
M. J.
&
Heydari Pour
M.
(
2023
)
Water Quality Index Estimations Using Machine Learning Algorithms: A Case Study of Yazd-Ardakan Plain, Iran
,
Water
,
15
,
1876
.
https://doi.org/10.3390/w15101876
.
Gupta
P.
,
Samui
P.
&
Quaff
A. R.
(
2023
)
Estimation of Water Quality Index using modern-day machine learning algorithms
.
https://doi.org/10.21203/rs.3.rs-3305153/v1
.
Hafeez
S.
,
Wong
M.
,
Ho
H.
,
Nazeer
M.
,
Nichol
J.
,
Abbas
S.
,
Tang
D.
,
Lee
K.
&
Pun
L.
(
2019
)
Comparison of Machine Learning Algorithms for Retrieval of Water Quality Indicators in Case-II Waters: A Case Study of Hong Kong
,
Remote Sensing
,
11
,
617
.
https://doi.org/10.3390/rs11060617
.
Haghiabi
A. H.
,
Nasrolahi
A. H.
&
Parsaie
A.
(
2018
)
Water quality prediction using machine learning methods
,
Water Quality Research Journal
,
53
,
3
13
.
https://doi.org/10.2166/wqrj.2018.025
.
Hameed
M.
,
Sharqi
S. S.
,
Yaseen
Z. M.
,
Afan
H. A.
,
Hussain
A.
&
Elshafie
A.
(
2017
)
Application of artificial intelligence (AI) techniques in water quality index prediction: a case study in tropical region, Malaysia
,
Neural Comput & Applic
,
28
,
893
905
.
https://doi.org/10.1007/s00521-016-2404-7
.
Hassan
M. M.
,
Hassan
M. M.
,
Akter
L.
,
Rahman
M. M.
,
Zaman
S.
,
Hasib
K. M.
,
Jahan
N.
,
Smrity
R. N.
,
Farhana
J.
,
Raihan
M.
&
Mollick
S.
(
2021
)
Efficient Prediction of Water Quality Index (WQI) Using Machine Learning Algorithms
,
Human-Centric Intelligent Systems
,
1
,
86
.
https://doi.org/10.2991/hcis.k.211203.001
.
Hmoud Al-Adhaileh
M.
&
Waselallah Alsaade
F.
(
2021
)
Modelling and Prediction of Water Quality by Using Artificial Intelligence
,
Sustainability
,
13
,
4259
.
https://doi.org/10.3390/su13084259
.
Hussein
E. E.
,
Jat Baloch
M. Y.
,
Nigar
A.
,
Abualkhair
H. F.
,
Aldawood
F. K.
&
Tageldin
E.
(
2023
)
Machine Learning Algorithms for Predicting the Water Quality Index
,
Water
,
15
,
3540
.
https://doi.org/10.3390/w15203540
.
Ibrahim
H.
,
Yaseen
Z. M.
,
Scholz
M.
,
Ali
M.
,
Gad
M.
,
Elsayed
S.
,
Khadr
M.
,
Hussein
H.
,
Ibrahim
H. H.
,
Eid
M. H.
,
Kovács
A.
,
Péter
S.
&
Khalifa
M. M.
(
2023
)
Evaluation and Prediction of Groundwater Quality for Irrigation Using an Integrated Water Quality Indices, Machine Learning Models and GIS Approaches: A Representative Case Study
,
Water
,
15
,
694
.
https://doi.org/10.3390/w15040694
.
Ilić
M.
,
Srdjević
Z.
&
Srdjević
B.
(
2022
)
Water quality prediction based on Naïve Bayes algorithm
,
Water Science and Technology
,
85
,
1027
1039
.
https://doi.org/10.2166/wst.2022.006
.
Jha
M. K.
,
Shekhar
A.
&
Jenifer
M. A.
(
2020
)
Assessing groundwater quality for drinking water supply using hybrid fuzzy-GIS-based water quality index
,
Water Research
,
179
,
115867
.
https://doi.org/10.1016/j.watres.2020.115867
.
Kadam
A. K.
,
Wagh
V. M.
,
Muley
A. A.
,
Umrikar
B. N.
&
Sankhua
R. N.
(
2019
)
Prediction of water quality index using artificial neural network and multiple linear regression modelling approach in Shivganga River basin, India
,
Model. Earth Syst. Environ.
,
5
,
951
962
.
https://doi.org/10.1007/s40808-019-00581-3
.
Khoi
D. N.
,
Quan
N. T.
,
Linh
D. Q.
,
Nhi
P. T. T.
&
Thuy
N. T. D.
(
2022
)
Using Machine Learning Models for Predicting the Water Quality Index in the La Buong River, Vietnam
,
Water
,
14
,
1552
.
https://doi.org/10.3390/w14101552
.
Kim
H. I.
,
Kim
D.
,
Mahdian
M.
,
Salamattalab
M. M.
,
Bateni
S. M.
&
Noori
R.
(
2024
)
Incorporation of water quality index models with machine learning-based techniques for real-time assessment of aquatic ecosystems
,
Environmental Pollution
,
355
,
124242
.
https://doi.org/10.1016/j.envpol.2024.124242
.
Kouadri
S.
,
Elbeltagi
A.
,
Islam
A. R. M. T.
&
Kateb
S.
(
2021
)
Performance of machine learning methods in predicting water quality index based on irregular data set: application on Illizi region (Algerian southeast)
,
Appl Water Sci
,
11
,
190
.
https://doi.org/10.1007/s13201-021-01528-9
.
Lap
B. Q.
,
Phan
T.-T.-H.
,
Nguyen
H. D.
,
Quang
L. X.
,
Hang
P. T.
,
Phi
N. Q.
,
Hoang
V. T.
,
Linh
P. G.
&
Hang
B. T. T.
(
2023
)
Predicting Water Quality Index (WQI) by feature selection and machine learning: A case study of An Kim Hai irrigation system
,
Ecological Informatics
,
74
,
101991
.
https://doi.org/10.1016/j.ecoinf.2023.101991
.
Masood
A.
,
Niazkar
M.
,
Zakwan
M.
&
Piraei
R.
(
2023
)
A Machine Learning-Based Framework for Water Quality Index Estimation in the Southern Bug River
,
Water
,
15
,
3543
.
https://doi.org/10.3390/w15203543
.
Mohammed
R.
&
Al-Obaidi
B.
(
2021
)
Treatability influence of municipal sewage effluent on surface water quality assessment based on Nemerow pollution index using an artificial neural network
,
IOP Conf. Ser.: Earth Environ. Sci.
,
877
,
012008
.
https://doi.org/10.1088/1755-1315/877/1/012008
.
Mohseni
U.
,
Pande
C. B.
,
Chandra Pal
S.
&
Alshehri
F.
(
2024
)
Prediction of weighted arithmetic water quality index for urban water quality using ensemble machine learning model
,
Chemosphere
,
352
,
141393
.
https://doi.org/10.1016/j.chemosphere.2024.141393
.
Mokhtar
A.
,
Elbeltagi
A.
,
Gyasi-Agyei
Y.
,
Al-Ansari
N.
&
Abdel-Fattah
M. K.
(
2022
)
Prediction of irrigation water quality indices based on machine learning and regression models
,
Appl Water Sci
,
12
,
76
.
https://doi.org/10.1007/s13201-022-01590-x
.
Nadiri
A. A.
,
Barzegar
R.
,
Sadeghfam
S.
&
Rostami
A. A.
(
2022
)
Developing a Data-Fused Water Quality Index Based on Artificial Intelligence Models to Mitigate Conflicts between GQI and GWQI
,
Water
,
14
,
3185
.
https://doi.org/10.3390/w14193185
.
Nair
J. P.
&
Vijaya
M. S.
(
2022
)
River Water Quality Prediction and index classification using Machine Learning
,
J. Phys.: Conf. Ser.
,
2325
,
012011
.
https://doi.org/10.1088/1742-6596/2325/1/012011
.
Najah Ahmed
A.
,
Binti Othman
F.
,
Abdulmohsin Afan
H.
,
Khaleel Ibrahim
R.
,
Ming Fai
C.
,
Shabbir Hossain
M.
,
Ehteram
M.
&
Elshafie
A.
(
2019
)
Machine learning methods for better water quality prediction
,
Journal of Hydrology
,
578
,
124084
.
https://doi.org/10.1016/j.jhydrol.2019.124084
.
Nallakaruppan
M. K.
,
Gangadevi
E.
,
Shri
M. L.
,
Balusamy
B.
,
Bhattacharya
S.
&
Selvarajan
S.
(
2024
)
Reliable water quality prediction and parametric analysis using explainable AI models
,
Sci Rep
,
14
,
7520
.
https://doi.org/10.1038/s41598-024-56775-y
.
Nasir
N.
,
Kansal
A.
,
Alshaltone
O.
,
Barneih
F.
,
Sameer
M.
,
Shanableh
A.
&
Al-Shamma'a
A.
(
2022
)
Water quality classification using machine learning algorithms
,
Journal of Water Process Engineering
,
48
,
102920
.
https://doi.org/10.1016/j.jwpe.2022.102920
.
Nguyen
D. P.
,
Ha
H. D.
,
Trinh
N. T.
&
Nguyen
M. T.
(
2023
)
Application of artificial intelligence for forecasting surface quality index of irrigation systems in the Red River Delta, Vietnam
,
Environ Syst Res
,
12
,
24
.
https://doi.org/10.1186/s40068-023-00307-6
.
Oliveira
M. D. D.
,
Rezende
O. L. T. D.
,
Fonseca
J. F. R. D.
&
Libânio
M.
(
2019
)
Evaluating the surface Water quality index fuzzy and its influence on water treatment
,
Journal of Water Process Engineering
,
32
,
100890
.
https://doi.org/10.1016/j.jwpe.2019.100890
.
Raheja
H.
,
Goel
A.
&
Pal
M.
(
2022
)
Prediction of groundwater quality indices using machine learning algorithms
,
Water Practice and Technology
,
17
,
336
351
.
https://doi.org/10.2166/wpt.2021.120
.
Rana
R.
,
Kalia
A.
,
Boora
A.
,
Alfaisal
F. M.
,
Alharbi
R. S.
,
Berwal
P.
,
Alam
S.
,
Khan
M. A.
&
Qamar
O.
(
2023
)
Artificial Intelligence for Surface Water Quality Evaluation, Monitoring and Assessment
,
Water
,
15
,
3919
.
https://doi.org/10.3390/w15223919
.
Sajib
A. M.
,
Diganta
M. T. M.
,
Rahman
A.
,
Dabrowski
T.
,
Olbert
A. I.
&
Uddin
M. G.
(
2023
)
Developing a novel tool for assessing the groundwater incorporating water quality index and machine learning approach
,
Groundwater for Sustainable Development
,
23
,
101049
.
https://doi.org/10.1016/j.gsd.2023.101049
.
Shams
M. Y.
,
Elshewey
A. M.
,
El-kenawy
E.-S. M.
,
Ibrahim
A.
,
Talaat
F. M.
&
Tarek
Z.
(
2023
)
Water quality prediction using machine learning models based on grid search method
,
Multimed Tools Appl
,
83
,
35307
35334
.
https://doi.org/10.1007/s11042-023-16737-4
.
Shamsuddin
I. I. S.
,
Othman
Z.
&
Sani
N. S.
(
2022
)
Water Quality Index Classification Based on Machine Learning: A Case from the Langat River Basin Model
,
Water
,
14
,
2939
.
https://doi.org/10.3390/w14192939
.
Solangi
G. S.
,
Ali
Z.
,
Bilal
M.
,
Junaid
M.
,
Panhwar
S.
,
Keerio
H. A.
,
Sohu
I. H.
,
Shahani
S. G.
&
Zaman
N.
(
2024
)
Machine learning, Water Quality Index, and GIS-based analysis of groundwater quality
,
Water Practice & Technology
,
19
,
384
400
.
https://doi.org/10.2166/wpt.2024.014
.
Suwadi
N. A.
,
Derbali
M.
,
Sani
N. S.
,
Lam
M. C.
,
Arshad
H.
,
Khan
I.
&
Kim
K.-I.
(
2022
)
An Optimized Approach for Predicting Water Quality Features Based on Machine Learning
,
Wireless Communications and Mobile Computing
,
2022
,
1
20
.
https://doi.org/10.1155/2022/3397972
.
Tiwari
S.
,
Babbar
R.
&
Kaur
G.
(
2018
)
Performance Evaluation of Two ANFIS Models for Predicting Water Quality Index of River Satluj (India)
,
Advances in Civil Engineering
,
2018
,
1
10
.
https://doi.org/10.1155/2018/8971079
.
Uddin
M. G.
,
Nash
S.
,
Mahammad Diganta
M. T.
,
Rahman
A.
&
Olbert
A. I.
(
2022
)
Robust machine learning algorithms for predicting coastal water quality index
,
Journal of Environmental Management
,
321
,
115923
.
https://doi.org/10.1016/j.jenvman.2022.115923
.
Uddin
M. G.
,
Diganta
M. T. M.
,
Sajib
A. M.
,
Rahman
A.
,
Nash
S.
,
Dabrowski
T.
,
Ahmadian
R.
,
Hartnett
M.
&
Olbert
A. I.
(
2023a
)
Assessing the impact of COVID-19 lockdown on surface water quality in Ireland using advanced Irish water quality index (IEWQI) model
,
Environmental Pollution
,
336
,
122456
.
https://doi.org/10.1016/j.envpol.2023.122456
.
Uddin
M. G.
,
Nash
S.
,
Rahman
A.
&
Olbert
A. I.
(
2023b
)
Performance analysis of the water quality index model for predicting water state using machine learning techniques
,
Process Safety and Environmental Protection
,
169
,
808
828
.
https://doi.org/10.1016/j.psep.2022.11.073
.
Uddin
M. G.
,
Nash
S.
,
Rahman
A.
&
Olbert
A. I.
(
2023c
)
A novel approach for estimating and predicting uncertainty in water quality index model using machine learning approaches
,
Water Research
,
229
,
119422
.
https://doi.org/10.1016/j.watres.2022.119422
.
Uddin
M. G.
,
Rahman
A.
,
Rosa Taghikhah
F.
&
Olbert
A. I.
(
2024
)
Data-driven evolution of water quality models: An in-depth investigation of innovative outlier detection approaches-A case study of Irish Water Quality Index (IEWQI) model
,
Water Research
,
255
,
121499
.
https://doi.org/10.1016/j.watres.2024.121499
.
Zamri
N.
,
Pairan
M. A.
,
Azman
W. N. A. W.
,
Abas
S. S.
,
Abdullah
L.
,
Naim
S.
,
Tarmudi
Z.
&
Gao
M.
(
2022
)
River quality classification using different distances in k-nearest neighbors algorithm
,
Procedia Computer Science
,
204
,
180
186
.
https://doi.org/10.1016/j.procs.2022.08.022
.
Zhao
X.
,
Liu
X.
,
Xing
Y.
,
Wang
L.
&
Wang
Y.
(
2022
)
Evaluation of water quality using a Takagi-Sugeno fuzzy neural network and determination of heavy metal pollution index in a typical site upstream of the Yellow River
,
Environmental Research
,
211
,
113058
.
https://doi.org/10.1016/j.envres.2022.113058
.
Zheng
H.
,
Hou
S.
,
Liu
J.
,
Xiong
Y.
&
Wang
Y.
(
2024
)
Advanced Machine Learning and Water Quality Index (WQI) Assessment: Evaluating Groundwater Quality at the Yopurga Landfill
,
Water
,
16
,
1666
.
https://doi.org/10.3390/w16121666
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).