Rapid urbanization and industrialization have significantly contributed to the pollution and degradation of water bodies through urban runoff, industrial discharge, and inadequate wastewater treatment infrastructure. This study explores the role of machine learning (ML) as a decision-support tool for addressing water quality challenges across diverse environments, including surface water, groundwater, seawater, and wastewater. We evaluated the performance of seven ML models for predicting water quality parameters using data from six years (2014–2019) of the Melbourne Eastern wastewater treatment plant, encompassing energy consumption, climate variables, and wastewater characteristics. Among the models tested, the gradient boosting model demonstrated the highest predictive accuracy, achieving a statistical measure coefficient of determination R2 score of 0.75. Key findings highlight the importance of integrating climate and water quality data to improve prediction accuracy and identify critical water parameters for enhancing future models. This review provides insights into the applicability of ML techniques in water quality management and identifies potential avenues for further research in predictive modeling.

  • Multi-environment analysis: Unique review of ML applications across surface water, groundwater, seawater, and wastewater for comprehensive water quality insights.

  • Longitudinal evaluation: Utilizes a 6-year dataset for robust, long-term for artificial intelligence model assessment.

  • Key parameter insights: Highlights critical water parameters for enhanced model predictions.

WWTP

wastewater treatment plant

RE

nutrient removal efficiency

ML

machine learning

AI

artificial intelligence

BOD

biochemical oxygen demand

COD

chemical oxygen demand

DO

dissolved oxygen

Lat

sampling location latitude

Lng

sampling location longitude

Y

year

M

month

D

day

H

hour

SAD

site actual depth

SS

sea state, degree of turbulence at sea

WS

wind speed

WDO

Winkler method dissolved oxygen

C

sample temperature

SAL

salinity

TC

total coliform

SD

water light penetration

ACA

active chlorophyll-A

PCB

polychlorinated biphenyl plate count

NN

nitrate/nitrite

OP

ortho-phosphorus

AMN

ammonium

TP

total phosphorus

PH

potential of hydrogen

TSS

total suspended solid

CDT

conductivity

SD

sample depth

WD

water density

TSP

transparency of water

NH3-N

ammonia-nitrogen

TOC

total organic carbon

DTP

dissolved total phosphorus

PO4-P

phosphate

TN

total nitrogen

DTN

dissolved total nitrogen

NO3-N

nitrate nitrogen

EC

electroconductivity

DMF

deep matrix factorization

DNN

deep neural network

GB

gradient boosting

RF

random forest

WQI

water quality index

ANFIS

adaptive neuro-fuzzy inference system

CNN

convolutional neural networks

ANN

artificial neural network

LSTM

long short-term memory

IA-LSTM

input-attention LSTM

FPN

feature pyramid network

PSPNet

pyramid scene parsing network

ResNet

residual networks

PSO

particle swarm optimization

NBC

naive bayes classifier

SVM

support vector machine

GCWs

groundwater circulation wells

MLR

multiple linear regression

NGR

natural gamma ray

TH

total hardness

HCO3

bicarbonate

Cl

chloride

SO42−

sulfate

nitrate

Ca2+

calcium

Mg2+

magnesium

F

iron

Na+

sodium

K+

potassium

KV

vertical hydraulic conductivity

KH

horizontal conductivity

M

aquifer thickness

Μ

specific yield

KH/KV

vertical heterogeneity

N

porosity

I

hydraulic gradient

Pr

ratio of particle recovery

R

radius of influence

TWI

topographic wetness index

TPI

topographic position index

FA

factor analysis

ORP

oxidation-reduction potential

SWRO

seawater reverse osmosis

PET

polyethylene terephthalate

PS

polystyrene

PP

polypropylene

PE

polyethylene

PA

polyamide

PUR

polyurethane

PC

polycarbonate

PVC

polyvinyl chloride

PMMA

poly (methyl methacrylate)

CA

cellulose acetate

Tp

permeate temperature

T

time

pf

pressure

Qf

permeate flow

Condf

feed conductivity

Condp

permeate conductivity

Tb

brine temperature

R

flow recovery

BB

boronpermeability coefficient

LMT

logistic model tree

CMOS

complementary metal-oxide-semiconductor

BiGRU

bidirectional gated recurrent unit

IoU

intersection over union

AAO

anaerobic–anoxic–oxic

Qr

internal recirculation flow rate

Qsr

sludge internal recycle flow rate

YOLO5

You Only Look Once

VGG

intense convolutional networks

GRUs

gated recurrent neural network units

AP

atmospheric pressure

WSavg

average wind speed

Pr

total rainfall or snow melt

VIS

average visibility

WSmax

maximum wind speed

Power1h

power consumption per hour

F1run

fan one running status

F2run

fan two running status

MLSS

mixed liquor suspended solids

TColi

total coliform

FColi

fecal coliform

P

precipitation

Chl-a

chlorophyll-a

Dd

drainage density

RH

humidity

Water is essential for sustaining life, ecosystems, and economic activities (Silva 2023). Water quality is critical for sustainable development, public health, and environmental balance. However, predicting and managing water quality remains challenging, particularly across diverse aquatic environments such as rivers, lakes, reservoirs, coastal waters, and groundwater systems (Oyedotun & Ally 2021). These environments are influenced by a range of natural and anthropogenic factors, which include industrial discharge, agricultural runoff, urbanization, and climate variability (Datta et al. 2021; Urban Wastewater Scenario in India 2022). In this context, conventional water quality monitoring techniques that depend on field sampling and laboratory analysis are indispensable but limited. Their spatial-temporal coverage is often time-consuming, expensive, and constrained (Korostynska et al. 2013). Consequently, these methodologies face challenges in delivering timely and comprehensive data analysis, which is vital for efficient water resource management.

Machine learning (ML) is one of the promising solutions to these challenges (Sheng et al. 2020). ML techniques excel at analyzing large, complex, diverse datasets and uncovering patterns to generate accurate predictions. ML models provide near real-time insights into water quality dynamics by integrating source data such as remote sensing, sensor networks, and historical records (Sagan et al. 2020; Mucheye et al. 2022). These capabilities complement traditional monitoring techniques, enhancing predictive accuracy and scalability across multiple aquatic environments. This paper provides an overview of the potential of ML in addressing water quality challenges, focusing on heterogeneous water systems. The key areas of investigation include:

  • 1. Complexity of diverse water environments: Examining how varying hydrological, geological, and ecological conditions influence water quality dynamics.

  • 2. The efficiency of ML techniques: Evaluating the performance of ML and ensemble models on predicting water quality parameters.

  • 3. Data integration: Leveraging spatial, temporal, and environmental datasets to improve model accuracy and address data limitations.

  • 4. Real-world applications: Demonstrating the role of ML-driven insights in supporting decision-making, policy formulation, and sustainable water resource management.

This paper attempts to bridge the gap between data-driven technologies and water quality management. The study demonstrates how ML can revolutionize monitoring practices by facilitating proactive and informed interventions to protect water resources in the face of increasing environmental challenges.

Literature search process

The comprehensive literature search process utilized in this study is outlined in Figure 1. The literature search utilized three significant databases: Scopus, Web of Science, and Google Scholar. A variety of search terms, including ‘water treatment,’ ‘machine learning,’ ‘deep learning,’ and ‘artificial intelligence,’ are used to ensure accurate and comprehensive results. For Google Scholar, search queries combined terms such as ‘water treatment,’ ‘machine learning,’ or ‘deep learning,’ with quotation marks ensuring exact phrase matches. Water treatment terms, such as ‘dissolved oxygen,’ ‘pH,’ and ‘chemical oxygen demand,’ are employed to achieve more specific and relevant search results. Publication year filters focus on the most recent and relevant research. Sorting results by relevance further prioritized articles aligned with the study's objectives. The search yielded 26 articles from Scopus, 17 from Web of Science, and seven from Google Scholar.
Figure 1

Comprehensive literature search flowchart.

Figure 1

Comprehensive literature search flowchart.

Close modal

A four-step process is employed to filter the literature systematically:

  • i. Identification: Initially, articles are identified by selecting subject areas such as ‘Engineering’ and ‘Computer Science.’

  • ii. Screening: Exclusion criteria are applied to remove systematic reviews, meta-analyses, and conference proceedings. The articles published only in a single language (English) during the previous 5 years are included.

  • iii. Eligibility: Titles and abstracts are carefully assessed to select articles relevant to the research scope.

  • iv. Inclusion: Full-text articles deemed relevant and available are included in the final dataset. Key details, including input parameters, methodologies, datasets, and target parameters, are analyzed and summarized.

Publications network analysis from Scopus and Web of Science Database

Network analysis is applied to examine the relationships between keywords and themes within the selected publications. Tools such as VOSviewer are utilized to generate co-occurrence networks based on data retrieved from 2020 to 2024. Keywords included ‘water treatment,’ ‘machine learning,’ ‘deep learning,’ and ‘artificial intelligence.’

Co-occurrence network of author keywords in Scopus Database

Figure 2 depicts the co-occurrence network of author keywords from Scopus. This analysis highlights the relationships and grouped concepts within the research domain. The size of the circles corresponds to the frequency of keyword usage, while links between circles indicate the strength of their association. A total of 1,117 keywords are identified, with 76 occurring at least thrice.
Figure 2

Co-occurrence network of author keywords in Scopus Database.

Figure 2

Co-occurrence network of author keywords in Scopus Database.

Close modal

Co-occurrence network of all keywords in Web of Science Database

Figure 3 presents the keyword co-occurrence network analysis from the Web of Science database. Extract 173 keywords from the analyzed documents, of which 13 occurred more than three times. The network obtained four clusters and 34 links, representing the relationship between various topics on water treatment. Thus, this cluster analysis points to the central place of ‘machine learning’ and ‘artificial intelligence.’ All these points relate to the prediction of water quality and their strong connections to domains such as wastewater treatment and drinking water quality.
Figure 3

Co-occurrence network of authors keywords in the Web of Science Database.

Figure 3

Co-occurrence network of authors keywords in the Web of Science Database.

Close modal

Figures 2 and 3 collectively represent the importance of ML and artificial intelligence (AI) in enhancing water quality forecasting and management. Both are directly associated with main streams, which include predictive modeling, wastewater treatment, and environmental monitoring, emphasizing significant trends and directions for further studies.

Literature review

Figure 4 depicts the ML applications addressing critical water treatment and monitoring challenges in various aquatic environments. These applications include prediction, real-time monitoring, tracking contamination sources, estimating contaminant concentrations, allocating water resources, and optimizing water treatment technologies. The following subsection reviews the application of recently applied ML algorithms to treat and manage water quality parameters in different environments: seawater, surface water, wastewater, and groundwater. The acronyms used in this paper are listed in the supplementary information section at the end.
Figure 4

ML applications on water environments.

Figure 4

ML applications on water environments.

Close modal

Surface water treatment using ML

Surface water refers to water found in streams, rivers, and lakes and represents a crucial freshwater resource. However, its quality is increasingly compromised by sewage discharge, industrial effluents, and agricultural runoff. These pollutants pose significant challenges to maintaining surface water quality, particularly in urban and rural areas (Oyedotun & Ally 2021). Geological and anthropogenic factors majorly impact global riverine zones, influencing the natural environment (Hoang et al. 2020; Islam et al. 2020). In the last 10 years, attention has increased toward water quality degradation in different regions of the world (Setia et al. 2020; Uddin et al. 2021). Table 1 highlights the various algorithms utilized in surface water research for predicting water quality parameters. Measuring water quality and recording parameters such as pH, dissolved oxygen (DO), and temperature is relatively straightforward and can often be done quickly using handheld detectors (Zhi et al. 2021). However, more complex analyses, such as detecting pesticides, nutrients, metals, or bacterial/algae concentrations require more resources, laboratory facilities, and time-intensive efforts (Ooi et al. 2022). Traditional statistical methods are generally ineffective in addressing temporal inefficiencies associated with biological oxygen demand (BOD) prediction. They are typically limited to sparse and incomplete datasets, which may arise due to sampling errors or facility constraints. Ma et al. (2020) resolved this issue by integrating deep matrix factorization (DMF) and deep neural network (DNN) data-driven models, significantly improving the accuracy of BOD prediction. Algorithms and datasets used in water quality models play a crucial role in determining the performance of prediction systems. For example, Chen et al. (2020) presented highly specified input features by comparing ten learning models, including seven traditional ML models and three ensemble models. Mohd Zebaral Hoque et al. (2022) applied the water quality index (WQI) to measure surface water quality. Sidek et al. (2024) improved WQI monitoring by integrating random forest (RF) and gradient boosting (GB) models, offering more reliable and accurate ML predictions with minimal effort.

Table 1

ML applications on different water environments

Water environmentTaskDatasetAlgorithmIndependent variablesTarget variables
Surface water BOD Prediction (Ma et al. 202032,323 samples from New York City Open Data up to 2018 DMF and DNN Lat, Lng, Y, M, D, H, SAD, Sea state, SS, WS, WDO, C, SAL, TC, SD, ACA, PCB, NN, OP, AMN, TP, PH, TSS, CDT, SD, WD, and TSP BOD5 Prediction 
Water quality prediction (Chen et al. 20202012–2018 data from the CNEMC stations Three ensembles and seven traditional ML models DO, NH3-N, COD, and pH COD, NH3-N, and DO prediction 
WQI prediction (Sidek et al. 20241,637 samples from 2008 to 2018 in Johor River, Malaysia RF and GB BOD, COD, and the DO (%) WQI prediction 
Algal bloom prediction (Ly et al. 20212011–2020 samples in Han River, China ANFIS NH3-N, NO3-N, COD, TOC, BOD, TSS, TP, PO4-P, DTP, TN, DTN, Chl-a P, Q, °C, DO, EC, pH, FColi and TColi Chlorophyll-a prediction 
Ground water WQI (Agrawal et al. 2021One-year samples from hand pumps and bore wells in the Pindarwan tank NBC, PSO, and SVM pH, total dissolved solids (TDS), EC, TH, alkalinity, HCO3, Cl, , , Ca2+, Mg2+, F, Na+ and K+ Predict water quality index 
Groundwater optimal design circulation wells (GCWs) (Fang et al. 20243,000 samples of media types, aquifer thicknesses, and hydrogeological parameters in Xi'an City, Shaanxi MLR, ANN, and SVM KV, KH, M, μ, KH/KV, n, and I Ratio of Pr and R 
Groundwater potential zones (Sarkar et al. (2024)  200 samples of groundwater and metrological data from Bangladesh Water Development (BWD) LMT, ANN, and LR Soil types, Dd, curvature, rainfall, slope, °C, geology, RH, roughness, lineament density, land use and land cover, geomorphology, TWI, and TPI Groundwater potential zones prediction 
Predicting groundwater availability (Hussein et al. 2020174 monthly groundwater satellite images from 2002–2019 MLP, extreme GB, MLR, RF, and SVM Local and global spatiotemporal features Groundwater prediction 
Seawater Trapping and identifying small-sized microplastics (Gong et al. 2023Dataset using microfluidics and Raman spectroscopy devices SVM, RF, CNN, and ResNet34 Eleven plastic types included PET, PS, PP, PE, PA, Nylon, polyester, PUR, PC, PVC, PMMA, and CA Identify the type of plastic 
Forecasting boron coefficient values (Ajali-Hernández et al. 2024Data from 18 coastal wells in Spain with depths of 50 m and 100 m Ensemble-based ML Tp, t, pf, Qf, Condf, Condp, Tb, and R Forecast BB. 
Classify saline particles (Alshehri et al. 2021The saline solution is observed using a CMOS camera CNN with transfer learning A Raspberry Pi device captured images scattered across various salt salinity concentrations Classification for 10 salt concentration particles 
Marin fauna detection (Colefax et al. 2023Drone-based fauna Red Green Blue (RGB) images from 2021–2022 RetinaNet single-shot detector with ResNet-50 Dolphin groups video data from nine flight captures Detecting submerged fauna 
Wastewater Energy consumption prediction (Harrou et al. 2023; Silva 20232014–2019 energy consumption, climate, and wastewater data from Melbourne's eastern WWTP LSTM and BiGRU TN, COD, BOD, °C, AP, WSavg, Pr, VIS, WSmax, year, month, and day Energy consumption prediction 
Predict water quality parameters (Wei et al. 20233,912 h of data from February to June 2022 at a WWTP in southern China LSTM with MLP network COD, pH, NH3-N Power1h, °C, instantaneous flow rate at the effluent outflow, DO, F1run, and F2run COD, pH, NH3-N prediction 
Real-time control of AAO (Liu et al. 2023b2019–2021 data from WWTP LSTM model predictive control DO, ORP, NH4+-N, , MLSS, Qin, COD, NH4+-N, TN, TP, pH, SS, °C Qr, and Qsr Aeration volumes, Qr, and Qsr controlling 
Blockage detection (Patil et al. 202314,765 sewer blockage frames ‘S-BIRD’ dataset Transfer learning and YOLOv5 Blockage images Blockage detection 
Water environmentTaskDatasetAlgorithmIndependent variablesTarget variables
Surface water BOD Prediction (Ma et al. 202032,323 samples from New York City Open Data up to 2018 DMF and DNN Lat, Lng, Y, M, D, H, SAD, Sea state, SS, WS, WDO, C, SAL, TC, SD, ACA, PCB, NN, OP, AMN, TP, PH, TSS, CDT, SD, WD, and TSP BOD5 Prediction 
Water quality prediction (Chen et al. 20202012–2018 data from the CNEMC stations Three ensembles and seven traditional ML models DO, NH3-N, COD, and pH COD, NH3-N, and DO prediction 
WQI prediction (Sidek et al. 20241,637 samples from 2008 to 2018 in Johor River, Malaysia RF and GB BOD, COD, and the DO (%) WQI prediction 
Algal bloom prediction (Ly et al. 20212011–2020 samples in Han River, China ANFIS NH3-N, NO3-N, COD, TOC, BOD, TSS, TP, PO4-P, DTP, TN, DTN, Chl-a P, Q, °C, DO, EC, pH, FColi and TColi Chlorophyll-a prediction 
Ground water WQI (Agrawal et al. 2021One-year samples from hand pumps and bore wells in the Pindarwan tank NBC, PSO, and SVM pH, total dissolved solids (TDS), EC, TH, alkalinity, HCO3, Cl, , , Ca2+, Mg2+, F, Na+ and K+ Predict water quality index 
Groundwater optimal design circulation wells (GCWs) (Fang et al. 20243,000 samples of media types, aquifer thicknesses, and hydrogeological parameters in Xi'an City, Shaanxi MLR, ANN, and SVM KV, KH, M, μ, KH/KV, n, and I Ratio of Pr and R 
Groundwater potential zones (Sarkar et al. (2024)  200 samples of groundwater and metrological data from Bangladesh Water Development (BWD) LMT, ANN, and LR Soil types, Dd, curvature, rainfall, slope, °C, geology, RH, roughness, lineament density, land use and land cover, geomorphology, TWI, and TPI Groundwater potential zones prediction 
Predicting groundwater availability (Hussein et al. 2020174 monthly groundwater satellite images from 2002–2019 MLP, extreme GB, MLR, RF, and SVM Local and global spatiotemporal features Groundwater prediction 
Seawater Trapping and identifying small-sized microplastics (Gong et al. 2023Dataset using microfluidics and Raman spectroscopy devices SVM, RF, CNN, and ResNet34 Eleven plastic types included PET, PS, PP, PE, PA, Nylon, polyester, PUR, PC, PVC, PMMA, and CA Identify the type of plastic 
Forecasting boron coefficient values (Ajali-Hernández et al. 2024Data from 18 coastal wells in Spain with depths of 50 m and 100 m Ensemble-based ML Tp, t, pf, Qf, Condf, Condp, Tb, and R Forecast BB. 
Classify saline particles (Alshehri et al. 2021The saline solution is observed using a CMOS camera CNN with transfer learning A Raspberry Pi device captured images scattered across various salt salinity concentrations Classification for 10 salt concentration particles 
Marin fauna detection (Colefax et al. 2023Drone-based fauna Red Green Blue (RGB) images from 2021–2022 RetinaNet single-shot detector with ResNet-50 Dolphin groups video data from nine flight captures Detecting submerged fauna 
Wastewater Energy consumption prediction (Harrou et al. 2023; Silva 20232014–2019 energy consumption, climate, and wastewater data from Melbourne's eastern WWTP LSTM and BiGRU TN, COD, BOD, °C, AP, WSavg, Pr, VIS, WSmax, year, month, and day Energy consumption prediction 
Predict water quality parameters (Wei et al. 20233,912 h of data from February to June 2022 at a WWTP in southern China LSTM with MLP network COD, pH, NH3-N Power1h, °C, instantaneous flow rate at the effluent outflow, DO, F1run, and F2run COD, pH, NH3-N prediction 
Real-time control of AAO (Liu et al. 2023b2019–2021 data from WWTP LSTM model predictive control DO, ORP, NH4+-N, , MLSS, Qin, COD, NH4+-N, TN, TP, pH, SS, °C Qr, and Qsr Aeration volumes, Qr, and Qsr controlling 
Blockage detection (Patil et al. 202314,765 sewer blockage frames ‘S-BIRD’ dataset Transfer learning and YOLOv5 Blockage images Blockage detection 

Harmful algal blooms (HAB) present a significant challenge in surface water due to their anoxic potential and toxin release (Barrientos-Espillco et al. 2023). Baek et al. (2022) explored deep-learning techniques to predict HAB in surface water. The researchers detected HAB using remote sensing and in situ monitoring; however, this approach relies entirely on algal biomass. For instance, weather conditions can influence physicochemical factors at spatial and temporal scales, with chlorophyll-a as a significant indicator of phytoplankton abundance. Ly et al. (2021) observed the impact of weather conditions to enhance the accuracy of HAB prediction, enabling proactive water management strategies.

Groundwater treatment by using ML

As per a report by the US EPA (2025) on ‘Climate Change Impacts on Freshwater Resources,’ the growing population and changing weather patterns have increasingly contributed to a higher demand for freshwater resources. Groundwater accounts for 99% of the Earth's liquid freshwater reserves (Lall et al. 2020). Groundwater treatment and quality prediction are achieved by ML algorithms, as summarized in Table 1.

The WQI is used to evaluate the groundwater quality and aggregate the various water quality parameters. The traditional approach to WQI computation relies on complex mathematical models, is time-intensive, and struggles to handle large datasets. Advances in ML techniques, such as support vector machines (SVM), naive bayes classifiers (NBC), and particle swarm optimization (PSO), offer more efficient and accurate predictive models, enhancing water resource management (Agrawal et al. 2021). Groundwater circulation wells (GCWs) are one of the popular water purification techniques. Fang et al. (2024) presented artificial neural networks (ANN), multiple linear regression (MLR), and SVM to predict the influence radius and particle recovery ratio for optimum GCW design. Sarkar et al. (2024) predicted groundwater potential zones in Bangladesh using climatic data through ML models ANN, logistic tree models, and linear regression. Similarly, Hussein et al. (2020) utilized further advanced groundwater availability prediction by using GB, LR, multilayer perceptron (MLP), RF, and support vector regression (SVR) by satellite imagery.

Seawater treatment using ML

Seawater, which constitutes 97% of Earth's water, is unsuitable for domestic use due to its high salinity (National Ocean Services 2024). Desalination has traditionally been used to convert seawater into freshwater (Qasim et al. 2019; Zapata-Sierra et al. 2021). Recently, ML models have emerged as promising tools for enhancing the efficiency of desalination processes, as detailed in Table 1.

Microplastics in seawater

The increasing contamination of seawater with microplastics poses a critical threat to marine ecosystems, aquatic life, and human health and emphasizes the importance of their identification, sampling, and contamination control (Cutroneo et al. 2020). Gong et al. (2023) proposed a data-driven model using SVM, RF, and deep-learning models, such as convolutional neural networks (CNN) and Residual Networks (ResNet34), for classifying and detecting 11 types of microplastics. Researchers utilize Raman spectroscopy to analyze microplastic samples obtained through microfluidics, facilitating high-precision identification.

Desalination technologies

The seawater reverse osmosis (SWRO) technique is one of the most reliable desalination methods, though extracting boron ions remains challenging (Abba et al. 2023). Ajali-Hernández et al. (2024) addressed this issue by using ensemble-based ML models to predict the permeability coefficient of boron over 1,500 days of operational data from a large-scale SWRO plant. Deep learning has also been applied to improve desalination efficiency. Alshehri et al. (2021) presented an ensemble-based model to classify salt particle concentrations into ten categories from the dataset of salt particle images captured with a Raspberry Pi device.

Marine animal monitoring

Monitoring marine fauna is essential to understanding and mitigating the negative impacts of human activities on aquatic ecosystems. The spectral filtering ML technique has been used to detect and track marine animals. Colefax et al. (2023) have applied this approach to monitor dolphin populations in aquatic systems, demonstrating the potential of ML for ecological conservation.

Wastewater treatment using ML

Wastewater treatment involves the collection, processing, and reuse of wastewater for safe disposal into natural water bodies. The treatment process typically involves three stages: primary, secondary, and tertiary stages (Adrados et al. 2014). A treatment plant typically includes the following stages (Garg 2012):

  • (a) Screening: Passing sewage through different types of screens to trap and remove the floating matter.

  • (b) Grit removal: Removal of non-putrescible material such as sand, bones, eggshells.

  • (c) Sedimentation: Particles in the water settle out of the liquid due to gravity, followed by processes like detention and flocculation.

  • (d) Aeration/activated sludge: Aeration tank of long detention, where the activated sludge mixed with segmented sewage is agitated and aerated.

  • (e) Secondary settling: Coagulated suspended mass settles down by gravity.

  • (f) Filtration: Removal of suspended and dissolved solids through physical and chemical processes.

  • (g) Disinfection: Removal of pathogens in effluents for safe disposal in water bodies.

In the past few years, ML has optimized processes, enabled predictive capabilities, and thus revolutionized wastewater treatment, as summarized in Table 1. Most of the ML-based studies concentrated on filtration, which is a critical energy-intensive step in wastewater treatment. Harrou et al. (2023) utilized a predictive model using long short-term memory (LSTM) and bidirectional gated recurrent unit (BiGRU) algorithms for estimating energy consumption on datasets, including energy usage, climate data, and wastewater properties. Wei et al. (2023) presented a deep-learning algorithm to optimize energy consumption using sensor data from wastewater treatment plants (WWTPs).

One widely used technique in WWTPs for nitrogen and phosphorus removal is the anaerobic–anoxic–oxic (AAO) treatment (Sang et al. 2021). The AAO system contains three interconnected tanks: anaerobic, aerobic, and anoxic. Liu et al. (2023a) demonstrated the use of an ML model to optimize the AAO process by maximizing the internal recirculation flow rate (Qr), sludge internal recycle flow rate (Qsr), and aeration volumes. This approach significantly improves energy efficiency, enhances water quality, and ensures effective process regulation.

WWTP's performance is significantly impacted by non-biodegradable products in the sewer system. These result in blockages that lead to overflows and adverse effects on water quality, which affects human health and safety. Deep-learning techniques offer promising solutions to improve wastewater management processes (Alvi et al. 2023). Integrating the ML methods is essential to avoid flooding due to sewer system defects. Patil et al. (2023) worked on blockage detection using image processing techniques, including the You Only Look Once version 5 (YOLO5) deep-learning algorithm and transfer learning on the ‘S-BIRD’ images dataset in sewer blockage detection.

As shown in Figure 5, building the prediction model is a multi-step process. The first step is data collection and integration from different sources. The dataset is split into training (80%) and testing (20%) subsets. Data engineering techniques like cleaning, encoding, and outlier detection are applied to preprocess the dataset. Feature selection (FS) methods are used to determine the most relevant subsets of variables, while ML algorithms are applied to train the models. Finally, the performance of the ML models is measured using standard metrics.
Figure 5

The prediction model architecture.

Figure 5

The prediction model architecture.

Close modal

Case study information

There are two major WWTPs in Melbourne City run by Melbourne Water: the Eastern Treatment Plant (ETP) and the Western Treatment Plant (WTP). The ETP resolves congestion at WTP, providing for Melbourne's expanding population in its southeastern areas. The records used in this study are open-access from Melbourne Water and the Melbourne Airport Weather Station. The data on hydraulic, biological, climate, and energy consumption are merged into a unified dataset with a daily time resolution (Table 2).

Table 2

Summary of values of measured water quality parameters

Parameter (Abbreviation)UnitMaxMinMean
Average inflow (Qin/s 19 2.6 4.5 
Average outflow (Qout/s 7.9 0.1 3.9 
Energy consumption (EC) MWh 398 116 275 
Ammonium (NH4-N) mg/L 93 13 39 
BOD mg/L 850 140 382 
Chemical oxygen demand (COD) mg/L 1,700 360 846 
Total nitrogen (TN) mg/L 92 40 62 
Average temperature (Tavg35.5 15 
Maximum temperature (Tmax43.5 20.5 
Minimum temperature (Tmin28 −2 10 
AP (hPa) 10.22 3.7 
Average humidity (H97 63 
Total rainfall or snowmelt (PrMm 18 0.2 
VIS Km 512 
Average wind speed (WSavgKm/h 49 19 
Maximum wind speed (WSmaxKm/h 83.5 35.4 
Year (year) 2019 2014 
Month (month) 12 
Day (day) 31 
Parameter (Abbreviation)UnitMaxMinMean
Average inflow (Qin/s 19 2.6 4.5 
Average outflow (Qout/s 7.9 0.1 3.9 
Energy consumption (EC) MWh 398 116 275 
Ammonium (NH4-N) mg/L 93 13 39 
BOD mg/L 850 140 382 
Chemical oxygen demand (COD) mg/L 1,700 360 846 
Total nitrogen (TN) mg/L 92 40 62 
Average temperature (Tavg35.5 15 
Maximum temperature (Tmax43.5 20.5 
Minimum temperature (Tmin28 −2 10 
AP (hPa) 10.22 3.7 
Average humidity (H97 63 
Total rainfall or snowmelt (PrMm 18 0.2 
VIS Km 512 
Average wind speed (WSavgKm/h 49 19 
Maximum wind speed (WSmaxKm/h 83.5 35.4 
Year (year) 2019 2014 
Month (month) 12 
Day (day) 31 

Data collection

The Melbourne Water dataset includes records from 2014 to 2019 and is available under Victoria's open data policy (Bagherzadeh et al. 2021b). The study obtains wastewater characteristics through sampling and sensor readings and sources climate data from the Melbourne Airport Weather Station (Melbourne Airport Weather Station 2021). Table 2 summarizes the water quality parameters, presenting the ranges of measured water quality components. As shown in the table, measurement parameters include average inflow (Qin), average outflow (Qout), energy consumption (EC), ammonium (NH4-N), total nitrogen (TN), BOD, average temperature (Tavg), chemical oxygen demand (COD), maximum temperature (Tmax), minimum temperature (Tmin), average humidity (H), total rainfall or snowmelt (Pr), average wind speed (WSavg), atmospheric pressure (AP), maximum wind speed (WSmax), year (year), average visibility (VIS), month (month), and day (day).

Data engineering processes included cleaning, encoding, and outlier detection to enhance the quality of the datasets. Linear interpolation is used to address missing values. Categorical data are converted to numerical form using encoding techniques, while Z-score outlier detection identifies anomalous data points that deviate significantly from the dataset's normal distribution (Aggarwal et al. 2019).

Feature selection

FS methods are employed to evaluate and identify the most relevant features for modeling based on specific criteria. This research uses mutual information (MI) as the primary FS method, which captures linear and nonlinear relationships between features and the target variable (Zegaar et al. 2024). Similarly, methods like LASSO regression and recursive feature elimination are commonly used to identify the model features (Zhang et al. 2021; Julian et al. 2023). MI is adopted for its simplicity and effectiveness in predicting water quality, particularly in addressing the nonlinear and complex interactions commonly observed between environmental and water quality factors (Moeinzadeh et al. 2023).

Modeling approaches

ML models, including linear regression (LR), K-nearest neighbor regression (KNN), SVR, and decision tree regression (DT), are applied to predict water quality. In addition, the ensemble model technique combines multiple models to produce more accurate predictions than any single model. The ensemble models are RF, GB regression, and AdaBoost Regression (AR), which are employed to improve predictive performance.

LR models predict target values by mapping input features to target values through line fitting. This method is straightforward for linear patterns but limited to complex, nonlinear relationships. RF is an ensemble learning method that creates multiple decision trees and aggregates their output. It is appropriate for large-scale classification and regression tasks due to its improved accuracy and reduction of model overfitting. The SVR is an extension of SVMs on regression tasks. It seeks the hyperplane that maximizes the margin between the data points, best fits the data, and concerns itself with minimizing prediction errors.

KNN predicts target values by averaging the outcomes of the k closest points in the feature space. This method works on the assumption that similar inputs produce similar outputs. DT models divide data into subsets by evaluating feature values. Each node in the tree serves as a decision point, informed by these features, forming branches that lead to leaf nodes where the model makes the final prediction. Although decision trees are straightforward to interpret and visualize, they are particularly susceptible to overfitting, especially when dealing with noisy data. Pruning techniques are frequently applied to enhance generalization and reduce overfitting.

GB is an ensemble technique that builds models linearly, with each new model correcting the errors of the previous model. It iteratively enhances prediction accuracy by minimizing a specified loss function. While powerful and capable of modeling complex relationships, GB is sensitive to overfitting and requires careful tuning of parameters like the learning rate and tree depth. AR combines multiple weak learners, typically shallow decision trees, by focusing on the most challenging data points to predict. It assigns higher weights to misclassified instances, iteratively refining the model. The final prediction is a weighted sum of these weak learners. AdaBoost effectively boosts model performance, especially with weak base models, though it can be sensitive to noisy data.

Model performance evaluation

For this study, root mean squared error (RMSE) (Equation (2)) and R2 (Equation (5)) are used for model performance evaluations (Harrou et al. 2023). RMSE is the average of absolute differences between actual and predicted values, and the closer the value of RMSE is to 0, the better the model's performance. The coefficient of determination evaluates the goodness of fit of a model by measuring the variance of the target variable. A better model with accurate predictions will have greater R² values.
(1)
(2)
(3)
(4)
(5)
where = 1, 2, … is the number of observations and n is the total number of records. Considering for model prediction, as real values and as the mean of actual values, the sum of squares of the difference between the observed dependent variables and predicted dependent variables is residual sum of squares (Equation (4)). The sum of squared differences between the observed dependent variables and the overall mean is called total sum of squares (TSS) (Equation (5)).

Importance of NH4-N, BOD, TN, and COD

Predicting water quality parameters like NH4-N, BOD, TN, and COD is vital for environmental monitoring and management.

  • (a) NH4-N (Ammonium): Excess NH4-N in water can lead to eutrophication, causing algal blooms that deplete oxygen levels and harm aquatic life.

  • (b) BOD: BOD of sewage gives the amount of biologically active organic matter in sewage. Aerobic bacteria flourish in wastewater when sufficient oxygen is available, causing aerobic biological decomposition of sewage until the oxidation is complete. The amount of oxygen consumed in this process is the BOD (Garg 2012). A high BOD level indicates organic pollution, which reduces the available oxygen for fish and other marine organisms to breathe (Ooi et al. 2022).

  • (c) TN: Total nitrogen includes all forms of nitrogen (ammonium, nitrate, nitrite, and organic nitrogen). A high level of TN indicates nutrient pollution in water bodies, worsening eutrophication and leading to ‘dead zones’ (Bagherzadeh et al. 2021a).

  • (d) COD: The laboratory test determines the COD of wastewater using a strong oxidant like dichromate solution. Theoretical computations of COD are performed only on water solutions prepared with a known amount of a specific organic compound in a laboratory to compare the theoretical and test results (Garg 2012).

These parameters are crucial for understanding and addressing water quality problems, enabling targeted interventions to preserve aquatic ecosystems.

Statistical details of data

Table 2 summarizes the basic statistical properties of all features in the combined dataset. This research focuses on predicting four water quality parameters, including NH4-N, BOD, TN, and COD, individually. Each iteration uses one parameter as the dependent or target variable; the rest are independent variables for prediction. For example, in the first iteration, NH4-N is predicted, and all other features are predictors. This process is repeated for BOD, TN, and COD. This approach iteratively analyzes the relationships between variables and each water quality parameter.

Performance measure of water quality prediction model

Table 3 shows the ML model's performance metrics for predicting NH4-N, BOD, TN, and COD. The models are evaluated using two key metrics: R2 and RMSE. The results indicate that the ensemble learning models, including GB, RF, and AR, outperformed the rest significantly.

  • The GB model indicates the highest overall accuracy with an RMSE value of 61.39 and R² value of 0.75, which shows its efficiency in dealing with complex relationships.

  • The RF model is more accurate in TN and COD prediction than the others.

  • Lower RMSE values across ensemble models indicate their robustness in predicting water quality parameters.

Table 3

Summary of performance of ML models for prediction of water quality parameters

OutputLR
RF
SVR
KNN
DT
GB
AR
RMSER2RMSER2RMSER2RMSER2RMSER2RMSER2RMSER2
NH4-N 6.32 0.17 5.75 0.32 6.15 0.22 6.48 0.13 8.13 0.36 5.90 0.28 6.17 0.21 
BOD 57.01 0.45 51.06 0.56 71.31 0.15 58.05 0.43 74.16 0.08 50.40 0.57 61.77 0.36 
COD 73.32 0.64 61.75 0.74 106.18 0.24 78.62 0.58 99.75 0.33 61.39 0.75 67.31 0.70 
TN 1.70 0.62 1.50 0.70 1.84 0.56 2.03 0.47 2.37 0.28 1.49 0.71 2.20 0.38 
OutputLR
RF
SVR
KNN
DT
GB
AR
RMSER2RMSER2RMSER2RMSER2RMSER2RMSER2RMSER2
NH4-N 6.32 0.17 5.75 0.32 6.15 0.22 6.48 0.13 8.13 0.36 5.90 0.28 6.17 0.21 
BOD 57.01 0.45 51.06 0.56 71.31 0.15 58.05 0.43 74.16 0.08 50.40 0.57 61.77 0.36 
COD 73.32 0.64 61.75 0.74 106.18 0.24 78.62 0.58 99.75 0.33 61.39 0.75 67.31 0.70 
TN 1.70 0.62 1.50 0.70 1.84 0.56 2.03 0.47 2.37 0.28 1.49 0.71 2.20 0.38 

These results show that ensemble learning has the potential to predict water quality, providing reliable environmental monitoring and management tools.

GB is a highly effective technique in water quality prediction due to its capability to capture complex and nonlinear relationships between input features and target variables. The methodology behind its ensemble technique involves combining several weak learners into one robust predictive model. In achieving better accuracy with successive iterations, it refines errors in previously created models through iteration. GB is particularly well-suited to handling unstructured and noisy data, which are some of the common characteristics of water quality datasets. Effective hyperparameter optimization improves the generalization capability of GB, enhancing its robustness against overfitting. This makes GB particularly suitable for environmental prediction tasks, where input–output relationships are often complex, nonlinear, and subject to high variability.

As shown in Table 3, with ensemble models like GB, better predictions for WWTP water quality parameters would be obtained than solely relying on individual ML methods. The reduction of RMSE and the improved R² values indicate the possibility of including temporal dependencies and non-linearity using the ensemble method. Such capabilities are more valuable in WWTP management, thus making it more straightforward to make informed decisions and properly allocate resources.

Figure 6 depicts the predictions of seven ML models for water quality parameters on the testing dataset. The plots, with the red lines showing the predicted values, indicate how well the models can predict future trends in water quality. Such performance demonstrates that the models have learned to capture underlying patterns and temporal dependencies in time-series data, producing reliable and actionable predictions.
Figure 6

Scatter plots illustrating the predicted water parameters: (a) TN, (b) BOD, (c) COD, and (d) NH4-N for the seven ML models using testing data.

Figure 6

Scatter plots illustrating the predicted water parameters: (a) TN, (b) BOD, (c) COD, and (d) NH4-N for the seven ML models using testing data.

Close modal

Table 4 shows the comparison of the proposed model with the existing model. The R2 value of the proposed model looks for a significant increase, indicating that the model provides a better overall fit and demonstrating that the water quality prediction model outperforms the existing model.

Table 4

Comparison of proposed and existing water quality prediction models

AlgorithmPerformance metricsReferences
GB m R2 = 0.53 Bagherzadeh et al. (2021b)  
GB regression R2 = 0.75 The present analysis 
AlgorithmPerformance metricsReferences
GB m R2 = 0.53 Bagherzadeh et al. (2021b)  
GB regression R2 = 0.75 The present analysis 

The proposed gradient boosting regression (GBR) model outperforms the traditional gradient boosting machine (GBM) model on the Melbourne WWTP dataset because of its enhanced ability to address inherent complexities, such as high-dimensional data, missing values, and noisy measurements. The GBR framework is designed to capture nonlinear relationships and feature interactions critical for wastewater treatment processes. Furthermore, modern boosting techniques natively handle missing data, manage categorical variables efficiently, and incorporate robust loss functions such as Huber loss to minimize the impact of outliers and noise.

Although the proposed model has high predictive power, its generalization to other regions and treatment plants is challenging. The key limitations are as follows:

  • 1. Regional and Environmental Variability

Differences in climate, hydrology, land use, and pollution sources between regions may influence the model's performance when applied to datasets from other sites. Lack of representation can reduce the universality of the model.

  • 2. Variability in Water Treatment Plant Operations

Different water treatment plants' treatment technologies, protocols, and pollutant characteristics further provide challenges. For instance, differences in sampling frequencies, types of equipment, maintenance schedules, and the nature of treatment (biological versus chemical) can cause differences within the input data and challenge adaptability in the model.

The study's findings have practical implications for water quality management by enabling more proactive, efficient, and data-driven decision-making. Predictive models enhance real-time monitoring and provide early warnings of water quality issues, allowing managers to address pollution events or treatment inefficiencies before they escalate, resulting in optimization of operations in water treatment plants and reducing costs while staying within regulatory standards. Through the identification of trends in such key parameters as NH4-N, BOD, TN, and COD, these tools can point to pollution sources, guide targeted interventions, and support sustainable agriculture and industry practices. The resilience of water systems against climate variability is enhanced when it can predict water quality under changing conditions, whether seasonal or extreme. Water resource managers can create flexible, scalable, and transparent frameworks to protect ecosystems, keep people healthy, and ensure water resources.

This paper reviews the recent advances in ML for water monitoring, such as forecasting water quality, optimizing resource allocation, and managing water shortages. ML algorithms are promising, but challenges persist in handling data complexity, ensuring model interpretability, and adapting algorithms to diverse geographic and environmental contexts.

This study assessed seven ML algorithms to predict water quality parameters and demonstrated their potential in supporting decision-making and enhancing energy efficiency in WWTP operations. Key findings of this study are:

  • The average, maximum, and minimum temperatures and humidity significantly influence the water quality parameters at Melbourne East WWTP.

  • The GBR algorithm demonstrated superior predictive performance compared to other algorithms, effectively capturing nonlinear and irregular patterns in the data.

ML models have demonstrated their efficacy in water quality prediction, with accuracy rates often exceeding 70% in various studies, enabling more precise and timely interventions to safeguard water resources.

The future scope of this work involves enhancing the model by integrating real-time data feeds from sensor-coupled systems. This will ensure robust model performance, significantly improving wastewater quality management through accurate, continuous monitoring and proactive intervention. Additionally, explainable AI techniques can enhance the model's transparency and interpretability. In the context of water quality monitoring and prediction, ML algorithms can be optimized to improve practicality, scalability, and reliability.

We extend our sincere gratitude to Dr Vaishnavi Dabir, Principal Consultant at Green Cube LLC, USA, for her invaluable technical assistance and expert guidance.

This work was supported by the Research Support Fund (RSF) of Symbiosis International (Deemed University), Pune, India.

S.P. conceptualized the process, developed the methodology, rendered support in formal analysis, wrote the original draft, P.K. conceptualized the process, developed the methodology, wrote and reviewed and edited the article, support in resources, supervised the work.

All relevant data are available from an online repository or repositories: https://data.mendeley.com/datasets/pprkvz3vbd/1.

The authors declare there is no conflict.

Abba
S. I.
,
Usman
J.
,
Abdulazeez
I.
,
Lawal
D. U.
,
Baig
N.
,
Usman
A. G.
&
Aljundi
I. H.
(
2023
)
Integrated modeling of hybrid nanofiltration/reverse osmosis desalination plant using deep learning-based crow search optimization algorithm
,
Water (Switzerland)
,
15
(
19
),
3515
.
https://doi.org/10.3390/w15193515
.
Adrados
B.
,
Sánchez
O.
,
Arias
C. A.
,
Becares
E.
,
Garrido
L.
,
Mas
J.
,
Brix
H.
&
Morató
J.
(
2014
)
Microbial communities from different types of natural wastewater treatment systems: vertical and horizontal flow constructed wetlands and biofilters
,
Water Research
,
55
,
304
312
.
https://doi.org/10.1016/J.WATRES.2014.02.011
.
Aggarwal
V.
, Gupta, V., Singh, P., Sharma, K. & Sharma, N. (
2019
) '
Detection of spatial outlier by using improved Z-score test
',
2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI)
, pp.
788
790
.
Available at: https://doi.org/10.1109/ICOEI.2019.8862582
.
Aggarwal
P.
, Sinha, A., Kumar, S., Agarwal, A., Banerjee, A., Villuri, V. G. K., Annavarapu, C. S. R., Dwivedi, R., Dera, V. V. R., Sinha, J. & Pasupuleti, S. (
2021
)
Exploring artificial intelligence techniques for groundwater quality assessment
,
Water (Switzerland)
,
13
(
9
),
1172
.
https://doi.org/10.3390/w13091172
.
Ajali-Hernández
N. I.
,
Ruiz-Garćıa
A.
&
Travieso-González
C. M.
(
2024
)
ANN based-model for estimating the boron permeability coefficient as boric acid in SWRO desalination plants using ensemble-based machine learning
,
Desalination
,
573
,
117180
.
https://doi.org/10.1016/j.desal.2023.117180
.
Alshehri
M.
,
Kumar
M.
,
Bhardwaj
A.
,
Mishra
S.
&
Gyani
J.
(
2021
)
Deep learning based approach to classify saline particles in sea water
,
Water (Switzerland)
,
13
(
9
),
1251
.
https://doi.org/10.3390/w13091251
.
Alvi
M.
,
Batstone
D.
,
Mbamba
C. K.
,
Keymer
P.
,
French
T.
,
Ward
A.
,
Dwyer
J.
&
Cardell-Oliver
R.
(
2023
)
Deep learning in wastewater treatment: a critical review
,
Water Research
,
245
,
120518
.
https://doi.org/10.1016/J.WATRES.2023.120518
.
Baek
S.
,
Abbas
A.
&
Cho
K. H.
(
2022
)
Deep learning-based algorithms for long-term prediction of chlorophyll-a in catchment streams
,
Journal of Hydrology, 626, 130240. https://doi.org/10.21203/rs.3.rs-1643745/v1
.
Bagherzadeh
F.
,
Mehrani
M. J.
,
Basirifard
M.
&
Roostaei
J.
(
2021a
)
Comparative study on total nitrogen prediction in wastewater treatment plant and effect of various feature selection methods on machine learning algorithms performance
,
Journal of Water Process Engineering
,
41
,
102033
.
https://doi.org/10.1016/j.jwpe.2021.102033
.
Bagherzadeh
F.
,
Nouri
A. S.
,
Mehrani
M. J.
&
Thennadil
S.
(
2021b
)
Prediction of energy consumption and evaluation of affecting factors in a full-scale WWTP using a machine learning approach
,
Process Safety and Environmental Protection
,
154
,
458
466
.
https://doi.org/10.1016/j.psep.2021.08.040
.
Barrientos-Espillco
F.
,
Gascó
E.
,
López-González
C. I.
,
Gómez-Silva
M. J.
&
Pajares
G.
(
2023
)
Semantic segmentation based on deep learning for the detection of cyanobacterial harmful algal blooms (CyanoHABs) using synthetic images
,
Applied Soft Computing
,
141, 110315. https://doi.org/10.1016/j.asoc.2023.110315
.
Chen
K.
,
Chen
H.
,
Zhou
C.
,
Huang
Y.
,
Qi
X.
,
Shen
R.
,
Liu
F.
,
Zuo
M.
,
Zou
X.
,
Wang
J.
,
Zhang
Y.
,
Chen
D.
,
Chen
X.
,
Deng
Y.
&
Ren
H.
(
2020
)
Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data
,
Water Research
,
171
,
115454
.
https://doi.org/10.1016/j.watres.2019.115454
.
Colefax
A. P.
,
Walsh
A. J.
,
Purcell
C. R.
&
Butcher
P.
(
2023
)
Utility of spectral filtering to improve the reliability of marine fauna detections from drone-Based monitoring
,
Sensors
,
23
(
22
),
9193
.
https://doi.org/10.3390/s23229193
.
Cutroneo
L.
,
Reboa
A.
,
Besio
G.
,
Borgogno
F.
,
Canesi
L.
,
Canuto
S.
,
Dara
M.
,
Enrile
F.
,
Forioso
I.
,
Greco
G.
,
Lenoble
V.
,
Malatesta
A.
,
Mounier
S.
,
Petrillo
M.
,
Rovetta
R.
,
Stocchino
A.
,
Tesan
J.
,
Vagge
G.
&
Capello
M.
(
2020
)
Microplastics in seawater: sampling strategies, laboratory methodologies, and identification techniques applied to port environment
,
Environmental Science and Pollution Research
,
27
,
8938
8952
.
https://doi.org/10.1007/s11356-020-07783-8
.
Datta
A.
,
Maharaj
S.
,
Prabhu
G. N.
,
Bhowmik
D.
,
Marino
A.
,
Akbari
V.
,
Rupavatharam
S.
,
Sujeetha
J. A. R. P.
,
Anantrao
G. G.
,
Poduvattil
V. K.
,
Kumar
S.
&
Kleczkowski
A.
(
2021
)
Monitoring the spread of water hyacinth (Pontederia crassipes): challenges and future developments
,
Frontiers in Ecology and Evolution
,
9, 631338. https://doi.org/10.3389/fevo.2021.631338
.
Fang
Z.
,
Ke
H.
,
Ma
Y.
,
Zhao
S.
,
Zhou
R.
,
Ma
Z.
&
Liu
Z.
(
2024
)
Design optimization of groundwater circulation well based on numerical simulation and machine learning
,
Scientific Reports
,
14
(
1
),
11506
.
https://doi.org/10.1038/s41598-024-62545-7
.
Garg
S.
(
2012
)
Sewage Disposal and Air Pollution Engineering
. 24th edn.
New Delhi
:
Khana Publisher
.
Gong
L.
,
Martinez
O.
,
Mesquita
P.
,
Kurtz
K.
,
Xu
Y.
&
Lin
Y.
(
2023
)
A microfluidic approach for label-free identification of small-sized microplastics in seawater
,
Scientific Reports
,
13
(
1
),
11011
.
https://doi.org/10.1038/s41598-023-37900-9
.
Harrou
F.
,
Dair i
A.
,
Dorbane
A.
&
Sun
Y.
(
2023
)
Energy consumption prediction in water treatment plants using deep learning with data augmentation
,
Results in Engineering
,
20
,
101428
.
https://doi.org/10.1016/j.rineng.2023.101428
.
Hoang
H. G.
,
Lin
C.
,
Tran
H. T.
,
Chiang
C. F.
,
Bui
X. T.
,
Cheruiyot
N. K.
,
Shern
C. C.
&
Lee
C. W.
(
2020
)
Heavy metal contamination trends in surface water and sediments of a river in a highly-industrialized region
,
Environmental Technology & Innovation
,
20
,
101043
.
https://doi.org/10.1016/J.ETI.2020.101043
.
Hussein
E. A.
et al (
2020
)
Groundwater prediction using machine-learning tools
,
Algorithms
,
13
(
11
),
300
.
https://doi.org/10.3390/a13110300
.
Islam
M. A.
,
Das
B.
,
Quraishi
S. B.
,
Khan
R.
,
Naher
K.
,
Hossain
S. M.
,
Karmaker
S.
,
Latif
S. A.
&
Hossen
M. B.
(
2020
)
Heavy metal contamination and ecological risk assessment in water and sediments of the Halda River, Bangladesh: a natural fish breeding ground
,
Marine Pollution Bulletin
,
160
,
111649
.
https://doi.org/10.1016/J.MARPOLBUL.2020.111649
.
Julian
J.
,
Dewantara
A. B.
&
Wahyuni
F.
(
2023
)
Design of machine learning-based water quality prediction system with recursive feature elimination cross-validation
,
Jurnal Infotel
,
15
(
3
),
249
255
.
https://doi.org/10.20895/infotel.v15i3.977
.
Korostynska
O.
,
Mason
A.
&
Al-Shamma
A.
(
2013
)
Monitoring pollutants in wastewater: traditional lab based versus modern real-time approaches
. In:
Mukhopadhyay, S. & Mason, A. (eds) Smart Sensors for Real-Time Water Quality Monitoring. Berlin, Heidelberg: Springer
, pp.
1
24
.
https://doi.org/10.1007/978-3-642-37006-9_1
.
Lall
U.
,
Josset
L.
&
Russo
T.
(
2020
)
A snapshot of the world's groundwater challenges
,
Annual Review of Environment and Resources
,
45
,
171
194
.
https://doi.org/10.1146/annurev-environ-102017
.
Liu
W.
,
Liu
T.
,
Liu
Z.
,
Luo
H.
&
Pei
H.
(
2023a
)
A novel deep learning ensemble model based on two-stage feature selection and intelligent optimization for water quality prediction
,
Environmental Research
,
224
,
115560
.
https://doi.org/10.1016/J.ENVRES.2023.115560
.
Liu
Y.
,
Tian
W.
,
Xie
J.
,
Huang
W.
&
Xin
K.
(
2023b
)
LSTM-based model-predictive control with rationality verification for bioreactors in wastewater treatment
,
Water (Switzerland)
,
15
(
9
),
1779
.
https://doi.org/10.3390/w15091779
.
Ly
Q. V.
,
Nguyen
X. C.
,
N. C.
,
Truong
T. D.
,
Hoang
T. H. T.
,
Park
T. J.
,
Maqbool
T.
,
Pyo
J. C.
,
Cho
K. H.
,
Lee
K. S.
&
Hur
J.
(
2021
)
Application of machine learning for eutrophication analysis and algal bloom prediction in an urban river: a 10-year study of the Han River, South Korea
,
Science of the Total Environment
,
797
,
149040
.
https://doi.org/10.1016/j.scitotenv.2021.149040
.
Ma
J.
,
Ding
Y.
,
Cheng
J. C. P.
,
Jiang
F.
&
Xu
Z.
(
2020
)
Soft detection of 5-day BOD with sparse matrix in city harbor water using deep learning techniques
,
Water Research
,
170
,
115350
.
https://doi.org/10.1016/j.watres.2019.115350
.
Melbourne Airport Weather Station
(
2021
).
Available at: https://en.tutiempo.net/ (Accessed: 14 August 2024)
.
Moeinzadeh, H., Jegakumaran, P., Yong, K. T. & Withana, A.
(
2023
)
Efficient water quality prediction by synthesizing seven heavy metal parameters using deep neural network
,
Journal of Water Process Engineering
,
56
,
104349
.
https://doi.org/10.1016/j.jwpe.2023.104349
.
Mohd Zebaral Hoque
J.
,
Nor
N. A.
,
Alelyani
S.
,
Mohana
M.
&
Hosain
M.
(
2022
)
Improving water quality index prediction using regression learning models
,
International Journal of Environmental Research and Public Health
,
19
(
20
),
13702
.
https://doi.org/10.3390/ijerph192013702
.
Mucheye
T.
,
Haro
S.
,
Papaspyrou
S.
&
Caballero
I.
(
2022
)
Water quality and water hyacinth monitoring with the sentinel-2A/B satellites in lake tana (Ethiopia)
,
Remote Sensing
,
14
(
19
),
4921
.
https://doi.org/10.3390/rs14194921
.
National Ocean Services
(
2024
).
Available at: https://oceanservice.noaa.gov/facts/oceanwater.html (Accessed: 6 July 2024)
.
Ooi
K. S.
,
Chen
Z. Y.
,
Poh
P. E.
&
Cui
J.
(
2022
)
BOD5 prediction using machine learning methods
,
Water Supply
,
22
(
1
),
1168
1182
.
https://doi.org/10.2166/ws.2021.202
.
Oyedotun
T. D. T.
&
Ally
N.
(
2021
)
Environmental issues and challenges confronting surface waters in South America: a review
,
Environmental Challenges
,
3
,
100049
.
https://doi.org/10.1016/j.envc.2021.100049
.
Patil
R. R.
,
Calay
R. K.
,
Mustafa
M. Y.
&
Ansari
S. M.
(
2023
)
AI-ddriven high-precision model for blockage detection in urban wastewater systems
,
Electronics (Switzerland)
,
12
(
17
),
3606
.
https://doi.org/10.3390/electronics12173606
.
Qasim
M.
,
Badrelzaman
M.
,
Darwish
N. N.
,
Darwish
N. A.
&
Hilal
N.
(
2019
)
Reverse osmosis desalination: a state-of-the-art review
,
Desalination
,
459
,
59
104
.
https://doi.org/10.1016/J.DESAL.2019.02.008
.
Sagan
V.
,
Peterson
K. T.
,
Maimaitijiang
M.
,
Sidike
P.
,
Sloan
J.
,
Greeling
B. A.
,
Maalouf
S.
&
Adams
C.
(
2020
)
Monitoring inland water quality using remote sensing: potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud computing
,
Earth-Science Reviews
,
205, 103187. https://doi.org/10.1016/j.earscirev.2020.103187
.
Sang
W.
,
Li
D.
,
He
Y.
,
Zhan
C.
,
Zhang
Q.
,
Li
C.
&
Singh
R. P.
(
2021
)
Sludge reduction and pollutants removal in anaerobic-anoxic-oxic reactor with 2450MHz electromagnetic wave loading on returned sludge: performance and mechanism
,
Process Safety and Environmental Protection
,
147
,
68
79
.
https://doi.org/10.1016/J.PSEP.2020.09.027
.
Sarkar
S. K.
,
Rudra
R. R.
,
Talukdar
S.
,
Das
P. C.
,
Nur
M. S.
,
Alam
E.
,
Islam
M. K.
&
Islam
A. R. M. T.
(
2024
)
Future groundwater potential mapping using machine learning algorithms and climate change scenarios in Bangladesh
,
Scientific Reports
,
14
(
1
),
10328
.
https://doi.org/10.1038/s41598-024-60560-2
.
Setia
R.
,
Dhaliwal
S. S.
,
Kumar
V.
,
Singh
R.
,
Kukal
S. S.
&
Pateriya
B.
(
2020
)
Impact assessment of metal contamination in surface water of Sutlej River (India) on human health risks
,
Environmental Pollution
,
265
,
114907
.
https://doi.org/10.1016/J.ENVPOL.2020.114907
.
Sheng
L.
,
Zhou
J.
,
Li
X.
,
Pan
Y.
&
Liu
L.
(
2020
)
Water quality prediction method based on preferred classification
,
IET Cyber-Physical Systems: Theory and Applications
,
5
(
2
),
176
180
.
https://doi.org/10.1049/iet-cps.2019.0062
.
Sidek
L. M.
,
Mohiyaden
H. A.
,
Marufuzzaman
M.
,
Noh
N. S. M.
,
Heddam
S.
,
Ehteram
M.
,
Kisi
O.
&
Sammen
S. S.
(
2024
)
Developing an ensembled machine learning model for predicting water quality index in Johor River Basin
,
Environmental Sciences Europe
,
36
(
1
),
67
.
https://doi.org/10.1186/s12302-024-00897-7
.
Silva
J. A.
(
2023
)
Wastewater treatment and reuse for sustainable water resources management: a systematic literature review
,
Sustainability (Switzerland)
,
15, 10940. https://doi.org/10.3390/su151410940
.
Uddin
M. G.
,
Nash
S.
&
Olbert
A. I.
(
2021
)
A review of water quality index models and their use for assessing surface water quality
,
Ecological Indicators
,
122
,
107218
.
https://doi.org/10.1016/J.ECOLIND.2020.107218
.
Urban Wastewater Scenario in India
(
2022
).
US EPA (2025) Available at: https://www.epa.gov/climateimpacts/climate-changeimpacts-freshwater-resources (Accessed: 7 January 2025).
Wei
Z.
,
Wu
N.
,
Zou
Q.
,
Zou
H.
,
Zhu
L.
,
Wei
J.
&
Huang
H.
(
2023
)
Data modeling of sewage treatment plant based on long short-term memory with multilayer perceptron network
,
Water (Switzerland)
,
15
(
8
),
1472
.
https://doi.org/10.3390/w15081472
.
Zapata-Sierra
A.
,
Cascajares
M.
,
Alcayde
A.
&
Manzano-Agugliaro
F.
(
2021
)
Worldwide research trends on desalination
,
Desalination
,
519
,
115305
.
https://doi.org/10.1016/J.DESAL.2021.115305
.
Zegaar
A.
,
Ounoki
S.
&
Telli
A.
(
2024
)
Machine learning for groundwater quality classification: a step towards economic and sustainable groundwater quality assessment process
,
Water Resources Management
,
38
(
2
),
621
637
.
https://doi.org/10.1007/s11269-023-03690-y
.
Zhang
S.
,
Jin
Y.
,
Chen
W.
,
Wang
J.
,
Wang
Y.
&
Ren
H.
(
2021
)
A temporal LASSO regression model for the emergency forecasting of the suspended sediment concentrations in coastal oceans: accuracy and interpretability
,
Engineering Applications of Artificial Intelligence
,
100
,
104206
.
https://doi.org/10.1016/J.ENGAPPAI.2021.104206
.
Zhi
W.
,
Feng
D.
,
Tsai
W. P.
,
Sterle
G.
,
Harpold
A.
,
Shen
C.
&
Li
L.
(
2021
)
From hydrometeorology to river water quality: can a deep learning model predict dissolved oxygen at the continental scale?
,
Environmental Science and Technology
,
55
(
4
),
2357
2368
.
https://doi.org/10.1021/acs.est.0c06783
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC 4.0), which permits copying, adaptation and redistribution for non-commercial purposes, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc/4.0/).