Abstract
Coagulation is an important water treatment step in a water treatment plant (WTP). Jar tests are performed to determine the required dose of coagulant; however, these tests are slow to be performed and do not give a response in real-time to changes in raw water quality that changes abruptly during the day. To overcome this limitation, this research developed artificial neural network (ANN) models, using full-scale WTP data that served to calibrate the model and then predict the coagulant dosage, considering raw water as data input, in compliance with the treated water quality parameters. The best model was able to predict the coagulant dosage with a mean squared error of 0.016 and a correlation coefficient equal to 0.872. These results corroborate to promote coagulant dosage automation in WTPs, making it clear that ANN models allow a faster response in dosage definition and reduce the need for human interaction in the process.
HIGHLIGHTS
Artificial neural network models consider water quality parameters of raw water and treated water to predict the best coagulant dosage, considering the operation cost and water quality.
The water quality parameters ‘pH’ and ‘turbidity’ were the most assertive in the prediction algorithm.
The parameters ‘residual fluoride’ and ‘residual chlorine’ had the worst performance among all water quality parameters studied.
INTRODUCTION
Water treatment is a fundamental process to ensure the supply of quality drinking water to the population. One of the main challenges of this process is the precise control of coagulant dosage, which plays a fundamental role in the effective removal of impurities and particles present in raw water. Traditionally, the control of coagulant dosage in WTP depends on a manual method called the jar test, and it may take 30–45 min to obtain the required result (Loc et al. 2020). Consequently, jar tests do not respond to rapid changes in water quality (Jayaweera & Aziz 2018). Optimizing this dosage is essential to improve coagulation performance in water treatment plants (WTPs) and to bring benefits such as greater operational efficiency and quality of treated water.
In this context, numerous studies have been conducted to explore innovative predictive control and modeling approaches that allow real-time adjustment of the coagulant dosage, considering various scenarios and uncertainties. Artificial intelligence, especially artificial neural networks (ANNs) and other machine learning techniques, have emerged as promising tools to improve the coagulation process in WTPs.
Among the relevant studies, Bello et al. (2014) proposed an advanced predictive control with multiple models to optimize coagulant dosing in real-time. Likewise, Kim & Parnichkun (2017) presented a hybrid model that combines the k-means clustering algorithm and the adaptive neuro-fuzzy inference system to predict the turbidity of treated water and determine the ideal coagulant for drinkable water.
Another interesting approach is described by Bobadilla et al. (2019), which uses the multiple response surface methodology to determine the main operational parameters in coagulation, while Heddam et al. (2011a, 2011b) performed a comparative study between radial basis function (RBF) neural networks and generalized regression neural networks to model the coagulant dosage.
In addition, a study by Jayaweera & Aziz (2018) developed and compared models of neural networks, extreme learning machine (ELM), and multi-layer perceptron (MLP), to predict the ideal dosage of a coagulant in WTPs, while Jayaweera & Aziz (2021) presented an efficient neural network model to assist the coagulation process in WTPs, optimizing coagulant dosage and improving efficiency.
Wu & Lo (2008) proposed a method to predict real-time coagulant dosage using ANNs and the adaptive network-based fuzzy inference system (ANFIS). Boumezbeur et al. (2023) developed a hybrid machine learning model (ELM-Bat) to determine the optimal dosage of coagulant in WTPs of drinking water, obtaining a greater accuracy.
Furthermore, Heddam et al. (2011a, 2011b) applied an ANFIS-based model for the coagulant dosage in a WTP. Maier (2004) describes the use of ANNs to predict optimal doses of aluminum sulfate and treated water quality parameters, contributing to more efficient treatment and better quality of treated water.
Another relevant study is the work of Jayaweera et al. (2019), which describes an approach using an ELM with RBF to improve the predictive capacity of the coagulation process. Shi et al. (2022) explore the use of real-time UV–Vis spectra of raw water to determine dosages of coagulants in the treatment process. Finally, Haghiri et al. (2018) employ ANNs to predict the ideal dosage of coagulant in jar test experiments, increasing the efficiency of the coagulation process in WTPs.
It is noteworthy that artificial intelligence has been increasingly explored by researchers around the world to improve the control and efficiency of water treatment in WTPs. The cited studies demonstrate the potential of these techniques to optimize the coagulation process, ensuring high-quality treated water and meeting the growing demand for safe water resources.
In the Brazilian context, few similar studies have been carried out in water treatment stations, as done by Gomes et al. (2015) and Menezes et al. (2018). Therefore, this study aims to adapt and apply the use of ANNs in the coagulation process of the José Pedro Horstmann Water Treatment Station, located in Santa Catarina. Using the data measured in this plant, it is expected that the predictive neural network model can improve the control of coagulant dosage and predict the quality of treated water.
MATERIALS AND METHODS
Case study
Extraction point 1 – Raw water extraction occurs before the coagulation stage and analyses pH, turbidity, and coagulant dosage.
Extraction point 2 – Occurs between coagulation and decanting/flocculation and includes pH, color, turbidity, chlorine dosage, fluorilical acid dosage, and geocalcium dosage analysis.
Extraction point 3 – Occurs right after the filtration step and it analyses pH, color, turbidity, and geocalcium dosage.
Extraction point 4 – Occurs at the end of all treatment steps, before the treated water distribution, and analyses pH, color, turbidity, residual chlorine, and residual fluoride.
The original data obtained from WTP José Pedro Horstmann are presented in Table 1. It contained 8,040 sets of inputs that represented the parameters of water quality. These sets cover 670 days of records made every 2 h, from 01 January 2019 to 31 October 2020.
Water quality parameters . | ||
---|---|---|
Extraction point . | Parameter . | Acronym . |
Raw water (Extraction point 1) | pH | RW-pH |
Color (uH) | RW-col | |
Turbidity (NTU) | RW-turb | |
Coagulant dosage (mg/L) | RW-dos | |
Decanted water (Extraction point 2) | pH | DW-pH |
Color (uH) | DW-col | |
Turbidity (NTU) | DW-turb | |
Geocalcium dosage (mg/L) | DW-geoc | |
Filtered water (Extraction point 3) | pH | FW-pH |
Color (uH) | FW-col | |
Turbidity (NTU) | FW-turb | |
Chlorine dosage (mg/L) | FW-chlo | |
Fluorilical acid dosage (mg/L) | FW-fluor | |
Geocalcium dosage (mg/L) | FW-geoc | |
Treated water (Extraction point 4) | pH | TW-pH |
Color (uH) | TW-col | |
Turbidity (NTU) | TW-turb | |
Residual chlorine (mg/L) | TW-chlo | |
Residual fluoride (mg/L) | TW-fluor |
Water quality parameters . | ||
---|---|---|
Extraction point . | Parameter . | Acronym . |
Raw water (Extraction point 1) | pH | RW-pH |
Color (uH) | RW-col | |
Turbidity (NTU) | RW-turb | |
Coagulant dosage (mg/L) | RW-dos | |
Decanted water (Extraction point 2) | pH | DW-pH |
Color (uH) | DW-col | |
Turbidity (NTU) | DW-turb | |
Geocalcium dosage (mg/L) | DW-geoc | |
Filtered water (Extraction point 3) | pH | FW-pH |
Color (uH) | FW-col | |
Turbidity (NTU) | FW-turb | |
Chlorine dosage (mg/L) | FW-chlo | |
Fluorilical acid dosage (mg/L) | FW-fluor | |
Geocalcium dosage (mg/L) | FW-geoc | |
Treated water (Extraction point 4) | pH | TW-pH |
Color (uH) | TW-col | |
Turbidity (NTU) | TW-turb | |
Residual chlorine (mg/L) | TW-chlo | |
Residual fluoride (mg/L) | TW-fluor |
It is important to highlight that during the period studied, there was a change in the type of coagulant used by the company to clarify raw water. In the period of 01 January 2019 until 21 December 2019, aluminum sulfate was used as a coagulant in the water treatment. From 22 December 2019 to 31 October 2020, the final date of the data provided by the company, the coagulant used was poly aluminum chloride (PAC). These data are categorized and explained in Table 2, along with the statistical characteristics of the database without the processing step.
Water quality parameters . | Average . | Standard deviation . | Coef. variation . | Min. value . | Max. value . | |
---|---|---|---|---|---|---|
Raw water (Extraction point 1) | RW-pH | 6.641 | 0.484 | 0.073 | 0.000 | 8,920 |
RW-col | 66.012 | 63.838 | 0.967 | 0.000 | 4,074,000 | |
RW-turb | 15.188 | 92.249 | 6.074 | 0.000 | 8,046,000 | |
RW-dos | 16.289 | 8.386 | 0.515 | 0.000 | 252,910 | |
Decanted water (Extraction point 2) | DW-pH | 6.454 | 0.812 | 0.126 | 0.000 | 9,780 |
DW-col | 6.805 | 6.966 | 1.024 | 0.000 | 85,000 | |
DW-turb | 2.300 | 2.518 | 1.095 | 0.000 | 48,300 | |
DW-geoc | 3.170 | 2.429 | 0.766 | 0.000 | 53,330 | |
Filtered water (Extraction point 3) | FW-pH | 6.459 | 0.724 | 0.112 | 0.000 | 9,090 |
FW-col | 2.806 | 1.840 | 0.656 | 0.000 | 40,000 | |
FW-turb | 0.503 | 1.172 | 2.330 | 0.000 | 86,000 | |
FW-chlo | 6.326 | 3.269 | 0.517 | 0.000 | 162,040 | |
FW-fluor | 0.671 | 0.213 | 0.317 | 0.000 | 7,480 | |
FW-geoc | 2.574 | 1.938 | 0.753 | 0.000 | 12,000 | |
Treated water (Extraction point 4) | TW-pH | 6.700 | 0.347 | 0.052 | 0.000 | 8,150 |
TW-col | 3.052 | 2.013 | 0.660 | 0.000 | 50,000 | |
TW-turb | 0.899 | 0.812 | 0.904 | 0.000 | 22,700 | |
TW-chlo | 3.679 | 0.448 | 0.122 | 0.000 | 5,300 | |
TW-fluor | 0.800 | 0.094 | 0.118 | 0.000 | 1,370 |
Water quality parameters . | Average . | Standard deviation . | Coef. variation . | Min. value . | Max. value . | |
---|---|---|---|---|---|---|
Raw water (Extraction point 1) | RW-pH | 6.641 | 0.484 | 0.073 | 0.000 | 8,920 |
RW-col | 66.012 | 63.838 | 0.967 | 0.000 | 4,074,000 | |
RW-turb | 15.188 | 92.249 | 6.074 | 0.000 | 8,046,000 | |
RW-dos | 16.289 | 8.386 | 0.515 | 0.000 | 252,910 | |
Decanted water (Extraction point 2) | DW-pH | 6.454 | 0.812 | 0.126 | 0.000 | 9,780 |
DW-col | 6.805 | 6.966 | 1.024 | 0.000 | 85,000 | |
DW-turb | 2.300 | 2.518 | 1.095 | 0.000 | 48,300 | |
DW-geoc | 3.170 | 2.429 | 0.766 | 0.000 | 53,330 | |
Filtered water (Extraction point 3) | FW-pH | 6.459 | 0.724 | 0.112 | 0.000 | 9,090 |
FW-col | 2.806 | 1.840 | 0.656 | 0.000 | 40,000 | |
FW-turb | 0.503 | 1.172 | 2.330 | 0.000 | 86,000 | |
FW-chlo | 6.326 | 3.269 | 0.517 | 0.000 | 162,040 | |
FW-fluor | 0.671 | 0.213 | 0.317 | 0.000 | 7,480 | |
FW-geoc | 2.574 | 1.938 | 0.753 | 0.000 | 12,000 | |
Treated water (Extraction point 4) | TW-pH | 6.700 | 0.347 | 0.052 | 0.000 | 8,150 |
TW-col | 3.052 | 2.013 | 0.660 | 0.000 | 50,000 | |
TW-turb | 0.899 | 0.812 | 0.904 | 0.000 | 22,700 | |
TW-chlo | 3.679 | 0.448 | 0.122 | 0.000 | 5,300 | |
TW-fluor | 0.800 | 0.094 | 0.118 | 0.000 | 1,370 |
Data selection and transformation
As the data provided by WTPs were manually entered by the operator into the system, it could happen to have entries with null values or quantities that are out of the close observations caused by typing errors. Therefore, in this paper, it was necessary to consider the data sequence by avoiding neural network inconsistent training results and distortions.
Thus, the data that originates possible registration errors, lack of records, and/or anomalous water quality parameters’ values were removed from the database. These criteria were defined considering the values with physical significance for the parameters, as shown in Table 3. The result of this step was the creation of a subset database without the sets that met the criteria defined.
Discrimination . | Parameter . | Criteria . |
---|---|---|
Raw water (Extraction point 1) | RW-pH | - Values less than 4 |
RW-col | - Values greater than 400 - Null values | |
RW-turb | - Values greater than 250 - Null values | |
RW-dos | - Values greater than 100 - Null values | |
Decanted water (Extraction point 2) | DW-pH | - Values less than 4 |
DW-col | - Null values | |
DW-turb | - Values greater than 30 - Null values | |
DW-geoc | - Values greater than 20 - Null values | |
Filtered water (Extraction point 3) | FW-pH | - Values less than 4 |
FW-col | - Null values | |
FW-turb | - Values greater than 30 - Null values | |
FW-chlo | - Values greater than 30 - Null values | |
FW-fluor | - Null values | |
FW-geoc | - Null values | |
Treated water (Extraction point 4) | TW-pH | - Null values |
TW-col | - Null values | |
TW-turb | - Null values | |
TW-chlo | - Null values | |
TW-fluor | - Null values |
Discrimination . | Parameter . | Criteria . |
---|---|---|
Raw water (Extraction point 1) | RW-pH | - Values less than 4 |
RW-col | - Values greater than 400 - Null values | |
RW-turb | - Values greater than 250 - Null values | |
RW-dos | - Values greater than 100 - Null values | |
Decanted water (Extraction point 2) | DW-pH | - Values less than 4 |
DW-col | - Null values | |
DW-turb | - Values greater than 30 - Null values | |
DW-geoc | - Values greater than 20 - Null values | |
Filtered water (Extraction point 3) | FW-pH | - Values less than 4 |
FW-col | - Null values | |
FW-turb | - Values greater than 30 - Null values | |
FW-chlo | - Values greater than 30 - Null values | |
FW-fluor | - Null values | |
FW-geoc | - Null values | |
Treated water (Extraction point 4) | TW-pH | - Null values |
TW-col | - Null values | |
TW-turb | - Null values | |
TW-chlo | - Null values | |
TW-fluor | - Null values |
The normalization of a database consists of a set of rules whose purpose is to organize the collected data to reduce redundancy, increase integrity, and increase database performance. This technique facilitates the detection of the relative importance of each parameter by the ANN model (Wu & Lo 2010).
Data processing
Parameters used for the ANN
Gaya et al. (2017) state that the use of only one hidden layer of neurons is sufficient to solve most real-world problems. In this paper, it was chosen to use three hidden layers, because this small increase in the number of layers did not require more computational power in the training of the proposed ANN models.
The error metrics used to compare the trained models are mean squared error (MSE) and the correlation coefficient (r). As the objective of ANN training is to obtain an adequate mapping of the relationship between input and output data, a well-trained network is fundamental to the success of the model. For this, it is common to divide the database that feeds ANN into three groups: one for training, one for validation, and the last for testing (Basheer 2000). In this research, 60% of the data were used to train the model, 20% of the data to validate the model, and 20% of the data for testing.
The tool used to implement the ANN models was the NNTool, from Matlab®. Table 4 indicates the implementation parameters models, while Table 5 indicates the training parameters of ANNs in this paper.
ANN models implementation parameters . | |
---|---|
Network type | Feed-forward backpropagation |
Training function | Levenberg–Marquardt |
Performance function | Mean square error |
Number of layers | 3 |
Number of neurons per layer | 29 |
Transfer function | Log-sigmoid |
ANN models implementation parameters . | |
---|---|
Network type | Feed-forward backpropagation |
Training function | Levenberg–Marquardt |
Performance function | Mean square error |
Number of layers | 3 |
Number of neurons per layer | 29 |
Transfer function | Log-sigmoid |
ANN models training parameters . | |
---|---|
Epochs | 1,000 |
Time | Infinite |
Performance goal | 1.00 × 10−4 |
Minimum performance gradient | 1.00 × 10−7 |
Maximum validation failures | 25 |
Training data | 60% |
Validation data | 20% |
Test data | 20% |
ANN models training parameters . | |
---|---|
Epochs | 1,000 |
Time | Infinite |
Performance goal | 1.00 × 10−4 |
Minimum performance gradient | 1.00 × 10−7 |
Maximum validation failures | 25 |
Training data | 60% |
Validation data | 20% |
Test data | 20% |
Proposed models
This research was carried out in two phases. In the first phase, the ANN model's purpose was to predict the coagulant dosage applied in the water treatment, considering the parameters of water quality as inputs. For this, three ANN models were proposed, each with distinct input parameters. Table 6 shows which parameters are considered in each of the models. The output parameter of all three models was the ‘coagulant dosage’.
Data . | Parameters . | Model 1 . | Model 2 . | Model 3 . |
---|---|---|---|---|
Input | RW-pH | X | X | X |
RW-col | X | X | X | |
RW-turb | X | X | X | |
DW-pH | X | |||
DW-col | X | |||
DW-turb | X | |||
DW-geoc | X | |||
FW-pH | X | |||
FW-col | X | |||
FW-turb | X | |||
FW-chlo | X | X | ||
FW-fluor | X | X | ||
FW-geoc | X | X | ||
TW-pH | X | X | X | |
TW-col | X | X | X | |
TW-turb | X | X | X | |
TW-chlo | X | X | ||
TW-fluor | X | X | ||
Output | RW-dos | X | X | X |
Data . | Parameters . | Model 1 . | Model 2 . | Model 3 . |
---|---|---|---|---|
Input | RW-pH | X | X | X |
RW-col | X | X | X | |
RW-turb | X | X | X | |
DW-pH | X | |||
DW-col | X | |||
DW-turb | X | |||
DW-geoc | X | |||
FW-pH | X | |||
FW-col | X | |||
FW-turb | X | |||
FW-chlo | X | X | ||
FW-fluor | X | X | ||
FW-geoc | X | X | ||
TW-pH | X | X | X | |
TW-col | X | X | X | |
TW-turb | X | X | X | |
TW-chlo | X | X | ||
TW-fluor | X | X | ||
Output | RW-dos | X | X | X |
In the second phase of the research, it was sought to evaluate individually the predictive capacity of ANN models for each of the water quality parameters. Thus, the previous three ANN models implemented were used at this stage. However, the parameter ‘coagulant dosage’ was considered as a model's input data, instead of output data. Also, the parameters of treated water quality (Extraction Point 4) were used as the model's output data. Table 7 shows which parameters were considered in each of the proposed models.
Data . | Parameters . | Model 1 . | Model 2 . | Model 3 . |
---|---|---|---|---|
Input | RW-pH | X | X | X |
RW-col | X | X | X | |
RW-turb | X | X | X | |
RW-dos | X | X | X | |
DW-pH | X | |||
DW-col | X | |||
DW-turb | X | |||
DW-geoc | X | |||
FW-pH | X | |||
FW-col | X | |||
FW-turb | X | |||
FW-chlo | X | X | ||
FW-fluor | X | X | ||
FW-geoc | X | X | ||
Output | TW-pH | X | X | X |
TW-col | X | X | X | |
TW-turb | X | X | X | |
TW-chlo | X | X | ||
TW-fluor | X | X |
Data . | Parameters . | Model 1 . | Model 2 . | Model 3 . |
---|---|---|---|---|
Input | RW-pH | X | X | X |
RW-col | X | X | X | |
RW-turb | X | X | X | |
RW-dos | X | X | X | |
DW-pH | X | |||
DW-col | X | |||
DW-turb | X | |||
DW-geoc | X | |||
FW-pH | X | |||
FW-col | X | |||
FW-turb | X | |||
FW-chlo | X | X | ||
FW-fluor | X | X | ||
FW-geoc | X | X | ||
Output | TW-pH | X | X | X |
TW-col | X | X | X | |
TW-turb | X | X | X | |
TW-chlo | X | X | ||
TW-fluor | X | X |
RESULTS AND DISCUSSION
Many null information in the original data set and high dispersion of the data (indicated by the high values of the coefficient of variation of the data – Table 2) suggest failures in the recording of the data collected by the WTP operators in the period studied. At the end of the data selection stage, after applying the criteria presented in Table 3 for the exclusion of entries, the number of valid data referring to the use of aluminum sulfate as a coagulant in the studied period decreased by 27.72%, while the number of valid data for the PAC decreased by 6.27%. Overall, the final data set decreased from 8.040 to 6.861 observations (14.66%), as summarized in Table 8.
Category . | Original data assembly no. . | Data after treatment assembly no. . | % Reduction . |
---|---|---|---|
Alum sulfate | 3.145 | 2.273 | 27.72 |
PAC | 4.895 | 4.588 | 6.27 |
Total | 8.040 | 6.861 | 14.66 |
Category . | Original data assembly no. . | Data after treatment assembly no. . | % Reduction . |
---|---|---|---|
Alum sulfate | 3.145 | 2.273 | 27.72 |
PAC | 4.895 | 4.588 | 6.27 |
Total | 8.040 | 6.861 | 14.66 |
The ability to predict the coagulant dosage of the ANN models proposed in the first phase of this work was evaluated by the metrics MSE and the correlation coefficient (r). The values presented in Table 9 correspond to the predictive statistics of each proposed model.
Models . | Predictive statistics . | |||
---|---|---|---|---|
Alum sulfate . | PAC . | |||
MSE . | r . | MSE . | r . | |
Model 1 | 0.026 | 0.859 | 0.042 | 0.587 |
Model 2 | 0.018 | 0.885 | 0.023 | 0.836 |
Model 3 | 0.015 | 0.933 | 0.016 | 0.872 |
Models . | Predictive statistics . | |||
---|---|---|---|---|
Alum sulfate . | PAC . | |||
MSE . | r . | MSE . | r . | |
Model 1 | 0.026 | 0.859 | 0.042 | 0.587 |
Model 2 | 0.018 | 0.885 | 0.023 | 0.836 |
Model 3 | 0.015 | 0.933 | 0.016 | 0.872 |
Model 1, which used only the parameters of water quality of extraction points 1 and 4 as the input data, presented a slightly lower performance than the other models, when the coagulant used in the treatment was aluminum sulfate. Based on the observations in which the coagulant was PAC, model 1 had its performance significantly below the other models proposed.
Model 2 adopted the same input data as model 1, in addition to the parameters ‘geocalcium dosage’ (extraction points 2 and 3), ‘chlorine dosage’, ‘fluorosilicic acid dosage’ (both from extraction point 3), ‘residual chlorine’, and ‘residual fluoride’ (both from extraction point 4). The model performed well, presenting low MSE and r close to 1, for the two types of coagulants used in the water treatment.
As a resume, it can be said that all three models had good results when the coagulant used is alum sulfate, having low MSE and r values close to 1. So, they could be satisfactorily used to explain the coagulant dosage in the studied WTP. On the other hand, for PAC, only models 2 and 3 presented good results, while model 1 failed to explain the phenomenon, having a small r value.
It can be observed that the results achieved in this phase of the research are consistent with the results of other peer surveys. Table 10 shows the same statistical metrics used in this research for the models implemented in some similar studies, i.e., referring to the prediction of coagulant dosage using ANNs.
Paper . | Model input parameters . | Coagulant . | MSE . | r . |
---|---|---|---|---|
Yu et al. (2000) | - Turbidity - Conductivity - pH - Treated water's turbidity | Alum sulfate | 0.00194 | 0.985 |
Wu & Lo (2008) | - Turbidity - Coagulant dosage from the previous day | PAC | 0.0000127 | 0.962 |
Haghiri et al. (2018) | - pH - Alkalinity - Turbidity - Temperature | Alum sulfate | 0.12 | 0.975 |
Jayaweera & Aziz (2018) | - pH - Turbidity - Color - Dissolved soirees - Alkalinity | Alum sulfate | 0.000483 | 0.987 |
Paper . | Model input parameters . | Coagulant . | MSE . | r . |
---|---|---|---|---|
Yu et al. (2000) | - Turbidity - Conductivity - pH - Treated water's turbidity | Alum sulfate | 0.00194 | 0.985 |
Wu & Lo (2008) | - Turbidity - Coagulant dosage from the previous day | PAC | 0.0000127 | 0.962 |
Haghiri et al. (2018) | - pH - Alkalinity - Turbidity - Temperature | Alum sulfate | 0.12 | 0.975 |
Jayaweera & Aziz (2018) | - pH - Turbidity - Color - Dissolved soirees - Alkalinity | Alum sulfate | 0.000483 | 0.987 |
In the second phase of this study, by evaluating each output parameter of the models proposed, it was noticed that the parameters color and turbidity had similar assertiveness and were slightly superior to the parameter pH, considering both the use of aluminum sulfate and PAC.
Models . | Parameters . | Accuracy . | |||
---|---|---|---|---|---|
Alum sulfate . | PAC . | ||||
MSE . | r . | MSE . | r . | ||
Model 1 | pH | 0.034 | 0.668 | 0.011 | 0.574 |
Color | 0.036 | 0.439 | 0.018 | 0.594 | |
Turbidity | 0.047 | 0.748 | 0.006 | 0.593 | |
Model 2 | pH | 0.034 | 0.751 | 0.035 | 0.734 |
Color | 0.008 | 0.603 | 0.012 | 0.629 | |
Turbidity | 0.04 | 0.798 | 0.009 | 0.652 | |
Residual chloride | 0.033 | 0.577 | 0.026 | 0.376 | |
Residual fluoride | 0.205 | 0.031 | 0.05 | 0.234 | |
Model 3 | pH | 0.027 | 0.732 | 0.01 | 0.752 |
Color | 0.014 | 0.825 | 0.006 | 0.872 | |
Turbidity | 0.025 | 0.85 | 0.003 | 0.861 | |
Residual chloride | 0.052 | 0.208 | 0.041 | 0.345 | |
Residual fluoride | 0.203 | 0.313 | 0.045 | 0.24 |
Models . | Parameters . | Accuracy . | |||
---|---|---|---|---|---|
Alum sulfate . | PAC . | ||||
MSE . | r . | MSE . | r . | ||
Model 1 | pH | 0.034 | 0.668 | 0.011 | 0.574 |
Color | 0.036 | 0.439 | 0.018 | 0.594 | |
Turbidity | 0.047 | 0.748 | 0.006 | 0.593 | |
Model 2 | pH | 0.034 | 0.751 | 0.035 | 0.734 |
Color | 0.008 | 0.603 | 0.012 | 0.629 | |
Turbidity | 0.04 | 0.798 | 0.009 | 0.652 | |
Residual chloride | 0.033 | 0.577 | 0.026 | 0.376 | |
Residual fluoride | 0.205 | 0.031 | 0.05 | 0.234 | |
Model 3 | pH | 0.027 | 0.732 | 0.01 | 0.752 |
Color | 0.014 | 0.825 | 0.006 | 0.872 | |
Turbidity | 0.025 | 0.85 | 0.003 | 0.861 | |
Residual chloride | 0.052 | 0.208 | 0.041 | 0.345 | |
Residual fluoride | 0.203 | 0.313 | 0.045 | 0.24 |
The parameters pH, color, and turbidity all had their best predictive performance in model 3, considering the data set referring to the use of PAC. For pH, it was obtained values of MSE and r equal to 0.010 and 0.752, respectively. The performance of the color MSE was equal to 0.006 and r was equal to 0.872. Finally, turbidity presented MSE and r equal to 0.003 and 0.861, respectively.
Both the parameters ‘residual chlorine’ and ‘residual fluoride’ obtained very low statistical indices in all models, considering both aluminum sulfate and PAC as coagulants; so it can be concluded that the models proposed in this research were not able to predict these parameters’ values with the necessary precision.
Furthermore, regarding the type of coagulant, in the first phase of the research it was observed that, even with the number of valid observations considering aluminum sulfate as the coagulant used in water treatment is lower than the number of valid observations considering PAC as the coagulant (approximately 50% lower), all models performed better with aluminum sulfate as the coagulant. However, in the second research phase that considered the individual parameters of treated water as the output data of the proposed models, the MSE of the predictions considering aluminum sulfate as the coagulant were higher (meaning worst performance) than the predictions considering PAC. Especially, the parameter ‘residual fluoride’ which presented a great discrepancy in the MSE when compared to the performance of the models with aluminum sulfate data with the data with PAC.
CONCLUSION
In this research, the implementation of ANN models was proposed to predict the dosage of coagulant applied in the treatment of water, considering the parameters of raw water and treated water of WTP José Pedro Horstmann (Palhoça, in Santa Catarina, Brazil). It also proposed the application of these ANN models in the prediction of the quality parameters of treated water, considering the coagulant dosage applied in the treatment. In general, the implemented models performed satisfactorily, with MSE values and r close to the values of similar studies performed by peers. These results corroborate the research that attempts to promote automation in the dosage of coagulants in WTPs, making it clear that ANN models allow faster response in the definition of dosage and reduce the need for human interaction in the process.
In the prediction of the dosage of the coagulant applied, it is noteworthy that the ANN model with the best performance is the model that considers the parameters of quality of raw water, water after decanting, water filtration, treated water, and dosages of geocalcium, fluorosilicic acid, and chlorine, affected during water treatment (Model 3). This model performed better when the coagulant used in the water treatment was PAC (MSE = 0.016 and r = 0.872), even with a lower number of observations with PAC than with aluminum sulfate as the coagulant.
In the individual parameters prediction of water quality based on the coagulant dosage, the same model mentioned again had the best performance among the proposed models. It was observed that the parameters ‘pH’ and ‘turbidity’ are the most assertive in the prediction (data considering PAC as the coagulant obtained slightly better results). On the other hand, the parameters ‘residual fluoride’ and ‘residual chlorine’ had the worst performance among all water quality parameters studied, either with PAC or aluminum sulfate as the coagulant. Because of this, for the prediction of these parameters, the proposed ANN models are not adequate and other types of mathematical modeling should be studied for a satisfactory result. Moreover, the observations whose coagulant applied in the water treatment was PAC performed better than the observations whose coagulant applied was aluminum sulfate.
The limitations of this study lie on the few water quality parameters available in the database, and in the use of only one machine learning technique. As a suggestion for future studies in this line of research, it is recommended to use water quality parameters that could not be obtained in this research, such as temperature and alkalinity of raw water. It also suggests applying these models in parallel with the jar test, to evaluate their efficiency in a WTP in operation. Another interesting suggestion is to use other machine learning techniques to assess predictive efficiency with the same database as this paper.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.