Water is one of the most critical resources for maintaining life. Although it makes upto 70% of the Earth’s surface but only a small amount of it is usable. Since water is used for a variety of functions, its quality must be determined before usage. The rapid increase of the world’s population has also had a significant influence on the environment, particularly on water quality. The quality of water has been deteriorating in recent years due to various pollutants. To control the water pollution, modeling and predicting the water quality has become a crucial need. In this work, we propose a machine learning (ML)-based model to predict and classify the water quality. The results from six different ML models are analyzed for accuracy, precision, recall, and F1 score as performance measures. The proposed approach is validated using benchmark dataset. The results show that Decision Tree ML model has a distinct superiority on other classifiers in terms of performance indicators like accuracy of 97.53%, precision of 87.66%, recall of 74.59%, and F1-score of 80.60%. This will help the aquatic system for better water quality analysis.

  • Water Quality Prediction: Our proposed machine learning model accurately predicts water quality, enabling timely interventions to maintain or improve water standards in various settings such as drinking water sources, industrial processes, or natural environments.

  • Novel ML Model: Our novel ML approach represents a breakthrough in water quality prediction with high accuracy.

  • Enhanced Efficiency: Our approach significantly streamlines the process of water quality prediction, enabling swift and reliable assessments that facilitate proactive interventions and resource allocation.

Water is a vital resource in our world and plays a crucial role in the survival of humans, animals, birds, and aquatic life. Humans rely on water for various daily activities, with drinking being its most essential purpose. Access to safe and clean water is essential for human survival. However, many parts of the world struggle to ensure the quality of available water sources (Ahmed et al. 2019). Water sources near agricultural areas and industrial plants are particularly vulnerable to contamination. These sources may contain contaminants such as lead, aluminium, nitrates that, when present at elevated levels, pose risks to human health, particularly in children (Haghiabi et al. 2018). These risks include acute diseases like gastrointestinal illness, developmental effects such as learning disorders, endocrine disruption, and even cancer (Environments & Contaminants - Drinking Water Contaminants n.d). Furthermore, precise water quality projections can serve as a foundation for policymakers and offer data to the environmental management department as an ‘early warning’. Therefore, there is a need of a system which is able to predict the quality of water in an accurate and efficient manner.

Water quality is presently determined by time-consuming and costly laboratory and statistical analyses that need sample collection, transportation to laboratories, and a substantial amount of time and effort. Ahmed et al. (2019). These manual methods also also error-prone. Several water testing laboratories are associated with technical difficulties, e.g., shortage of specialized testing equipment, failure to preschedule calibration and maintenance of analytical instruments, and enforcement of standard operating procedures (SOPs) (Chen et al. 2020). These horrific situations of inaccurate results leading to water contamination need a faster and less expensive solution.

In recent years, machine learning’s exceptional flexibility has proved its potential as a tool in the domains of environmental science and engineering. These methods have been widely used to measure river water quality, including the creation of the Water Quality Index (WQI). Shamsuddin et al. (2022). Despite the difficulties of employing machine learning (ML) for water quality study and evaluation, more reliable evaluation findings can be predicted (Lu & Ma 2020).

Existing studies (Fahd Abd & Hasan 2020; Lu & Ma 2020) have proposed algorithms for WQI and water quality prediction. The effectiveness of these studies has indirectly played its important role in the enhancement of water quality management by developing ML approaches. These approaches suggest a variety of categorization and forecasting methods to fulfill the individual demands of policymakers, environmental professionals, and the general public (Shamsuddin et al. 2022). Therefore, ML plays a crucial role in water quality classification by providing accurate and efficient methods for predicting and analyzing water quality parameters.

In this study, a ML-based model was developed, to detect and predict the contamination of drinking water. The specific objectives of this research paper are to analyze and compare the performance of six different state-of-the-art algorithms and to determine the most significant features contributing to water quality classification. In summary, the following key contributions were made:

  1. New System Design: An ML-based new system was designed for an accurate and quick prediction of water quality.

  2. State-of-the-art Algorithms Comparative Study: Six different ML-based state-of-the-art algorithms were compared for water quality prediction.

  3. Dataset: The proposed approach was tested on benchmark data instances taken from Kaggle (DataSet n.d).

The rest of the paper is structured into five sections. Section 2 explains the background and related works followed by proposed methodology in Section 3. Section 4 explains the experimental analysis. The limitations of this study are explained in Section 5. Finally, this study concludes in Section 6 with future developments.

Background

Water is the most vital supply, critical to all kinds of life; however, it is continuously contaminated by life itself. Water quality has been deteriorated alarmingly as a result of rapid growth of industries. Poor water quality, caused by a variety of contaminants (Figure 1), has been identified as one of the key contributors to the spread of life-threatening diseases. In this context, the efficiency of six distinct ML algorithms was analyzed, to forecast water quality. These algorithms were chosen because they are easy to implement and, for the most part, free source.
Figure 1

Common contaminants in drinking water.

Figure 1

Common contaminants in drinking water.

Close modal

Random Forest (Biau & Scornet 2016) is the most used supervised learning technique that is adaptable, easy to use, does not require hyperparameters. With a cap on the bare minimum number of trees that must be constructed, this technique is highly effective in classifying data. All data are classified in this quantity, which is highly dependent on each individual’s data. The minimal number of trees for each data set depends on the number of breaker properties.

A decision tree (Myles et al. 2004) is a tree-like structure where a node represents a feature or attribute, a links denotes a decision or rule and a lead shows a consequence that either a categorical or continuous value, as illustrated in Figure 2. It is really easy to gather the data and come up with some insightful insights since decision trees imitate human-alevel thinking. The whole point is to construct a tree for all the data and process a single result at each leaf (Patel & Prajapati 2018).
Figure 2

Decision Tree basic structure.

Figure 2

Decision Tree basic structure.

Close modal

The KNN algorithm (Latha et al. 2022) is a supervised learning algorithm that applies the proximity about the grouping of an individual data point to classify or predict a result . Although it may be used for classification or regression problems but it is commonly used for classification results since it relies on the idea that comparable points can be discovered close to one another.

Naïve Bayes (Abu Amra & Maghari 2017) is a supervised ML algorithm commonly used for classification problems, e.g., text classification. Its predecessors are generative learning algorithms that model the input distribution of a certain class or category. It does not learn which characteristics are most crucial for class differentiation, unlike discriminative classifiers like logistic regression. Ensemble methods are learning algorithms which take a (weighted) vote on classifier’s predictions and then build a group of classifiers and subsequently categorize incoming data points (Dietterich 2000). Voting is frequently quite successful and is arguably the simplest ensemble method. Regression or classification problems can be solved with it. Two or more sub-models are created to implement voting. Every sub-model provides predictions that are integrated in some fashion, such as by calculating the mean or the mode of the predictions, giving each sub-model a say in the final result (Kabari & Onwuka 2019).

Related work

Water quality prediction using classification algorithms in the ML domain has gained immense popularity due to its ability to provide accurate and rapid detection of water contaminants. Several studies have been conducted to explore and compare various machine learning algorithms for water quality prediction. For instance, Aldhyani et al. Fahd Abd & Hasan (2020) proposed advanced artificial intelligence (AI) algorithms for predicting water quality index and water quality classification. The authors used ML algorithms such as Support Vector Machine (SVM), KNN, and Naïve Bayes for the water quality classification prediction.

Similarly, Malek et al. (2022) employed seven classification models to predict the water quality classification of the Kelantan River Basin. The Gradient Boosting algorithm with a learning rate of 0.1 exhibited the best prediction performance. Chen et al. (2021) proposed the applications of cost-sensitive learning models and ensemble learning models in predicting the quality of drinking water.

Chou et al. (2021) and Uddin et al. (2023) used a multi-classification approach to predict the quality of water reservoirs, urban land cover, and coastal water quality. The studies utilized the Naïve Bayes, SVM, KNN, and XGBoost algorithms for predicting multi-class classification. Suwadi et al. (2022) concluded that the ML classification approach shows promising results in the rapid detection and prediction of water quality. Other studies such as Yurtsever & Murat (2023) and Nitharshni et al. (2023) used ML algorithms to classify portable water quality and accurately predict the WQI.

However, despite the success of ML algorithms, some studies such as Hu et al. (2022) suggested that more predictive power is needed for continuous models of water quality analysis to account for larger datasets and additional predictors. In addition, Dimple et al. (2022) reported that the SVM model is the most accurate algorithm for water quality classification prediction (reported highest accuracy of 97.01% for water quality classification prediction over other algorithms). These studies have been summarized in Table 1.

Table 1

Comparative analysis of existing studies

StudyDatasetMethodologyResultsClasses
Aldhyani et al. Fahd Abd & Hasan (2020)  Indian water quality data SVM, KNN, Naïve Bayes SVM: accuracy 97.01% and F1 score 98.54 % 
Malek et al. (2022)  Kelantan River Basin KNN, SVM, ANN, Decision Tree, Random Forest, Gradient Boosting (GB) GB’s accuracy 94.90% 
Chen et al. (2021)  Data was obtained from GECCO Conference LR, KNN, SVM, Decision Tree, GBDT, Random Forest, DCF KNN: F1 score 92% 
Chou et al. (2021)Quality of water in reservoir Naïve Bayes, LR, LSSVM, LibSVM, SMO LSSVM: accuracy 93.65% N/A 
Uddin et al. (2023)  Water quality monitoring data in year 2,019 Naïve Bayes, KNN, XGBoost, DTSVM, ANN, Decision tree XGBoost: accuracy 00% 
Suwadi et al. (2022)  Langat Basin in Selangor dataset ANN, SVM, Random Forest, Naïve Bayes RF: accuracy 95.63% N/A 
Yurtsever & Murat (2023)  Kaggle (DataSet n.dRF, AdaBoost, SXH Random Forest: accuracy 90.24% 
Proposed work Kaggle (DataSet n.dKNN, Decision Tree, Random Forest, Naïve Bayes, Stacking Ensemble and Voting Ensemble Decision Tree: accuracy 97.53% and F1-score 80.5% 
StudyDatasetMethodologyResultsClasses
Aldhyani et al. Fahd Abd & Hasan (2020)  Indian water quality data SVM, KNN, Naïve Bayes SVM: accuracy 97.01% and F1 score 98.54 % 
Malek et al. (2022)  Kelantan River Basin KNN, SVM, ANN, Decision Tree, Random Forest, Gradient Boosting (GB) GB’s accuracy 94.90% 
Chen et al. (2021)  Data was obtained from GECCO Conference LR, KNN, SVM, Decision Tree, GBDT, Random Forest, DCF KNN: F1 score 92% 
Chou et al. (2021)Quality of water in reservoir Naïve Bayes, LR, LSSVM, LibSVM, SMO LSSVM: accuracy 93.65% N/A 
Uddin et al. (2023)  Water quality monitoring data in year 2,019 Naïve Bayes, KNN, XGBoost, DTSVM, ANN, Decision tree XGBoost: accuracy 00% 
Suwadi et al. (2022)  Langat Basin in Selangor dataset ANN, SVM, Random Forest, Naïve Bayes RF: accuracy 95.63% N/A 
Yurtsever & Murat (2023)  Kaggle (DataSet n.dRF, AdaBoost, SXH Random Forest: accuracy 90.24% 
Proposed work Kaggle (DataSet n.dKNN, Decision Tree, Random Forest, Naïve Bayes, Stacking Ensemble and Voting Ensemble Decision Tree: accuracy 97.53% and F1-score 80.5% 

In summary, the application of ML algorithms in WQI and water quality classification prediction has been widely explored, providing accurate and rapid detection of water contaminants. The studies have used an array of algorithms such as SVM, KNN, Naive Bayes, Gradient Boosting, ensemble learning models, and cost-sensitive learning models to achieve high prediction performance. Nonetheless, there is a need for more investigation and optimization of these approaches to account for more predictors and larger datasets to improve the models’ predictive power.

The methodology of proposed work is explained in this section. The workflow is visually represented in Figure 3 and steps are explained in the following.
Figure 3

The workflow of proposed methodology.

Figure 3

The workflow of proposed methodology.

Close modal

Data pre-processing

In this step, the data were pre-processed. The initial dataset is comprised of 7,999 rows and 21 columns, and does not have any null values. To ensure the data quality, a few rows containing incorrect input were identified and subsequently eliminated from the dataset. The dataset was refined to contain 7,996 rows and 21 columns.

Data analysis

In this step, the dataset was analyzed by plotting histograms for each feature and creating a heatmap of the entire dataset. The histograms provide a visual representation of the distribution of values for each individual feature (Figure 4). It shows that features are following uniform distribution, but it is not always the case. Few features follows uniform distribution and alot of features are following non-uniform distribution, means that either they are skewed to left or right. Additionally, the heatmap (Figure 5) provides a comprehensive overview of the interrelationships among all the features in the dataset. From the heatmap, it becomes evident that the majority of the features exhibit low or negligible correlation with one another. This lack of correlation suggests that each feature contributes unique and independent information to the dataset, rather than being influenced by or associated with other features.
Figure 4

Histogram of different contaminants.

Figure 4

Histogram of different contaminants.

Close modal
Figure 5

Heat map of features.

Figure 5

Heat map of features.

Close modal

Hyper parameter optimization

In this step, first hyperparameter optimization was applied and then different classification algorithms were fitted onto the dataset. Hyperparameter tuning is an essential part to construct an efficient and robust ML model. Different ML algorithms have different mechanism of tuning because of various types of hyperparameters, such as discrete, categorical, and continuous. However, the process of choosing the best hyperparameters for a model is generally time-consuming and complex, because it entails defining the necessary algorithm and determining the optimum model by modifying its hyperparameters values (Probst et al. 2019). This process is quite expensive due to the evaluation of large number of possible combinations and the necessary resources for computations.

To optimize the performance of each model, the Grid Search technique (Liashchynskyi & Liashchynskyi 2019) was applied. This approach allowed us to systematically search through different combinations of hyperparameters for each model. It involves discretizing a target range of values into each hyperparameter of interest and training and testing models across all hyperparameters for all combinations of values. This enabled us to identify the best hyperparameters value for each model. This ensures that most suitable settings were selected to enhance the predictive capabilities of the models.

The six different ML algorithms were used for training the model, i.e., Decision Tree, KNN , Naïve Bayes, Stacking, Voting Ensemble, and Random Forest. Before training Grid Search technique was applied to every model to find its best hyperparameters. The tuning of hyperparameters is statistically significant and has a positive influence on the prediction capability of the model (Belete & Huchaiah 2021). The 80% of data was used for training and 20% for testing purposes.

Experimental setup

Dataset

In this study, the dataset obtained from Kaggle (DataSet n.d) comprises of 7,999 instances. The dataset contains 20 features and one target column. The features represent different chemical contaminates present in water. The target variable has two classes 0 non-drinkable and 1 drinkable, also shown in Table 2.

Table 2

Dataset description

DatasetNo. of attributesNo. of instancesNo. of classes
Water quality 21 7,999 
DatasetNo. of attributesNo. of instancesNo. of classes
Water quality 21 7,999 

Parameter settings

The hyperparameters settings of Decision Tree, KNN and Random Forest models are presented in Table 3.

Table 3

Hyperparameter settings of ML models

Decision Tree
Parameter Values 
criterion entropy 
Max_depth 
splitter best 
KNN 
n_neighbors 
Random Forest 
n_estimators 300 
Decision Tree
Parameter Values 
criterion entropy 
Max_depth 
splitter best 
KNN 
n_neighbors 
Random Forest 
n_estimators 300 

Performance measures

In order to assess the effectiveness of classifiers, commonly used quantitative metrics are presented in this section. These metrics are applied to classification problems where the results are divided into two categories: drinkable water and non-drinkable water, referred as the positive class and negative class, respectively. The predictions are represented as true or false which implies correct or incorrect predictions, respectively. Consequently, classification can be classified into the following first four possible states which is also known as confusion matrix (Visa et al. 2011).
  • True positive (TP): Positive class true prediction

  • True negative (TN): Negative class true prediction class.

  • False positive (FP): Positive class false prediction

  • False negative (FN): Negative class false prediction class.

  • Type-I error: This is also referred to as False Positive in the confusion matrix. The mistaken rejection of Null Hypothesis (H0) raises this error. In terms of this study, when a non-drinkable sample of water is predicted as drinkable.

  • Type-II error: This is also referred to as the False Negative in the confusion matrix. If the Null Hypothesis is accepted that is actually false, then this error occurs. In terms of this study, when a sample of drinkable water is predicted as non-drinkable.

For evaluation of ML models’ performance, the accuracy, precision, recall, and F1-score were used as the metric, which are computed in the following:
formula
(1)
formula
(2)
formula
(3)
formula
(4)

Experimental environment

The experiments were conducted on a personal computer equipped with an Intel Core i7, 2.80 GHz CPU and 16 GB of RAM. The Python programming language was used including libraries, i.e., NumPy, Pandas, SciPy, Matplotlib, Seaborn, and Scikit-learn.

Results and discussion

In this section, six different state-of-the-art ML algorithms are compared to predict the water quality. The results presented in Table 4 show that Decision Tree performs better than its peers. Its accuracy of 95.5%, precision of 87.66%, recall of 74.58% and an F1-score of 80.50% shows highest classification performance. Notably, Random Forest demonstrated better performance among all algorithms in terms of precision achieving rate of 91.91%. Conversely, Naïve Bayes (or Gaussian NB) exhibited the lowest accuracy 83.87% among all the algorithms evaluated and in terms of F1-score KNN shows the lowest F1-score of 22.64%. The visual representation of these values is also shown in Figure 6.
Table 4

Performance results of ML models for water quality prediction

AccuracyPrecisionRecallFl-score
Model Stacking 0.94875 0.792899 0.740331 0.765714 
Random Forest 0.958125 0.919118 0.690608 0.788644 
Voting 0.944375 0.848485 0.618785 0.715655 
Naïve Bayes 0.83875 0.371237 0.61326 0.4625 
KNN 0.871875 0.357143 0.165746 0.226415 
Decision Tree 0.975375 0.876623 0.745856 0.80597 
AccuracyPrecisionRecallFl-score
Model Stacking 0.94875 0.792899 0.740331 0.765714 
Random Forest 0.958125 0.919118 0.690608 0.788644 
Voting 0.944375 0.848485 0.618785 0.715655 
Naïve Bayes 0.83875 0.371237 0.61326 0.4625 
KNN 0.871875 0.357143 0.165746 0.226415 
Decision Tree 0.975375 0.876623 0.745856 0.80597 
Figure 6

Comparative analysis of evaluation metrices for different ML classifiers.

Figure 6

Comparative analysis of evaluation metrices for different ML classifiers.

Close modal
When it comes to predicting whether water is drinkable or non-drinkable, different machine learning (ML) algorithms show varying levels of performance. The Random Forest demonstrates good performance in classifying non-drinkable water correctly, followed by the decision tree classifier. Naive Bayes, on the other hand, exhibits poor performance in this regard. In the case of predicting drinkable water accurately, the Decision Tree algorithm performs well, followed by the Stacking Ensemble method. KNN shows poor performance in this task. Regarding type-I errors (false positives), the Decision Tree algorithm demonstrates a low rate of such errors, whereas KNN has a high rate of type-I errors. For type-II errors (false negatives), Random Forest performs well with a low rate, while Naive Bayes exhibits a high rate of type-II errors. The visual representation of these values is also given in Figure 7. These findings shed light on the comparative strengths and weaknesses of the different algorithms in predicting the water quality.
Figure 7

Confusion matrix of all models.

Figure 7

Confusion matrix of all models.

Close modal

This study aims for accurate prediction of water quality. However, there are some limitations of this work. First, the real-world dataset was not used in our experiments. The use of real-world dataset may affect prediction accuracy, and scalability of proposed approach. Second, the size of the training data used in the experiments could impact the results. With a larger training dataset, the better performance might be achieved with more accurate predictions. Finally, the resiliency of solution is important, it means that solutions generated by our approach was 100% same as done in water laboratory testings.

One of the most vital resources for survival is water. Rapid development has resulted in an alarming degradation of water quality. Poor water quality has been identified as one of the key contributors to the spread of critical diseases. Traditionally, to determine the purity of the water, a costly and time-consuming lab analysis is required. In this study, a ML-based model was proposed to predict the water quality in an cost-effective and accurate manner. The six different ML approaches were investigated namely, Random Forest, Decision Tree, KNN, Naïve Bayes Bayes, Voting and Model Stacking, to forecast the water quality. The results indicate that Decision Tree shows a distinct superiority over its peer algorithms with an accuracy rate of 95.94%, precision 87.66%, recall 74.59%, and F1-score 80.60%.

In future, the intergration of Internet of Things (IoT) will be done in our work. The system will predict the water quality based on the real-time data fed from the IoT system.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

A.T.A. contributed in the development of this study. N.N. worked on design, proposed methodology and original manuscript draft writing. H.M.F. worked on proposed methodology and analyzed the results. M.K.S. reviewed the article.

Abu Amra
I. A.
&
Maghari
A. Y. A.
2017
Students performance prediction using KNN and Naïve Bayesian. In 2017 8th International Conference on Information Technology (ICIT), pp. 909–913
.
Ahmed
U.
,
Mumtaz
R.
,
Anwar
H.
,
Shah
A. A.
,
Irfan
R.
&
García-Nieto
J.
2019
Efficient water quality prediction using supervised machine learning
.
Water
11
(
11
),
2210
.
Belete
D.
&
Huchaiah
M. D.
2021
Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results
.
International Journal of Computers and Applications
44
,
1
12
.
https://doi.org/10.1080/1206212X.2021.1974663
.
Biau
G.
&
Scornet
E.
2016
A random forest guided tour
.
Test
25
,
197
227
.
Dietterich
T. G.
2000
Ensemble methods in machine learning. In Multiple Classifier Systems: First International Workshop, MCS 2000 Cagliari, Italy, June 21–23, 2000 Proceedings 1. Springer, pp. 1–15
.
Dimple
D.
,
Rajput
J.
,
Al-Ansari
N.
&
Elbeltagi
A.
2022
Predicting irrigation water quality indices based on data-driven algorithms: Case study in semiarid environment
.
Chemistry
2022
,
1
17
.
https://doi.org/10.1155/2022/4488446
.
Environments and Contaminants - Drinking Water Contaminants. Available from: https://www.epa.gov/americaschildrenenvironment/environments-and-contaminants-drinking-water//-contaminants
.
Fahd Abd
T. H. H.
&
Hasan
M. M.
2020
Water quality prediction using artificial intelligence algorithms. https://doi.org/10.1155/2020/6659314
.
Haghiabi
A. H.
,
Nasrolahi
A. H.
&
Parsaie
A.
2018
Water quality prediction using machine learning methods
.
Water Quality Research Journal
53
(
1
),
3
13
.
Hu
X.
,
Dai
M.
,
Sun
J.
&
Sunderland
E.
2022
The utility of machine learning models for predicting chemical contaminants in drinking water: Promise, challenges, and opportunities
.
Current Environmental Health Reports
10
, 45–60.
https://doi.org/10.1007/s40572-022-00389-x
.
Kabari
L.
&
Onwuka
U.
2019
Comparison of bagging and voting ensemble machine learning algorithm as a classifier
.
International Journal of Computer Science and Software Engineering
9
,
19
23
.
Latha
R. S.
,
Sreekanth
G. R.
,
Suganthe
R. C.
,
Geetha
M.
,
Selvaraj
R. E.
,
Balaji
S.
,
Harini
K. R.
&
Ponnusamy
P. P.
2022
Stock movement prediction using KNN machine learning algorithm. In: 2022 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–5
.
Liashchynskyi
P.
&
Liashchynskyi
P.
2019
Grid search, random search, genetic algorithm: a big comparison for NAS. arXiv preprint arXiv:191206059
.
Malek
N. H. A.
,
Wan Yaacob
W. F.
,
Md Nasir
S. A.
&
Shaadan
N.
2022
Prediction of water quality classification of the kelantan river basin, Malaysia, using machine learning techniques
.
Water
14
(
7
), 1067.
https://doi.org/10.3390/w14071067
.
Myles
A. J.
,
Feudale
R. N.
,
Liu
Y.
,
Woody
N. A.
&
Brown
S. D.
2004
An introduction to decision tree modeling
.
Journal of Chemometrics: A Journal of the Chemometrics Society
18
(
6
),
275
285
.
Nitharshni
J. M.
,
Nilasruthy
R.
,
Shakthi Akshaiya
K. R.
&
Rajavel
M.
2023
Quality check of water for human consumption using machine learning. In: Proceedings: IoT, Cloud and Data Science. vol. 124 of Advances in Science and Technology. Trans Tech Publications Ltd. pp. 574–589
.
Patel
H.
&
Prajapati
P.
2018
Study and analysis of decision tree based classification algorithms
.
International Journal of Computer Sciences and Engineering
6
,
74
78
.
https://doi.org/10.26438/ijcse/v6i10.7478
.
Probst
P.
,
Boulesteix
A. L.
&
Bischl
B.
2019
Tunability: Importance of hyperparameters of machine learning algorithms
.
The Journal of Machine Learning Research
20
(
1
),
1934
1965
.
Shamsuddin
I. I. S.
,
Othman
Z.
&
Sani
N. S.
2022
Water quality index classification based on machine learning: A case from the langat river basin model
.
Water
14
(
19
), 2397.
https://doi.org/10.3390/w14192939
.
Suwadi
N. A.
,
Derbali
M.
,
Sani
N. S.
,
Lam
M. C.
,
Arshad
H.
,
Khan
I.
&
Kim
K.I.
2022
An optimized approach for predicting water quality features based on machine learning
.
Wireless Communications & Mobile Computing (Online)
2022
, 1–20.
Uddin
M. G.
,
Nash
S.
,
Rahman
A.
&
Olbert
A. I.
2023
Performance analysis of the water quality index model for predicting water state using machine learning techniques
.
Process Safety and Environmental Protection
169
,
808
828
.
Visa
S.
,
Ramsay
B.
,
Ralescu
A.
&
Knaap
E.
2011
Confusion matrix-based feature selection. vol. 710. p. 120–127
.
Yurtsever
M.
&
Murat
E.
2023
Potable water quality prediction using artificial intelligence and machine learning algorithms for better sustainability
.
Ege Academic Review
23
(
2
),
265
278
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).