The integrated solar and hydraulic jump-enhanced waste stabilization pond (ISHJEWSP) has been proposed as a solution to enhance performance of the conventional WSP. Despite the better performance of the ISHJEWSP, there is seemingly no previous study that has deployed machine learning (ML) methods in modelling the ISHJEWSP. This study is aimed at determining the relationships between the ISHJEWSP effluent parameters as well as comparing the performance of extra trees (ET), random forest (RF), decision tree (DT), light gradient boosting machine (LightGBM), gradient boosting (GB), and extreme gradient boosting (XGBoost) methods in predicting the effluent biochemical oxygen demand (BOD5) in the ISHJEWSP. The feature importance technique indicated that the most important parameters were pH, temperature, solar radiation, dissolved oxygen (DO), and total suspended solids. These selected features yielded strong correlations with the dependent variable except DO, which had a moderate correlation. With respect to coefficient of determination and root mean square error (RMSE), the XGBoost performed better than the other models [coefficient of determination (R2) = 0.807, mean absolute error (MAE) = 4.3453, RMSE = 6.2934, root mean squared logarithmic error (RMSLE) = 0.1096]. Gradient boosting, XGBoost, and RF correspondingly yielded the least MAE, RMSE, and RMSLE of 3.9044, 6.2934, and 0.1059, respectively. The study demonstrates effectiveness of ML in predicting the effluent BOD5 in the ISHJEWSP.

  • BOD5 prediction models were developed using different ML methods.

  • The performance of various gradient boosting machines was evaluated.

  • Extreme gradient boosting proved to be better than other models.

  • Feature importance indicated the most important variables

  • The study demonstrated the effectiveness of ML in predicting BOD5.

The global quest for improved living standards, through a varied range of activities, continually impacts water quality. Wastewater constitutes a risk factor to human health and serves as a contaminant source to the environment. Increasing the generation of wastewater, as a result of rapid population growth and industrialization, further exacerbates the scale of the problem. Therefore, wastewater treatment plays a vital role in safeguarding public health and the sustainability of the environment. The development of proper wastewater treatment is crucial for the prevention of many types of transmittable diseases (Olukanni & Ducoste 2011). The conventional wastewater treatment process train consists of various unit operations. The technical expertise required to operate the unit operations, the costs of construction and operation render them less attractive to most developing countries. Hence, it is necessary to develop a rapid, simple, eco-friendly, effective, and efficient method of wastewater treatment (Gedda et al. 2021).

A waste stabilization pond (WSP) is a natural treatment process, hence, economically feasible compared to other treatment technologies in respect to its maintenance cost and energy requirement (Mahapatra et al. 2022). The simplicity in the construction and low cost make it desirable in developing countries (Ho et al. 2017). However, the applicability of the WSP is limited by its large area requirement (Mara et al. 1983) as well as the problem of land availability. Consequently, various researches have been carried out to improve the performance of the conventional WSP via solar enhancement (Utsev & Agunwamba 2020). High solar irradiance consequent upon solar enhancement improved the treatment efficiencies, hence, the reduction in large land area requirement (Utsev & Agunwamba 2020). Previously, the effects of pond depth (Hosetti & Patil 1987; Oragui et al. 1987; Silva et al. 1987), tapering (Agunwamba 2001), and baffles (Shilton & Mara 2005) have been reported. The integrated solar and hydraulic jump-enhanced waste stabilization pond (ISHJEWSP) incorporated solar reflectors and hydraulic jump in the conventional WSP (Ogarekpe & Agunwamba 2016b). Besides the physical modification of the conventional WSP, several tools have been utilized for the analyses, interpretation, and representation of results emanating from these researches. One such analytical tool is machine learning (ML).

ML models utilize algorithms in recognition of patterns between data using a data subset for the training and the verification of the prediction accuracy using a separate data subset in a procedure known as testing (Hammed et al. 2021). ML has been described as an appropriate solution for solving difficult and elusive problems involving input and output variables, where there are difficulties in the use of mathematical equations, the necessity to make assumptions to interpret the equations as well as explaining the outcomes (Yang et al. 2024). Previously, ML algorithms have been found useful in predicting uncertain treatment performances involving the symbiotic relationship between algae and bacteria (Sundui et al. 2021). A better understanding of WSPs performance was achieved using multiple regression and neural network modelling (Khodadadi et al. 2016). Nitrogen removal in WSP has been modelled using optimal parameterization, local sensitivity analysis, and global uncertainty analysis (Mukhtar et al. 2017). Khatri et al. (2023) used artificial neural network (ANN)-based models for predicting the performance of combined upflow anaerobic sludge blanket and facultative pond. Superior effluent quality predictive performance of back propagation neural networks (BPNN) over traditional mathematical models has been reported (Gao et al. 2023).

Biochemical oxygen demand (BOD5) is an important water quality indicator (Ooi et al. 2022). However, measuring BOD5 is time consuming and could delay important and appropriate decisions, plans, and actions. The indirect estimation of major wastewater quality parameter using ML has been advocated (Ooi et al. 2022). Modelling BOD5 would enhance time and cost savings associated with actual measurements, as it can be estimated from the predictor variables in the model. ML methods have been developed for an accurate, reliable, and cost-effective prediction of BOD (Nafsin & Li 2022). Random forest (RF), support vector regression (SVR), and multilayer perceptron (MLP) have been used for the prediction of BOD5 in water based other physicochemical characteristics of water (Ooi et al. 2022). A good model performance, for BOD measurement of the effluent quality, was obtained using optimized extreme ML (Yu et al. 2019). Granata et al. (2017) predicted BOD5 and other wastewater quality indicators using SVR and regression trees (RT) techniques. BOD5 has been predicted using ANN, support vector machine (SVM), RF, gradient boosting machine (GBM), and other hybrid algorithms (Nafsin & Li 2022).

Previous studies have shown that the ISHJEWSP yields better treatment efficiencies than the conventional WSP (Ogarekpe & Agunwamba 2016a). The precedence of ambient climatic factors as well as the state of sewage to other variables, in relation to the performance of the ISHJEWSP, has been highlighted (Ogarekpe et al. 2022). The effect of rate constant models on the performance of the ISHJEWSP model was evaluated (Ogarekpe 2018). Despite the better performance of the ISHJEWSP, there is seemingly no previous study that has deployed ML methods in modelling the ISHJEWSP. This study is aimed at comparing the predictive performance of extra trees (ET), RF, decision tree (DT), light gradient boosting machine, GBM, and extreme gradient boosting (XGBoost) methods in modelling the effluent BOD5 in the ISHJEWSP. Therefore, the specific objectives of this paper will include the following: to determine the relationship between ISHJEWSP effluent parameters and to develop ML models for the prediction of the effluent BOD in the ISHJEWSP.

Study area

The study area is located at the University of Nigeria, Nsukka (Figure 1). Nsukka is a town and Local Government Area in South-East Nigeria in Enugu State, Nigeria. The laboratory-scale experiment was set up adjacent to the imhoff tank at the waste water treatment plant in the University of Nigeria, Nsukka. The treatment plant consists of a screen, two Imhoff tanks, two facultative waste stabilization ponds as well as drying beds.
Figure 1

Location of experimental setup located in Nsukka, Enugu State, Nigeria.

Figure 1

Location of experimental setup located in Nsukka, Enugu State, Nigeria.

Close modal

Experimental setup and sample collection

Three sets of experimental ponds were constructed with varying locations of the points of initiation of hydraulic jump. Influent was supplied to the experimental ponds from the wastewater storage tank. An overhead wastewater tank was constructed for the purpose of maintaining constant head in the wastewater storage (Figure 2). The two tanks were filled to supply the ISHJEWSP with sewage from the imhoff tank. Wastewater samples collected from the inlet and outlet for varying inlet velocities and varying locations of point of initiation of hydraulic jump were examined for physicochemical and bacteriological characteristics for a period of 9 months. Some of the parameters examined include temperature, pH, dissolved oxygen (DO), total suspended solids (TSS), and BOD. The influent samples for the laboratory analysis were obtained from the storage tank immediately after being filled. Also, the experimental ponds were immediately filled and samples collected at the outlets after 2 days. All the water analyses were carried out in accordance with the standard methods for the examination of water and wastewater (APHA 1998) while the statistical analysis were carried out in R version 4.2.2 and Python.
Figure 2

Schematic diagram of laboratory-scale ISHJEWSP.

Figure 2

Schematic diagram of laboratory-scale ISHJEWSP.

Close modal

Prediction models

The following algorithms were utilized for the analysis of the ISHJEWSP effluent data: DT, RF, ET, light gradient boosting machine (LightGBM), gradient boosting (GB), and XGBoost.

Decision tree

A DT model consists of nodes and branches (Song & Ying 2015; Charbuty & Abdulazeez 2021), and utilizes the important steps of splitting, stopping, and pruning in building the model (Song & Ying 2015). A tree consists of root nodes, non-terminal nodes, and terminal nodes (Swain & Hauska 1977). A node represents a test of an attribute and leaf node provides classification while the branches from the selected node are the possible values (Gavankar & Sawarkar 2017). These tests are filtered down through the tree to get the right output to the input pattern (Navada et al. 2011). DT can simultaneously handle numerical and categorical input variables, is robust to outliers, and can efficiently deal with missing input data (Touzani et al. 2018). In spite of the advantages of DT, they are prone to overfitting (James et al. 2013).

Random forest

A RF is a tree-based ensemble method (Ahmad et al. 2018). RF uses randomization to create a large number of DTs (Rigatti 2017). Bootstrapped data subsets, for the training, are grown to unpruned regression (or classification) trees (Ahmad et al. 2018). The trees are created by drawing each new training set, without replacement, from the original training set using random feature selection (Breiman 2001). The various randomized DTs are combined as well as aggregated by averaging (Biau & Scornet 2016). The out-of-bag samples are then used for testing the performance of the resulting RF model performs (Breiman 2001). RF makes few assumptions about the relationships between the variables and is extremely flexible (Langsetmo et al. 2023).

Extra tree

The ET proposed by Geurts et al. (2006) belong to the class of DT-based ensemble learning methods (John et al. 2016). DT-based ensemble methods utilize multiple DTs to perform classification and regression tasks (Gall et al. 2011). ET adds another layer of randomness to decision forests, utilizes an approach that reduces the search space, hence, resulting in faster training (Maier et al. 2015). ET employs the same principle as RF (John et al. 2016) and is less likely to overfit a dataset (Hammed et al. 2021). In addition, the use of ET ensemble technique for selection of the optimal feature importance has been reported (Arya et al. 2022).

Gradient boosting machine

Boosting models iteratively combines several simple models to obtain improved prediction accuracy (Touzani et al. 2018). Boosting has been utilized for classification problems (Freund 1995). Friedman (2001) introduced the GBM by extending the boosting to regression. Gradient boosting is a way to gradually reduce error (Ayyadevara, 2018). The GBM method can be considered a numerical optimization algorithm that aims at finding an additive model that minimizes the error function (Touzani et al. 2018). GBM aims at improving additional base models by correcting the mistakes of the previous base model (Zhang & Haghani 2015). The gradient boosting algorithms family has been extended with proposals that are centred around speed and accuracy (Bentéjac et al. 2021). The extensions of the gradient boosting algorithms have been highlighted to include XGBoost, LightGBM, and CatBoost (Bentéjac et al. 2021). XGBoost is a scalable ensemble technique while LightGBM uses selective sampling of high gradient instances to provide extremely fast training performance (Bentéjac et al. 2021).

Model training, testing, and evaluation

The proposed models used for this study consisted of training and testing phases. The datasets were split into two parts. Approximately 70% of the data were used for training and the rest, for testing the models. The performance indicator at testing was determined by the coefficient of determination as shown in Equations (1) and (2). Previously, standard theories of regression analysis have been discussed (Sen & Srivastava 2012).
(1)
The coefficient of determination is also given as
(2)
where and are the observed and simulated variable at the ith time step, respectively; is the average of the observed variable; is the average of the simulated variable; and n is the total number of observations.
The performance error metrics utilized for the evaluation of the models at testing included the mean absolute error (MAE), root mean squared error (RMSE), and root mean squared logarithmic error (RMSLE). The evaluation measured the model's ability or inability to match the important features of the observed data. MAE is the average variance in between the significant values in the data and the predictions in the same data (Schneider & Xhafa 2022). The MAE, RMSE (Hodson 2022), and RMSLE were computed using the following equations, respectively.
(3)
(4)
(5)
where is the prediction, is the true value or observation, and n is the number of observations in the data.

Data preprocessing and statistical analyses

The box-and-whisker plots were utilized to summarize the ISHJEWSP raw data considering the following components: 25th, 50th, 75th percentiles, the whiskers (showing the range of the variable) as well as the outliers. For the variables under review, an outlier was only detected for solar radiation (Figure 3). Prior to the correlation analysis and ML, the raw data were cleaned to ensure that no misrepresentations were introduced into our proposed model. Prior to the correlation analysis, care was taken to ensure that only data between the upper and lower ranges were utilized, hence, excluding outliers.
Figure 3

Box-and-whiskers plot of raw ISHJEWSP data.

Figure 3

Box-and-whiskers plot of raw ISHJEWSP data.

Close modal
A normality test was conducted using the Shapiro–Wilk test. The normality test revealed that the DO and solar radiation are normally distributed (p> 0.05) while the remaining variables are not normally distributed (p< 0.05). While we understand the difference between the normality test result for each feature, the authors decided to use Pearson's correlation analysis to determine the relationship between the variables (Figure 4). Pearson's correlation coefficient varies from −1 to +1, with a negative or positive value indicating a negative or positive correlation.
Figure 4

Correlation matrix of ISHJEWSP parameters.

Figure 4

Correlation matrix of ISHJEWSP parameters.

Close modal

The strength of the relationship between the variables was described using the following absolute indices: r< 0.500 (weak relationship), 0.500 ≤ r< 0.699 (moderate relationship), r ≥ 0.700 (strong relationship). Prior to the correlation analysis, the data were cleaned in order to get rid of the outlier in the solar radiation data. The cleaning entailed the utilization of the datasets within the upper and lower ranges of the box-and-whiskers plot (Figure 3). The statistical analyses were carried out in R version 4.2.2 while ML was implemented using Python.

Relationship between ISHJEWSP parameters

In order to demonstrate the relationships between the variables, Pearson's correlation analysis was carried out for the ISHJEWSP parameters. From the results, a strong positive relationship was observed between temperature and pH (r = 0.878), TSS and BOD5 (r = 0.836), solar radiation and pH (r = 0.888), and solar radiation and temperature (r = 0.889). Conversely, a strong negative relationship was observed between BOD5 and pH (r = −0.880), solar radiation and BOD (r = −0.833), TSS and pH (r = −0.843), TSS and temperature (r = −0.840), and TSS and solar radiation (r = −0.864). The remaining parameters have either a weak positive or negative relationship between the respective variables (Figure 4).

The relationships reported in this study were compared with trends from previous studies. In the past, a strong positive relationship was obtained between algae and temperature for the ISHJEWSP under review (Ogarekpe et al. 2016). Increase in temperature and algae concentration result in the increase in photosynthetic activities. The rapid consumption of CO2, during photosynthesis, faster than it can be replaced by bacterial respiration, results in the occurrence of high pH values above 9 in ponds (Mara & Pearson 1998). Consequently, algae concentration and solar radiation (which provides energy for photosynthesis) play a vital role in enhancing the strong positive relationships between pH and temperature, as well as solar radiation and pH. Elevated temperature enhanced pH in WSP in Portugal (Pearson et al. 1987). A strong positive relationship between solar radiation and surface temperature has been reported (Daut et al. 2012). TSS and BOD are related. A portion of the TSS, the volatile suspended solids, exerts an oxygen demand in a facultative lagoon (Gerardi, 2015).

The pH of the ISHJEWSP effluent range between 7.2 and 11.2. The high pH values, perhaps, inhibited the oxidation of organic matter, hence, the strong negative relationship between the pH and BOD. Based on the extrapolations from the results of activated sludge, Pipes (1962) stated that for a pH of above 9.0, the oxidation of organic matter in a stabilization pond is severely inhibited. Most bacteria can grow in a pH range between 5 and 9; pH values below 5 or above 8.5 affect the growth and survival of aquatic microorganisms (Pearson et al. 1987). The microbial activities is slow below 5 °C, maximum between 25 and 30 °C and thereafter it decreases to a minimum at about 65 °C (Skiba 2008). The decrease or die off of microorganisms at elevated temperature, perhaps, inhibited the degradation of the organic matter component of the TSS, hence, the negative relationship between temperature and TSS as well as temperature and BOD.

The role of solar radiation in photosynthesis, and subsequently on the pH of the ISHJEWSP, plays a vital role in enhancing the strong negative relationships between solar radiation and BOD as well as solar radiation and TSS. The relationships are consequent upon the range of pH values obtained, hence, the inhibition of the oxidation of organic matter. The abatement of suspended solids, in oxidation pond effluents, at pH of 10.2 and above has been reported (Elmaleh et al. 1996). The high pH range from the study accounts for the abatement of the TSS. Hence, it justifies the strong negative relationship between the pH and TSS.

Conventionally, the dissolution of oxygen in pure water decreases with increase in temperature (Xing et al. 2014). Weak negative relationships existed between DO and temperature, as well as DO and solar radiation. The DO as well as ensuing relationships were influenced by algae concentration, solar radiation, as well as the photosynthetic activities in the pond. Oxygen, provided by the algal population, principally accounts for the oxygenation in pond systems (Mara & Pearson 1998). The temporal diurnal variation of DO, in the WSP systems, in response to the photosynthetic activities of algae have been reported (Ho et al. 2018). The timing and magnitude of solar radiation, as well as the variability in the algal concentration, perhaps confers additional complexities to the relationships between DO and the other parameters.

Comparison between GBMs, RF, and ET models

A comparison of the outcomes of various ML models, for effluent BOD5 prediction, are presented in Table 1 considering the metrics of R2, MAE, RMSE, and RMSLE. The ML models under review included ET, RF, light gradient boosting machine (LightGBM), DT, gradient boosting (GB), and XGBoost.

Table 1

Performance metrics for the evaluation of different machine learning algorithms

ModelMAERMSERMSLE
Tree-based models Extra trees regressor 3.9379 6.6906 0.1065 
Random forest regressor 4.0974 6.5025 0.1059 
Light gradient boosting machine 4.8543 6.9004 0.1255 
Decision tree regressor 5.0714 7.1975 0.1237 
Gradient boosting regressor 3.9044 6.8201 0.1071 
Extreme gradient boosting 4.3453 6.2934 0.1096 
ModelMAERMSERMSLE
Tree-based models Extra trees regressor 3.9379 6.6906 0.1065 
Random forest regressor 4.0974 6.5025 0.1059 
Light gradient boosting machine 4.8543 6.9004 0.1255 
Decision tree regressor 5.0714 7.1975 0.1237 
Gradient boosting regressor 3.9044 6.8201 0.1071 
Extreme gradient boosting 4.3453 6.2934 0.1096 

Figure 5 shows the coefficient of determination of the different BOD5 ML prediction models. The results depict the performance of the ML models at testing. With respect to the coefficient of determination, the XGBoost performed better than the other algorithms (Figure 5(f)). The results revealed that different models yielded the least error values for the different performance error metrics under review. No model yielded the least error value more than once. This highlights the competence of the other models in predicting the BOD5 of the ISHJEWSP (Table 1). However, the XGBoost algorithm stands out as the only ML method that yielded the least error value as well as the highest coefficient of determination. Previously, XGBoost has been reported to be faster than other algorithms, with the capacity of accelerated learning, allowing for faster model discovery (Mesut et al. 2023). The results show how well the ML models performed by evaluating the predicted and observed effluent BOD5. A minimal difference between the predicted and observed values, resulting in a tight clustering around the diagonal line, indicates a high performance and accurate prediction (Zohair & Mahmoud 2019). The gradient boosting regressor yielded the least MAE of 3.9044, while the XGBoost yielded the least RMSE of 6.2934. RF yielded the least RMSLE of 0.1059 (Table 1). The XGBoost was the best performing algorithm for RMSE and was in the middle range of the performance metrics with the tree-based models. In no case was the XGBosst the worst performing algorithm.
Figure 5

Comparison of the different BOD5 prediction models using the coefficient of determination: (a) extra trees regressor; (b) random forest; (c) light gradient boosting; (d) decision trees; (e) gradient boosting regressor; and (f) extreme gradient boosting.

Figure 5

Comparison of the different BOD5 prediction models using the coefficient of determination: (a) extra trees regressor; (b) random forest; (c) light gradient boosting; (d) decision trees; (e) gradient boosting regressor; and (f) extreme gradient boosting.

Close modal

In the past, the nexus of R2 score as well as error metrics have been used to determine, among a select set of ML models, the best model for BOD5 prediction (Ooi et al. 2022). Different ML models have been reported to perform better for certain variables than others (Khodadadi et al. 2016). BOD removal has been predicted satisfactorily using ANN (Akratos et al. 2008). The use of hybrid intelligent systems and ANN has yielded high predictive accuracies of water treatment efficiencies for BOD, Chemical Oxygen Demand (COD), heavy metals, and organics (Malviya & Jaspal 2021).

The most important variables were identified using the BorutaShap algorithm. Dimensionality reduction, in ML, is effective in removing irrelevant and redundant data, increasing learning accuracy, and improving result comprehensibility (Khalid et al. 2014). Based on the results of the BorutaShap algorithm, the most important input features were pH, temperature, solar radiation, DO, and TSS (Figure 6). The feature importance results obtained are directly and indirectly comparable to the results obtained via principal component analysis (PCA) (Ogarekpe et al. 2022). The most important input features yielded strong correlations with the dependent variable except DO, which had a moderate correlation. The maximization of the prediction model accuracy by using attributes with high correlation to the output class has been reported (Alfian et al. 2022). BorutaShap as well as other interpretable ML models have been utilized for the identification of crucial features in diseases diagnoses (Ejiyi et al. 2024). Principal component analysis (PCA), RF, and other algorithms can be utilized for the extraction of relevant features, hence, improving the model accuracy and efficiency during prediction (Kotsiantis et al. 2006).
Figure 6

Feature importance of selected ISHJEWSP variables.

Figure 6

Feature importance of selected ISHJEWSP variables.

Close modal

This study compared the predictive performance of ET, RF, DT, Light Gradient Boosting Machine (LightGBM), gradient boosting (GB), and XGBoost methods in predicting the effluent BOD5 in the ISHJEWSP. With respect to the error evaluation metrics, the performance of the ML models varied with different models yielding the least error values for different metrics. The relationships between the parameters of the ISHJEWSP were found to range from a strong positive/negative to weak positive/negative relationships. The feature importance indicated that the most important were pH, temperature, solar radiation, DO, and TSS. These selected features yielded strong correlations with the dependent variable except DO, which had a moderate correlation. With respect to coefficient of determination and RMSE, the XGBoost performed better than the other models (R2 = 0.807, MAE = 4.3453, RMSE = 6.2934, RMSLE = 0.1096). The XGBoost was the best performing algorithm for RMSE and was in the middle range of performance metrics with the tree-based models. The study demonstrates the effectiveness of ML in predicting BOD5 in the ISHJEWSP. This can result in significant time saving associated with the determination of BOD5 via experimental procedures, especially when mitigation works due to pollution are required without delays.

NO initiated the study and wrote the first draft. The data collection was carried out by NO. IT wrote some sections of the paper as well as carried out the machine learning analysis. JA, OU, and AC revised the paper.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Akratos
C. S.
,
Papaspyros
J. N.
&
Tsihrintzis
V. A.
(
2008
)
An artificial neural network model and design equations for BOD and COD removal prediction in horizontal subsurface flow constructed wetlands
,
Chemical Engineering Journal
,
143
(
1–3
),
96
110
.
Alfian
G.
,
Syafrudin
M.
,
Fahrurrozi
I.
,
Fitriyani
N. L.
,
Atmaji
F. T. D.
,
Widodo
T.
,
Bahiyah
N.
,
Benes
F.
&
Rhee
J.
(
2022
)
Predicting breast cancer from risk factors using SVM and extra-trees-based feature selection method
,
Computers
,
11
(
9
),
136
.
APHA
(
1998
)
Standard Methods for the Examination of Water and Wastewater
. 20th Edition.
American Public Health Association, American Water Works Association and Water Environmental Federation
,
Washington, DC
.
Arya
M.
,
Sastry G
H.
,
Motwani
A.
,
Kumar
S.
&
Zaguia
A.
(
2022
)
A novel extra tree ensemble optimized DL framework (ETEODL) for early detection of diabetes
,
Frontiers in Public Health
,
9
,
797877
.
Ayyadevara
V. K.
(
2018
)
Gradient Boosting Machine
. In:
Pro Machine Learning Algorithms
.
Apress
,
Berkeley, CA
.
https://doi.org/10.1007/978-1-4842-3564-5_6
.
Bentéjac
C.
,
Csörgő
A.
&
Martínez-Muñoz
G.
(
2021
)
A comparative analysis of gradient boosting algorithms
,
Artificial Intelligence Review
,
54
,
1937
1967
.
Biau
G.
&
Scornet
E.
(
2016
)
A random forest guided tour
,
Test
,
25
,
197
227
.
Breiman
L.
(
2001
)
Random forests
,
Machine Learning
,
45
,
5
32
.
Charbuty
B.
&
Abdulazeez
A.
(
2021
)
Classification based on decision tree algorithm for machine learning
,
Journal of Applied Science and Technology Trends
,
2
(
1
),
20
28
.
Daut
I.
,
Yusoff
M. I.
,
Ibrahim
S.
,
Irwanto
M.
&
Nsurface
G.
(
2012
)
Relationship between the solar radiation and surface temperature in Perlis
,
Advanced Materials Research
,
512
,
143
147
.
Ejiyi
C. J.
,
Qin
Z.
,
Ukwuoma
C. C.
,
Nneji
G. U.
,
Monday
H. N.
,
Ejiyi
M. B.
,
Ejiyi
T. U.
,
Okechukwu
U.
&
Bamisile
O. O.
(
2024
)
Comparative Performance Analysis of Boruta, SHAP, and Borutashap for Disease Diagnosis: A Study with Multiple Machine Learning Algorithms
.
Network
:
Computation in Neural Systems
, pp.
1
38
Elmaleh
S.
,
Yahi
H.
&
Coma
J.
(
1996
)
Suspended solids abatement by pH increase – upgrading of an oxidation pond effluent
,
Water Research
,
30
(
10
),
2357
2362
.
Freund
Y.
(
1995
)
Boosting a weak learning algorithm by majority
,
Information and Computation
,
121
(
2
),
256
285
.
Friedman
J. H
. (
2001
)
Greedy function approximation: a gradient boosting machine
.
Annals of statistics
,
29
(
5
),
1189
1232
.
Gall
J.
,
Yao
A.
,
Razavi
N.
,
Van Gool
L.
&
Lempitsky
V.
(
2011
)
Hough forests for object detection, tracking, and action recognition
,
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
33
(
11
),
2188
2202
.
Gavankar
S. S.
&
Sawarkar
S. D.
(
2017
) “
Eager decision tree
,”
2nd International Conference for Convergence in Technology (I2CT)
,
Mumbai, India
,
2017
, pp.
837
840
.
doi: 10.1109/I2CT.2017.8226246
.
Gedda
G.
,
Balakrishnan
K.
,
Devi
R. U.
,
Shah
K. J.
,
Gandhi
V.
,
Gandh
V.
&
Shah
K.
(
2021
)
Introduction to conventional wastewater treatment technologies: Limitations and recent advances
,
Mater. Res. Found
,
91
,
1
36
.
Gerardi
M. H.
(
2015
)
The biology and troubleshooting of facultative lagoons
.
New Jersey
:
John Wiley & Sons
.
Geurts
P.
,
Ernst
D.
&
Wehenkel
L.
(
2006
)
Extremely randomized trees
,
Machine Learning
,
63
,
3
42
.
Granata
F.
,
Papirio
S.
,
Esposito
G.
,
Gargano
R.
&
De Marinis
G.
(
2017
)
Machine learning algorithms for the forecasting of wastewater quality indicators
,
Water
,
9
(
2
),
105
.
Hammed
M. M.
,
AlOmar
M. K.
,
Khaleel
F.
&
Al-Ansari
N.
(
2021
)
An extra tree regression model for discharge coefficient prediction: Novel, practical applications in the hydraulic sector and future research directions
,
Mathematical Problems in Engineering
,
2021
,
1
19
.
Ho
L. T.
,
Van Echelpoel
W.
&
Goethals
P. L.
(
2017
)
Design of waste stabilization pond systems: A review
,
Water Research
,
123
,
236
248
.
Ho
L.
,
Pham
D. T.
,
Van Echelpoel
W.
,
Muchene
L.
,
Shkedy
Z.
,
Alvarado
A.
,
Espinoza-Palacios
J.
,
Arevalo-Durazno
M.
,
Thas
O.
&
Goethals
P.
(
2018
)
A closer look on spatiotemporal variations of dissolved oxygen in waste stabilization ponds using mixed models
,
Water
,
10
(
2
),
201
.
Hodson
T. O.
(
2022
)
Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not
,
Geoscientific Model Development Discussions
,
2022
,
1
10
.
Hosetti
B. B.
&
Patil
H. S
. (
1987
)
Performance of wastewater stabilization ponds at different depths
.
Water Air Soil Pollut
34
,
191
198
.
https://doi.org/10.1007/BF00184760
.
James
G.
,
Witten
D.
,
Hastie
T.
&
Tibshirani
R.
(
2013
)
An Introduction to Statistical Learning
(Vol.
112
).
Springer
.
John
V.
,
Liu
Z.
,
Guo
C.
,
Mita
S.
&
Kidono
K.
(
2016
) ‘
Real-time lane estimation using deep features and extra trees regression
’,
Image and Video Technology: 7th Pacific-Rim Symposium, PSIVT 2015
,
Auckland, New Zealand
,
November 25–27, 2015
,
Revised Selected Papers 7
.
Khalid
S.
,
Khalil
T.
&
Nasreen
S.
(
2014
) ‘
A survey of feature selection and feature extraction techniques in machine learning
’,
2014 Science and Information Conference
.
Khodadadi
M.
,
Mesdaghinia
A.
,
Nasseri
S.
,
Ghaneian
M. T.
,
Ehrampoush
M. H.
&
Hadi
M
. (
2016
)
Prediction of the waste stabilization pond performance using linear multiple regression and multi-layer perceptron neural network: a case study of Birjand, Iran
.
Environmental Health Engineering and Management Journal
3
(
2
),
81
89
.
Kotsiantis
S. B.
,
Kanellopoulos
D.
&
Pintelas
P. E.
(
2006
)
Data preprocessing for supervised leaning
,
International Journal of Computer Science
,
1
(
2
),
111
117
.
Langsetmo
L.
,
Schousboe
J. T.
,
Taylor
B. C.
,
Cauley
J. A.
,
Fink
H. A.
,
Cawthon
P. M.
,
Kado
D. M.
&
Ensrud
K. E.
, &
Osteoporotic Fractures in Men (MrOS) Research Group
(
2023
)
Advantages and disadvantages of random forest models for prediction of hip fracture risk versus mortality risk in the oldest old
,
JBMR Plus
,
7
(
8
),
e10757
.
Mahapatra
S.
,
Samal
K.
&
Dash
R. R.
(
2022
)
Waste stabilization pond (WSP) for wastewater treatment: A review on factors, modelling and cost analysis
,
Journal of Environmental Management
,
308
,
114668
.
Maier
O.
,
Wilms
M.
,
von der Gablentz
J.
,
Krämer
U. M.
,
Münte
T. F.
&
Handels
H.
(
2015
)
Extra tree forests for sub-acute ischemic stroke lesion segmentation in MR sequences
,
Journal of Neuroscience Methods
,
240
,
89
100
.
Malviya
A.
&
Jaspal
D.
(
2021
)
Artificial intelligence as an upcoming technology in wastewater treatment: A comprehensive review
,
Environmental Technology Reviews
,
10
(
1
),
177
187
.
Mara
D.
&
Pearson
H
. (
1998
)
Design manual for waste stabilization ponds in Mediterranean countries
.
Leeds
:
Lagoon Technology International
.
Mara
D.
,
Pearson
H.
&
Silva
S. A.
(
1983
)
Brazilian stabilization pond research suggests low-cost urban applications
,
World Water
,
6
(
7
),
20
24
.
Mesut
B.
,
Başkor
A.
&
Aksu
N. B
. (
2023
)
Role of artificial intelligence in quality profiling and optimization of drug products
. In
Philips
A.
,
Shahiwala
A.
,
Rashid
M.
&
Faiyazuddin
Md.
(Eds.)
A Handbook of Artificial Intelligence in Drug Delivery
(pp.
35
54
).
Academic Press, Elsevier
.
Mukhtar
H.
,
Lin
Y.-P.
,
Shipin
O. V.
&
Petway
J. R.
(
2017
)
Modeling nitrogen dynamics in a waste stabilization pond system using flexible modeling environment with MCMC
,
International Journal of Environmental Research and Public Health
,
14
(
7
),
765
.
Navada
A.
,
Ansari
A. N.
,
Patil
S.
&
Sonkamble
B. A.
(
2011
) “
Overview of use of decision tree algorithms in machine learning
,”
2011 IEEE Control and System Graduate Research Colloquium
,
Shah Alam, Malaysia
, pp.
37
42
.
doi: 10.1109/ICSGRC.2011.5991826
.
Ogarekpe
N.
&
Agunwamba
J.
(
2016a
)
Effect of geometry on the performance of integrated solar and hydraulic jump enhanced waste stabilization pond
,
Desalination and Water Treatment
,
57
(
52
),
24946
24959
.
Ogarekpe
N. M.
,
Agunwamba
J. C.
&
Ekpenyong
M. G.
(
2022
)
Dimensionality reduction analysis of the integrated solar and hydraulic jump enhanced waste stabilization pond model parameters
,
International Journal of Engineering Research in Africa
,
58
,
95
106
.
Ooi
K. S.
,
Chen
Z.
,
Poh
P. E.
&
Cui
J.
(
2022
)
BOD5 prediction using machine learning methods
,
Water Supply
,
22
(
1
),
1168
1183
.
Oragui
J.
,
Curtis
T.
,
Silva
S.
&
Mara
D.
(
1987
)
The removal of excreted bacteria and viruses in deep waste stabilization ponds in northeast Brazil
,
Water Science and Technology
,
19
(
3–4
),
569
573
.
Pearson
H.
,
Mara
D.
,
Mills
S.
&
Smallman
D.
(
1987
)
Physico-chemical parameters influencing faecal bacterial survival in waste stabilization ponds
,
Water Science and Technology
,
19
(
12
),
145
152
.
Pipes
W. O.
(
1962
)
pH variation and BOD removal in stabilization ponds
,
Journal (Water Pollution Control Federation)
,
00000
,
1140
1150
.
Rigatti
S. J.
(
2017
)
Random forest
,
Journal of Insurance Medicine
,
47
(
1
),
31
39
.
Schneider
P.
&
Xhafa
F
. (
2022
)
Anomaly Detection and Complex Event Processing Over iot Data Streams: With Application to EHealth and Patient Data Monitoring
.
London
:
Academic Press
Sen
A.
&
Srivastava
M
. (
2012
)
Regression analysis: theory, methods, and applications
.
Chicago, USA
:
Springer Science & Business Media
.
Silva
S.
,
Mara
D.
&
De Oliveira
R.
(
1987
)
The performance of a series of five deep waste stabilization ponds in northeast Brazil
,
Water Science and Technology
,
19
(
12
),
61
64
.
Skiba
U.
(
2008
)
Denitrification
.
Song
Y.-Y.
&
Ying
L.
(
2015
)
Decision tree methods: Applications for classification and prediction
,
Shanghai Archives of Psychiatry
,
27
(
2
),
130
.
Sundui
B.
,
Ramirez Calderon
O. A.
,
Abdeldayem
O. M.
,
Lázaro-Gil
J.
,
Rene
E. R.
&
Sambuu
U.
(
2021
)
Applications of machine learning algorithms for biological wastewater treatment: Updates and perspectives
,
Clean Technologies and Environmental Policy
,
23
,
127
143
.
Swain
P. H.
&
Hauska
H.
(
1977
)
The decision tree classifier: Design and potential
,
IEEE Transactions on Geoscience Electronics
,
15
(
3
),
142
147
.
Touzani
S.
,
Granderson
J.
&
Fernandes
S.
(
2018
)
Gradient boosting machine for modeling the energy consumption of commercial buildings
,
Energy and Buildings
,
158
,
1533
1543
.
Utsev
J.
&
Agunwamba
J.
(
2020
)
Modelling solar enhanced waste stabilization pond
,
Water Practice & Technology
,
15
(
2
),
282
294
.
Xing
W.
,
Yin
M.
,
Lv
Q.
,
Hu
Y.
,
Liu
C.
&
Zhang
J
. (
2014
)
Oxygen solubility, diffusion coefficient, and solution viscosity
. In
Xing
W.
,
Yin
G.
&
Zhang
J.
(Eds.)
Rotating electrode methods and oxygen reduction electrocatalysts
(pp.
1
31
).
Elsevier
.
Zhang
Y.
&
Haghani
A.
(
2015
)
A gradient boosting method to improve travel time prediction
,
Transportation Research Part C: Emerging Technologies
,
58
,
308
324
.
Zohair
A.
&
Mahmoud
L.
(
2019
)
Prediction of Student's performance by modelling small dataset size
,
International Journal of Educational Technology in Higher Education
,
16
(
1
),
1
18
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).