Abstract
In this research, general regression neural network (GRNN), Hammerstein-wiener (HW) and non-linear autoregressive with exogenous (NARX) neural network, least square support vector machine (LSSVM) models were employed for multi-parametric (Hardness (mg/L), turbidity (Turb) (μs/cm), pH and suspended solid (SS) (mg/L)) modeling of Tamburawa water treatment plant (TWTP) at Kano, Nigeria. The weekly data from the TWTP were used and the predictive models were evaluated based on several numerical indicators. Four different non-linear ensemble techniques (GRNN-E, HW-E, NARX-E, and LSSVM-E) were subsequently employed. The comparison of the results of modeling showed that HW served as the best model for the simulation of Hardness, Turb, and SS while the NARX model demonstrated high capability in predicting pH. Yet, the HW and NARX as system identification techniques attained best overall predictive performance among the four modeling approaches. The HW model offers, therefore, a reliable and intelligent approach for predicting the treated Hardness, Turb, and SS and should be explored as a predictive tool in TWTP. Among the non-linear ensemble models, GRNN-E proved of high merit and increased the accuracy of the best single models significantly up to 30% for Hardness and Turb, 34% for pH, and 37% for SS with regards to the single models.
INTRODUCTION
Water is essential to sustain life, and an affordable and adequate supply of water must be available (World Health Organization (WHO) and United Nation Children Fund (UNICEF) 2012). Municipal or drinking water treatment plants (MWTP) are operated to remove bacteria, solids, micro-organisms, and other contaminants from untreated water through various approaches such as coagulation, filtration, etc. (Ogwueleka & Ogwueleka 2009). A satisfactory water treatment plant (WTP) is vital in order to avoid the discharge of pollutants and meet the required standards by law. The multi-parametric combination of physical, chemical, and biological characteristics are often the major factors affecting the operation and control of WTPs (Nourani et al. 2018).
Improper control for standard limits of different parameters in WTPs may lead to serious environmental and human health problems. As such, the proper control may be achieved by introducing a robust tool for modeling the WTP performance based on the importance of various key parameters. However, the WTP process is complex in nature due to the dilute mixture of numerous compositions, quality, and characteristic of the system which result in the difficulty of modeling WTP parameters (Belia et al. 2009). In addition, the biochemical and physical nature of WTPs exhibit non-linear phenomena which are too complicated to simulate by simple linear principles or mathematical models (Gaya et al. 2017). In the last decades, different linear models have been widely presented for managing the overall performance of WTPs, but most of them have limitations and are incapable of meeting the standards of modeling non-linear systems (Falah Nezhad et al. 2016). On the other hand, non-linear artificial intelligence (AI) models such as artificial neural network (ANN), adaptive neurofuzzy inference system (ANFIS), and support vector machine (SVM) have shown merit in modeling non-linear systems of WTPs (Mashaly & Alazba 2017; Rahimi et al. 2019).
For instance, Al-Baidhani & Alameedee (2017) developed an ANN model to predict the effluent pH and turbidity using various measured input parameters such as pH, temperature, and dose turbidity. The results proved the suitability of ANN in modeling the WTP parameters. Wu & Lo (2008) used the ANN and ANFIS models to compute the real-time coagulant dosage in WTPs using the measured turbidity, pH, and color. The outcomes demonstrated that the ANFIS model was capable to accurately predict the coagulant dosage with regards to the ANN model. Zhang & Stanley (1999) employed the ANN model with process control for predicting the treated turbidity. Maleki et al. (2018) predicted the influent parameters in WTP using auto regressive integrated moving average (ARIMA) and neural network auto-regression (NNAR) models, and despite an acceptable performance of ARIMA model, the results observed better prediction performance for NNAR with regard to ARIMA. Gaya et al. (2017) demonstrated the application of ANN and Hammerstein-Wiener (HW) models for forecasting the influent turbidity in WTPs using different input parameters. The simulated results indicated that ANN could outperform the HW model and may serve as an acceptable tool for modeling the turbidity of WTPs.
According to the mentioned literature review, it is obvious that several studies using AI models have been conducted and show promising performance, each model for a specific case. Despite the fact there is no exceptional model that justifies superiority to others, applying the knowledge of ensemble techniques could lead to more promising outcomes. The ensemble techniques have been used in various fields of engineering (Sharghi et al. 2018). Nourani et al. (2018) demonstrated a combination of different AI-based single models in an ensemble modeling parameter to predict the performance of a wastewater treatment plant. The obtained results showed that ensemble techniques outperform the AI-based models. Kim & Seo (2015) developed, respectively, ensemble methods using data mining techniques to simulate biological oxygen demand (BOD) and various water quality (WQ) parameters. However, since the pronouncement of the non-linear ensemble (NLE) approach, to the best of the authors’ knowledge, there is no published research indicating the application of different AI-based models integrated with several non-linear ensemble techniques (NETs) in WTPs. Hence, the present work is focused on the application of four types of AI-based models (i.e., general regression neural network (GRNN), HW, non-linear autoregressive with exogenous (NARX) neural network, least square support vector machine (LSSVM) model for multi-parametric modeling in Tamburawa WTP, Kano, Nigeria. For this purpose, various combinations of the input parameters based on their significance are imposed into the models. In order to improve the overall performance of the prediction, NLE modeling is also employed using the outputs of the single models. In this way, three different types of black-box models, GRNN, LSSVM, HW, and NARX ensemble kernels (GRNN-E, LSSVM-E, HW-E, and NARX-E) are used and the results are compared together. Other feasible alternative single models may also be used, such as genetic programming, ARIMA models (Olyaie et al. 2015; Nabavi-Pelesaraei et al. 2017) and for an ensemble, wavelet-ANN (Shamshirband et al. 2019) and non-linear neural network ensemble (Elkiran et al. 2019) were also employed.
MATERIALS AND METHODS
Plant description and data used
Kano is the most populous state in Nigeria, in which drainage river water is conventionally treated in a plant exceeding the minimum standard WQ of WHO. The Tamburawa water treatment plant (TWTP) has a capacity of producing 150 ML potable water per day to cover communities in Kano city and the surroundings. The raw water from the source is pumped via pump station and enters a preliminary treatment unit where grits and some of the suspended solids are removed to avoid pump wear and pipe deterioration. The water is then clarified as shown in Figure 1(a), the clarifier retains and settles the water to remove settleable organic and floatable solids by gravity. Figure 1(b) shows the schematic flow chart of the operational process. The operational process comprises the stages of rapid mix, coagulation/flocculation, sedimentation, filtration, disinfection, and final treated water, which can be distributed to different sources such as domestic, commercial, and institutional (Gaya et al. 2017).
(a) Clarifier, sedimentation tank and pump station, and (b) operational process of the plant.
(a) Clarifier, sedimentation tank and pump station, and (b) operational process of the plant.

The obtained weekly data set from the TWTP for the year 2015 are measured raw and treated pH (pHR and pHT), turbidity (TurbR and TurbT) (μs/cm), conductivity (CondR and CondT) (mS/cm), total dissolved solids (TDSR and TDST) (mg/L), suspended solids (SSR and SST) (mg/L), hardness (HardnessR and HardnessT) (mg/L), chloride (ClR and ClT) (mg/L), and iron (FeR and FeT) (mg/L). Table 1 shows the descriptive statistical analysis used for this study of the data. The data set was divided into two parts, 75% for the calibration and the remaining 25% were used in the verification phase. As a very common and classic method of data splitting, we used simple random selection of train and test data samples. However, it is highly recommended that in order to have homogeneous data distribution in both training and test data sets, after splitting the data set, the statistical characteristics of both subsets should be checked to have almost the same statistics (mean, standard deviation, etc.) in both subsets to guarantee the generalization of the model. In order to see that the data splitting strategy does not affect the overall performance of the modeling, via the cross-validation method, such random data splitting is repeated several times to see whether there any large differences among the obtained performance metrics in different runs or not; if not, the model generalization is good and the quantity and quality of the data are appropriate.
Descriptive statistics of the data
Parameters . | Mean . | Max . | Min . | SD . | Coef. skewness . |
---|---|---|---|---|---|
pHRaw | 7.6813 | 11.5000 | 6.5000 | 0.6649 | 2.1368 |
TurbRaw (mg/L) | 200.7667 | 1,796.0000 | 51.0000 | 220.2917 | 4.7173 |
CondRaw (mS/cm) | 116.5067 | 257.0000 | 53.0000 | 43.7466 | 0.5669 |
TDSRaw (mg/L) | 57.4705 | 106.400 | 17.9000 | 20.7953 | 0.2955 |
HardnessRaw (mg/L) | 36.3028 | 53.8700 | 24.6100 | 6.33222 | 0.8852 |
ClRaw (mg/L) | 13.4560 | 33.5600 | 8.8800 | 4.1255 | 2.7859 |
SSRaw (mg/L) | 154.200 | 1,248.0000 | 34.0000 | 161.4377 | 4.1678 |
FeRaw (mg/L) | 3.8977 | 32.0000 | 1.0400 | 7.2880 | 3.0713 |
pHTreated | 6.4370 | 8.8000 | 5.0000 | 0.6521 | 0.6055 |
TurbTreated (mg/L) | 0.8181 | 3.4100 | 0.2000 | 0.5390 | 2.7709 |
CondTreated (mS/cm) | 160.8732 | 363.0000 | 8.1900 | 60.6630 | 0.8620 |
TDSTreated (mg/L) | 80.1022 | 181.1000 | 4.2000 | 29.8376 | 0.8334 |
HardnessTreated (mg/L) | 29.0826 | 44.8900 | 17.9600 | 5.1921 | 0.5919 |
ClTreated (mg/L) | 37.8045 | 123.0000 | 11.8400 | 18.1758 | 1.7205 |
SSTreated (mg/L) | 0.8222 | 2.0000 | 0.000 | 0.6284 | 0.1481 |
FeTreated (mg/L) | 0.1011 | 1.0100 | 0.0100 | 0.1155 | 5.7892 |
Parameters . | Mean . | Max . | Min . | SD . | Coef. skewness . |
---|---|---|---|---|---|
pHRaw | 7.6813 | 11.5000 | 6.5000 | 0.6649 | 2.1368 |
TurbRaw (mg/L) | 200.7667 | 1,796.0000 | 51.0000 | 220.2917 | 4.7173 |
CondRaw (mS/cm) | 116.5067 | 257.0000 | 53.0000 | 43.7466 | 0.5669 |
TDSRaw (mg/L) | 57.4705 | 106.400 | 17.9000 | 20.7953 | 0.2955 |
HardnessRaw (mg/L) | 36.3028 | 53.8700 | 24.6100 | 6.33222 | 0.8852 |
ClRaw (mg/L) | 13.4560 | 33.5600 | 8.8800 | 4.1255 | 2.7859 |
SSRaw (mg/L) | 154.200 | 1,248.0000 | 34.0000 | 161.4377 | 4.1678 |
FeRaw (mg/L) | 3.8977 | 32.0000 | 1.0400 | 7.2880 | 3.0713 |
pHTreated | 6.4370 | 8.8000 | 5.0000 | 0.6521 | 0.6055 |
TurbTreated (mg/L) | 0.8181 | 3.4100 | 0.2000 | 0.5390 | 2.7709 |
CondTreated (mS/cm) | 160.8732 | 363.0000 | 8.1900 | 60.6630 | 0.8620 |
TDSTreated (mg/L) | 80.1022 | 181.1000 | 4.2000 | 29.8376 | 0.8334 |
HardnessTreated (mg/L) | 29.0826 | 44.8900 | 17.9600 | 5.1921 | 0.5919 |
ClTreated (mg/L) | 37.8045 | 123.0000 | 11.8400 | 18.1758 | 1.7205 |
SSTreated (mg/L) | 0.8222 | 2.0000 | 0.000 | 0.6284 | 0.1481 |
FeTreated (mg/L) | 0.1011 | 1.0100 | 0.0100 | 0.1155 | 5.7892 |
Selection of the dominant inputs is one of the essential parts in AI-based modeling. As such, a suitable combination of inputs is determined (see Table 2). The holdout validation method was applied and considered as another version of k-fold cross-validation. In the validation process, different types of validation approach can be applied including cross-validation, which is called k-fold cross-validation, others are holdout, leave one out, and so on. The holdout method is also considered as the simple version of k-fold where the data are usually portioned randomly into two sets known as training and testing phase (Nourani et al. 2018).
Pearson correlation matrix between raw and treated parameters
Parameters . | pHR . | TurbR . | CondR . | TDSR . | HardnR . | ClR . | SSR . | FeR . | pHT . | TurbT . | HardnT . | SST . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
pHR | ||||||||||||
TurbR | 0.032525 | 1 | ||||||||||
CondR | −0.33195 | −0.10229 | 1 | |||||||||
TDSR | −0.29748 | −0.15049 | 0.829811 | 1 | ||||||||
HardnR | 0.001972 | 0.438722 | 0.226554 | 0.179464 | 1 | |||||||
ClR | 0.048784 | 0.321302 | −0.00621 | 0.031401 | −0.01866 | 1 | ||||||
SSR | 0.005969 | 0.970237 | −0.08517 | −0.12697 | 0.365937 | 0.304131 | 1 | |||||
FeR | −0.34289 | 0.097055 | 0.166551 | 0.209041 | −0.09534 | 0.076946 | 0.11259 | 1 | ||||
pHT | 0.461364 | −0.15327 | −0.01035 | −0.02636 | −0.04034 | 0.022556 | −0.1392 | −0.24764 | 1 | |||
TurbT | −0.04921 | 0.448457 | −0.05912 | −0.10834 | 0.169559 | 0.200926 | 0.417396 | 0.092185 | 0.011003 | 1 | ||
HardnT | −0.04240 | 0.464929 | 0.201475 | 0.157669 | 0.581288 | 0.251558 | 0.407611 | 0.138094 | −0.29883 | 0.26419 | 1 | |
SST | −0.07740 | 0.207289 | −0.07127 | 0.015499 | 0.116308 | 0.051853 | 0.187615 | 0.100166 | −0.03585 | 0.648656 | 0.13846 | 1 |
Parameters . | pHR . | TurbR . | CondR . | TDSR . | HardnR . | ClR . | SSR . | FeR . | pHT . | TurbT . | HardnT . | SST . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
pHR | ||||||||||||
TurbR | 0.032525 | 1 | ||||||||||
CondR | −0.33195 | −0.10229 | 1 | |||||||||
TDSR | −0.29748 | −0.15049 | 0.829811 | 1 | ||||||||
HardnR | 0.001972 | 0.438722 | 0.226554 | 0.179464 | 1 | |||||||
ClR | 0.048784 | 0.321302 | −0.00621 | 0.031401 | −0.01866 | 1 | ||||||
SSR | 0.005969 | 0.970237 | −0.08517 | −0.12697 | 0.365937 | 0.304131 | 1 | |||||
FeR | −0.34289 | 0.097055 | 0.166551 | 0.209041 | −0.09534 | 0.076946 | 0.11259 | 1 | ||||
pHT | 0.461364 | −0.15327 | −0.01035 | −0.02636 | −0.04034 | 0.022556 | −0.1392 | −0.24764 | 1 | |||
TurbT | −0.04921 | 0.448457 | −0.05912 | −0.10834 | 0.169559 | 0.200926 | 0.417396 | 0.092185 | 0.011003 | 1 | ||
HardnT | −0.04240 | 0.464929 | 0.201475 | 0.157669 | 0.581288 | 0.251558 | 0.407611 | 0.138094 | −0.29883 | 0.26419 | 1 | |
SST | −0.07740 | 0.207289 | −0.07127 | 0.015499 | 0.116308 | 0.051853 | 0.187615 | 0.100166 | −0.03585 | 0.648656 | 0.13846 | 1 |
The crux of selecting individual inputs, for example, hardness, turbidity, pH, and suspended solids parameters to simulate the performance of WTP is that these variables are usually utilized and considered when determining the performance of the treated water from the plant. The hardness is a common quality of water and contains dissolved compounds of calcium and magnesium and, sometimes, other divalent and trivalent metallic elements such as the concentration of hydrogen ions that indicates the acidic and basic level of the solution. It is considered to be of great practical importance as it influences most of the chemical and biochemical reactions (Elkiran et al. 2018). Measuring suspended solids (SS) in water is used for control of various treatment processes and for the examination of wastewater quality. SS is considered to be one of the major pollutants that contributes to the deterioration of WQ, contributing to higher costs for water treatment, and the general esthetics of the water. Turbidity mostly provides cover and food for pathogens. If not effectively removed, turbidity can cause an outburst of waterborne diseases (Gaya et al. 2017).
The stability analysis is an important issue in both data-driven algorithms and numerical solution of initial-value problems. It has been studied extensively in the past for a variety of engineering problems, especially for input selection approach. The motivation behind exploring the stability is to provide evidence that the selected features are relatively robust to slight changes in the data (Křížek et al. 2007). However, different stability selection approaches are applied in an effort to reduce the noise and dimensionality of such data sets. Kalousis et al. (2007) measured the stability in terms of Pearson and Spearman correlation. Nourani & Fard (2012) introduced the stability features selection approach using sensitivity analysis. Similarly, Křížek et al. (2007), Nourani et al. (2019), Nagana et al. (2019), and Meinshausen & Bühlmann (2010) reported different input stability selection approaches. In this paper, both the Pearson and Spearman correlation and non-linear stability of input variables selection approach were assessed prior to model development.
Proposed model development
It should be noted that the quality and quantity of the intake water can be drastically changed through different periods of the year due to climate/natural or anthropogenic issues. However, as shown by Equation (2), in this study, we tried to learn the pattern between the input and output parameters using available data (only one-year data), so that having the inputs, the model can estimate the output parameters (WQ parameters in outlet). Therefore, when a severe change occurred in the intake water parameters, via the trained (learned) models, proportionally the output will reflect such severe changes.
It should be mentioned that if we have a very long period of observed data set to use for training the models, the efficiency of such developed data-driven methods will be more robust, especially when facing such severe conditions.
Used data intelligence methods
For the purpose of this research, the employed data-driven intelligence, GRNN, HW, NARX, and LSSVM models and a brief description of the mathematical concept for each of these models as well as the related citations are provided in the Appendix, Figures A1, A2, and A3, in more detail.
Performance assessment of the models

Ensemble learning technique
According to the ‘no-free lunch’ theorem, it is a plain and well-known fact that no single model proves the best for all kinds of data set. The properties of data such as normality, size, linearity, etc. affect the performance accuracy of any model. In addition, several studies have already shown that even for the same type of data set, the performance results may vary for different models (Nourani et al. 2019). Hence, employing multiple models and combining them has been found to be effective for a variety of problems. The ensemble technique has been used with promising success in several fields including hydro-environmental sciences, data mining and statistics as an approach to improve the prediction skill. The main goal for this technique follows the concept of enhancing the performance of the single model, by combining the results of the various individual models. As such, the process of combining various single models will be an ensemble modeling parameter (Nourani et al. 2018). In this study, four different NET were used, GRNN-E, HW-E, NARX-E, and LSSVM-E, to compare and improve the performance of single models.
Non-linear ensemble techniques
As mentioned above, four black-box models, GRNN-E, HW-E, NARX-E, and LSSVM, are employed to perform the NLE modeling. In NLE techniques, the outputs of the individual models are built to form the input layer of the considered NLE, each of which is assigned to one neuron in the input layer. Sharghi et al. (2018) and Nourani et al. (2018) have used non-linear ANN ensemble and reported superiority of such technique over linear ensemble techniques. For the modeling procedure by GRNN-E, LSSVM-E, HW-E, and NARX-E, first the individual models are selected and the outputs of the individual models are used for the ensembling. Figure 2 shows a flowchart of the techniques employed for the ensemble modeling.
General flow chart of the proposed NET employed for ensemble modeling.
RESULTS AND DISCUSSION
The main context of data-driven and predictive intelligence models for system identification (i.e., GRNN, LSSVM, HW, and NARX) are evaluated numerically via various acceptable performance accuracy. Afterwards, NLE approaches (i.e., GRNN-E, LSSVM-E, HW-E, and NARX-E) are used to enhance the accuracy of the individual models. The comparison of the individual models is presented in Table 3, according to Equations (3)–(6). As defined by Equation (2), three different models and input combinations were examined based on the number and types of input for the employed models in order to predict the treated hardness (HardnessT) (mg/L), turbidity (TurbT), pH (pHT) (μs/cm), and suspended solids (SST) (mg/L).
Performance indicators of the single models for modeling hardness, Turb, pH, and SS
. | . | Models . | Calibration . | Verification . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
DC . | RMSE . | MAE . | MAPE . | DC . | RMSE . | MAE . | MAPE . | |||
GRNN | 0.5789 | 0.1582 | 0.0241 | 0.0904 | 0.5528 | 0.1736 | 0.0305 | 0.2071 | ||
HW | 0.841 | 0.08 | 0.018 | 0.0674 | 0.8272 | 0.1208 | 0.0276 | 0.6264 | ||
Hardness | M1 | NARX | 0.3949 | 0.1561 | 0.0098 | 0.0368 | 0.3552 | 0.126 | 0.0237 | 0.8231 |
LSSVM | 0.329 | 0.1691 | 0.0022 | 0.6001 | 0.3125 | 0.1526 | 0.0023 | 0.6907 | ||
GRNN | 0.6311 | 0.1219 | 0.0127 | 0.0478 | 0.6175 | 0.1317 | 0.0269 | 0.4883 | ||
HW | 0.8445 | 0.0792 | 0.017 | 0.0639 | 0.8316 | 0.1269 | 0.0228 | 0.3639 | ||
M2 | NARX | 0.4644 | 0.1722 | 0.0661 | 0.2477 | 0.4257 | 0.1324 | 0.0926 | 0.6891 | |
LSSVM | 0.5499 | 0.1482 | 0.003 | 0.3001 | 0.5014 | 0.1508 | 0.0021 | 0.3657 | ||
GRNN | 0.6387 | 0.1207 | 0.0034 | 0.0129 | 0.6181 | 0.1254 | 0.0211 | 0.2946 | ||
HW | 0.9851 | 0.0245 | 0.0048 | 0.018 | 0.9181 | 0.0254 | 0.0081 | 0.5639 | ||
M3 | NARX | 0.5165 | 0.1396 | 0.0279 | 0.1046 | 0.4994 | 0.1432 | 0.2792 | 0.6885 | |
LSSVM | 0.5167 | 0.1396 | 0.1423 | 0.6701 | 0.514 | 0.1384 | 0.1344 | 0.7387 | ||
GRNN | 0.6518 | 0.1517 | 0.0027 | 0.0104 | 0.6311 | 0.1079 | 0.0126 | 0.1456 | ||
HW | 0.9636 | 0.0362 | 0.003 | 0.0115 | 0.9263 | 0.0483 | 0.0129 | 0.1494 | ||
M1 | NARX | 0.9948 | 0.122 | 0.0103 | 0.0400 | 0.9937 | 0.0121 | 0.0057 | 0.0656 | |
LSSVM | 0.5392 | 0.1565 | 0.018 | 0.021 | 0.509 | 0.1245 | 0.0192 | 0.2214 | ||
GRNN | 0.7519 | 0.1504 | 0.0009 | 0.0034 | 0.7163 | 0.0947 | 0.0074 | 0.085 | ||
HW | 0.8395 | 0.0671 | 0.0081 | 0.0312 | 0.8066 | 0.1039 | 0.0265 | 0.3066 | ||
pH | M2 | NARX | 0.9908 | 0.0027 | 0.0001 | 0.0016 | 0.9902 | 0.0027 | 0.0001 | 0.0022 |
LSSVM | 0.5421 | 0.1552 | 0.0134 | 0.0142 | 0.4911 | 0.1292 | 0.0127 | 0.1464 | ||
GRNN | 0.793 | 0.1399 | 0.0036 | 0.0138 | 0.7786 | 0.0836 | 0.0331 | 0.3828 | ||
HW | 0.7703 | 0.1399 | 0.0036 | 0.0138 | 0.7555 | 0.0879 | 0.0068 | 0.0782 | ||
M3 | NARX | 0.9968 | 0.0025 | 0.0001 | 0.0011 | 0.9948 | 0.0027 | 0.0001 | 0.0016 | |
LSSVM | 0.6033 | 0.1536 | 0.1055 | 1.973 | 0.6043 | 0.254 | 0.1088 | 1.2563 | ||
GRNN | 0.4358 | 0.1542 | 0.0005 | 0.0329 | 0.4144 | 0.0873 | 0.0108 | 0.2385 | ||
HW | 0.7398 | 0.1136 | 0.0037 | 0.2407 | 0.7594 | 0.125 | 0.0206 | 0.4544 | ||
M1 | NARX | 0.5822 | 0.2681 | 0.0418 | 2.7325 | 0.5613 | 0.0757 | 0.0518 | 2.9404 | |
LSSVM | 0.5384 | 0.1651 | 0.0027 | 0.1443 | 0.5478 | 0.0938 | 0.0086 | 0.1888 | ||
GRNN | 0.636 | 0.159 | 0.0175 | 1.1458 | 0.6552 | 0.0548 | 0.0149 | 1.3298 | ||
HW | 0.971 | 0.0012 | 0.0002 | 0.0101 | 0.9601 | 0.0002 | 0.0002 | 0.0043 | ||
Turb | M2 | NARX | 0.5959 | 0.1203 | 0.0198 | 0.1523 | 0.5693 | 0.0044 | 0.0045 | 0.0099 |
LSSVM | 0.7607 | 0.1641 | 0.0063 | 0.1474 | 0.7481 | 0.0999 | 0.0078 | 0.1715 | ||
GRNN | 0.7733 | 0.1644 | 0.0077 | 0.5047 | 0.751 | 0.0564 | 0.0131 | 0.2883 | ||
HW | 0.9166 | 0.0546 | 0.0021 | 0.138 | 0.9077 | 0.0283 | 0.0067 | 0.1472 | ||
M3 | NARX | 0.739 | 0.1235 | 0.0191 | 1.2451 | 0.7065 | 0.2911 | 0.1901 | 1.0086 | |
LSSVM | 0.761 | 0.1627 | 0.2053 | 3.4712 | 0.7239 | 0.4837 | 0.252 | 5.5602 | ||
GRNN | 0.5522 | 0.2141 | 0.0086 | 0.0307 | 0.5642 | 0.2131 | 0.0026 | 0.0271 | ||
HW | 0.754 | 0.2659 | 0.0362 | 0.0013 | 0.7447 | 0.0958 | 0.0354 | 0.3727 | ||
M1 | NARX | 0.5478 | 0.3221 | 0.03 | 0.0016 | 0.5011 | 0.1905 | 0.0401 | 0.0084 | |
LSSVM | 0.4513 | 0.3226 | 0.004 | 1.928 | 0.408 | 0.4273 | 0.0529 | 1.5567 | ||
GRNN | 0.5294 | 0.3374 | 0.0136 | 0.0485 | 0.508 | 0.2029 | 0.0527 | 0.555 | ||
HW | 0.997 | 0.0188 | 0.0004 | 0.0016 | 0.9897 | 0.0192 | 0.002 | 0.0209 | ||
SS | M2 | NARX | 0.9687 | 0.0606 | 0.0074 | 0.0263 | 0.9546 | 0.0171 | 0.0095 | 0.0005 |
LSSVM | 0.672 | 0.3219 | 0.0832 | 0.44 | 0.649 | 0.2673 | 0.0526 | 0.5539 | ||
GRNN | 0.8176 | 0.3411 | 0.0868 | 0.0031 | 0.8167 | 0.173 | 0.0304 | 0.3203 | ||
HW | 0.944 | 0.081 | 0.0103 | 0.0676 | 0.9355 | 0.0275 | 0.0004 | 0.0913 | ||
M3 | NARX | 0.9169 | 0.0607 | 0.0076 | 0.0127 | 0.9072 | 0.0317 | 0.0003 | 0.0166 | |
LSSVM | 0.7126 | 0.3203 | 0.0047 | 2.0024 | 0.7793 | 0.5987 | 0.306 | 3.2214 |
. | . | Models . | Calibration . | Verification . | ||||||
---|---|---|---|---|---|---|---|---|---|---|
DC . | RMSE . | MAE . | MAPE . | DC . | RMSE . | MAE . | MAPE . | |||
GRNN | 0.5789 | 0.1582 | 0.0241 | 0.0904 | 0.5528 | 0.1736 | 0.0305 | 0.2071 | ||
HW | 0.841 | 0.08 | 0.018 | 0.0674 | 0.8272 | 0.1208 | 0.0276 | 0.6264 | ||
Hardness | M1 | NARX | 0.3949 | 0.1561 | 0.0098 | 0.0368 | 0.3552 | 0.126 | 0.0237 | 0.8231 |
LSSVM | 0.329 | 0.1691 | 0.0022 | 0.6001 | 0.3125 | 0.1526 | 0.0023 | 0.6907 | ||
GRNN | 0.6311 | 0.1219 | 0.0127 | 0.0478 | 0.6175 | 0.1317 | 0.0269 | 0.4883 | ||
HW | 0.8445 | 0.0792 | 0.017 | 0.0639 | 0.8316 | 0.1269 | 0.0228 | 0.3639 | ||
M2 | NARX | 0.4644 | 0.1722 | 0.0661 | 0.2477 | 0.4257 | 0.1324 | 0.0926 | 0.6891 | |
LSSVM | 0.5499 | 0.1482 | 0.003 | 0.3001 | 0.5014 | 0.1508 | 0.0021 | 0.3657 | ||
GRNN | 0.6387 | 0.1207 | 0.0034 | 0.0129 | 0.6181 | 0.1254 | 0.0211 | 0.2946 | ||
HW | 0.9851 | 0.0245 | 0.0048 | 0.018 | 0.9181 | 0.0254 | 0.0081 | 0.5639 | ||
M3 | NARX | 0.5165 | 0.1396 | 0.0279 | 0.1046 | 0.4994 | 0.1432 | 0.2792 | 0.6885 | |
LSSVM | 0.5167 | 0.1396 | 0.1423 | 0.6701 | 0.514 | 0.1384 | 0.1344 | 0.7387 | ||
GRNN | 0.6518 | 0.1517 | 0.0027 | 0.0104 | 0.6311 | 0.1079 | 0.0126 | 0.1456 | ||
HW | 0.9636 | 0.0362 | 0.003 | 0.0115 | 0.9263 | 0.0483 | 0.0129 | 0.1494 | ||
M1 | NARX | 0.9948 | 0.122 | 0.0103 | 0.0400 | 0.9937 | 0.0121 | 0.0057 | 0.0656 | |
LSSVM | 0.5392 | 0.1565 | 0.018 | 0.021 | 0.509 | 0.1245 | 0.0192 | 0.2214 | ||
GRNN | 0.7519 | 0.1504 | 0.0009 | 0.0034 | 0.7163 | 0.0947 | 0.0074 | 0.085 | ||
HW | 0.8395 | 0.0671 | 0.0081 | 0.0312 | 0.8066 | 0.1039 | 0.0265 | 0.3066 | ||
pH | M2 | NARX | 0.9908 | 0.0027 | 0.0001 | 0.0016 | 0.9902 | 0.0027 | 0.0001 | 0.0022 |
LSSVM | 0.5421 | 0.1552 | 0.0134 | 0.0142 | 0.4911 | 0.1292 | 0.0127 | 0.1464 | ||
GRNN | 0.793 | 0.1399 | 0.0036 | 0.0138 | 0.7786 | 0.0836 | 0.0331 | 0.3828 | ||
HW | 0.7703 | 0.1399 | 0.0036 | 0.0138 | 0.7555 | 0.0879 | 0.0068 | 0.0782 | ||
M3 | NARX | 0.9968 | 0.0025 | 0.0001 | 0.0011 | 0.9948 | 0.0027 | 0.0001 | 0.0016 | |
LSSVM | 0.6033 | 0.1536 | 0.1055 | 1.973 | 0.6043 | 0.254 | 0.1088 | 1.2563 | ||
GRNN | 0.4358 | 0.1542 | 0.0005 | 0.0329 | 0.4144 | 0.0873 | 0.0108 | 0.2385 | ||
HW | 0.7398 | 0.1136 | 0.0037 | 0.2407 | 0.7594 | 0.125 | 0.0206 | 0.4544 | ||
M1 | NARX | 0.5822 | 0.2681 | 0.0418 | 2.7325 | 0.5613 | 0.0757 | 0.0518 | 2.9404 | |
LSSVM | 0.5384 | 0.1651 | 0.0027 | 0.1443 | 0.5478 | 0.0938 | 0.0086 | 0.1888 | ||
GRNN | 0.636 | 0.159 | 0.0175 | 1.1458 | 0.6552 | 0.0548 | 0.0149 | 1.3298 | ||
HW | 0.971 | 0.0012 | 0.0002 | 0.0101 | 0.9601 | 0.0002 | 0.0002 | 0.0043 | ||
Turb | M2 | NARX | 0.5959 | 0.1203 | 0.0198 | 0.1523 | 0.5693 | 0.0044 | 0.0045 | 0.0099 |
LSSVM | 0.7607 | 0.1641 | 0.0063 | 0.1474 | 0.7481 | 0.0999 | 0.0078 | 0.1715 | ||
GRNN | 0.7733 | 0.1644 | 0.0077 | 0.5047 | 0.751 | 0.0564 | 0.0131 | 0.2883 | ||
HW | 0.9166 | 0.0546 | 0.0021 | 0.138 | 0.9077 | 0.0283 | 0.0067 | 0.1472 | ||
M3 | NARX | 0.739 | 0.1235 | 0.0191 | 1.2451 | 0.7065 | 0.2911 | 0.1901 | 1.0086 | |
LSSVM | 0.761 | 0.1627 | 0.2053 | 3.4712 | 0.7239 | 0.4837 | 0.252 | 5.5602 | ||
GRNN | 0.5522 | 0.2141 | 0.0086 | 0.0307 | 0.5642 | 0.2131 | 0.0026 | 0.0271 | ||
HW | 0.754 | 0.2659 | 0.0362 | 0.0013 | 0.7447 | 0.0958 | 0.0354 | 0.3727 | ||
M1 | NARX | 0.5478 | 0.3221 | 0.03 | 0.0016 | 0.5011 | 0.1905 | 0.0401 | 0.0084 | |
LSSVM | 0.4513 | 0.3226 | 0.004 | 1.928 | 0.408 | 0.4273 | 0.0529 | 1.5567 | ||
GRNN | 0.5294 | 0.3374 | 0.0136 | 0.0485 | 0.508 | 0.2029 | 0.0527 | 0.555 | ||
HW | 0.997 | 0.0188 | 0.0004 | 0.0016 | 0.9897 | 0.0192 | 0.002 | 0.0209 | ||
SS | M2 | NARX | 0.9687 | 0.0606 | 0.0074 | 0.0263 | 0.9546 | 0.0171 | 0.0095 | 0.0005 |
LSSVM | 0.672 | 0.3219 | 0.0832 | 0.44 | 0.649 | 0.2673 | 0.0526 | 0.5539 | ||
GRNN | 0.8176 | 0.3411 | 0.0868 | 0.0031 | 0.8167 | 0.173 | 0.0304 | 0.3203 | ||
HW | 0.944 | 0.081 | 0.0103 | 0.0676 | 0.9355 | 0.0275 | 0.0004 | 0.0913 | ||
M3 | NARX | 0.9169 | 0.0607 | 0.0076 | 0.0127 | 0.9072 | 0.0317 | 0.0003 | 0.0166 | |
LSSVM | 0.7126 | 0.3203 | 0.0047 | 2.0024 | 0.7793 | 0.5987 | 0.306 | 3.2214 |
Results of single models (GRNN, LSSVM, HW, and NARX)
It is essential to determine the appropriate range of the smoothing factor in any GRNN, due to its significant effect on the simulation ability of the model. The smoothing factor needs to be an average value because extremely large or small values will affect the regression result, so the smoothing factor was considered within the range of 0.01–1 (Ji et al. 2017). The LSSVM model with non-linear radial basis function (RBF) kernel function was created for different conditions. For this purpose, different γ and σ values, through a grid search, were tried to get the best modeling result (Nourani et al. 2017). HW and NARX were built using the system identification toolbox of MATLAB based on the configuration of piecewise linear function (range 10–20) containing input and output non-linearity predictors in the case of the HW model. While for the NARX model, a specify delay and number of terms in standard regressors were considered as 1 and 4–8, respectively.
The direct comparison of results for all of the models shows that the HW model outperformed the other models in the prediction of hardness (see Table 3). The assessment of the results also indicates that for the HW model, the best model was acquired using the third input combination (M3). This can be proved using the performance indicators as the basis of identifying an optimal model for HW ((DC = 0.8272), (DC = 0.8316), and (DC = 0.9181) for M1, M2, and M3, respectively). According to the obtained DCs, addition of hardness, Cl, and SS increased the performance of the prediction as indicated in the third combination. This finding corresponds with that identified by Dan'azumi & Bichi (2010).
The M1, M2, and M3 for the HW model produced RMSE values of 0.1208, 0.1269, and 0.0245, respectively, and MAE of 0.0276, 0.3639, and 0.5639. Another important indicator is MAPE, which provides a better way to determine the predicting model (Gaya et al. 2017). The smaller the MAPE value, the more accurate the estimation model. Table 3 shows that the HW model has the smallest MAPE values with regards to LSSVM, and NARX models with the exception of GRNN. It is noteworthy that several evaluation metrics indicate the merits of the HW over the other models. To continue with additional examination of the estimating models, a series of scatter plots for the three input combinations for the HW model are shown in Figure 3(a). From the plots it can be seen that M3 was able to obtain the best agreement between the observed and computed hardness values. Assessment based on the time series estimated hardness is essential to provide a deeper and more comprehensive analysis, as Figure 4(a) indicates the best computed hardness values for the four models for the verification phase. A visual examination of the time series shows that the computed values of HW were closer to the observed hardness than GRNN, LSSVM, and NARX models.
Observed vs computed scatter plot for best single models for (a) hardness, (b) pH, (c) Turb, (d) SS treated water using the three sets of input combinations (M1, M2, and M3).
Observed vs computed scatter plot for best single models for (a) hardness, (b) pH, (c) Turb, (d) SS treated water using the three sets of input combinations (M1, M2, and M3).
Observed vs computed time series plots obtained by the best single models for (a) hardness, (b) pH, (c) Turb, and (d) SS treated water.
Observed vs computed time series plots obtained by the best single models for (a) hardness, (b) pH, (c) Turb, and (d) SS treated water.
Table 3 also depicts the obtained results for pH computed by GRNN, LSSVM, HW, and NARX using three different input combinations (M1, M2, and M3) and the best model emerged as NARX using the third input combination (M3). As shown in Table 3, M3 has the lowest RMSE and MAE and highest DC values in the verification phase with merit and suitable accuracy for two other input combinations. Considering the results of the four models in predicting pH, HW was the second-best model followed by GRNN and lastly LSSVM. Further examination of the models shows that when all the variables are used as inputs in pH modeling, the accuracy of the model is higher. Figures 3(b) and 4(b) show the scatter and time series plots for the computed pH, respectively. Turbidity (Turb) as the measure of WQ is also simulated and the obtained results are presented in Table 3. From Table 3, the HW model outperformed all models with reasonable accuracy in both calibration and verification steps. According to the results, it can be justified that M2 with the combination of pH, Turb, Cond, TDS, hardness, and Cl demonstrated the best performance accuracy and therefore proved to be a reliable model for prediction of turbidity in TWTP. A survey of Turb results indicates that HW-M2 increases the prediction accuracy up to 20% and 5% with regards to HW-M1 and HW-M3, respectively. Comparison of results with respect to RMSE and MAE values indicates that the quantitative prediction of Turb can be arranged in the following order: HW > NARX > LSSVM > GRNN. Figures 3(c) and 4(c) show the scatter and time series plots for the Turb parameter in the verification phase, respectively. SS is considered to be one of the major pollutants that contributes to the deterioration of WQ, as such the obtained result for this variable is also presented in Table 3. It is noticed that the HW model is more satisfactory for predicting the SS in TWTP, supported by the values of DC, RMSE, MAE, and MAPE. Among the input combinations, M2 was found to be the best model and the simulated results showed a good level of satisfaction both in calibration and verification steps. Figures 3(d) and 4(d) show the scatter and time series plots for the SS parameter in the verification phase, respectively. Further investigation of SS results shows that HW-M2 increases the prediction accuracy up to 24% and 19% with regards to HW-M1 and HW-M3, respectively, according to computed DC. In addition, comparison of results with respect to RMSE and MAE values indicates that the quantitative prediction of SS can be arranged in the following order: HW > NARX > GRNN > LSSVM. The assessment of the different modeling paradigms (GRNN, HW, NARX, and LSSVM) with different input combinations (i.e., M1, M2, and M3) illustrates that the accurate predictions depend on the used model for a given input data set.
The comparison of multi-parametric simulation in this study proved that HW could serve as the best model for the simulation of hardness, Turb, and SS while the NARX model demonstrated high capability in predicting pH variable. It is apparent that system identification models (HW and NARX) verified the improvement of prediction capability better than GRNN and LSSVM models. Thus, the HW model clearly indicates the extent of effluent removal efficiency and plant performance in TWTP. As mentioned above and justified by Figure 4, there is no one exceptional model that delivers the best estimation performance. The best sets of model inputs were not the same for each of the explored predictive modeling techniques, demonstrating that the respective model types respond differently to different input variable sets and the data patterns/characteristics in the historical input data. Therefore, combining the outputs of these models in the form of an ensemble context could increase the generalization capability of the modeling by properly capturing the target and hence improving the prediction performance. As such, four ensemble techniques (GRNN-E, LSSVM-E, HW-E, and NARX-E) were employed to improve the predictive performance by combining the outputs of the single models.
Results of NLE models (GRNN-E, HW, NARX-E, and LSSVM-E)
As reported by Yaseen et al. (2016), no single model works best for all data sets, and different data characteristics such as data size, normality, linearity, and correlation all affect model performance considerably. Nourani et al. (2019) also reported that even for the same data set, different models may excel in different aspects. Hence, gathering the multiple models in an ensemble approach has been found to be effective for a variety of problems. Different NLEs such as GRNN-E, LSSVM-E, HW-E, and NARX-E performed as shown in Figure 5 and the NLE results are presented in Table 4.
Results of non-linear ensemble models for estimation of hardness, Turb, pH, and SS
Parameter . | Ensemble models . | Calibration . | Verification . | ||||||
---|---|---|---|---|---|---|---|---|---|
DC . | RMSEa . | MAE . | MAPE . | DC . | RMSEa . | MAE . | MAPE . | ||
GRNN-E | 0.9991 | 0.0091 | 0.0002 | 0.0007 | 0.9969 | 0.0085 | 0.0011 | 0.0093 | |
HardnessT | HW-E | 0.6843 | 0.1576 | 0.0011 | 0.0041 | 0.6973 | 0.1365 | 0.0893 | 0.7884 |
NARX-E | 0.5179 | 0.4362 | 0.3873 | 1.4706 | 0.5195 | 0.5148 | 0.3975 | 3.5081 | |
LSSVM-E | 0.8687 | 0.2821 | 0.5555 | 1.8235 | 0.8345 | 0.6771 | 0.0342 | 2.8374 | |
GRNN-E | 0.9989 | 0.0439 | 0.0042 | 0.0164 | 0.9989 | 0.0663 | 0.0167 | 0.1927 | |
pHT | HW-E | 0.6301 | 0.1537 | 0.0325 | 0.1255 | 0.6008 | 0.1721 | 0.0378 | 0.4369 |
NARX-E | 0.6779 | 0.3073 | 0.0464 | 0.1794 | 0.6258 | 0.2167 | 0.0109 | 0.1257 | |
LSSVM-E | 0.8878 | 0.4091 | 0.1734 | 0.4301 | 0.7315 | 0.197 | 0.0238 | 0.1413 | |
GRNN-E | 0.9838 | 0.0241 | 0.0037 | 0.0281 | 0.9991 | 0.0008 | 0.0002 | 0.0052 | |
TurbT | HW-E | 0.9359 | 0.1923 | 0.0004 | 0.0029 | 0.9288 | 0.0596 | 0.0113 | 0.2483 |
NARX-E | 0.6022 | 0.2691 | 0.1913 | 1.4707 | 0.5349 | 1.6377 | 0.3308 | 7.2979 | |
LSSVM-E | 0.8007 | 0.2529 | 0.2494 | 1.6101 | 0.6203 | 1.3691 | 0.5132 | 5.7450 | |
GRNN-E | 0.9998 | 0.0008 | 0.0015 | 0.0053 | 0.9988 | 0.0002 | 0.0017 | 0.0043 | |
SST | HW-E | 0.5592 | 0.2995 | 0.0153 | 0.0546 | 0.5433 | 0.1797 | 0.0486 | 0.5112 |
NARX-E | 0.7292 | 0.5355 | 0.4118 | 1.4706 | 0.6837 | 0.5316 | 0.4336 | 4.5646 | |
LSSVM-E | 0.6819 | 0.3609 | 0.0588 | 0.0924 | 0.6326 | 0.0341 | 0.1739 | 0.4645 |
Parameter . | Ensemble models . | Calibration . | Verification . | ||||||
---|---|---|---|---|---|---|---|---|---|
DC . | RMSEa . | MAE . | MAPE . | DC . | RMSEa . | MAE . | MAPE . | ||
GRNN-E | 0.9991 | 0.0091 | 0.0002 | 0.0007 | 0.9969 | 0.0085 | 0.0011 | 0.0093 | |
HardnessT | HW-E | 0.6843 | 0.1576 | 0.0011 | 0.0041 | 0.6973 | 0.1365 | 0.0893 | 0.7884 |
NARX-E | 0.5179 | 0.4362 | 0.3873 | 1.4706 | 0.5195 | 0.5148 | 0.3975 | 3.5081 | |
LSSVM-E | 0.8687 | 0.2821 | 0.5555 | 1.8235 | 0.8345 | 0.6771 | 0.0342 | 2.8374 | |
GRNN-E | 0.9989 | 0.0439 | 0.0042 | 0.0164 | 0.9989 | 0.0663 | 0.0167 | 0.1927 | |
pHT | HW-E | 0.6301 | 0.1537 | 0.0325 | 0.1255 | 0.6008 | 0.1721 | 0.0378 | 0.4369 |
NARX-E | 0.6779 | 0.3073 | 0.0464 | 0.1794 | 0.6258 | 0.2167 | 0.0109 | 0.1257 | |
LSSVM-E | 0.8878 | 0.4091 | 0.1734 | 0.4301 | 0.7315 | 0.197 | 0.0238 | 0.1413 | |
GRNN-E | 0.9838 | 0.0241 | 0.0037 | 0.0281 | 0.9991 | 0.0008 | 0.0002 | 0.0052 | |
TurbT | HW-E | 0.9359 | 0.1923 | 0.0004 | 0.0029 | 0.9288 | 0.0596 | 0.0113 | 0.2483 |
NARX-E | 0.6022 | 0.2691 | 0.1913 | 1.4707 | 0.5349 | 1.6377 | 0.3308 | 7.2979 | |
LSSVM-E | 0.8007 | 0.2529 | 0.2494 | 1.6101 | 0.6203 | 1.3691 | 0.5132 | 5.7450 | |
GRNN-E | 0.9998 | 0.0008 | 0.0015 | 0.0053 | 0.9988 | 0.0002 | 0.0017 | 0.0043 | |
SST | HW-E | 0.5592 | 0.2995 | 0.0153 | 0.0546 | 0.5433 | 0.1797 | 0.0486 | 0.5112 |
NARX-E | 0.7292 | 0.5355 | 0.4118 | 1.4706 | 0.6837 | 0.5316 | 0.4336 | 4.5646 | |
LSSVM-E | 0.6819 | 0.3609 | 0.0588 | 0.0924 | 0.6326 | 0.0341 | 0.1739 | 0.4645 |
aSince all data are normalized, the RMSE has no dimension.
Radar chart of MAPE for hardness, Turb, pH, and SS for both calibration and verification phases.
Radar chart of MAPE for hardness, Turb, pH, and SS for both calibration and verification phases.
Table 4 demonstrates that among the NLE modeling, GRNN-E outperformed the other models with considerable accuracy, due to its robustness in dealing with complex interactions which perform prediction without requiring large samples. The accuracy of GRNN-E as a type of ANN can also be attributed to the promising predictive skills of the overall ANN model. Peng & Kumar (2005) and Nourani et al. (2019) reported that in a few cases single models can be superior to the ensemble model in some instances, which was also observed by comparing Tables 3 and 4.
Figure 5 shows the radar chart showing the different percentage variation of MAPE. From Figure 5 it is clear that the MAPE value is less than 10% for all the best models, which demonstrated the high accuracy of the model. Similarly, in terms of DC, GRNN-E increased the accuracy of the modeling up to 30% for hardness and Turb, 34% for pH, and 37% for SS with regards to the single models.
In Figure 6, a Taylor diagram was used as the most widely recommended diagram for accuracy comparison (Kim et al. 2018; Zhu et al. 2019). The performances of the GRNN-E, HW-E, NARX-E, and LSSVM-E models were compared during verification phase, using the correlation coefficient (R) and the standard deviation (SD). From Figure 6 it can be seen that GRNN-E outperformed other models, because the observed points are closer to the computed points. This can also be proved by considering the high value of R which was attributed to the GRNN-E. Generally, if the SD of the computed values is higher than the SD of observed values, then it will result in overestimation and vice versa.
Taylor diagram depicting the ensemble performance of GRNN-E, HW-E, NARX-E, and LSSVM-E models for (a) hardness, (b) pH, (c) Turb, and (d) SS during the verification phase.
Taylor diagram depicting the ensemble performance of GRNN-E, HW-E, NARX-E, and LSSVM-E models for (a) hardness, (b) pH, (c) Turb, and (d) SS during the verification phase.
CONCLUSIONS
In this study, GRNN, HW, NARX, and LSSVM for modeling the treated hardness, turbidity, pH, and suspended solids were applied in Tamburawa WTP Kano, Nigeria. Subsequently, the NETs were also employed using the outputs of the single models to enhance the overall accuracy of the prediction. Weekly available data from the treatment plant were used and various statistical performance indicators were used to explore the predictive capability of the overall modeling techniques. The comparison results of multi-parametric simulation in this study showed that HW served as the best model for the simulation of HardnessT, TurbT, and SST while the NARX model demonstrated high capability in predicting pHT variable. Yet, the HW and NARX as system identification techniques attained overall best predictive performance among the four modeling approaches. However, among the NLE models, GRNN-E proved of high merit and increased the accuracy of the best single models up to 30% for HardnessT and TurbT, 34% for pHT, and 37% for SST with regards to the single models. Finally, such a model can serve as the background and be part of the portfolio of tools used by water engineers in decision-making at TWTP. The results also recommended that the application of other AI such as extreme learning machine (ELM), wavelet and the combination of hybrid models may also be combined in the proposed ensemble approach in order to integrate a set of models so as to create a new model which could yield higher precision than the used single models.
Just like any data-driven method and as the limitation of the study, the efficiency of the proposed model is markedly related to the quality and quantity of the available data, which most of the time is a challenge in developing countries. Therefore, the used data should be checked and when gathering more data in the future, the models should be updated using the new data.
SUPPLEMENTARY DATA
The Supplementary Data for this paper is available online at http://dx.doi.org/10.2166/aqua.2019.078.