Abstract
The measurement of the wastewater BOD5 level requires five days, and the use of a prediction model to estimate BOD5 saves time and enables the adoption of an online control system. This study investigates the application of artificial neural networks (ANNs) in predicting the influent BOD5 concentration and the performance of WWTPs. The WWTP performance was defined in terms of the COD, BOD, and TSS concentrations in the effluent. Sensitivity analysis was performed to identify the best-performing ANN network structure and configuration. The results showed that the ANN model developed to predict the BOD concentration performed the best among the three outputs. The top-performing ANN models yielded R2 values of 0.752, 0.612, and 0.631 for the prediction of the BOD, COD, and TSS concentrations, respectively. The optimal performing models were obtained (three inputs – one output), which indicated that the influent temperature and conductivity greatly affect the WWTP performance as inputs in all models. The developed prediction model for the influent BOD5 concentration attained a high accuracy, i.e., R2 = 0.754, which implies that the model is viable as a soft sensor for online control and management systems for WWTPs. Overall, the ANN model provides a simple approach for the prediction of the complex processes of WWTPs.
HIGHLIGHTS
ANN model provides an assessment tool for WWTP design and performance.
Increasing the number of model inputs beyond three inputs was not beneficial.
Influent BOD and conductivity have the highest effect on the WWTP effluent.
COD input parameter had the highest impact on BOD5 prediction model.
BOD5 soft-sensor development is viable using ANN model.
Graphical Abstract
INTRODUCTION
Rapid urban development leads to an increase in wastewater flow rates, which requires a high pace of advancement in treatment methods. Wastewater treatment plant (WWTP) effluent quality is a key factor in environmental and health concerns (Hamed et al. 2004). Thus, it is important to be able to model and simulate WWTPs to identify optimal conditions and to avoid future operation failures. However, the characteristics and variables involved in WWTPs are numerous and exhibit a high level of complexity, which leads to difficulties in modeling through linear regression models (Hamed et al. 2004). Moreover, physical and chemical simulations through software can be performed to simulate WWTPs. Ferrer et al. (2008) applied Design and Simulation of Activated Sludge Systems (DESASS) software to model and optimize WWTP performance in the steady or transient state. Other software includes anaerobic digestion models (Batstone et al. 2002) and the Activated Sludge Model (ASM) (Henze et al. 1987, 1995, 1999). However, these software programs require a large amount of input data regarding the process at each stage of the operation.
Data-driven simulation and modeling is an alternative approach in the modeling of complex systems (Elmolla et al. 2010). Some of these methods include fuzzy logic and artificial neural networks (ANNs). Data-driven modeling relies on advancements in the computing field and artificial intelligence (AI). ANNs can be applied to predict the performance of WWTPs. Due to their high prediction accuracy, ANNs were developed to model the water treatment process (Manu & Thalla 2017; Newhart et al. 2019; Alver & Kazan 2020). ANN modeling greatly relies on the quality of historical data. Poor historical data quality could lead to poor ANN modeling performance. However, ANN modeling requires a relatively small amount of data to provide acceptable prediction results. The use of neural network models to simulate wastewater plants provides a framework for plant operation monitoring (Mjalli et al. 2007). This monitoring framework leads to the minimization of the cost of operation and the determination of the quality of environmental stability. Mjalli et al. (2007) collected data over a span of one year (measurements were conducted every five days). It was determined that the outputs of the secondary treatment effluent (STE) were the inputs of their model. The overall data set was divided at a ratio of 4:2:1 into training, validation, and testing data sets. It should be noted that Mjalli et al. (2007) employed a feed-forward neural network (FFNN), which was developed using MATLAB software. In all single-input networks, the network structure consists of three layers (input, hidden, and output layers). In multi-input networks, the network structure consists of four layers (two hidden layers). The optimal network structure was found to be 1–40–1 with the COD as the input variable (R2 = 0.987 and MSE = 0.021). The training output function was based on the mean square error (MSE) between the network prediction and observed values. Nasr et al. (2012) applied an ANN to predict the performance of wastewater treatment plants in Alexandria. The plant performance was studied in terms of the total suspended solids (TSS), biological oxygen demand (BOD), and chemical oxygen demand (COD) over a one-year period. The gathered data were classified into four groups, where each group represents three months of the year. The feedforward network was developed and trained using back propagation. The predicted values were highly correlated with the measured values (R2 = 0.90317).
Guo et al. (2015) applied machine learning through a feedforward ANN and support vector machine (SVM) to predict the total nitrogen concentration in the effluent of a WWTP in Ulsan, Korea. The inputs of the models were the total nitrogen, total phosphorus, and TSS of the influent flow. Based on the coefficient of determination (R2), Nash–Sutcliffe efficiency (NSE) and relative efficiency ( criteria, the results showed that both models were effective in prediction (R2 = 0.55, NSE = 0.56, and
= 0.8). Even though the SVM model attained a higher prediction efficiency, the ANN model consisting of three layers (input, hidden, and output layers) was more effective in correlating the input values to the T-N concentrations. Nezhad et al. (2016) employed the MATLAB ANN toolbox to apply machine learning techniques to predict the effluent quality index (EQI) for wastewater treatment plants in Tehran. A feedforward back-propagated neural network was developed that consisted of three hidden layers. The inputs of the model included BOD, TDS, TSS, FC, PO4, NH4, and pH. The results indicated that the optimal network structure was 8–7–1, which led to a high EQI prediction efficiency (R = 0.96, and MSE = 0.1).
Moreover, Abba & Elkiran (2017) predicted the COD of the effluent of wastewater treatment plants using a feed-forward neural network (FFNN). The COD of the effluent is considered a major parameter to assess the performance of WWTPs. An FFNN was applied to predict the COD, BOD, pH, T-P, T-N, TSS, and conductivity of the effluent at WWTP inlets. Several structures and input combinations were considered; however, the FFNN model with an 8–8–1 network structure, utilizing all four input parameters, yielded good performance and the highest accuracy in effluent COD prediction (R2 = 0.7, and RMSE = 0.0108).
The use of neural network models to simulate wastewater plants provides a framework for plant operation monitoring (Mjalli et al. 2007). This monitoring framework enables minimizing the cost of operation and determining the quality of environmental stability. Moreover, previous studies did not present the method used for determining the ANN configuration. Thus, this study provides a simple approach to identify the optimal ANN configuration for WWTP plant performance prediction.
Moreover, several regression models and artificial neural networks were developed to predict the five-day biological oxygen demand (BOD5) (Kasem et al. 2018; Baki et al. 2019; Najafzadeh & Ghaemi 2019). The importance of BOD5 modeling stems from the extensive laboratory procedures performed to measure the BOD5 concentration, as these tests require approximately five days. A regression model based on the relation between the characteristics of wastewater and BOD5 was developed and proved to attain a high accuracy (R2 up to 0.7966) (Baki et al. 2019). Furthermore, Kasem et al. (2018) developed a software sensor to monitor the BOD5 concentration in the Sefidrood River in Iran using a feedforward ANN as a function of the dissolved oxygen level. The developed ANN performance indicated was suitable and a high R2 value was generated (up to 0.89). In WWTPs, deep neural networks and genetic algorithms were also used to design a BOD5 soft sensor (Qiu et al. 2016). The developed sensor was tested under three conditions, i.e., dry, rainy, and stormy weather conditions on the BSM1 simulation platform, and the results revealed good performance under extreme weather conditions. Although there have been a few studies that developed soft sensors for BOD monitoring, there is a lack of parameter analysis and its effect on predicting the inlet BOD concentration, which will be performed in this study.
Most of the literature utilized TSS, COD, and BOD to predict the performance of WWTPs using ANN-based models. These parameters are considered the controlling parameters of the effluent quality from WWTPs (Abba & Elkiran 2017). Furthermore, WWTPs commonly consist of a series of complex processes that cannot be modeled using simple regression techniques due to the large number of input parameters and data points required to capture the plant performance. Thus, the capacity of ANN modeling makes it a reliable solution in modeling the performance of wastewater treatment plants. The objective of this paper is to investigate the application of ANNs in WWTP modeling and to provide a simple algorithm to determine the optimal ANN configuration capturing the complex behavior of treatment plants. In contrast to most of the previous literature, this study aims to provide a simple determination method for the optimal configuration, which is defined in this study as a simple and general configuration that can be extrapolated, in future studies, to multiple treatment plants with similar processes. Additionally, the generated model could be utilized in the design of control systems and the monitoring of plant performance and water quality parameters. This study also aims to develop a viable ANN to act as a soft sensor of the influent BOD5 in WWTPs, as there is a lack of this specific approach in the previous literature. The above soft sensor is expected to decrease the five-day period of the BOD5 measurement to several hours, which will allow the use of online control systems.
METHODOLOGY AND MODEL DEVELOPMENT
Data collection and preparation
The data analyzed in this study were collected from the Kabd WWTP, which is located in Kuwait, over a seven-year period (2013 to 2019). A total of six wastewater parameters were adopted as model inputs, including the influent temperature, pH, conductivity, and TSS, COD, and BOD concentrations. These chosen input parameters sufficiently describe the WWTP performance (Mjalli et al. 2007; Nasr et al. 2012). These parameters were employed to predict the WWTP performance, which is represented by the effluent BOD, COD, and TSS concentrations. The influent temperature, pH, conductivity, TSS, and COD were used to predict the influent BOD5 concentration. The total number of data points was 2,397 points; however, they included missing values. Thus, IBM SPSS Statistics 26 was applied to conduct missing value analysis along with descriptive analysis on the data set, as indicated in Table 1. The BOD5 concentration had the most missing values among all the other parameters, which could be explained by worker negligence in performing the tedious five-day experimental procedure to estimate the BOD5 value. Thus, a BOD5 prediction model is a valuable approach to calculate these missing values. Listwise deletion of missing data points was conducted using a developed MATLAB code. This code located the missing values of any parameter and deleted the whole data row. After removing the data points with missing values for all of the parameters, the total number of data points was 1,032 points.
Descriptive and missing value analysis results on the data set used in this study
Descriptive statistics . | Missing . | No. of extremesa . | |||||||
---|---|---|---|---|---|---|---|---|---|
. | N . | Minimum . | Maximum . | Mean . | Standard deviation . | Count . | Percent . | Low . | High . |
Temperature (°C) | 1,992 | 16.6 | 36.9 | 28.8 | 4.3 | 405 | 16.9 | 1 | 0 |
pH | 1,973 | 6.48 | 10.12 | 7.0 | 0.2 | 424 | 17.7 | 4 | 31 |
Conductivity (μs/cm) | 1,991 | 728 | 2,100 | 1,504.1 | 194.1 | 406 | 16.9 | 7 | 1 |
TSS (mg/L) | 1,990 | 72 | 1,124 | 173.4 | 52.6 | 407 | 17.0 | 10 | 86 |
COD (mg/L) | 1,989 | 196 | 1,570 | 613.9 | 143.8 | 408 | 17.0 | 16 | 126 |
BOD5 (mg/L) | 1,365 | 17 | 541 | 279.6 | 88.2 | 1,032 | 43.1 | 21 | 14 |
TSS effluent (eff.) (mg/L) | 1,989 | 0 | 92 | 7.1 | 5.1 | 408 | 17.0 | 3 | 50 |
COD eff. (mg/L) | 1,988 | 0 | 217 | 25.3 | 12.3 | 409 | 17.1 | 2 | 70 |
BOD5 eff. (mg/L) | 1,365 | 1 | 41 | 6.2 | 3.4 | 1,032 | 43.1 | 0 | 14 |
Descriptive statistics . | Missing . | No. of extremesa . | |||||||
---|---|---|---|---|---|---|---|---|---|
. | N . | Minimum . | Maximum . | Mean . | Standard deviation . | Count . | Percent . | Low . | High . |
Temperature (°C) | 1,992 | 16.6 | 36.9 | 28.8 | 4.3 | 405 | 16.9 | 1 | 0 |
pH | 1,973 | 6.48 | 10.12 | 7.0 | 0.2 | 424 | 17.7 | 4 | 31 |
Conductivity (μs/cm) | 1,991 | 728 | 2,100 | 1,504.1 | 194.1 | 406 | 16.9 | 7 | 1 |
TSS (mg/L) | 1,990 | 72 | 1,124 | 173.4 | 52.6 | 407 | 17.0 | 10 | 86 |
COD (mg/L) | 1,989 | 196 | 1,570 | 613.9 | 143.8 | 408 | 17.0 | 16 | 126 |
BOD5 (mg/L) | 1,365 | 17 | 541 | 279.6 | 88.2 | 1,032 | 43.1 | 21 | 14 |
TSS effluent (eff.) (mg/L) | 1,989 | 0 | 92 | 7.1 | 5.1 | 408 | 17.0 | 3 | 50 |
COD eff. (mg/L) | 1,988 | 0 | 217 | 25.3 | 12.3 | 409 | 17.1 | 2 | 70 |
BOD5 eff. (mg/L) | 1,365 | 1 | 41 | 6.2 | 3.4 | 1,032 | 43.1 | 0 | 14 |
aNumber of cases outside the range of (Q1 − 1.5*IQR, Q3 + 1.5*IQR).


Artificial neural network
The idea of strong computing methods, known as neural networks, as an equivalent to the human brain, was developed in the late 1800s (Lippmann 1988). Artificial neural networks consist of artificial neurons in a connected layered structure to provide the desired output. The network is trained through the continuous addition of data. Then, generalization of the network is realized through the introduction of new unseen data. The principal advantages of ANNs lie in (a) their high learning speed and data processing and (b) their capacity to represent highly nonlinear systems. However, the major drawback is the black box aspect of ANNs (Lippmann 1988).





Input combinations used in this study
Input . | Parameter combination . |
---|---|
input 1 | Temperature |
input 2 | pH |
input 3 | Conductivity_inf |
input 4 | TSS_inf |
input 5 | COD_inf |
input 6 | BOD5_inf |
input 7 | TSS_inf + COD_inf + BOD5_inf |
input 8 | Conductivity_inf + COD_inf + BOD5_inf |
input 9 | Conductivity_inf + TSS_inf + BOD5_inf |
input 10 | Conductivity_inf + TSS_inf + COD_inf |
input 11 | pH_inf + COD_inf + BOD5_inf |
input 12 | pH_inf + TSS_inf + BOD5_inf |
input 13 | pH_inf + TSS_inf + COD_inf |
input 14 | pH_inf + Conductivity_inf + BOD5_inf |
input 15 | pH_inf + Conductivity_inf + COD_inf |
input 16 | pH_inf + Conductivity_inf + TSS_inf |
input 17 | Temp_inf + COD_inf + BOD5_inf |
input 18 | Temp_inf + TSS_inf + BOD5_inf |
input 19 | Temp_inf + TSS_inf + COD_inf |
input 20 | Temp_inf + Conductivity_inf + BOD5_inf |
input 21 | Temp_inf + Conductivity_inf + COD_inf |
input 22 | Temp_inf + Conductivity_inf + TSS_inf |
input 23 | Temp_inf + pH_inf + BOD5_inf |
input 24 | Temp_inf + pH_inf + COD_inf |
input 25 | Temp_inf + pH_inf + TSS_inf |
input 26 | Temp_inf + pH_inf + Conductivity_inf |
input 27 | Temp_inf + Conductivity + pH + BOD5_inf + COD_inf + TSS_inf |
Input . | Parameter combination . |
---|---|
input 1 | Temperature |
input 2 | pH |
input 3 | Conductivity_inf |
input 4 | TSS_inf |
input 5 | COD_inf |
input 6 | BOD5_inf |
input 7 | TSS_inf + COD_inf + BOD5_inf |
input 8 | Conductivity_inf + COD_inf + BOD5_inf |
input 9 | Conductivity_inf + TSS_inf + BOD5_inf |
input 10 | Conductivity_inf + TSS_inf + COD_inf |
input 11 | pH_inf + COD_inf + BOD5_inf |
input 12 | pH_inf + TSS_inf + BOD5_inf |
input 13 | pH_inf + TSS_inf + COD_inf |
input 14 | pH_inf + Conductivity_inf + BOD5_inf |
input 15 | pH_inf + Conductivity_inf + COD_inf |
input 16 | pH_inf + Conductivity_inf + TSS_inf |
input 17 | Temp_inf + COD_inf + BOD5_inf |
input 18 | Temp_inf + TSS_inf + BOD5_inf |
input 19 | Temp_inf + TSS_inf + COD_inf |
input 20 | Temp_inf + Conductivity_inf + BOD5_inf |
input 21 | Temp_inf + Conductivity_inf + COD_inf |
input 22 | Temp_inf + Conductivity_inf + TSS_inf |
input 23 | Temp_inf + pH_inf + BOD5_inf |
input 24 | Temp_inf + pH_inf + COD_inf |
input 25 | Temp_inf + pH_inf + TSS_inf |
input 26 | Temp_inf + pH_inf + Conductivity_inf |
input 27 | Temp_inf + Conductivity + pH + BOD5_inf + COD_inf + TSS_inf |
At present, to determine the optimal number of hidden layers and neurons in each layer, several network structures should be tested, and the performance of each network should be evaluated. Although a small number of hidden layers and neurons could lead to a suitable generalization, it may also lead to an underfitted network. In contrast, a large number of hidden layers could lower the training error but could increase network overfitting (Geman et al. 1992). Various configurations of the network structure were tested, and the optimal structure for each output was determined through trial and error. The configuration parameters included the number of inputs, hidden layers, and neurons in the hidden layer.
RESULTS AND DISCUSSION
Correlation matrix
The developed correlation matrix, summarized in Table 3, indicates the absence of any linear relationship among the parameters. However, this does not indicate the absence of any other types of relationships, as the small correlation coefficient indicates that conventional regression methods are unsuitable to predict such a complex system. Moreover, the input parameters lack any correlation among each other, which is required to train a reliable ANN. The presence of correlated input parameters biases the ANN towards the effect of these parameters.
Correlation matrix of the parameters in this study
. | Temperature . | pH . | Conductivity . | TSS influent (inf.) . | COD inf. . | BOD5 inf. . | TSS eff. . | COD eff. . | BOD5 eff. . |
---|---|---|---|---|---|---|---|---|---|
Temp. | 1.000 | ||||||||
pH | 0.365 | 1.000 | |||||||
Conductivity | 0.422 | 0.190 | 1.000 | ||||||
TSS | 0.018 | −0.019 | 0.012 | 1.000 | |||||
COD | 0.009 | 0.010 | −0.006 | 0.146 | 1.000 | ||||
BOD5 | 0.051 | −0.235 | −0.010 | 0.176 | 0.177 | 1.000 | |||
TSS eff. | 0.118 | −0.146 | 0.207 | 0.182 | 0.007 | 0.189 | 1.000 | ||
COD eff. | −0.056 | −0.108 | 0.109 | 0.025 | 0.041 | −0.017 | 0.353 | 1.000 | |
BOD5 eff. | 0.197 | −0.192 | 0.149 | 0.209 | 0.111 | 0.497 | 0.435 | 0.228 | 1.000 |
. | Temperature . | pH . | Conductivity . | TSS influent (inf.) . | COD inf. . | BOD5 inf. . | TSS eff. . | COD eff. . | BOD5 eff. . |
---|---|---|---|---|---|---|---|---|---|
Temp. | 1.000 | ||||||||
pH | 0.365 | 1.000 | |||||||
Conductivity | 0.422 | 0.190 | 1.000 | ||||||
TSS | 0.018 | −0.019 | 0.012 | 1.000 | |||||
COD | 0.009 | 0.010 | −0.006 | 0.146 | 1.000 | ||||
BOD5 | 0.051 | −0.235 | −0.010 | 0.176 | 0.177 | 1.000 | |||
TSS eff. | 0.118 | −0.146 | 0.207 | 0.182 | 0.007 | 0.189 | 1.000 | ||
COD eff. | −0.056 | −0.108 | 0.109 | 0.025 | 0.041 | −0.017 | 0.353 | 1.000 | |
BOD5 eff. | 0.197 | −0.192 | 0.149 | 0.209 | 0.111 | 0.497 | 0.435 | 0.228 | 1.000 |
ANN modeling and prediction
The development of the ANN model was realized using various structures to obtain the ANN with the optimal performance. First, each of the influent parameters was applied as input to predict each of the effluent parameters (one input to one output). Then, the number of inputs was gradually increased until all the inputs (six inputs) were applied to predict each output. The number of hidden layers and the number of neurons were also scanned for the optimal network parameters. The data set was divided into training, validation, and test sets at a ratio of 70:15:15.
The results of developing a one input to one output ANN for the estimation of the effluent TSS, BOD and COD concentrations from the WWTP plant are summarized in Table 4. The low R2 value indicates the complexity of the system, as it cannot be modeled using only one input parameter. In comparison, Mjalli et al. (2007) reported high R2 values while using a one input and one output ANN structure. However, the high performance occurred due to the large number of neurons used in their model, which led to overfitting and reduced model generalization.
One input and one output ANN model performance for BOD, COD, and TSS prediction
Input parameter . | Effluent BOD . | Effluent COD . | Effluent TSS . | |||
---|---|---|---|---|---|---|
ANN structure . | R2 . | ANN structure . | R2 . | ANN structure . | R2 . | |
Temperature | 1–25–1 | 0.2082 | 1–25–1 | 0.1052 | 1–25–1 | 0.1636 |
pH | 1–25–1 | 0.2228 | 1–24–1 | 0.1258 | 1–24–1 | 0.1495 |
Conductivity | 1–24–1 | 0.1013 | 1–24–1 | 0.1406 | 1–25–1 | 0.1544 |
TSS | 1–24–1 | 0.1310 | 1–24–1 | 0.0657 | 1–25–1 | 0.1562 |
COD | 1–25–1 | 0.1547 | 1–24–1 | 0.1001 | 1–25–1 | 0.1482 |
BOD | 1–25–1 | 0.3440 | 1–25–1 | 0.0998 | 1–25–1 | 0.1733 |
Input parameter . | Effluent BOD . | Effluent COD . | Effluent TSS . | |||
---|---|---|---|---|---|---|
ANN structure . | R2 . | ANN structure . | R2 . | ANN structure . | R2 . | |
Temperature | 1–25–1 | 0.2082 | 1–25–1 | 0.1052 | 1–25–1 | 0.1636 |
pH | 1–25–1 | 0.2228 | 1–24–1 | 0.1258 | 1–24–1 | 0.1495 |
Conductivity | 1–24–1 | 0.1013 | 1–24–1 | 0.1406 | 1–25–1 | 0.1544 |
TSS | 1–24–1 | 0.1310 | 1–24–1 | 0.0657 | 1–25–1 | 0.1562 |
COD | 1–25–1 | 0.1547 | 1–24–1 | 0.1001 | 1–25–1 | 0.1482 |
BOD | 1–25–1 | 0.3440 | 1–25–1 | 0.0998 | 1–25–1 | 0.1733 |
The best input performances in predicting the effluent BOD were obtained for the influent BOD, pH, and temperature. The top three COD predictors were the temperature, pH, and conductivity. The effluent TSS was best predicted by the temperature, influent BOD, and influent TSS. The effect of the initial pH on the BOD rate and wastewater treatment process has been examined in previous literature (Mukherjee et al. 1968; Liu et al. 2007). The removal efficiency of COD and TSS is known to increase with the temperature (Ahsan et al. 2005). The next step was to train three input and one output ANN models. It should be noted that every possible combination of three input parameters was considered with different numbers of hidden layers.
Effluent BOD prediction
Figure 2 shows the R2 values for predicting the effluent BOD for each input combination using the various numbers of hidden layers. It is found that the input-20 combination (temperature–conductivity–inf. BOD) outperforms the other input combinations. The results also show that for most of the input combinations, the prediction accuracy peaks when using three hidden layers. This explains the underfitting that occurs with a small number of hidden layers and the overfitting that occurs when using more than three hidden layers.
BOD model prediction accuracy for all three-input combinations using various numbers of hidden layers.
BOD model prediction accuracy for all three-input combinations using various numbers of hidden layers.
The input-20 combination was trained for various network structures, and the optimal ANN structure (R2 = 0.752 and d = 0.928) contained three hidden layers with 17 neurons for each layer (1–17–17–17–1), as shown in Figure 3. Figures 4 and 5 show the performance of the three-input ANN model as a parity plot between the measured and predicted values and as a comparison of the prediction and measured values along the data sequence, respectively. Although the R2 value is deemed inadequate, the modeled values match the measured values closely except for some extent values. These extent values might be caused by the noisy data set.
BOD prediction values against measured values for all data sequences.
The resulting model attains a lower performance than that of Mjalli et al. (2007); however, they adopted a much smaller data set (80 data points) than the data set used in this study (1,365 data points). In addition, Mjalli et al. employed the secondary treatment effluent as the input of their model, which is strongly correlated to the effluent of the plant. Thus, their neural network only predicts the tertiary treatment process, while in our study, the artificial neural network is designed to predict the whole treatment process.
Effluent COD prediction
The accuracy results for each input combination are shown in Figure 6 using an ANN with different numbers of hidden layers. The input-26 combination (temperature–conductivity–pH) outperformed the other input combinations. It was also observed that the combinations containing temperature + conductivity, or temperature + pH (inputs 20 to 26) resulted in a higher prediction performance than that of the other combinations. The results also revealed performance spikes for the other combinations; however, these were not considered due to the lack of consistency.
COD model prediction accuracy for all three-input combinations using various numbers of hidden layers.
COD model prediction accuracy for all three-input combinations using various numbers of hidden layers.
The temperature–conductivity–pH combination was trained for various network structures, and the optimal ANN structure (R2 = 0.6115 and d = 0.877) contained three hidden layers with 13 neurons for each layer (1–13–13–13–1), as shown in Figure 7. Figures 8 and 9 show the performance of the three-input ANN model as a parity plot between the measured and predicted values and as a comparison of the prediction and measured values along the data sequence, respectively. Figure 9 indicates that the prediction accuracy is low at the extent values. Except for these extent values, the prediction values are nearly identical to the measured values.
COD prediction values against measured values for all data sequences.
Additionally, Mjalli et al. (2007) generated a better-performing ANN model with many fewer data points and different input parameters. Moreover, the performance of the three-input ANN (R2 = 0.6115) is comparable to that of the ANN model developed with six inputs (R2 = 0.7034) (Abba & Elkiran 2017). This shows the reliability of our procedure of determining a simple, accurate ANN prediction model.
Effluent TSS prediction
For the TSS ANN model, the input-20 combination (temperature–conductivity–BOD) yielded better results than the other combinations, as shown in Figure 10. Moreover, the input combinations containing the conductivity attained a higher prediction accuracy than the other combinations. In contrast to BOD and COD, optimum TSS modeling occurred with four hidden layers rather than with three hidden layers. This indicates the amount of complexity involved in TSS modeling over BOD and COD modeling.
TSS model prediction accuracy for all three-input combinations using various numbers of hidden layers.
TSS model prediction accuracy for all three-input combinations using various numbers of hidden layers.
The temperature–conductivity–BOD combination was trained for various network structures, and the optimal ANN structure (R2 = 0.6308 and d = 0.884) consisted of four hidden layers with 11 neurons for each layer (1–11–11–11–11–1), as shown in Figure 11. Figures 12 and 13 show the performance of the three-input ANN model as a parity plot between the measured and predicted values and as a comparison of the prediction and measured values along the data sequence, respectively. TSS was found to be the most challenging parameter to model. However, the developed ANN managed to predict TSS with a sufficient accuracy in regard to the measured values. As with the BOD and COD predictions, with a smaller data set and a less generalized model, Mjalli et al. (2007) developed an ANN model with a high performance.
TSS prediction values against measured values for all data sequences.
Furthermore, six inputs of ANN were also trained for BOD, COD and TSS; however, they yielded a decreased performance (R2 = 0.649, 1–15–15–15–1; R2 = 0.47, 1–8–8–8–1; and R2 = 0.6974, 1–11–11–11–1; respectively). It is evident here that increasing the number of inputs is not always a benefit (Zare et al. 2011; Hamada et al. 2018). For all outputs in this study, using six inputs to develop the ANN failed to provide a higher performance (Wei 2013). However, the R2 values are comparable to those reported by Abba & Elkiran (2017) when predicting the effluent COD concentration.
To study the cause of the low value of the performance indicator for the models developed in this paper, residual plots were generated, which depict the goodness of fit of the created ANN models (Guo et al. 2015). Figure 14 shows the residual plots for each model. The low relationship between the residual and predicted data indicates that the low model performance is mainly due to data noise. Hence, the ANN model would yield a higher prediction accuracy using smoother data. Moreover, ANN models provide sufficient accuracy for the prediction of the effluent concentration.
Residual plots of (a) the effluent BOD, (b) effluent COD, and (c) effluent TSS ANN models.
Residual plots of (a) the effluent BOD, (b) effluent COD, and (c) effluent TSS ANN models.
Influent BOD5 prediction
Several ANN configurations (i.e., various input combinations, layers, and neurons) were developed and tested for the prediction of the BOD5 concentration in the influent wastewater treatment plant. Figure 15 shows the best prediction results of the test data set for each ANN configuration with respect to the coefficient of determination (R2). The difference between underfitting and overfitting is clearly demonstrated in Figure 15, where the R2 values are lower using either a small (one layer) or large (five layers) number of layers, and the maximum R2 value occurs between these extremes.
Values of R2 of the best of each ANN configuration for influent BOD5 prediction.
Values of R2 of the best of each ANN configuration for influent BOD5 prediction.
ANN configurations with the highest influent BOD5 prediction accuracy for the different numbers of input parameters
. | Input parameters . | ANN configuration . | R2 . |
---|---|---|---|
ANN–V | Temperature–pH–Conductivity–TSS–COD | (1–9–9–9–9–1) | 0.754 |
ANN–IV | Temperature–Conductivity–TSS–COD | (1–20–20–1) | 0.737 |
ANN–III | Temperature–Conductivity–COD | (1–13–13–13–1) | 0.702 |
ANN–II | Temperature–COD | (1–10–10–10–1) | 0.464 |
ANN–I | COD | (1–19–1) | 0.176 |
. | Input parameters . | ANN configuration . | R2 . |
---|---|---|---|
ANN–V | Temperature–pH–Conductivity–TSS–COD | (1–9–9–9–9–1) | 0.754 |
ANN–IV | Temperature–Conductivity–TSS–COD | (1–20–20–1) | 0.737 |
ANN–III | Temperature–Conductivity–COD | (1–13–13–13–1) | 0.702 |
ANN–II | Temperature–COD | (1–10–10–10–1) | 0.464 |
ANN–I | COD | (1–19–1) | 0.176 |
Furthermore, the introduction of the conductivity parameter greatly increased the prediction accuracy. Thus, for an ANN to sufficiently predict the influent BOD5, at least the influent COD, temperature, and conductivity are required as input parameters. Thus, decreasing the time to estimate BOD5 from five days to several hours is the time required to measure the COD concentration of any sample. From the different configurations listed in Table 4, ANN-V was adopted to predict BOD5, and the results are shown in Figure 16. The results demonstrate the high accuracy of the ANN in predicting BOD5 with the exception of a few data points. The spikes deviating from the measurement values occur due to noise in the training data set similar to the previously developed ANN.
Influent BOD prediction values against measured values for all data sequences using ANN–V.
Influent BOD prediction values against measured values for all data sequences using ANN–V.
The R2 values from the literature and previous models designed to predict BOD5 in WWTPs are listed in Table 6. The previous regression models, in the literature, attain a slightly higher performance than the ANN developed in this study. However, regression models tend to require a large amount of data and rely on a large number of input parameters. For instance, Tables S1 to S6 (Supplementary Material) list the results of linear regression for the data used in this study and using the input parameters of the optimal ANN developed. The results reveal low R2 values for the linear regression model. Moreover, the soft sensor of Qiu et al. (2016) yields a comparable or better performance in predicting BOD5 than the model in this study. However, the model developed in this study is notably simpler, as it does not require the use of a genetic algorithm.
The R2 values in previous studies
References . | Model type . | R2 . |
---|---|---|
Baki et al. (2019) | Regression models | 0.7966 |
Ebrahimi et al. (2017) | Stepwise Multivariate Regression Analysis | 0.82–0.83 |
Qiu et al. (2016) | FFNN and Genetic Algorithm | 0.7–0.96 |
References . | Model type . | R2 . |
---|---|---|
Baki et al. (2019) | Regression models | 0.7966 |
Ebrahimi et al. (2017) | Stepwise Multivariate Regression Analysis | 0.82–0.83 |
Qiu et al. (2016) | FFNN and Genetic Algorithm | 0.7–0.96 |
CONCLUSION
The developed ANN models suitably predicted the WWTP performance, which was defined based on the effluent TSS, BOD, and COD concentrations in the treatment plant, with a high degree of reliability. A low model performance mainly occurred due to noise in the data used to develop the ANN.
The results demonstrated that preliminary data analysis and preparation are essential for ANN training. It is recommended to gradually increase the complexity (number of inputs, hidden layers, and neurons) of the network configuration until no further improvement is recorded. In this study, a three input and one output configuration was sufficient, and a further increase in inputs resulted in overfitting of the system. The results also indicated that increasing the number of inputs was not always beneficial. Another finding from the developed models is the importance of the input parameters in the wastewater treatment process. The parameter significance could be interpreted from the model input combination that yielded the highest prediction accuracy. For instance, the influent temperature and conductivity greatly affected the WWTP performance as they were used as inputs in all models. However, the influent BOD concentration was important in the treatment process regarding the effluent BOD and TSS concentrations, while the effluent COD concentration was indirectly dependent on the influent pH.
In this study, the developed prediction model for the influent BOD5 concentration worked well and attained a high performance accuracy, namely, R2 = 0.754. This result demonstrates the viability of using this model as a soft sensor for online control and management systems for WWTPs. A further recommendation is to implement a more robust control system entailing frequent measurements to further tune the model with data highly representative of real-time operation conditions.
FUNDING
This research did not receive a specific grant from any funding agencies in the public, commercial, or not-for-profit sectors.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.