Predicting the coagulant dosage is especially crucial to the purification process in water treatment plants, directly affecting the quality of the purified water. Nowadays, several mathematical methods have been adopted for the purification process, but their predictive precision and speed still need to be improved. This study applies a novel neural network called the extreme learning machine (ELM) to predict the coagulant dosage based on certain signification factors of the raw water. Performances are compared between ELM and back-propagation neural networks in this paper. The results show that both neural network algorithms perform well in this application and ELM can realize online prediction due to its short time consumption.

## LIST OF PARAMETERS

- Parameters
Explanation

*M*The number of training data

- MAPE
Mean absolute percent error

*N*The dimension of input features

The number of hidden neurons

*O*_{k}The

*k*th actual output of the neural network corresponding to the*k*th input*R*The correlation coefficient

*std(x)*The standard deviation of

*x**x*_{k}The

*k*th training data (input)*y*_{k}The

*k*th standard value corresponding to the*k*th input

## INTRODUCTION

The coagulation process is a very prominent step in water treatment. The precision of controlling the coagulant dosage is closely related to the next several processes and the quality of the purified water. Nowadays, there are several conventional methods to predict the coagulant dosage in practice, such as the jar test (Wu & Lo 2010), based on the operation's experience, and regression methods (Deng *et al.* 2010), etc. Because of complicated and nonlinear relationships between operational parameters in water treatment processes, some methods cannot capture the underlying relationships (Tashaouie *et al.* 2012) and hardly adjust in time to the proper dosage (Wu & Lo 2010). An accurate prediction method can contribute to reducing the consumption of coagulant dosage, facilitate the purification process and improve the quality of the purified water. Artificial neural networks (ANNs) have been increasingly applied in this area, and they can successfully solve the system's nonlinearity.

An ANN is a nonlinear system, as first developed by McCulloch & Pitts (1943), which is the simulation of biological nervous systems and has the ability to deal with much information. Therefore ANN can be used to solve many complicated problems, and several previous studies have applied ANN to improve the water treatment system. Mirsepassi *et al.* (1995) used back-propagation (BP) networks to determine the alum and polymer dosing in a water treatment plant on the central coast of NSW. Valentin *et al.* (1999) developed a hybrid system based on the Kohonen self-organizing feature map, and used a multilayer perceptron to predict the coagulant dosing. Yu *et al.* (2000) improved their prediction model using ANN to control the coagulant dosing at a water treatment plant in Taipei, Taiwan. Karama *et al.* (2005) used an ANN model for online prediction of optimal coagulant dosing, and in order to avoid over-fitting, their study adopted the Levenberg–Marquardt method in combination with weight decay regularization.

In all the studies above, the ANN models extract all the significant features from the raw water as their inputs and take the actual coagulant dosing as their outputs. Then the neural networks can be trained through the Levenberg–Marquardt method to determine the optimal coagulant dosage. However, the BP model is still confronted with bottlenecks such as time-consuming training, local minima and necessary stopping methods.

The extreme learning machine (ELM) is a single hidden layer feed-forward neural network (SLFN) proposed by Huang *et al.* (2004, 2006, 2012), and it has been applied in many regression and multiclass classification problems. ELM only needs to designate the number of hidden layer nodes and choose the appropriate activation function. With its input weights (linking the input layer to the hidden layer) and hidden layer bias randomly assigned, the ELM can then analytically determine the output weights (linking the hidden layer to the output layer) through the Moore–Penrose generalized inverse, while plenty of iterations are required to train the BP neural networks. Furthermore, the result of ELM is the smallest norm least-squares solution, which means that it can have the smallest predicting error (Bartlett 1998). In this paper, we apply this novel neural network to enhance the accuracy of the prediction system, and focus on analysis and comparison of two neural networks for solving the coagulant dosage prediction problem. Specifically, the main contributions of this paper lie in the following facts.

- (1)
Two neural networks: back-propagation neural networks (BPNNs) and ELM are exploited, analyzed and compared when they are applied to predict the coagulant dosage. To the best of the authors' knowledge, this is the first time that ELM has been applied in this application.

- (2)
Computer simulations and experiments were conducted to verify the effectiveness of two neural networks.

- (3)
In addition, both BPNNs and ELM obtain good performance, but ELM is faster than BPNNs when solving the prediction of the coagulant dosage.

Above all, we organize this paper as follows. The second section will brieﬂy introduce the methodology of ELM and BP as these two artiﬁcial neural networks. The simulation performances of both networks and further discussion of the distinctions between ELM and BPNNs are presented in the third section Experimental conclusions are given in the final section.

## METHODS

As follows, the matrix form of ELM is presented and we will apply this neural network to our modeling to predict the coagulant dosage.

### Matrix form of ELM

Suppose that we have training data, that is and , where and , is the dimension of input features and the neuron number of the input layer. The input weight vector *W* = [*w*_{1}, *w*_{2}, … ,*w _{Ñ}*], is the number of hidden neurons, and , where denotes the weight that connects the th hidden neuron and the th input neuron, here and .

*et al.*2004, 2006, 2012):

### Data preprocessing

According to the modeling above, now we will preprocess our data and apply them to this neural network model. The data in this paper are provided by a water treatment plant in Foshan, China, for 2008. The data set contains 5,100 samples, and the neural network takes its raw-water turbidity, value of pH, flux rate and turbidity of water to be filtered as inputs and takes coagulant dosage as output.

We removed unreasonable data to eliminate the burst noise, and 5,017 valid data remained. Then we randomly divided the valid data set into a training data set (2/3) and a testing data set (1/3).

### Normalization

### Cost function

To estimate and compare the performances of ELM and BP, their performances are evaluated by the same cost function to calculate the MAPE as its training error and testing error.

## RESULTS AND DISCUSSION

### Performance evaluation

Activation function . | Neuron number . | Average iteration number . | Average training time (%) . | Average training error (%) . | Average testing error (%) . | Smallest testing error (%) . |
---|---|---|---|---|---|---|

Tansig | 6 | 591 | 51.4039 | 7.09 | 7.21 | 6.63 |

8 | 708 | 57.7563 | 6.60 | 6.88 | 6.28 | |

10 | 844 | 75.7478 | 5.98 | 6.29 | 5.86 | |

12 | 951 | 92.8658 | 5.81 | 6.19 | 5.57 | |

14 | 988 | 98.0263 | 5.42 | 5.72 | 5.32 | |

16 | 1,000 | 111.6640 | 5.09 | 5.54 | 5.20 | |

18 | 1,000 | 117.8120 | 4.85 | 5.23 | 4.84 | |

20 | 1,000 | 127.2032 | 4.84 | 5.69 | 4.73 | |

Logsig | 6 | 534 | 42.7224 | 7.16 | 7.22 | 6.84 |

8 | 806 | 66.8199 | 6.50 | 6.70 | 6.12 | |

10 | 897 | 81.2812 | 6.27 | 6.52 | 5.90 | |

12 | 986 | 97.8064 | 5.71 | 5.96 | 5.62 | |

14 | 982 | 97.2682 | 5.47 | 5.88 | 5.44 | |

16 | 1,000 | 106.4332 | 5.18 | 5.80 | 5.37 | |

18 | 1,000 | 113.0742 | 5.07 | 5.46 | 5.07 | |

20 | 1,000 | 128.5261 | 4.89 | 5.38 | 5.03 |

Activation function . | Neuron number . | Average iteration number . | Average training time (%) . | Average training error (%) . | Average testing error (%) . | Smallest testing error (%) . |
---|---|---|---|---|---|---|

Tansig | 6 | 591 | 51.4039 | 7.09 | 7.21 | 6.63 |

8 | 708 | 57.7563 | 6.60 | 6.88 | 6.28 | |

10 | 844 | 75.7478 | 5.98 | 6.29 | 5.86 | |

12 | 951 | 92.8658 | 5.81 | 6.19 | 5.57 | |

14 | 988 | 98.0263 | 5.42 | 5.72 | 5.32 | |

16 | 1,000 | 111.6640 | 5.09 | 5.54 | 5.20 | |

18 | 1,000 | 117.8120 | 4.85 | 5.23 | 4.84 | |

20 | 1,000 | 127.2032 | 4.84 | 5.69 | 4.73 | |

Logsig | 6 | 534 | 42.7224 | 7.16 | 7.22 | 6.84 |

8 | 806 | 66.8199 | 6.50 | 6.70 | 6.12 | |

10 | 897 | 81.2812 | 6.27 | 6.52 | 5.90 | |

12 | 986 | 97.8064 | 5.71 | 5.96 | 5.62 | |

14 | 982 | 97.2682 | 5.47 | 5.88 | 5.44 | |

16 | 1,000 | 106.4332 | 5.18 | 5.80 | 5.37 | |

18 | 1,000 | 113.0742 | 5.07 | 5.46 | 5.07 | |

20 | 1,000 | 128.5261 | 4.89 | 5.38 | 5.03 |

For the ELM algorithm, the input weights and hidden layer bias are randomly assigned within the interval [−1, 1]. In order to compare the capacity of ELM with BPNNs, we adopt the same hidden layer activation functions (tansig and logsig), and assign the hidden layer neuron number from 10 to 500 respectively (the interval for the neuron number is set as 10). Each set-up is trained 50 times and the average performances are shown in Figure 3.

As the neuron number of the ELM can range over a wide extent to get a good performance, we only list one set-up which has the smallest average testing error. And each set-up of the ELM algorithm is trained 50 times and the average performances are presented in Table 2.

Activation function . | Neuron number . | Average training time(s) . | Average training error (%) . | Average testing error (%) . | Smallest testing error (%) . |
---|---|---|---|---|---|

Tansig | 120 | 0.1732 | 5.53 | 6.23 | 5.53 |

Logsig | 80 | 0.1011 | 6.55 | 6.99 | 6.44 |

Activation function . | Neuron number . | Average training time(s) . | Average training error (%) . | Average testing error (%) . | Smallest testing error (%) . |
---|---|---|---|---|---|

Tansig | 120 | 0.1732 | 5.53 | 6.23 | 5.53 |

Logsig | 80 | 0.1011 | 6.55 | 6.99 | 6.44 |

*et al.*2008) to avoid over-fitting or under-fitting. Here we terminate the training process when the correlation coefficient

*R*reaches near to 0.98 (shown in Figure 4), which denotes that the neural networks can approximate well with both the training data and the testing data. However, the larger the neuron number is, the faster the convergence speed that the BPNNs will reach, and thus the less the iteration number the BPNNs require, whereas the stopping method becomes hard to control and the training time becomes longer when the neuron number becomes much larger. Consequently the neuron number was set in the interval [50,200].

Activation function . | Neuron number . | Average iteration number . | Average training time (%) . | Average training error (%) . | Average testing error (%) . | Smallest testing error (%) . |
---|---|---|---|---|---|---|

Tribas | 50 | 60 | 17.1195 | 4.60 | 5.11 | 4.83 |

100 | 40 | 23.7122 | 4.00 | 4.76 | 4.55 | |

150 | 16 | 17.2210 | 3.94 | 4.83 | 4.60 | |

200 | 10 | 17.2912 | 3.98 | 4.93 | 4.61 | |

Radbas | 50 | 60 | 17.7935 | 4.12 | 4.82 | 4.58 |

100 | 20 | 14.8497 | 4.09 | 4.80 | 4.54 | |

150 | 10 | 17.1336 | 4.23 | 4.77 | 4.59 | |

200 | 9 | 30.7603 | 3.99 | 4.74 | 4.55 | |

Satlin | 50 | 110 | 29.8102 | 4.71 | 5.08 | 4.77 |

100 | 100 | 56.9560 | 4.24 | 4.77 | 4.52 | |

150 | 30 | 31.6885 | 3.85 | 4.51 | 4.41 | |

200 | 11 | 19.3847 | 4.02 | 4.70 | 4.48 |

Activation function . | Neuron number . | Average iteration number . | Average training time (%) . | Average training error (%) . | Average testing error (%) . | Smallest testing error (%) . |
---|---|---|---|---|---|---|

Tribas | 50 | 60 | 17.1195 | 4.60 | 5.11 | 4.83 |

100 | 40 | 23.7122 | 4.00 | 4.76 | 4.55 | |

150 | 16 | 17.2210 | 3.94 | 4.83 | 4.60 | |

200 | 10 | 17.2912 | 3.98 | 4.93 | 4.61 | |

Radbas | 50 | 60 | 17.7935 | 4.12 | 4.82 | 4.58 |

100 | 20 | 14.8497 | 4.09 | 4.80 | 4.54 | |

150 | 10 | 17.1336 | 4.23 | 4.77 | 4.59 | |

200 | 9 | 30.7603 | 3.99 | 4.74 | 4.55 | |

Satlin | 50 | 110 | 29.8102 | 4.71 | 5.08 | 4.77 |

100 | 100 | 56.9560 | 4.24 | 4.77 | 4.52 | |

150 | 30 | 31.6885 | 3.85 | 4.51 | 4.41 | |

200 | 11 | 19.3847 | 4.02 | 4.70 | 4.48 |

Activation function . | Neuron number . | Average training time(s) . | Average training error (%) . | Average testing error (%) . | Smallest testing error (%) . |
---|---|---|---|---|---|

Tribas | 400 | 0.8263 | 3.98 | 4.96 | 4.65 |

Radbas | 160 | 0.2150 | 4.98 | 6.10 | 5.34 |

Satlin | 360 | 0.7107 | 4.13 | 4.98 | 4.60 |

Activation function . | Neuron number . | Average training time(s) . | Average training error (%) . | Average testing error (%) . | Smallest testing error (%) . |
---|---|---|---|---|---|

Tribas | 400 | 0.8263 | 3.98 | 4.96 | 4.65 |

Radbas | 160 | 0.2150 | 4.98 | 6.10 | 5.34 |

Satlin | 360 | 0.7107 | 4.13 | 4.98 | 4.60 |

As is shown in Figure 5, as the neuron number increases, the training time of both algorithms increases, and their training error and testing error decrease. In this situation, radbas will produce a large testing error when its neuron number exceeds 250. Tribas and satlin are adaptive to a wide range of neuron numbers. In general, both algorithms with tribas or satlin and the proper neuron number can obtain better performances than the situation (tansig and logsig) listed above, which means that our predictive model has been improved.

In the first subplot, the solid curve is the standard dosing corresponding to the inputs, and the dotted curve is the actual output of the neural networks. In the second subplot, the value of the curve denotes the relative error of each of the testing data, i.e. . Comparing the performances of both algorithms, both of them show similarly good performances, however, the ELM runs more than 50 times faster than the BPNNs algorithm in this application.

All simulations are conducted in MATLAB running on a personal computer with a dual core CPU (1.80 GHz and 1.80 GHz) and 4G memory.

## DISCUSSION

In this section, several similarities and distinctions between BP and ELM will be discussed for this application.

(1) As is shown in Figures 6 and 7, both the BPNNs and ELM can be used to predict the coagulant dosing, and the performances of the BPNNs and ELM are quiet good and even similar. If the proper hidden layer activation function and neuron number are adopted, and the training error of the ELM is 4.33% and the training error can reach as low as 4.68%, the accuracy is more than 95.32%.

(2) The data sets we used only include raw-water turbidity, pH, flux rate and the turbidity of water to be filtered as inputs, but the factors which influence the coagulant dosing are not limited to these, including, for example, temperature and microorganism content. Considering more factors, maybe it can help to improve the accuracy of prediction in actual practice.

(3) From the result, we see that the training time of the BPNNs is about 60 seconds or even 120 seconds. In actual practice, this is not allowable because the water treatment process is very fast. But the time for the ELM is less than one second. The speed of the ELM is more prominent. In most practical engineering applications, the ELM can quickly work out the accurate coagulant dosing corresponding to the changeable input factors, so that it is very helpful for realizing online prediction.

(4) Both algorithms require taking the bounded nonlinear function as the hidden layer activation function. Their training time also increases with the trend of increasing neuron number, while their training error and testing error decrease with that trend. And both of them face the same issues of over-fitting and under-fitting.

(5) To obtain better performance, the BPNNs algorithm requires plenty of iterations and time to tune all the input weights, hidden layer bias and output weights, while the ELM algorithm only needs to randomly designate the input weights and hidden layer bias, and then mathematically determines the output weights through the Moore–Penrose generalized inverse.

(6) The BPNNs algorithm needs to ﬁx many parameters, such as hidden layer activation function, learning rate and hidden layer neuron number. Sometimes it requires an appropriate stopping method to avoid over-ﬁtting and under-ﬁtting. The ELM only needs to designate the activation function and hidden layer neuron number. Thus it is much easier to train the ELM for good performance than to train BPNNs.

(7) BPNNs and ELM with different activation functions have different performances. For both algorithms to obtain good performance with the same activation function, the ELM generally requires a larger neuron number.

## CONCLUSION

In this paper, we apply the ELM to model the coagulant dosage prediction system and compare its performances with BPNNs. This paper has demonstrated that both BPNNs and ELM can perform well when used to predict the coagulant dosing, while ELM is much more prominent in speed. Thus in most engineering applications, the ELM can be very signiﬁcant for online prediction. The ELM algorithm can also avoid converging at a local minimum, and it does not require some stopping methods to avoid over-training. Therefore, the ELM is adopted in this application. Another interesting line of future research would be a regularization term in the ELM, which can reduce the over-ﬁtting effects and help to select neuron number.

## ACKNOWLEDGEMENT

This work was supported in part by the Foshan Cancheng Technology program of China (2008B1034).