Recently, with the growing demand for water quality monitoring, soft measurement sensors have drawn public attention, which can overcome the drawbacks of high cost and long time needed in traditional measurement methods. In this study, a machine learning-based soft monitoring sensor was developed to simultaneously monitor four water quality indicators including COD, NH4+-N, NO3--N, PO43--P. Firstly, specialized experimental equipment and calibration methods were developed to generate a matching dataset that collected over 94,000 data points. Secondly, five models including Multiple Linear Regression, Ridge Regression, AdaBoost, Decision Tree Regression, and Bagging Regression were constructed and compared. The learning accuracy of the models ranged from 0.8860 to 0.9999, among which the predicted value of Bagging Regression is highly fit to the true value. Subsequently, the fuzzy grade method was adopted to reduce the prediction error and strike a balance between efficiency and accuracy. Finally, the designed soft sensor was used for real-time monitoring at three monitoring points in Changzhou, China from September to October 2020, and the results proved the feasibility of the soft sensor in practical application. This study provided a fast and accurate method for water quality measurement, which is of great significance for the management of rural sewage treatment facilities.

  • The soft sensor based on machine learning is applied to the water quality monitoring of rural sewage treatment facilities.

  • Designing of laboratory-level devices to obtain datasets for model training.

  • Fuzzy classification method is introduced to analyze and process data to reduce errors and obtain comprehensive water quality evaluation results.

In recent years, with the construction of new sewage treatment plants, renovation and upgrading of discharge standards, China's sewage treatment rate has been gradually increasing. But statistics show that the treatment of rural wastewater lags far behind that of urban areas (Wang & Gong 2018; Yizhou & Liming 2019; Huang et al. 2020). Rural wastewater and the corresponding treatment facilities have very different characteristics than urban areas. The characteristics of rural wastewater include its small size, complex composition, wide variation, relatively low pollution concentration, and biochemical properties (Liang et al. 2011; Liu & Shen 2015; Yizhou & Liming 2019). Decentralized treatment is better suited to reduce cost and improve treatment efficiency depending on the water quality characteristics of rural wastewater, which also brings more complex management needs (Massoud et al. 2009; Liu & Shen 2015) as well as rapid and accurate measurement of water quality indicators. However, currently commonly used online or offline water quality detection methods are difficult to achieve real-time feedback control of water quality and are not suitable for process control and management systems.

With the development of computer technology, soft measurement has emerged as a major research hotspot in the process control field due to its dynamic response, low cost, and smart agility (Qing & Yu 2005; Kadlec et al. 2011; Haimi et al. 2013; Zhu et al. 2020; Ching et al. 2021; Paepae et al. 2021). The soft measurement technique builds some sort of correlation between the target water quality index and the other water quality index and extrapolates from the numeric value. Soft sensors are iterated and optimized to deal with both the accuracy and speed of real-time online measurements through the relationship between water quality indicators and the idea of a computational prediction (Paepae et al. 2021). According to the principle and structure of the internal model (Haimi et al. 2013), soft sensors can be divided into three categories: mechanistic model sensors, data-driven model sensors and composite model sensors. One of the most typical models of mechanisms is the serial ASM model (Henze et al. 1987). With ASM models, Spérandio et al. and Grau et al. performed the prediction of ammonia and nitrate based on the DO and ORP value, respectively (Sperandio & Queinnec 2004), and inverted the component concentration and mass fraction based on the measured parameter (Grau et al. 2007). Due to the many constraints faced by mechanism models, data-driven models are receiving increasing attention. The construction of data-driven models is not based on known physical or chemical knowledge, but is completely dependent on the collected data, which can reflect the real process state, and is particularly suitable for wastewater quality monitoring (Dürrenmatt & Gujer 2012). The composite model combines biochemical mechanisms and data-driven black-box features with high accuracy and interpretability (Sheng-guang et al. 2008; Cong et al. 2015), and has a higher computational demand (Nair et al. 2022). Compared with mechanism models and composite models, data-driven models compromise accuracy and prediction efficiency, and can achieve higher practical application value after purposeful development and optimization, and are more in line with the complex and changeable rural sewage application scenarios.

In the development of soft sensors, improving measurement accuracy requires selecting suitable model algorithms. Previous studies have reported on algorithm selection and optimization of data-driven models in different application scenarios, and have implemented methods such as artificial neural network (ANN) with back-propagation (BP) training algorithm (Liping & Boshnakov 2010), radial basis function (RBF) neural network optimized by genetic algorithm (Li & Yang 2009), time difference-based multi-kernel relevance vector machine (MRVM) (Wu et al. 2020), eXtreme Gradient boosting (XGboost) machine (Ching et al. 2022). Although existing studies have made breakthroughs in algorithm design, there are relatively few studies on predicting water quality indicators of rural sewage treatment facilities. Most existing prediction models are single objective measurement methods, which make it difficult to meet the sewage treatment indicators. Moreover, artificial neural networks have high requirements for datasets and hardware equipment, which do not meet the characteristics of rural sewage treatment facilities. In contrast, machine learning algorithms have shown good performance on datasets, training durations, and discrete points. Therefore, this study chose machine learning algorithms for soft sensor development.

In addition, the model algorithms can be continuously optimized based on different application scenarios and data history, and model performance will also be affected by the characteristics and differences of the dataset (Wu et al. 2019; Ching et al. 2022; Fox et al. 2022; Nair et al. 2022). Therefore, it is necessary to design a matching dataset specifically to improve prediction accuracy. Due to the constraints of matching datasets, the design of soft sensors for rural sewage monitoring in China currently faces a series of problems. Firstly, the characteristics of sewage quality in rural China are different from those in urban areas and from other countries, and it is not possible to directly use existing datasets to train soft sensors. Therefore, specific datasets need to be constructed based on actual water quality characteristics. Secondly, in the development of soft sensors, the algorithm selection and parameter adjustment optimization need to be based on matched datasets and be calibrated in actual water samples to improve accuracy. Finally, compared to actual laboratory measurement methods, the accuracy of soft sensors in machine learning models inevitably faces shortcomings. Therefore, it is necessary to combine it with the needs of actual management to seek a balance between accuracy and timeliness.

In response to the above issues, this study focused on data-driven models under machine learning algorithms and designed a soft sensor suitable for real-time monitoring of data in rural sewage treatment facilities in China. In the development of soft sensors, a corresponding laboratory pilot device was designed to obtain a dataset that matches the scenario of rural sewage treatment facilities; Then, based on the dataset, multiple regression models and machine learning models were compared, and the algorithm adopted for the soft sensor was determined. Next, the parameters of the soft sensor model were optimized using actual water quality data from the sewage treatment plant, verifying the prediction accuracy and application universality. Finally, the calibrated soft sensors were arranged at three points in the Wujin Port watershed in Changzhou City for practical application effect analysis.

Development framework for soft sensors

The framework of developing a machine learning-based soft sensor is illustrated in Figure 1. The construction of a soft sensor model includes obtaining a matching dataset, constructing, comparing, and verifying the model. The optimal model obtained was applied to practical engineering for multiple water quality monitoring and evaluated through fuzzy grade. In this study, data cleaning was based on the open source software Spyder (Python language) platform; The algorithm call, performance comparison, and prediction of the models were completed using the SciKit-Learn toolkit in Python language. Descriptive statistical analysis and fuzzy grade were implemented through R language programming.
Figure 1

A framework of developing a machine learning-based soft sensor.

Figure 1

A framework of developing a machine learning-based soft sensor.

Close modal

Design of laboratory experimental devices and data generation

The use of machine learning to design soft sensors requires a large amount of experimental data to train and validate the model. In order to facilitate data collection and analysis, a small device was constructed in the laboratory to simulate the operation of the wastewater treatment plant (WWTP), and a corresponding experimental scheme was designed to collect the data. A schematic of the small device is shown in Figure 2.
Figure 2

Illustration of a simulation device for a small sewage treatment plant in a laboratory.

Figure 2

Illustration of a simulation device for a small sewage treatment plant in a laboratory.

Close modal

In the experimental process, sensor probe measurement and mathematical calculation were combined to fit the change equation on the time series on the basis of an exact measurement, and the corresponding additional dataset was obtained. The measured dataset was combined with additional data to form the large data needed for model design. Physical indicators (pH, ORP, COND, TURB) were determined directly by sensor probes. Changes in the concentration of target biochemical indicators (COD, -N, -N, -P) were calculated using absorbance and initial concentration data.

The data were cleaned and pretreated before fitting and during calculation. The main purpose is to remove noise and outliers and reflect different background states of sewage. Details of the specific dataset generation and cleaning steps are provided in Supplementary material, Appendix 1.

Modelling and comparison

Five algorithms, including traditional regression and machine learning, were modeled in this study and compared by the model output. The five algorithms are the Multiple Linear Regression (MLR) Model, Ridge Regression (RR) Model (Hoerl & Kennard 1970a, 1970b), AdaBoost Model (Freund & Schapire 1997), Decision Tree Regression Model (Quinlan 1996), and Bagging Regression (BagR) Model (Breiman 1996). Among them, three MLR models are used: the general multiple regression model, the generalized MLR model with cross-multiplied terms, and the generalized MLR model with cross-terms and quadratic terms, which are abbreviated as LR, LR2, and LR3 for convenience. The RR model adds restrictions on the size of the model based on the multivariate linear model, preventing overfitting while allowing the model to have stronger generalization capabilities. The AdaBoost model (abbreviated as AdaR) integrates a series of linear regression models to obtain a better-performing model, resulting in a stronger generalization ability of the model. When integrating relatively simple and interpretable models, high accuracy can be achieved. The Decision regression tree (DTR) divides the continuous independent variable interval and replaces the 0–1 loss function by the continuous loss function, such that the regression problem is turned into a classification problem. In this study, CART algorithm is adopted for the Decision Tree Regression model. CART algorithm is a greedy algorithm that judges the purity of a classification based on the GINI index, ensuring the greatest decrease in the coefficient of GINI with each classification. In the BagR algorithm, modeling and training were conducted in multiple rounds. Each round was composed of several randomly selected training samples from the initial training set. After training multiple rounds, the results were averaged.

For different model algorithms, the performance of the COD soft sensor is compared by using COD data. The remaining index soft sensor is built depending on the relative optimum model chosen, and the effect is then verified. The root mean squared error (RMSE), mean absolute error (MAE), mean relative error (MRE), and the correlation coefficient (Corr) were used to compare the model algorithms.
formula
(1)
formula
(2)
formula
(3)
formula
(4)

Applications in actual senarios

The model algorithm with the best performance after screening needs to be validated and calibrated by different sewage background data to reflect the accuracy and robustness of the model prediction. In order to validate the performance of the model, a total of 5 datasets with different water quality backgrounds were obtained through small experiments and field measurements. The background water quality is as follows: pure water, tap water, SBR water, SBR water (before filtration), SBR water (after filtration). The associated SBR data are obtained from the actual operation data of Changzhou WWTP. As there is a lack of -N in the SBR-related data, a total of 17 application scenarios can be constructed. In each application scenario, the parameters of the model need to be determined sufficiently to obtain the relatively optimal prediction effect in an acceptable range. The comprehensive performance and stability of the model are analyzed by the statistics of the optimal prediction results.

To verify the true monitoring effect, the designed soft sensor was applied to the rural wastewater treatment facilities in Changzhou. The equipment was set up at the outlet of three sewage treatment facilities in the Wujingang basin of Changzhou City, Luodong area, Qingdun head area and Xie Jiatou area. The location of these three facilities is shown in Supplementary material, Figure A2.1, and the processes used in these facilities were Airlift circulation MBR. Monitoring sensors with pH, ORP, COND, TURB, SS, and so on are provided at corresponding points at 3-min intervals. The resulting data were able to predict target indicators through soft sensors. The study examined operational data from September 1, 2020 to October 15, 2020 and performed a follow-up analysis based on data from the above period.

Fuzzy grade method

In the monitoring work for rural sewage treatment facilities, the objective is to obtain trends in chemical indicators to be measured by soft sensors and the comprehensive pollution levels at monitoring sites. The paper thus analyses actual water quality monitoring and management requirements, and takes the fuzzy grading method to partition the degree of water quality pollution. The fuzzy grading method compensates for the small deficiency in the absolute accuracy of the soft sensor in machine learning, and reduces the time consumption and measurement cost.

The comprehensive water quality index examined COD, SS, TN, -N, and TP. For each index, the single water quality index can be calculated according to the formula given in the text below. The comprehensive water quality index is the maximum of the five single index water quality indexes. The formula for the single water quality indicator is:
formula
(5)
The standard values for the five indicators of COD, SS, TN, -N, and TP are selected based on the local implementation of the Grade I-A standard of ‘Urban Wastewater Discharge Standard’ (GB18918-2002), which are 50, 10, 15, 5, and 0.5 mg/L, respectively. The concentration of each index was estimated by field measurement and soft sensor prediction respectively. COD and -N values were obtained directly from soft sensors, SS values were estimated from turbidity values determined by field probes, and TN and TP values were estimated based on measurements of -N, -N, and -P given by the soft sensors. SS, TN, and TP were estimated based on empirical equations obtained from our previous work, as follows:
formula
(6)
formula
(7)
formula
(8)

SS, TN, TP refer to the concentration of suspended solids, total nitrogen, and total phosphorus estimated by calculation, TURB refers to the turbidity measurement, -N, -N, -P refers to the ammonia-nitrogen, nitrate, and phosphate prediction obtained by the soft sensor, respectively.

Once the composite water quality index is obtained, it is ranked according to the fuzzy scale of the index value. The formula suggests that the range of the index should be between 0 and 100 when the water meets the standard, so that the index greater than 100 is considered excessive. A score of less than 100 is broken down into five equal scores, which correspond to excellent, good, medium, poor, and bad. Specific grading criteria are shown in Table 1. Finally, the changes and statistics of the comprehensive water quality index rank in a certain time series are obtained for further analysis.

Table 1

Comprehensive water quality index classification criteria

Score of water qualityGrade
0–20 Excellent/Grade Ⅰ 
20–40 Good/Grade Ⅱ 
40–60 Medium/Grade Ⅲ 
60–80 Poor/Grade Ⅳ 
80–100 Bad/Grade Ⅴ 
>100 Exceeding 
Score of water qualityGrade
0–20 Excellent/Grade Ⅰ 
20–40 Good/Grade Ⅱ 
40–60 Medium/Grade Ⅲ 
60–80 Poor/Grade Ⅳ 
80–100 Bad/Grade Ⅴ 
>100 Exceeding 

Dataset generation and descriptive statistics

A total of 94,852 experimental data were obtained from 23 experimental forms, forming a dataset for training soft sensor machine learning models. In the dataset, the concentration range of -N and -N is 0–150 mg/L, COD is 0–1,000 mg/L, and -P is 0–15 mg/L, which can achieve large-scale measurement.

Due to the premise of establishing a data-driven model being that there is a certain correlation between the explanatory variable and the test variable, it indicates that changes in the test variable can be reflected through changes in the explanatory variable. The concentration values of -N, -N, -P were regressed with four physical indexes (explanatory variables), and the significance level was taken α = 0.001 for F-test to illustrate the interpretability of explanatory variables to test variables. The descriptive statistics of COD, -N, -N, -P and their corresponding explanatory variable data are shown in Figure 3, and the p-values of the F-test are all less than α. The model exhibits overall significance. Regression analysis demonstrates that features can explain the trend of changes in most concentrations of COD, -N, -N, -P, while also suggesting the rationality of using machine learning models for simulation and prediction.
Figure 3

COD (a), -N (b), -N (c), -P (d), and their corresponding descriptive statistics for the explanatory variable data (with the probability density function for the corresponding measurement indicators in the diagonal diagram from top left to bottom right, a scatter plot between two matching measurement indicators in the bottom diagonal plot, and the correlation between two matching measurement indicators in the top diagonal plot).

Figure 3

COD (a), -N (b), -N (c), -P (d), and their corresponding descriptive statistics for the explanatory variable data (with the probability density function for the corresponding measurement indicators in the diagonal diagram from top left to bottom right, a scatter plot between two matching measurement indicators in the bottom diagonal plot, and the correlation between two matching measurement indicators in the top diagonal plot).

Close modal

Machine learning model prediction and comparison

Models were tested using COD concentration data. The fitting curve of each algorithm is shown in Supplementary material, Appendix 3, the prediction results are arranged as shown in Supplementary material, Table A3.1, and the intuitive bar chart of the related representation parameters is shown in Figure 4. Through the first three multivariable linear regression models, it can be seen that we can favor the ascending order of features for a better fit. However, too many variables also show multiple co-linearity problems, such as SBROpre, DO, and ORP variables that are excluded from the model because of multiple co-linearity problems. The effect was not significantly improved after the modification of the RR model, so the nonlinear model is considered to achieve a better effect. The results of the AdaBoost model experiment show that the number t of embedded models will have some influence on the adaptation effect. If the number of integration models is too large, the elevation space of Corr is limited, but the increase in RMSE and MAE is large, leading to overfitting phenomena. For this reason, t = 18 fitting AdaBoost model was chosen in the final compromise, and the effect was improved over the regression model. The fit of the latter two machine learning models was more accurate, with a significant decrease in RMSE, MRE, and MAE representing model error, and a correlation coefficient of 0.9999, much higher than the previous five models, as seen in both Supplementary material, Table A3.1 and Figure 4. More intuitively, through the tuning curve in Supplementary material, Appendix 3, we can see that the prediction outcome curve for both machine learning models fundamentally overlaps with the curve for the real data, which intuitively reflects the lead of the prediction effect. This result indicates that machine learning models better match the accuracy in soft sensors with complex processes and high values of the input variables. Both the decision tree model and the BagR model involved in this study are capable of achieving a high degree of agreement between the prediction and the true values. Of the two, the BagR model works better.
Figure 4

Comparison of RMSE, MAE, MRE, and Corr of different machine learning models.

Figure 4

Comparison of RMSE, MAE, MRE, and Corr of different machine learning models.

Close modal

Fit validation of residual index of experimental data

To further demonstrate the predictive performance of the algorithms in -N, -N, -P soft sensors, representative linear models (multiple regression models) and machine learning models (Decision Tree Regression and BagR models) were selected for validation of the remaining water quality metrics based on Section 3.2. The prediction effect is shown in Table 2. Fitting diagrams are shown in Supplementary material, Appendix 4.

Table 2

Results of representative models predicting various water quality indicators

MLRDTRBagR
COD MRE 1.1989 0.0056 0.0038 
MAE 2.7369 0.0425 0.0324 
RMSE 37.7377 1.1758 0.9739 
Corr 0.8860 0.9999 0.9999 
-N MRE 0.9751 0.0047 0.0035 
MAE 0.4198 0.0045 0.0033 
RMSE 5.9182 0.1321 0.0791 
Corr 0.8864 0.9999 0.9999 
-N MRE 0.9638 0.0044 0.0030 
MAE 0.4713 0.0039 0.0024 
RMSE 7.0369 0.0938 0.0508 
Corr 0.8422 0.9999 0.9999 
-P MRE 1.4475 0.0165 0.0074 
MAE 0.0312 0.0005 0.0003 
RMSE 0.4614 0.0073 0.0065 
Corr 0.8636 0.9999 0.9999 
MLRDTRBagR
COD MRE 1.1989 0.0056 0.0038 
MAE 2.7369 0.0425 0.0324 
RMSE 37.7377 1.1758 0.9739 
Corr 0.8860 0.9999 0.9999 
-N MRE 0.9751 0.0047 0.0035 
MAE 0.4198 0.0045 0.0033 
RMSE 5.9182 0.1321 0.0791 
Corr 0.8864 0.9999 0.9999 
-N MRE 0.9638 0.0044 0.0030 
MAE 0.4713 0.0039 0.0024 
RMSE 7.0369 0.0938 0.0508 
Corr 0.8422 0.9999 0.9999 
-P MRE 1.4475 0.0165 0.0074 
MAE 0.0312 0.0005 0.0003 
RMSE 0.4614 0.0073 0.0065 
Corr 0.8636 0.9999 0.9999 

The model efficacy parameters in Table 2 show that both machine learning models have low MRE, MAE, and RMSE values, and Corr is close to 1, which is a good predictor of all four water quality metrics and significantly better than multiple regression models. The advantages of machine learning models over multiple regression models are also evident from the fit curves in Supplementary material, Appendix 4. Although both machine learning models achieved high accuracies, the BagR model performed slightly better than the Decision Tree Regression model in comparison to these four metrics. For this reason, the BagR model is chosen for further investigation in the design of subsequent soft sensors.

Validation of model measurement capabilities in different sewage background

To further validate the wider applicability of the machine learning models in the present study, the BagR model was chosen to validate measurement capabilities in different water intake environments. Table 3 provides a summary of the forecasting effect of various indicators under different water intake antecedents. The actual and projected values of the fitting curve almost overlap, as detailed in Supplementary material, Appendix 5.

Table 3

Forecast effect of the Bagging Regression model in different input water contexts

Pure waterTap waterSBR influentSBR effluent (filtered)SBR effluent (before filtration)
COD MRE 0.0038 0.0041 0.0046 0.0072 0.0044 
MAE 0.0202 0.0311 0.0223 0.0500 0.0178 
RMSE 0.3960 0.5458 0.4483 1.4965 0.3933 
Corr 0.999988 0.999987 0.999973 0.999844 0.999973 
-N MRE 0.0035 0.0034 0.0056 0.0054 0.0040 
MAE 0.0046 0.0030 0.0017 0.0028 0.0020 
RMSE 0.1142 0.0940 0.0240 0.0429 0.0667 
Corr 0.999957 0.999974 0.999995 0.999982 0.999952 
-N MRE 0.0032 0.0036    
MAE 0.0028 0.0051    
RMSE 0.0597 0.1128    
Corr 0.999988 0.999966    
-P MRE 0.0069 0.0062 0.0102 0.0054 0.0218 
MAE 0.0005 0.0005 0.0004 0.0005 0.0001 
RMSE 0.0141 0.0105 0.0087 0.0157 0.0015 
Corr 0.999890 0.999960 0.999943 0.999880 0.999943 
Pure waterTap waterSBR influentSBR effluent (filtered)SBR effluent (before filtration)
COD MRE 0.0038 0.0041 0.0046 0.0072 0.0044 
MAE 0.0202 0.0311 0.0223 0.0500 0.0178 
RMSE 0.3960 0.5458 0.4483 1.4965 0.3933 
Corr 0.999988 0.999987 0.999973 0.999844 0.999973 
-N MRE 0.0035 0.0034 0.0056 0.0054 0.0040 
MAE 0.0046 0.0030 0.0017 0.0028 0.0020 
RMSE 0.1142 0.0940 0.0240 0.0429 0.0667 
Corr 0.999957 0.999974 0.999995 0.999982 0.999952 
-N MRE 0.0032 0.0036    
MAE 0.0028 0.0051    
RMSE 0.0597 0.1128    
Corr 0.999988 0.999966    
-P MRE 0.0069 0.0062 0.0102 0.0054 0.0218 
MAE 0.0005 0.0005 0.0004 0.0005 0.0001 
RMSE 0.0141 0.0105 0.0087 0.0157 0.0015 
Corr 0.999890 0.999960 0.999943 0.999880 0.999943 

All 17 models have prediction effects that are close to or more precise than some of the same studies, which demonstrates the feasibility of this model across all four water quality indices. Model training speed and prediction speed are also very fast because of the use of training data as the basis for the model logic, operating on computers set up with Intel (R) Core (TM) i5-8300H CPU @ 2.30 GHz, model training time fluctuates in the range of 4–10 min, and single-group data prediction time in the range of 10–20 ms.

Practical application effectiveness and analysis

To obtain and analyze data in real-time, and to validate the model's effectiveness, the model obtained by the BagR algorithm was applied to the testing of water quality indicators in actual water samples. Three sampling points in the Luodong area, Qingduntou area and Xiejiatou area were analyzed, and the prediction data of each sampling point were obtained. The above data were collated into 12 images showing the time series projections for each of the four indicators, as detailed in Supplementary material, Appendix 6.

Figure 5 shows the trend of the composite water quality index for the three real sample sites (the curve is somewhat smoothed to indicate only trends and not accurate data) after the monitoring data have been computed using the water quality index. Table 4 shows the number of time points during this period that achieved each of the comprehensive water quality index scores.
Table 4

Distribution of comprehensive water quality index levels in Luodong, Qingduntou, and Xiejiatou

SiteExcellent/Grade ⅠGood/Grade ⅡMedium/Grade ⅢPoor/Grade ⅣBad/Grade ⅤExceeding
Luodong 740 2,746 341 284 26 
Qingduntou 1,929 1,628 466 72 42 
Xiejiatou 1,394 192 242 2,222 87 
SiteExcellent/Grade ⅠGood/Grade ⅡMedium/Grade ⅢPoor/Grade ⅣBad/Grade ⅤExceeding
Luodong 740 2,746 341 284 26 
Qingduntou 1,929 1,628 466 72 42 
Xiejiatou 1,394 192 242 2,222 87 
Figure 5

Comprehensive water quality index of actual water samples.

Figure 5

Comprehensive water quality index of actual water samples.

Close modal

The average comprehensive water quality index of Luodong during this period was 50.55, which is at a medium level. Qingduntou had an average comprehensive water quality index of 44.17, which is at a medium level. The average comprehensive water quality index in the Xiejiatou area is 65.56, which is at a poor level, and 87 of the measured values are exceeding.

In terms of ratings, water quality data at most of the test points in Loudong and Qingduntou during the period of the survey were at Grade Ⅱ or Ⅲ, which suggests that the facilities were relatively well worked. This is in contrast to the test results in Xiejiatou District, which showed that the majority of the test time points in the region were of Grade II or IV water quality, indicating that the overall treatment performance of the facility (for effluent only) was inferior to that of the Luodong or Qingduntou facilities. Despite the mixed results of the actual monitoring of the three sites and the use of adjectives such as ‘poor’ and ‘bad’ in the classification description, monitoring results with an integrated pollution index score of one to five met the baseline requirements for wastewater treatment facilities and water quality was in the normal range. The number of ‘exceeding’ in the monitoring results for all three facilities is noteworthy, as it shows that all three facilities still require some degree of optimization, and that the number of overshoots reflects the scale of the problem.

Figure 5 shows the variation in water quality at three monitoring points. It can be found that the data on poor water quality are all concentrated in a certain time period. Monitoring results from the Xiejiatou facility show that while the average comprehensive pollution index and amount of substandard and excess data are the highest of the three, most of the severe pollution is concentrated in the early portion of the monitoring period, and similar levels of pollution can be achieved over the medium term with the other two monitoring locations. This reflects the volatility of data monitoring and indirectly the instability of rural domestic wastewater and the need for monitoring and management of treatment facilities. On the other hand, we also need to monitor and analyze more data from a longer time scale in order to give a more meaningful reference for optimizing operation and handling regulation.

Effect of soft sensors based on BagR algorithm

In this study, we compared the results of machine learning algorithms with traditional linear regression models, in which the linear regression model's predictions were close to actual data in terms of trends. However, many of the single-point predictions were biased, suggesting limitations of linear regression models compared with machine learning models, which is in agreement with the results of other researchers (Viviano et al. 2014; Wang et al. 2015; Ha et al. 2020; Pattanayak et al. 2020; Pattnaik et al. 2021). Schilling's research on monitoring points for two rivers in Iowa, the United States, however, showed that the accuracy of the results of the MLR model in predicting TP was significantly improved after the addition of the input variable to the OP (orthophorus) (Schilling et al. 2017), suggesting that traditional MLR models continue to have superior performance after filtering with appropriate primary analysis and other methods. The MLR model is still useful in certain scenarios due to its lower computation demand and faster training (Zhu & Anderson 2019; Pattanayak et al. 2020). These results also show that the optimal models are not necessarily consistent across different scenarios. Since most machine learning models can achieve relatively high prediction accuracies after parameter optimization, the characteristics of the training data will be a significant factor affecting model selection. Similarly, the results of this study can only suggest that the BagR algorithm is more suitable for use in soft sensors based on rural wastewater treatment facilities in the Wujingang Watershed, Changzhou City, Jiangsu Province, China.

In this paper, we use the machine learning model of BagR and combine it with management requirements to build a soft sensor with good effect. While the prediction performance is good, the machine learning model can achieve a relative balance in computation, prediction time and interpretability. Our BagR algorithm takes four physical indicators as inputs to the model and then makes full use of the facilities of the monitoring equipment to reduce costs while avoiding overfitting the model with too many input variables. The prediction effect for the soft sensor designed in this study is stable or significantly improved when compared to the model in the earlier study in Table 5.

Table 5

Typical soft sensor-related studies

AuthorsAlgorithms/ModelsInput variablesOutput variablesSources/Features of DataEffect of modelsYear of publictionReferences
Spérandio et al. ASM DO + ORP -N/-N Lab + Software Simulation Ammonia relative error = 3.94% 2004 Sperandio & Queinnec (2004)  
Sheng-guang et al. Combined model of mechanism and data-driven Soluble refractory organics, Soluble degradable organics, Soluble Oxygen, Heterotrophic bacteria Effluent COD  average error = 0.0326
standard deviation = 0.3933
R2 = 0.9685 
2008 Sheng-guang et al. (2008)  
Li and Yang et al. Radial Basis Function (RBF) neural network + genetic algorithm + Gradient descent method COD + TN + DO + T + HRT TN WWTP of Dingshu, Yixing relative errors = 1.65%-3.14% 2009.7 Li & Yang (2009)  
Liping and Boshnakov et al. BP neural network  Effluent COD/BOD  relative errors = 7.5%-12% 2010 Liping & Boshnakov (2010)  
Mulas et al. ordinary least squares regressions (OLSR)
partial least squares regression (PLSR)
local linear regression based on k-nearest neighbors (k-NN LLR) 
6 variables after PCA -N Viikinmäki WWTP RMSE = 0.14–0.46 2012 Mulas et al. (2012)  
Liu et al. PCA + JIT(just-in-time learning)-ENS(ensemble learning) 19 variables after PCA Effluent BOD5 Baecelona WWTP RMSE = 0.3825
r = 0.8991 
2013.4 Liu et al. (2013)  
Guo et al. PSO + Elman neural network -N、pH、T、MLSS SVI A WWTP in Beijing RMSE = 0.0509–0.1039 2014.6 Guo et al. (2014)  
Cong et al. Combined model of mechanism and data-driven SS、-N、Q、CODinf、DO Effluent COD A WWTP in Shenyang RMSE = 8.31 2015 Cong et al. (2015)  
Mari and Laskar et al. deep learning-based soft sensor (DLSS) DO TN BSM2 mean squared errors
MSE = 0.072–0.0825
r = 0.9852–0.9869 
2020 Mali & Laskar (2020)  
Wu et al. Lasso Regression + Time Difference-based Multi-kernel Relevance Vector Machine (MRVM) 20 variables for analysis BOD BSM1 model + a real WWTP RMSE = 7.0301
r = 0.9580 
2020.5 Wu et al. (2020)  
Schneider et al. feature detection algorithms pH/DO -N Lab + 3 real WWTP accuracy = 68%-94% 2019.6
2020.7 
Schneider et al. (2019); Schneider et al. (2020)  
Li et al. Gaussian process regression (GPR) and least squares support vector machine (LSSVM) algorithm, Kalman filter (KF) and moving window function (MW) Historical data SS、-N、-N、COD、BOD BSM1 model + a real WWTP RMSE = 0.013–89.654
R = 0.796–0.957
RMSSD = 2.984–11.135
RR = 0.697–0.745 
2021.2 Li et al. (2021)  
Ching et al. Xgboost Other index of the same side (influent or effluent) Influent and effluent BOD UCI Machine Learning Repository, a WWTP in Hong Kong (The slope of the data is high and contains extreme values) RMSE = 0.92–62.10 2022.2 Ching et al. (2022)  
Fox et al. neural network (NN)、Multiple Linear Regression (MLR) pH + ORP Effluent -N Lab R2 = 0.465–0.769
RMSE = 0.196–0.5 
2022.3 Fox et al. (2022)  
Alvi et al. Gated Recurrent Neural Network units (GRUs) + Convolution Neural Network (CNN) pH + DO + turbidity + TSS + ORP -N Luggage Point sewage treatment plant in Pinkenba Queensland RMSE = 0.04909 ± 0.0106
MAE = 0.01655 ± 0.0022
R2 = 0.9305 ± 0.0318 
2022.5 Alvi et al. (2022)  
AuthorsAlgorithms/ModelsInput variablesOutput variablesSources/Features of DataEffect of modelsYear of publictionReferences
Spérandio et al. ASM DO + ORP -N/-N Lab + Software Simulation Ammonia relative error = 3.94% 2004 Sperandio & Queinnec (2004)  
Sheng-guang et al. Combined model of mechanism and data-driven Soluble refractory organics, Soluble degradable organics, Soluble Oxygen, Heterotrophic bacteria Effluent COD  average error = 0.0326
standard deviation = 0.3933
R2 = 0.9685 
2008 Sheng-guang et al. (2008)  
Li and Yang et al. Radial Basis Function (RBF) neural network + genetic algorithm + Gradient descent method COD + TN + DO + T + HRT TN WWTP of Dingshu, Yixing relative errors = 1.65%-3.14% 2009.7 Li & Yang (2009)  
Liping and Boshnakov et al. BP neural network  Effluent COD/BOD  relative errors = 7.5%-12% 2010 Liping & Boshnakov (2010)  
Mulas et al. ordinary least squares regressions (OLSR)
partial least squares regression (PLSR)
local linear regression based on k-nearest neighbors (k-NN LLR) 
6 variables after PCA -N Viikinmäki WWTP RMSE = 0.14–0.46 2012 Mulas et al. (2012)  
Liu et al. PCA + JIT(just-in-time learning)-ENS(ensemble learning) 19 variables after PCA Effluent BOD5 Baecelona WWTP RMSE = 0.3825
r = 0.8991 
2013.4 Liu et al. (2013)  
Guo et al. PSO + Elman neural network -N、pH、T、MLSS SVI A WWTP in Beijing RMSE = 0.0509–0.1039 2014.6 Guo et al. (2014)  
Cong et al. Combined model of mechanism and data-driven SS、-N、Q、CODinf、DO Effluent COD A WWTP in Shenyang RMSE = 8.31 2015 Cong et al. (2015)  
Mari and Laskar et al. deep learning-based soft sensor (DLSS) DO TN BSM2 mean squared errors
MSE = 0.072–0.0825
r = 0.9852–0.9869 
2020 Mali & Laskar (2020)  
Wu et al. Lasso Regression + Time Difference-based Multi-kernel Relevance Vector Machine (MRVM) 20 variables for analysis BOD BSM1 model + a real WWTP RMSE = 7.0301
r = 0.9580 
2020.5 Wu et al. (2020)  
Schneider et al. feature detection algorithms pH/DO -N Lab + 3 real WWTP accuracy = 68%-94% 2019.6
2020.7 
Schneider et al. (2019); Schneider et al. (2020)  
Li et al. Gaussian process regression (GPR) and least squares support vector machine (LSSVM) algorithm, Kalman filter (KF) and moving window function (MW) Historical data SS、-N、-N、COD、BOD BSM1 model + a real WWTP RMSE = 0.013–89.654
R = 0.796–0.957
RMSSD = 2.984–11.135
RR = 0.697–0.745 
2021.2 Li et al. (2021)  
Ching et al. Xgboost Other index of the same side (influent or effluent) Influent and effluent BOD UCI Machine Learning Repository, a WWTP in Hong Kong (The slope of the data is high and contains extreme values) RMSE = 0.92–62.10 2022.2 Ching et al. (2022)  
Fox et al. neural network (NN)、Multiple Linear Regression (MLR) pH + ORP Effluent -N Lab R2 = 0.465–0.769
RMSE = 0.196–0.5 
2022.3 Fox et al. (2022)  
Alvi et al. Gated Recurrent Neural Network units (GRUs) + Convolution Neural Network (CNN) pH + DO + turbidity + TSS + ORP -N Luggage Point sewage treatment plant in Pinkenba Queensland RMSE = 0.04909 ± 0.0106
MAE = 0.01655 ± 0.0022
R2 = 0.9305 ± 0.0318 
2022.5 Alvi et al. (2022)  

The soft sensor based on the BagR algorithm has been shown to achieve a very good predictive effect in validation experiments of a variety of water quality indicators in many different contexts. RMSE is less than 1 in most cases, the average relative error of the MRE is less than 1% in nearly all scenarios, and the majority are less than 0.5%, leading to better predictions than the single input models (Schneider et al. 2020). Our algorithm has significant interpretability gaps compared to the ASM-based mechanistic model (Sperandio & Queinnec 2004), but the prediction accuracy is comparable, and data-driven models can quickly learn and adapt in complex and variable scenarios. In the rural wastewater treatment facility scene, the complex change in water quality and quantity and multi-parameter forecast demand make the mechanism model difficult to accurately compute. Therefore, the data-driven model is more feasible in practice.

Compared with other data-driven model soft sensors, the soft sensors designed in this study managed to catch up and surpass some of the studies in absolute accuracy, but did not reach the highest level in the same type of study. Compared to studies by Cong Qiumei et al., Jing Wu et al., Dong Li et al., P.M.L. Ching et al., Shane Fox et al., Liu et al. (Liu et al. 2013; Cong et al. 2015; Wu et al. 2020; Li et al. 2021; Ching et al. 2022; Fox et al. 2022), our sensor's value was enhanced or comparable to the optimal situation in these studies when using RMSE as an indicator of model accuracy characterization simultaneously. In comparison to studies using correlation coefficients or similar determinants in Table 5 as the key predictors of the accuracy of the model (Sheng-guang et al. 2008; Liu et al. 2013; Wu et al. 2020; Li et al. 2021), our sensors generally exhibit better prediction performance. Although we cannot be considered as surpassing previous studies simply because of the difference between the dataset and the measurement index used, some of the higher accuracy characterization values demonstrate the applicability and accuracy of the sensor designed in this study under certain conditions. On the other hand, many of the more algorithmically biased studies have been able to provide soft sensor model scenarios with higher accuracy or robustness than in the present study (Mulas et al. 2012; Guo et al. 2014; Mali & Laskar 2020; Zhu et al. 2020; Alvi et al. 2022). Complex models combining some of the mechanisms and data-driven methods can also further improve the accuracy of predictions. However, more accurate and complex models mean larger calculations and longer training and prediction times or higher computational costs. For the practical application scenario of water quality testing in rural wastewater treatment facilities, a balance should be sought between accuracy and cost, without the need for the highest predictive accuracy. The BagR model in this study has a moderate amount of data, a moderate number of parameters, a low complexity of the algorithm itself, and a low computational volume, enabling soft sensors to achieve training time of up to 10 min or less and prediction time of up to 20 ms, which greatly improves timeliness. Furthermore, the application of the fuzzy grading method also reduces the data monitor's accuracy requirement to the soft sensor level. Application practice under the actual conditions also demonstrates that the current accuracy of our sensors meets the requirements for use (this will be discussed in detail later). Therefore, although the accuracy is not up to the highest level, this study can balance the calculation cost and accuracy, and is the most suitable for real application scenarios.

Recommendations for the preparation of datasets

On the dataset of the model training, the method of generating data by the small laboratory test device is adopted in this study. This method can quickly generate matching datasets. By contrast, the time and economic cost of collecting and marking relevant data during the operation of actual sewage plants is relatively high. Thulane et al. reported that many studies have used large amounts of time series data during the training of soft sensors (Paepae et al. 2021). In most cases, the time series spanned months to years, and even some of them covered data for up to 49 years (Sepahvand et al. 2021), with frequency as high as once every 10 min (Wang et al. 2015). Although some of these studies used very large data over very long periods of time, many of which directly used routinely recorded data during the operation of wastewater treatment plants and were not calibrated in the field. Indeed, in the corresponding application scenario for the design of soft sensors, it is the most appropriate dataset to be processed into a training dataset if accurate operational monitoring data are available. However, under the application scenario of rural sewage treatment facilities in Wujin port, Changzhou, which corresponds to the present study, a large number of small and scattered sewage treatment facilities and the downstream channels involved have not been regularized and monitored before, and the data in the real scenario is completely missing. In such a scenario, significant time and measurement costs would be required if field data tagging were used. Moreover, the values of the indicators to be measured, COD, -N, -N and -P, need to be determined by manual measurement, which is a heavy workload and makes it difficult to guarantee the frequency of data collection. In this study, we take the training of generating data by small laboratory test device, and ensure the reliability of soft sensor data by double-checking the operation data of surrounding sewage plants which have been collected and centralized while avoiding additional field measurement. The soft sensor in this study is effective in predicting SBR input water quality in Changzhou Sewage Plant, and has been used successfully in Luodong, Qingduntou and Xiejiatou. This approach also reduces the development cycle to some extent.

Necessity and feasibility of integration with management

Since the fundamental principle of soft sensing is data-driven black-box models, errors and uncertainties in the estimation of the data are unavoidable. The design of soft sensors using the basic BagR algorithm does not add more data processing algorithms or model optimization algorithms, and the accuracy achieved is not the highest of its kind. However, the fuzzy grading method in the management process effectively reduces the need for model accuracy and allows our model to strike a balance between efficiency and accuracy. The soft sensor errors in this study were reduced through two channels. Firstly, in the calculation of the composite water quality index, the errors made in the estimates by the four soft sensors do not always appear in the results as a maximum of five individual indicators are used. Secondly, there is the contribution of the fuzzy grading method. The fuzzy grading method used in the actual application case of this study divides the comprehensive water quality index into six grades and has a large width. Our soft sensor test results showed that the MRE was less than 1% in most scenarios and less than 6% in all of the scenarios, which means that it can be approximated that errors in the data of only about twice the MRE ratio could lead to hierarchical changes. Thirdly, overall statistics are presented. In practical applications, the management requirement is the amount of various comprehensive pollution index ratings that must be elicited within a certain amount of data (i.e. over time). Thus, even if only one composite pollution index is affected by soft sensor misestimation and classification is altered, this effect will be mitigated in the quantitative analysis of high-amount data. Through the above three management methods and data requirements to reduce the estimation error, we designed a simple model of the soft sensor that can be applied in practice. Similarly, in scenarios that require specific and accurate data, more accurate machine learning algorithms and more efficient prediction models are what researchers are looking for. However, not all application scenarios require the absolute accuracy of the model. If conditions permit, the adoption of an appropriate management strategy and the design of a matching management system may weaken the requirement for data accuracy, to make the application of data-driven soft sensing more widespread and to some extent to realize the substitution of traditional methods.

This study takes the development of soft sensors as the starting point to solve the problem of real-time monitoring and management of rural sewage treatment facilities, and applies the research results to the management of actual facilities. The article analyses and draws the following conclusions:

  • (1)

    A laboratory pilot device designed based on the characteristics of actual scenarios can generate datasets that meet the requirements of soft sensor training.

  • (2)

    In the design of soft sensing sensors, it is recommended to use the BagR model, which combines relatively fast simulation speed and relatively high accuracy, and can match actual management needs.

  • (3)

    The method of fuzzy classification can effectively reduce the error of prediction results. Adjusting management strategies based on practical application needs can reduce the difficulty of developing soft sensors and improve their practicality.

  • (4)

    Using the soft sensors designed in this study, it was found that the water quality in Luodong and Qingdun was at a moderate level, while the water quality in Xiejiatou was at an inferior level in the actual measurement of three locations in the Wujin Port basin. The results demonstrate the feasibility of soft sensors in practical applications and provide data reference for local water quality supervision authorities.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Alvi
M.
,
French
T.
,
Cardell-Oliver
R.
,
Keymer
P.
&
Ward
A.
2022
Cost effective soft sensing for wastewater treatment facilities
.
Ieee Access
10
,
55694
55708
.
Breiman
L.
1996
Bagging predictors
.
Mach Learn
24
(
2
),
123
140
.
Ching
P. M. L.
,
So
R. H. Y.
&
Morck
T.
2021
Advances in soft sensors for wastewater treatment plants: A systematic review
.
J Water Process Eng
44
,
102367
.
Cong
Q.
,
Zhang
B.
&
Yuan
M.
2015
On-line soft sensor for water quality of wastewater based on synchronous clustering
.
Computer Engineering and Application
51
(
24
),
27
33
.
66
.
Dürrenmatt
D. J.
&
Gujer
W.
2012
Data-driven modeling approaches to support waste-water treatment plant operation
.
Environmental Modelling & Software
30
(
5
),
47
56
.
Grau
P.
,
Beltran
S.
,
de Gracia
M.
&
Ayesa
E.
2007
New mathematical procedure for the automatic estimation of influent characteristics in WWTPs
.
Water Sci Technol
56
(
8
),
95
106
.
Guo
M.
,
Geng
Y. N.
&
Han
H. G.
2014
A SVI Soft Sensor Model Based on Improved PSO-Elman Neural Network
. In:
2014 11th World Congress on Intelligent Control and Automation (Wcica)
, pp.
3545
3550
.
Haimi
H.
,
Mulas
M.
,
Corona
F.
&
Vahala
R.
2013
Data-derived soft-sensors for biological wastewater treatment plants: An overview
.
Environ Modell Softw
47
,
88
107
.
Henze
M.
,
Grady
C.
,
Gujer
W.
,
Marais
G.
&
Matsuo
T.
1987
Activated Sludge Model No 1
.
Hoerl
A. E.
&
Kennard
R. W.
1970a
Ridge regression – applications to nonorthogonal problems
.
Technometrics
12
(
1
),
69
.
Huang
J.
,
Wang
J. N.
,
Zhao
X. L.
,
Zhang
X. X.
&
Wei
W. L.
2020
Study on the evaluation method of operation performance for rural domestic wastewater treatment facilities
. In
2020 4th International Workshop on Renewable Energy and Development (Iwred 2020)
, p.
510
.
Kadlec
P.
,
Grbic
R.
&
Gabrys
B.
2011
Review of adaptation mechanisms for data-driven soft sensors
.
Comput Chem Eng
35
(
1
),
1
24
.
Li
M.-h.
&
Yang
M.
2009
Research and application of wastewater TN soft-sensor model
.
Automation & Instrumentation
24
(
9
),
1
4
.
Liang
H.
,
Liu
J.
,
Wei
Y.
,
Guo
X.
&
Shan
B.
2011
Investigation and analysis of rural wastewater discharge characteristics in three typical areas of China
.
Chinese Journal of Environmental Engineering
5
(
9
),
2054
2059
.
Liping
F.
&
Boshnakov
K.
2010
Neural-network-based water quality monitoring for wastewater treatment processes
.
Liu
P.
&
Shen
Z.
2015
The status and prospect of cost effectiveness analysis of rural wastewater treatment
.
Journal of Fudan University. Natural Sciences
54
(
1
),
91
97
.
Mali
B.
&
Laskar
S. H.
2020
Deep learning based automatic maintenance of soft sensors used in wastewater treatment plants
.
Mulas
M.
,
Corona
F.
,
Haimi
H.
,
Sundell
L.
,
Heinonen
M.
&
Vahala
R.
2012
Nitrate estimation in the denitrifying post-filtration unit of a municipal wastewater treatment plant: The Viikinmaki case
.
Water Sci Technol
65
(
8
),
1521
1529
.
Nair
A.
,
Hykkerud
A.
&
Ratnaweera
H.
2022
Estimating phosphorus and COD concentrations using a hybrid soft sensor: A case study in a Norwegian municipal wastewater treatment plant
.
Water-Sui
14
(
3
),
332
.
Pattanayak
A. S.
,
Pattnaik
B. S.
,
Udgata
S. K.
&
Panda
A. K.
2020
Development of chemical oxygen on demand (COD) soft sensor using edge intelligence
.
Ieee Sens J
20
(
24
),
14892
14902
.
Pattnaik
B. S.
,
Pattanayak
A. S.
,
Udgata
S. K.
&
Panda
A. K.
2021
Machine learning based soft sensor model for BOD estimation using intelligence at edge
.
Complex Intell Syst
7
(
2
),
961
976
.
Qing
X.
&
Yu
J.
2005
Soft sensors and it's use in wastewater treatment systems
.
Industrial Water Treatment
25
(
3
),
13
16
.
Quinlan
J. R.
1996
Improved use of continuous attributes in C4.5
.
J Artif Intell Res
4
,
77
90
.
Schilling
K. E.
,
Kim
S. W.
&
Jones
C. S.
2017
Use of water quality surrogates to estimate total phosphorus concentrations in Iowa rivers
.
J Hydrol-Reg Stud
12
,
111
121
.
Schneider
M. Y.
,
Furrer
V.
,
Sprenger
E.
,
Carbajal
J. P.
,
Villez
K.
&
Maurer
M.
2020
Benchmarking soft sensors for remote monitoring of on-site wastewater treatment plants
.
Environ Sci Technol
54
(
17
),
10840
10849
.
Sepahvand
A.
,
Singh
B.
,
Sihag
P.
,
Nazari Samani
A.
,
Ahmadi
H.
&
Fiz Nia
S.
2021
Assessment of the various soft computing techniques to predict sodium absorption ratio (SAR)
.
ISH Journal of Hydraulic Engineering
27
(
S1
),
124
135
.
Sheng-guang
W.
,
Yong-zai
L. V.
&
Peng
C.
2008
Studies on hybrid soft sensor modeling and simulation for Wastewater treatment process
.
Microcomputer Information Integration of Management and Control
24
(
30
),
150
152
.
Viviano
G.
,
Salerno
F.
,
Manfredi
E. C.
,
Polesello
S.
,
Valsecchi
S.
&
Tartari
G.
2014
Surrogate measures for providing high frequency estimates of total phosphorus concentrations in urban watersheds
.
Water Res
64
,
265
277
.
Wang
Z. J.
,
Zhao
Z.
,
Li
D.
&
Cui
L.
2015
Data-driven soft sensor modeling for algal blooms monitoring
.
Ieee Sens J
15
(
1
),
579
590
.
Wu
J.
,
Cheng
H. C.
,
Liu
Y. Q.
,
Liu
B.
&
Huang
D. P.
2019
Modeling of adaptive multi-output soft-sensors with applications in wastewater treatments
.
Ieee Access
7
,
161887
161898
.
Yizhou
F.
&
Liming
X.
2019
Research on Rural Wastewater Treatment Evaluation System Based on AHP Method
. In
IOP Conference Series: Earth and Environmental Science
, Vol.
300
, pp.
032048 (032046 pp.)
.
Zhu
X. L.
,
Rehman
K. U.
,
Wang
B.
&
Shahzad
M.
2020
Modern soft-sensing modeling methods for fermentation processes
.
Sensors-Basel
20
(
6
),
1771
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).