Abstract
In order to improve the efficiency of urban drinking water safety monitoring and early warning management, a pollution risk early warning model of urban drinking water supply chain is proposed. Firstly, the current situation of urban drinking water supply is analyzed and the causes of pollution are analyzed. Then, the autoregressive model is used to predict the time series of multiple water quality indicators by constantly introducing new monitoring data modes for the residual vector group, the outlier scores of each vector group are obtained by using the isolated forest algorithm to judge whether the water quality is abnormal or not, and the fuzzy comprehensive evaluation method is used to evaluate the level of the abnormal situation and carry out the corresponding level early warning. The experimental results show that the area under the receiver operating characteristic curve can reach 0.919 when using the prediction residual vector group of turbidity and conductivity to detect the numerical changes of water quality parameters in the drinking water supply chain, accurately predict the abnormal data, make early warning, and provide the guarantee for the survival of urban residents and urban development.
HIGHLIGHTS
Current situation of urban drinking water supply.
The causes of pollution.
AR model (autoregressive model).
Isolated forest algorithm.
Accurately predict the abnormal data
Graphical Abstract
INTRODUCTION
Water is the life of a city, and water source is the most basic condition for the survival and development of urban residents. The risk assessment and early warning management of water source security is the basis for the safety of urban public water supply and the safe operation of urban economic and social system. However, for a long time, the worsening water pollution problem has always been an important threat to the safety of water sources in most cities at home and abroad. These water pollutions not only include conventional pollution such as industrial sewage, domestic wastewater and agricultural non-point source, but also include sudden water pollution such as terrorism, ship chemicals and oil leakage, industrial accident discharge, storm runoff pollution and poisoning (Bai et al. 2018; Díez-Del-Molino et al. 2018; McComb et al. 2019).
The research on risk assessment and safety early warning management of urban water sources in foreign countries has been carried out rapidly since the promulgation of the World Health Organization (WHO) drinking water guidelines in 2004. The WHO has developed some guidelines for drinking water quality, which are international reference points for standard setting and drinking water safety. The latest guidelines developed by WHO are those agreed in Geneva in 1993. There are no guidelines for certain elements and substances considered. This is because there is not enough research on the effects of substances on organisms, so it is impossible to define the guidance limit. In other cases, there are no guidelines because the substance cannot reach dangerous concentrations in water due to its insolubility or scarcity. The purpose of the study is to monitor and warn pollutants, risk assessment and risk warning management (Ahmed et al. 2018; Khatmullina et al. 2018). The purpose of early warning is to ensure the water quality safety and sufficient water supply in the whole process of the water supply system from water source to user faucet, and gradually from ‘emphasizing the importance of building a robust water supply system’ (Diendere et al. 2018) to ‘using specific models to analyze and evaluate its safety’ (Liu et al. 2019). At present, water source risk assessment studies have focused on two aspects: one is the research on the relationship between water supply system and human health; The other is the assessment of water supply system's own security and vulnerability (Wilson et al. 2018).
In China, the current research focuses on water quality health risk assessment, index comprehensive evaluation, emergency management system construction, risk response measures implementation and research and development, etc. The comprehensive risk assessment and early warning management of the whole process of urban water supply chain has not been paid enough attention.
Compared with foreign countries, domestic water quality prediction research started relatively late. Chen Yue studied the water quality anomaly detection method based on radial basis function (RBF) neural network prediction and wavelet de-noising, and used fuzzy c-means clustering (FCM) for anomaly classification. The RBF neural network is introduced to predict the future water quality value by using the previous water quality observation value. The time series of wavelet denoising residual is obtained by comparing the predicted value with the actual value, and then wavelet denoising is used in the sliding window. If the new residual is greater than the specific threshold/baseline, the water quality is abnormal. Taking online monitoring of ammonia nitrogen value as the research object, it is proved that the algorithm has a lower false alarm rate (far) and a higher detection probability (PD) compared with time series increment. The performance of water quality anomaly detection algorithm was further improved (Hou et al. 2013). He proposed a water quality anomaly detection algorithm based on multifactor fusion. Firstly, the autoregressive model was used to predict the water quality indicators, and then fuzzy c-means clustering analysis was carried out by fusing the prediction residuals of various sensitive water quality indicators. Through comparison, it was proved that the anomaly detection performance of the proposed method was better than the traditional autoregressive prediction method and multidimensional Euclidean distance method (He 2013).
Wang et al. (2019) proposed a supervised learning-based UV–visible spectral water quality anomaly detection method. The method obtains the normal sample difference space in different data sets, and then uses an orthogonal projection method to remove the spectral data components in the difference space to achieve the purpose of baseline correction. Then, partial least squares discriminant analysis is used to extract features from the corrected spectra. The optimal threshold obtained from the training set is used to determine outlier points; Finally, sequential Bayesian rolling updating is used to update the abnormal probability at each time to determine the water quality alarm sequence. The background difference of different batches of water quality spectrum is eliminated, the spectral information is more fully utilized, and the detection lower limit of characteristic pollutants is reduced. Yin et al. (2019) proposed a method of water quality anomaly detection based on supervised learning. The method firstly obtains the difference space of normal samples in different data sets, and then uses the orthogonal projection method to remove the spectral data components in the difference space to achieve the purpose of baseline correction; then partial least squares discriminant analysis is used to extract features from the corrected spectra, and the optimal threshold obtained from the training set is used to determine outliers. Finally, the water quality alarm sequence is determined by using a sequential Bayesian rolling update to update the abnormal probability at each time (Liu et al. 2019).
Yang et al. (2018) regarded the parameter calibration of the prediction model as a Bayesian estimation problem, and the posterior probability density function of the parameters is obtained according to the finite difference method and Bayesian inference, and then the more reasonable parameter value is obtained by the improved metropolis Hastings sampling method The results show that the new method is more suitable for the prediction of water pollution.
On this basis, an early warning model of pollution risk in urban drinking water supply chain is proposed. The AR model and the isolated forest algorithm are used to analyze the abnormal scores of water quality in the water supply network system of the urban drinking water supply chain (Zuo et al. 2020). The fuzzy comprehensive evaluation method is used to analyze the abnormal grade and make different grade early warning. The effectiveness of the proposed method is verified by an actual case.
ANALYSIS OF CURRENT SITUATION OF URBAN WATER SUPPLY
Drinking water safety is a prominent environmental problem, which is directly related to people's life and health. Drinking water pollution accidents seriously affect the normal drinking water of residents, and even lead to the outbreak of water-borne diseases or pollution poisoning. Drinking water safety is directly related to the health of the masses. Poor drinking water quality can cause a variety of diseases. Studies have shown that 80% of human diseases are related to water (Jindal et al. 2018). At present, there are two major problems of drinking water in the world.
Serious shortage of drinking water resources
China's total water resources rank sixth in the world, but China has a large population. If the per capita water resources are calculated, the per capita occupancy is only 2,500 cubic meters, about 1/4 of the world's per capita water resources, ranking 110th in the world. Water resource constraints restrict the scale and growth speed of economic development, and the resource constraints caused by water resource shortage will restrain short-term economic development and often become the ‘bottleneck’ of short-term economic development. With the acceleration of modernization, the sharp increase of world urban population, especially the improvement of people's living standards, the shortage of water resources has become a major problem restricting urban development and affecting people's quality of life (Shi et al. 2018).
Water pollution is increasing
Due to the increasingly serious environmental pollution, urban drinking water sources are polluted to varying degrees. The city is the center of economic life, often with a large amount of sewage discharge. In addition, the level of urban sewage treatment is not high. The situation of urban water environment is very serious. The river section flowing through the city is polluted more seriously, and the water quality of urban lakes is poor. Considerable numbers of urban water sources are seriously polluted, directly threatening the safety of drinking water (Burnett et al. 2014).
Urban drinking water supply mode
Classification of urban water supply chain
The urban water supply chain is mainly divided into three categories: The first is centralized treatment and unified supply (a set of pipe network). In terms of water quality control, an advanced treatment process is added after the current conventional treatment process to make the produced water meet the standard of direct drinking water and be delivered to the household through high-quality pipe networks. The second is centralized treatment, and the water supply by quality (two sets of pipe networks) is also called the urban separate quality supply water, using two sets of water supply pipe network systems, with drinking water system as the main water supply system of the city, the water supplied can be directly drunk. The third type is subquality water supply (also known as ‘pipeline direct drinking water’) refers to the depth of urban tap water or water that meets the quality standard of the drinking water source by setting up special water treatment stations in residential areas and using advanced water treatment technologies (Chen et al. 2021). At the same time, high-quality pipes and a set of independent circulation networks are used to transport the purified high-quality water to users for direct drinking, to ensure that the water quality is sanitary, stable and fresh. The system only needs a small amount of water for advanced treatment, and the water purification station is located in the water supply area, so it can use high-quality pipes (accessories) to send to the user points in the shortest distance.
POLLUTION RISK EARLY WARNING MODEL OF URBAN DRINKING WATER SUPPLY CHAIN
Causes and characteristics of abnormal water quality
Time series variation characteristics of abnormal water quality in a drinking water supply chain pipe network.
Time series variation characteristics of abnormal water quality in a drinking water supply chain pipe network.
Abnormal water quality detection process
Water quality time series prediction based on the autoregressive model
Among them, is
is the coefficient of the historical observation value,
is the constant value, and
is the error value. The residual vector group of each index is obtained.
The process of time series prediction of water quality index by AR model is as follows:
- (1)
Stationary detection of time series: augmented Dickey fuller test (ADF) is used to detect the stationarity of time series. When the assumed value (P-value) is less than 0.05 (α = 0.05), the null hypothesis with unit root is rejected, and the time series is considered to be stable;
- (2)Bayesian information criterion (BIC) is used to determine the order p of the model. The AR model is as follows in Equation (2):

- (3)
The AR model of order p is established;
- (4)
Based on the historical monitoring data, the current water quality data are predicted. The specific process is shown in Figure 4.
Analysis of water quality outliers based on forest isolation algorithm

Among them, H() is the harmonic number, which can be estimated by ln() + 0.5772156649 (Euler constant); C(δ) is the average path length of binary search tree (BST), which is used for standardization of H(x); E(h(x)) is the average path length of all isolated trees in the isolated forest. If a detected water quality data point reaches the external node and the external node contains multiple data, the path length is adjusted according to the number of data points to make up for the subtree that is not built below the depth of the tree. The evaluation rule based on the abnormal score is: when the abnormal score s(x, δ) is close to 1, it must be an abnormal point. If s(x,δ) is less than 0.5, it can be regarded as normal data. If the abnormal score s(x, δ) of all water quality data points is about 0.5, there is no obvious abnormality in the residual vector group of each index monitored. Based on the abnormal score, the most likely abnormal data in the sample can be obtained. In the training phase, the isolated tree structure and segmentation conditions are returned, and the residual vector group data of each index outside the sample can be used to calculate the abnormal score to judge whether it is abnormal or not, and make early warning judgment.
Early warning model based on fuzzy comprehensive evaluation
After getting the test data analysis and prediction results, the fuzzy comprehensive evaluation method is used for subjective evaluation. The fuzzy relationship between the evaluation index and the evaluation level is described by the membership function, which effectively deals with the objective fuzzy problem and the subjective problem of the subjective evaluation. The risk assessment of urban drinking water pollution by fuzzy comprehensive evaluation includes the following five steps:
- (1)
If there are n indexes in the index system, then a series of indexes can be expressed in the form of vector as u = {u1, u2, un}.
- (2)
Determine the evaluation grade system, if the evaluation level has m levels, then the evaluation grade can be expressed in the form of vector as v = {v1, v2, vm}.
- (3)The form of fuzzy relation matrix R is shown in Equation (5):
Each row in the matrix R represents the degree to which each evaluation index belongs to m evaluation levels.
- (4)
In order to determine the weight of evaluation index, the weight of each index needs to be determined. The weight vector of N indexes can be expressed as w = {w1, w2, wm}.
- (5)The comprehensive membership function and index weight of fuzzy comprehensive evaluation are calculated. The specific formula is shown in Equation (6):

- (6)
According to the principle of maximum membership degree, the evaluation result is the grade corresponding to the maximum value in the fuzzy comprehensive evaluation vector, and is used as the basis for early warning level evaluation.
CASE ANALYSIS
Study area and monitoring data



Basic statistical characteristics of data
Statistical characteristics . | Turbidity/(NTU) . | Conductivity/(μs/cm) . | Dissolved oxygen/(mg/L) . |
---|---|---|---|
Maximum | 1.23 | 520 | 14.8 |
Minimum | 0 | 429 | 7.9 |
Average | 0.46 | 492.36 | 11.54 |
Variance | 0.48 | 15.99 | 1.33 |
Statistical characteristics . | Turbidity/(NTU) . | Conductivity/(μs/cm) . | Dissolved oxygen/(mg/L) . |
---|---|---|---|
Maximum | 1.23 | 520 | 14.8 |
Minimum | 0 | 429 | 7.9 |
Average | 0.46 | 492.36 | 11.54 |
Variance | 0.48 | 15.99 | 1.33 |
Water quality time series prediction
BIC corresponding to the AR model with different orders of turbidity, conductivity and dissolved oxygen. (a) Order of turbidity AR model. (b) Order of AR model for conductivity. (c) Order of AR model for dissolved oxygen.
BIC corresponding to the AR model with different orders of turbidity, conductivity and dissolved oxygen. (a) Order of turbidity AR model. (b) Order of AR model for conductivity. (c) Order of AR model for dissolved oxygen.
Standardized value, AR model prediction value and residual error of water quality time series monitoring.
Standardized value, AR model prediction value and residual error of water quality time series monitoring.
The prediction value is compared with the actual monitoring value and the corresponding residual time series. In general, the AR model has good prediction effect in the stable fluctuation stage of water quality index, and can track the change of water quality data better. In the later stage of water quality anomaly, when the water quality index changes suddenly, the prediction residual time series of turbidity and conductivity increase and decrease obviously in this stage, the absolute value of residual is 11.35 and 2.11 respectively, while the dissolved oxygen concentration in the river is not affected by the abnormal event, the residual sequence has no obvious change, and the absolute value of residual is the largest 07. Table 2 shows the residual statistical results of water quality time series from June 13 to June 17 in the prediction stage. It can be seen from the table that the average absolute error, mean square error and root mean square error are small, which indicates that the AR model can better predict the water quality time series.
Prediction and evaluation of standard water quality parameters by AR model
Evaluating indicator . | Turbidity/(NTU) . | Conductivity/(μs/cm) . | Dissolved oxygen/(mg/L) . |
---|---|---|---|
Mean absolute error (MAE) | 0.1086 | 0.0453 | 0.0282 |
Mean square error (MSE) | 0.0287 | 0.0069 | 0.0011 |
Root mean square error (RMSE) | 0.1694 | 0.0831 | 0.0332 |
Evaluating indicator . | Turbidity/(NTU) . | Conductivity/(μs/cm) . | Dissolved oxygen/(mg/L) . |
---|---|---|---|
Mean absolute error (MAE) | 0.1086 | 0.0453 | 0.0282 |
Mean square error (MSE) | 0.0287 | 0.0069 | 0.0011 |
Root mean square error (RMSE) | 0.1694 | 0.0831 | 0.0332 |
Abnormal detection of water quality time series
In this paper, the outlier detection is carried out for the residual vector group of pairwise combination of residual time series by using the isolated forest algorithm, and the abnormal water quality can be judged according to the abnormal score. Outlier detection of isolated forest includes two stages (Yu et al. 2020). The first stage is the training stage. In total, 256 subsamples are randomly selected from 2,400 groups of residual vector groups in the first 25 days of time series to construct an isolated tree with maximum depth of 8. In the same way, 100 i-Trees are constructed to form an iForest, and the abnormal scores of each vector group in the training set are obtained.
Water quality time series anomaly detection performance
TP stands for true positive (the actual water quality is abnormal, and the detection result is abnormal); FN (false positive) represents false negative (the actual water quality is abnormal and the detection result is normal); FP (false positive) represents false positive (the actual water quality is normal and the detection result is abnormal); TN (true negative) represents true negative (the actual water quality is normal and the detection result is normal). The area under the ROC curve (AUC) is the area enclosed by the ROC curve and the abscissa. The larger the area, the better the performance of the anomaly detection algorithm and the higher the warning accuracy.
Performance evaluation of the ROC curve for abnormal detection of different water quality index combinations.
Performance evaluation of the ROC curve for abnormal detection of different water quality index combinations.
Performance evaluation of single index abnormal detection ROC curve.
Performance evaluation of ROC curve for abnormal detection of three indexes.
The AUC was 0.821, 0.816 and 0.509 when using only the predicted residual of turbidity, conductivity and dissolved oxygen. The AUC is 0.902 when the three-dimensional vector group composed of the predicted residuals of three water quality indexes is used for anomaly detection. It can be seen that the AUC will have a very small decrease when the index without abnormal change is added for anomaly detection, which basically does not affect the performance of anomaly detection. When the abnormal detection of water quality indicators responding to abnormal events is increased, the AUC will increase significantly. Therefore, we can consider the real-time prediction of as many online monitoring indicators as possible, and integrate the prediction residuals of all indicators for anomaly detection, to increase the detection rate of abnormal water quality events and improve the early warning efficiency.
In conclusion, it can be seen that if the number of high-frequency water quality index changes caused by abnormal water quality increases, the greater the change range, the detection effect of abnormal situation is more obvious, which can better realize the early warning of water quality abnormal events.
CONCLUSION
Through the analysis of water quality time series, abnormal water quality events can be identified in time, to realize the supervision of water environment quality and the timely warning and prediction of water pollution events:
- (1)
The method can judge the degree of water quality abnormity dynamically, and the fuzzy comprehensive evaluation method can classify the abnormal degree of surface water quality, and realize the classification early warning.
- (2)
The proposed method is used to detect the abnormal turbidity, conductivity and dissolved oxygen time series of urban drinking water. The AR prediction model is established and the predicted residual error is obtained. The outlier detection is carried out by using the isolated forest algorithm. The results show that the AUC can reach 0.919 when using the prediction residual vector group of turbidity and conductivity to detect the isolated forest anomaly, which verifies that the method has good anomaly detection performance. For the turbidity and conductivity time series with low anomaly degree, the AUC decreases to 0.745 when the same method is used for anomaly detection.
- (3)
The more the number of water quality index changes caused by water quality abnormality, the greater the change range, the higher the detection rate of abnormal events, and the stronger the ability of early warning.
ACKNOWLEDGEMENTS
The authors are grateful to the support of 2021 Henan Provincial Water Conservancy Science and Technology Tackle Project (No. 2021063). Special thanks to the reviewers for their constructive comments and suggestions in improving the quality of this manuscript.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.