Abstract
To enhance the quality of life and ensure sustainability in crowded cities, safe management of drinking water using cutting-edge technologies is a priority. This study developed an intelligent early warning system (EWS) for alarming and controlling risks from bacteria and disinfection byproducts in a drinking water distribution system (DWDS), named BARCS (Bacterial Risk Controlling System). BARCS adopts an artificial intelligence (AI) approach to data-driven prediction and considers total chlorine (TCl) concentration as the pivot indicator for risk identification and control. First, the machine learning-based AI model in BARCS can provide a reliable prediction of TCl concentration in a DWDS, with an average R2 of 0.64 for the validation set, while offering great flexibility for BARCS to adapt to various conditions. Second, TCl concentration was proven to be a good indicator of bacterial risk in a DWDS, as well as a cost-effective surrogate variable to assess disinfection byproduct risk. Third, the robustness analysis demonstrates that with state-of-the-art water quality monitoring technologies, online implementation of BARCS in real-world settings is feasible. Overall, BARCS represents a promising solution to the safe management of drinking water in future smart cities.
HIGHLIGHTS
BARCS predicts and regulates bacterial risk in pipelines.
TCl prediction module harvests an average R2 of 0.64 for the validation set.
The relationship between TCl and CFU is quantified.
Online implementation of BARCS in real-world settings is feasible.
INTRODUCTION
As reported by the United Nations, 55% of the world's population was urban as of 2018, and 68% is projected to be urban by 2050 (WUP 2018). It is crucial to enhance the quality of citizens’ lives and ensure sustainability in cities with technology developments, and the new concept of smart cities (Albino et al. 2015; Peng et al. 2017; Gascó-Hernandez 2018) serves this purpose nicely. Although no unanimous definition exists for smart cities, it is generally accepted that smart cities intelligently use information and communication technology within an interactive infrastructure to provide advanced and innovative services to citizens, impacting quality of life and sustainable management of natural resources (Shi et al. 2018b; Ismagilova et al. 2019; Jiang et al. 2019). Artificial intelligence (AI) is one of the cornerstones of smart cities. More and more machine learning methods are being applied to research in the field of environmental science. Machine learning methods are based on data-driven techniques that have the advantages of fast computing speed, high accuracy, and it does not require complex physical processes. In the past decade, the widely used and discussed machine learning techniques in environmental science include Random Forest, Convolutional Neural Networks, Long Short-Term Memory Artificial Neural Networks, and Support Vector Machines as well. Sagan et al. (2020) developed an Artificial Neural Networks model for water quality monitoring using remotely sensed data. Ballesté et al. (2020) use Random Deep Forest for source classification of environmental samples and source of contamination determination in unknown environmental samples. However, while urban traffic (Hu et al. 2022), energy (Liang et al. 2020), and even surface water pollution (Shi et al. 2018b) systems have been the focus of smart city development, the water supply system, another pillar system of modern cities, has received much less attention. A blueprint for water supply systems in future smart cities is yet to be drawn.
Improving drinking water safety ought to be a priority in smart city development, as water pollution has been a global threat to drinking water supply (Vörösmarty et al. 2010; Wang et al. 2021). Bacteria have been identified as the most common type of etiologic agent driving drinking water-associated outbreaks (Beer et al. 2015). For example, more than 60% of such outbreaks in the United States over the years have been related to bacteria (Hlavsa et al. 2011; Benedict et al. 2017). Chlorination using disinfectants such as chlorine, chloramine, and chlorine dioxide has been widely implemented to control bacteria in drinking water distribution systems (DWDSs) for decades (Wolfe et al. 1984; Aieta & Berg 1986), and adjusting the dose of disinfectant used during the water treatment process is a common strategy to inhibit bacterial regrowth in DWDSs. However, the disinfectants can react with natural organic matter to form hazardous disinfection byproducts (DBPs) (Wang et al. 2019). At present, more than 600 DBPs have been identified (Li & Mitch 2018), some of which, such as haloacetic acids (HAAs) and halomethane, are classified as carcinogenic, mutagenic, or toxic for reproduction. Therefore, drinking water disinfected by chlorination encounters the dual risks of bacteria and DBPs. An intelligent system for alarming the dual risks and consequently controlling them by adjusting the total chlorine (TCl) concentration level in DWDSs is highly desired, but significant barriers remain, including the practical difficulty in having a sufficient number of monitoring points, as well as the methodological deficiency in continuous and high-frequency monitoring of bacteria and DBPs in DWDSs.
In China's developed cities, for example, the total bacterial number is typically measured once a day at the water treatment plant, and once to twice a month at limited sampling locations in the pipeline, through manual sampling. The analyses of bacteria and DBPs are typically performed in laboratory facilities. Classic culture-based methods, such as heterotrophic plate count analysis and assimilable organic carbon analysis, require labor-intensive and time-consuming processes, with a typical wait time of several days. Molecular techniques such as quantitative real-time polymerase chain reaction analysis and next-generation sequencing have been applied by researchers to study microbial ecology in DWDSs (e.g., Goraj et al. 2020). These techniques are costly and require experienced operators, making them inapplicable in routine analysis by water utilities. In addition, they still need at least several hours to acquire data. Adenosine triphosphate and flow cytometry are emerging techniques for determining the bacterial activity of drinking water (Zhang et al. 2019). The Adenosine triphosphate method can be used to determine aquatic bacterial activity in minutes (Delahaye et al. 2003). However, this method is sensitive to all living cells, including bacteria as well as cells from fungi, plants, and animals. Moreover, assays for the adenosine triphosphate method require strict storage temperatures of −20 to −80 °C, a condition that is difficult to achieve for onsite and online analysis (ASTM D7463-18 2018). The flow cytometry method can provide bacterial cell concentrations within 15 min and is considered an optimally available technique to replace heterotrophic plate count as a routine assessment of water utility (Nevel et al. 2017). However, the flow cytometry method has a high cost, preventing it from wide installation in DWDSs. Similarly, the concentrations of DBPs must be determined by gas chromatography, gas chromatography‒mass spectrometry, and high-performance liquid chromatography after a complex pretreatment process. Thus, the existing approaches only enable aftermath assessment but do not risk alarming before a potential accident.
While the analyses of bacteria and DBPs are complex, other water quality parameters, such as chlorine, pH, water temperature, and turbidity, can be readily monitored using online instrumentation. Currently, monitoring networks with in situ sensors are commonly constructed in developed regions, producing a large amount of high-frequency online water quality data. This new data source enables surrogate monitoring which refers to using high-frequency in situ monitoring parameters to generate high-frequency estimates of other parameters that are not commonly measured in situ or measured with low frequency due to the constraints of the analysis procedure (Jones et al. 2011). For example, Shi et al. (2018b) performed surrogate monitoring on the surface water quality of the Potomac River, using high-frequency monitoring data (recorded at a fixed interval of 15–60 min and transmitted every hour) of water temperature, specific conductance, and turbidity to infer concentrations of total phosphorus via a regression approach. He et al. (2021) built the artificial neural network and logistic regression models to predict microcystins based on 10 regular water quality parameters. However, studies on surrogate monitoring for bacteria in DWDSs are still very rare. Bacterial regrowth in DWDSs depends on the physical, chemical, and environmental states of water as well as the operating conditions (Berry et al. 2006). TCl, assimilable organic carbon, temperature, pH, conductivity, turbidity, distance from the water plant (DIST), pipe properties, and biofilm flaking have all been mentioned in previous studies as predictors for bacterial regrowth (e.g., Zhang et al. 2016; Seo et al. 2019; Baek et al. 2020; Vavourakis et al. 2020; Wu et al. 2020) and therefore may be considered in surrogate monitoring of bacterial risk in DWDSs.
In this study, we proposed a closed-loop monitoring-prediction-warning-advising system for controlling the dual risk of bacteria and DBPs in urban DWDSs, named BARCS (Bacterial Risk Control System). BARCS adopts AI approaches to make data-driven predictions of bacterial risk levels based on water quality parameters that can be readily measured online through in situ water quality sensors, and informs the regulation of the TCl concentration at the water plant outlet (hereafter denoted as TCloutlet), a key variable for risk control. A case study was conducted in a megacity in South China to demonstrate the applicability of BARCS in real-world settings, as well as its potential to safeguard the drinking water supply in densely populated cities. The key questions addressed in this study include the following. (1) Can AI provide a reliable prediction of the bacterial risk in urban DWDSs based on regular water quality parameters? (2) Can state-of-the-art online water quality monitoring technologies enable the implementation of BARCS in real-world settings? Overall, this study presents a promising solution to the safe management of drinking water in future smart cities.
METHODS
Framework of BARCS
Study area and data
Based on routine manual sampling and laboratory analysis by the local water supply utility between 2014 and 2016, we compiled a dataset of 4,382 water samples at 280 locations (see Figure 2) within the DWDS. A total of 197 water quality parameters were measured (not for all the samples), including water temperature (Twater), turbidity, pH, total bacteria (TB) in colony-forming units (CFUs), TCl concentration, nitrate (), ammonia (), and many others. All the parameters were analyzed in accordance with the Chinese national standard GB/T 5750. Four subdatasets were further retrieved from the original dataset for different study purposes. Subdataset 1 is for identifying key input variables of the TCl prediction module, which has 199 samples with measurements of , , chemical oxygen demand (COD), chloride, TOC, TCl, TCloutlet of the respective water plant, pH, Water temperature, and turbidity. Subdataset 2 is for building the TCl prediction module, which contains 1,120 samples with measurements of pH, turbidity, TCl, TCloutlet of the respective water plant, and water temperature. Subdataset 3 contains 4,382 data pairs of total bacteria level and TCl concentration, which was used to assess bacterial risk based on TCl concentration (i.e., the bacterial risk identification module in Figure 2). Only 3 out of the 4,382 samples had a higher total bacteria level than required by the Chinese national standard (≤100 CFU/ml), and most of them had a zero total bacteria value. Subdataset 4 has 241 data pairs of HAAs and TCl, which was used to assess the risk of DBP HAAs being good indicators of chlorination byproducts (WHO 2011), and adequate data of HAAs are available in this case study. Supplementary Table S1 summarizes the statistics of the four subdatasets.
Implementation of BARCS
Predicting TCl concentrations using machine learning
In this study, the ITI values between the TCl concentration and the candidate input variables were calculated. A larger value of ITI indicates a higher power of the variable in predicting the TCl concentration. Eventually, based on ITI values, plant ID, pH, turbidity, TCloutlet, DIST, and water temperature were selected as the predictors (see Section 3.1 for details), and subdataset 2 was then used to further train and validate the SVR model for the TCl prediction module.
A 5-fold cross-validation was conducted on subdataset 2, with a data segmentation of 4:1 (training:validation). To test the generalization of the model, seven transfer learning experiments were also conducted. In each transfer learning experiment, the data associated with six of the seven water plants were used for training, while the data associated with the left plant were reserved for validation. We used an open-source machine learning library, scikit-learn, from Python to implement SVR modeling with an RBF kernel (https://scikit-learn.org/stable/modules/svm.html#svm-regression). ‘Grid-search’ was used to find the optimal parameters for the SVR model, and the final settings were kernel = rbf, γ = 0.1, C = 0.1, and = 0.1.
The validated SVR model was used to predict the TCl concentration at the 337 sampling locations (see Figure 2). These locations are controlling nodes of the DWDS, which were determined by the local water utility. It is assumed that if the bacterial risk at these locations is well controlled, the entire DWDS would be secured with confidence. These locations are also where online sensors, such as the hardware portion of BARCS, can be equipped. Note that determining the controlling nodes in a DWDS is a prerequisite for the implementation of BARCS and needs separate work. Some advanced approaches to the optimal design of a monitoring network can be found in the literature (Jiang et al. 2020a).
Bacterial risk assessment
To avoid bacterial risk, the lower limit of the TCl concentration, TClTB, for the target DWDS needs to be determined by the bacterial risk identification module in BARCS (Figure 1). In this case study, we took the following steps to build the risk identification module based on subdataset 3:
- (i)
The 4,382 samples were split into groups of TCl concentration level with a fixed interval, ΔTCl = 0.05 mg/l Cl, determined by Scott's choice method (Scott 1979). Supplementary method 1 in the Supplementary material provides details about Scott's choice method.
- (ii)
The ratio of samples not exceeding a given control level of total bacteria was calculated for each group, defined as the nonexceedance probability of total bacteria (denoted as P).
- (iii)
For different control levels of total bacteria, regression models were built between P and the maximum TCl concentration in each group. This study considered four control levels, including 0 CFU/ml (i.e., nondetection), 20 CFU/ml (the local standard for direct drinking water), 50 CFU/ml (the local standard for the outlet water of water plants), and 100 CFU/ml (the Chinese national standard for the outlet water of water plants).
- (iv)
Given a control target (e.g., TB ≤ 20 CFU/ml) and an acceptable confidence level α (e.g., α ≥ 0.95), the lower limit of TCl concentration to avoid bacterial risk can be determined although the corresponding regression model by setting P equal to α.
It is worth emphasizing that BARCS is a general framework and other approaches to building the risk identification module, depending on the actual data condition, can be readily accommodated by BARCS as well.
DBP risk assessment
When the data condition allows, data-driven prediction of DBP species can be performed (Sadiq & Rodriguez 2011) in BARCS to assess DBP risk. However, monitoring data of DBPs in DWDSs are usually insufficient for making such predictions. As DBPs form through the reaction between Natural Organic Matter (NOM) and chlorine-containing disinfectants, a higher TCl concentration is generally associated with a more significant risk of DBPs. Thus, an alternative approach that BARCS can adopt is to identify an upper limit of the TCl concentration, denoted TClDBP, above which the DBP risk is considered possible and further investigation of the water quality is necessary. For example, the Canada Safe Drinking Water Act sets a standard of 0.08 mg/l for HAAs, and therefore, a TCl concentration level above which exceedance of this standard is very likely to happen can be set as TClDBP.
Regulation of TCloutlet
The online parameters monitored by sensors will be transmitted to TCl prediction module, TClpipe is the predicted TCl concentration in pipeline under current TCloutlet. Logically, risks can occur in different steps as the three diamond shapes shown in BARCS (Figure 1).
- (i)
If TCloutlet > TClDBP, an immediate check of the DBP risk by manual sampling and instrumental analysis can be performed. If TCloutlet < TClDBP, it proceeds to (ii).
- (ii)
If any of the controlling nodes in the DWDS is predicted to experience TClpipe < TClTB, an iterative predict-and-adjust procedure is initiated to determine a new level of TCloutlet, which we can easily control at the water treatment plant. In each iteration, TCloutlet is increased by a given interval ΔTCloutlet, and the TClpipe concentrations at all controlling nodes are predicted again. ΔTCloutlet in this study was set to 0.05 mg/l Cl. The iteration ends when the predicted TClpipe is no lower than TClTB. And BARCS proceeds to (iii). Or the iteration ends when TCloutlet > TClDBP, indicating that BARCS cannot provide a strategy for avoiding both bacterial and DBP risk and BARCS proceeds to (iv).
- (iii)
A second comparison between TClpipe and TClDBP is conducted. When TClpipe < TClDBP, BARCS provides a successful regulation strategy. If any node is predicted to have TClpipe > TClDBP, the risk cannot be eliminated through the regulation of BARCS. Then, Step (iv) is required.
- (iv)
In the cases mentioned before, the situation is caused whether by the limitation of the BARCS or a real outbreak event, timely alert, manual sampling and further laboratory analysis are needed to confirm the risk.
RESULTS AND DISCUSSION
Performance of the machine learning model
Detection and regulation of the dual risk
Based on the equations shown in Figure 7, the lower limit, TClTB, can be inferred under different management schemes defined by the combination of the total bacteria target and confidence level α. Table 1 shows an example of water plant 7, which has 90 samples in subdataset 2. In this case, no samples meet the scheme T0-99 (i.e., TB = 0 CFU/ml, confidence level α ≥0.99), while only 10 samples meet T0-95, indicating that TB = 0 is too strict to achieve. With α ≥0.99, 15 samples had potential risk per the target TB ≤ 20 CFU/ml, 3 samples per TB ≤ 50 CFU/ml, and no samples per TB ≤ 100 CFU/ml. With α ≥ 0.95, all 90 samples meet the targets except TB = 50.
Management scheme . | Scheme ID . | TClTB . | Number of no-risk samples (percentage) . |
---|---|---|---|
P = 0.95, TB = 0 CFU/ml | T0-95 | 0.82 mg/l Cl | 10 (11.1%) |
P = 0.95, TB ≤ 20 CFU/ml | T20-95 | 0.33 mg/l Cl | 90 (100%) |
P = 0.95, TB ≤ 50 CFU/ml | T50-95 | 0.23 mg/l Cl | 90 (100%) |
P = 0.95, TB ≤ 100 CFU/ml | T100-95 | 0.08 mg/l Cl | 90 (100%) |
P = 0.99, TB = 0 CFU/ml | T0-99 | 1.58 mg/l Cl | 0 (0%) |
P = 0.99, TB ≤ 20 CFU/ml | T20-99 | 0.59 mg/l Cl | 75 (83.3%) |
P = 0.99, TB ≤ 50 CFU/ml | T50-99 | 0.47 mg/l Cl | 87 (96.7%) |
P = 0.99, TB ≤ 100 CFU/ml | T100-99 | 0.25 mg/l Cl | 90 (100%) |
Management scheme . | Scheme ID . | TClTB . | Number of no-risk samples (percentage) . |
---|---|---|---|
P = 0.95, TB = 0 CFU/ml | T0-95 | 0.82 mg/l Cl | 10 (11.1%) |
P = 0.95, TB ≤ 20 CFU/ml | T20-95 | 0.33 mg/l Cl | 90 (100%) |
P = 0.95, TB ≤ 50 CFU/ml | T50-95 | 0.23 mg/l Cl | 90 (100%) |
P = 0.95, TB ≤ 100 CFU/ml | T100-95 | 0.08 mg/l Cl | 90 (100%) |
P = 0.99, TB = 0 CFU/ml | T0-99 | 1.58 mg/l Cl | 0 (0%) |
P = 0.99, TB ≤ 20 CFU/ml | T20-99 | 0.59 mg/l Cl | 75 (83.3%) |
P = 0.99, TB ≤ 50 CFU/ml | T50-99 | 0.47 mg/l Cl | 87 (96.7%) |
P = 0.99, TB ≤ 100 CFU/ml | T100-99 | 0.25 mg/l Cl | 90 (100%) |
The concentration of HAAs subdataset 4 ranges from nondetection to 0.030 mg/l, well below the standard of 0.08 mg/l per the Canada Safe Drinking Water Act. Thus, we are not able to identify a clear boundary between no risk and potential risk based on this subdataset. To demonstrate the implementation of BARCS, the maximum TCl concentration (1.19 mg/l Cl) in subdataset 4 was set as TClDBP, which may represent a very conservative assumption. For instance, some risky events per T0-95 and all risky events per T0-99 could not be eliminated by a proper TCl concentration. To meet the control standard T0-99, the TCl concentration should be at least 1.58 mg/l Cl, which is higher than TClDBP.
Robustness under data error conditions
Previous results are all based on monitoring data obtained through manual sampling and offline laboratory analysis, while BARCS by nature is designed as an EWS equipped with online sensors. Online monitoring data are less accurate and stable than data acquired by laboratory-based methods. To demonstrate the applicability of BARCS under online-monitoring conditions, a robustness analysis was conducted for the TCl prediction module, the core part of BARCS. The goal is to evaluate both the individual and combined effects of TCloutlet, pH, and turbidity on the prediction accuracy of TClpipe. Water temperature was excluded from the robustness analysis, as online temperature sensors are currently very accurate. According to a local technological specification for online monitoring of drinking water quality (T/WSJ D10-2020), ±10%, ±2%, and ±10% error ranges were set for the online TCloutlet, pH, and turbidity data, respectively. The original sample measurements were then degraded with errors randomly generated within the respective error ranges by assuming uniform distributions, leading to a hypothetical online-monitoring dataset. The data generation process was repeated 500 times. Each hypothetical online-monitoring dataset was fed to the TCl prediction module. The TCl concentrations in different nodes of the DWDS were predicted, and the R2 values were calculated to analyze the robustness of the TCl prediction module.
Discussion
The online real-time operational system developed in this study, BARCS, has a closed loop indicating monitoring-simulating-alarming-regulating (Figure 1). It has the potential to well service smart water management. The key information (e.g., pH, turbidity, and water temperature) required by BARCS can be collected in real-time using online sensors and wirelessly transferred to the system. The robustness analysis performed in this study shows that BARCS is compatible with in situ water quality monitoring, which involves measurement inaccuracy. The machine learning component in BARCS represents the intelligence to translate the real-time information into TCl concentration and water risks. With potential risks identified, the system can immediately warn the water plants and suggest TCl regulations through the Internet. Although direct online monitoring of free chlorine is possible, free chlorine sensors still face problems such as low service life, high maintenance cost, and low sensitivity (Qin et al. 2015; Janudin et al. 2022). The machine learning-based method proposed in this study is a better choice until there is a major breakthrough in free chlorine sensors.
BARCS uses machine learning in its prediction module, and it is very easy to iteratively retrain the AI model and continuously improve its accuracy with new data. Therefore, BARCS has great potential for real-world applications in the coming new era of environmental big data.
BARCS has a flexible framework that can be easily expanded to cope with changing conditions at different time scales, such as diurnal fluctuations, seasonal variations, and long-term trends. At a daily timescale, water flow varies significantly in accordance with the water demand pattern. Higher water flow implies a shorter residence time of pipeline water, thus decreasing the TCl decay ratio and inhibiting bacterial regrowth. Therefore, the predictive power of BARCS may be further improved by incorporating flow monitoring data when applicable. In areas where drinking water is sourced from surface water, the quality of the raw water may have significant seasonal patterns, in which case BARCS can be tailored to assimilate meteorological data. Furthermore, as the frequency and severity of floods and droughts tend to increase under changing climate conditions (Pachauri & Meyer 2014) and urbanization and industrialization continue to proceed quickly in developing countries, the raw water quantity is expected to show some long-term trends. In such cases, BARCS may be expanded to embrace socioeconomic big data, and an advanced AI model using deep learning (Jiang et al. 2020b) may be incorporated into the system.
The model can not only serve water treatment plants but can also be used by users. The water quality parameters can be monitored by sensors that are accessible to users, and the distance parameter we used is estimated and can be calculated without knowing the pipeline network. Specifically, the distance is estimated by half of the perimeter of the rectangle that the water treatment plant and sample point form. For example, if the locations of the water treatment plant and sample point are (x1, y1) and (x2, y2), the distance can be estimated as |x1–x2| + |y1–y2|. The estimated distance is proven to be helpful even when DWDS networking information is not available. Other parameters that BARCS required can be obtained with costless probs. This kind of design makes it possible for users to monitor the bacterial risk at end points on their own, which could be an ideal supplementary measure for being aware of risk.
The feedback mechanism can facilitate the unattended conditions. Combined with a programmable logic controller, pump, and flow sensor, it is expected that the feedback mechanism will be used for monitoring and automatic adjustment of bacterial risk in the pipeline network throughout the day. The system can help to keep the bacterial risk of water at any node in the DWDS in the safe range without human intervention.
To actualize intelligently safeguard DWDSs in future smart cities, further developments beyond BARCS are desired. For example, an optimally designed monitoring network can help both increase the accuracy of alerts and save monitoring costs (Jiang et al. 2020a). In the DWDS, the water quality is inhomogeneous with some end water points, high water demand points, and low water demand points. The selection of monitoring points has been proven to be critical for alert systems (Shi et al. 2018a). Another important direction for technology innovation is online monitoring of bacterial indicators. Although aimed at predicting and preventing bacteriological failure, this study relied on an indirect method, as in previous studies. Online monitoring of bacterial indicators is still challenging for various reasons, including practicality, cost, reliability, maintenance, and transparency of results (Feng et al. 2021). It is not yet possible to directly use a bacterial indicator for risk alerts. However, emerging techniques may help solve this problem. For instance, biological sensors that measure the activity of a naturally located enzyme in the periplasmic space of cells using a stable fluorescence-labeled enzyme substrate can provide results in minutes (https://bactiquant.com/). Pioneering studies using AI combined with spectrum characteristics to identify waterborne bacteria have emerged in recent years (Feng et al. 2021). Cross-validation between direct and indirect alert methods is also worth exploring in the future.
In addition, most studies on the microbial risk presented by drinking water focus on bacteria. However, viruses also cause a significant proportion of waterborne diseases (Ekundayo 2021). Many kinds of viruses have been detected in wastewater in the past decade (Farkas et al. 2020). The entire world is battling against the COVID-19 pandemic, a war that is unlikely to end soon and will have a long-lasting impact on human society (Scudellari 2020). SARS-CoV-2, the virus causing COVID-19, has already been detected in wastewater, sewage sludge, and rivers (Mallapaty 2020; Wang et al. 2020; Patel et al. 2021) and has the potential to contaminate drinking water sources, especially in areas lacking adequate sanitary infrastructure and treatment facilities. In this pandemic context, efforts to develop virus-targeting early warning systems for DWDSs are needed more than ever to mitigate the risk of drinking water contamination and protect public health.
CONCLUSIONS
In this study, we have proposed an intelligent EWS prototype for warning and controlling bacterial and DBP risks in urban DWDSs. BARCS is a closed-loop monitoring-prediction-warning-advising system that can help water utilities make appropriate decisions to meet predefined water safety requirements. BARCS makes data-driven predictions based on machine learning and considers the TCl concentration in drinking water as the key indicator for risk alarming and control. The applicability of BARCS in real-world settings has been demonstrated using monitoring data collected in a megacity in South China. The major findings in this study include the following. First, the SVR-based AI model in BARCS provides a reliable prediction of TCl concentration in DWDSs (with R2 values of approximately 0.68 and 0.64 for training and testing stages, similar to those of classic disinfectant decay models) and offers greater flexibility for BARCS to serve as an intelligent EWS in complex water supply settings. Second, TCl concentration has been proven to be a good indicator of bacterial risk in DWDSs, which shows the great potential of in situ surrogate monitoring for microbial contamination in drinking water systems. Third, the robustness analysis has demonstrated that with state-of-the-art water quality monitoring technologies, online implementation of BARCS in real-world settings is feasible because the data error involved in online monitoring would not significantly degrade the predictive accuracy of the AI model in BARCS.
BARCS represents a promising solution to the safe management of drinking water in future smart cities. The feedback mechanism especially facilitates the unattended conditions. Further developments are needed before the system can be implemented at the city scale. The relationships among TCl concentration, bacterial, and DBP risks need to be further refined with a larger dataset. A more sophisticated AI model (e.g., deep learning model) needs to be developed to cope with more diversified data sources and a larger volume of data. In addition, an advanced algorithm for optimizing the network of online-monitoring sensors is highly desired to make the solution cost-effective.
ACKNOWLEDGMENTS
This study was supported by the Shenzhen Science and Technology Innovation Commission (No. KQJSCX20180322152024270; No. KCXFZ202002011006491) and the National Natural Science Foundation of China (No. 51979136).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.
REFERENCES
Author notes
Co-first/equal authorship.