To enhance the quality of life and ensure sustainability in crowded cities, safe management of drinking water using cutting-edge technologies is a priority. This study developed an intelligent early warning system (EWS) for alarming and controlling risks from bacteria and disinfection byproducts in a drinking water distribution system (DWDS), named BARCS (Bacterial Risk Controlling System). BARCS adopts an artificial intelligence (AI) approach to data-driven prediction and considers total chlorine (TCl) concentration as the pivot indicator for risk identification and control. First, the machine learning-based AI model in BARCS can provide a reliable prediction of TCl concentration in a DWDS, with an average R2 of 0.64 for the validation set, while offering great flexibility for BARCS to adapt to various conditions. Second, TCl concentration was proven to be a good indicator of bacterial risk in a DWDS, as well as a cost-effective surrogate variable to assess disinfection byproduct risk. Third, the robustness analysis demonstrates that with state-of-the-art water quality monitoring technologies, online implementation of BARCS in real-world settings is feasible. Overall, BARCS represents a promising solution to the safe management of drinking water in future smart cities.

  • BARCS predicts and regulates bacterial risk in pipelines.

  • TCl prediction module harvests an average R2 of 0.64 for the validation set.

  • The relationship between TCl and CFU is quantified.

  • Online implementation of BARCS in real-world settings is feasible.

As reported by the United Nations, 55% of the world's population was urban as of 2018, and 68% is projected to be urban by 2050 (WUP 2018). It is crucial to enhance the quality of citizens’ lives and ensure sustainability in cities with technology developments, and the new concept of smart cities (Albino et al. 2015; Peng et al. 2017; Gascó-Hernandez 2018) serves this purpose nicely. Although no unanimous definition exists for smart cities, it is generally accepted that smart cities intelligently use information and communication technology within an interactive infrastructure to provide advanced and innovative services to citizens, impacting quality of life and sustainable management of natural resources (Shi et al. 2018b; Ismagilova et al. 2019; Jiang et al. 2019). Artificial intelligence (AI) is one of the cornerstones of smart cities. More and more machine learning methods are being applied to research in the field of environmental science. Machine learning methods are based on data-driven techniques that have the advantages of fast computing speed, high accuracy, and it does not require complex physical processes. In the past decade, the widely used and discussed machine learning techniques in environmental science include Random Forest, Convolutional Neural Networks, Long Short-Term Memory Artificial Neural Networks, and Support Vector Machines as well. Sagan et al. (2020) developed an Artificial Neural Networks model for water quality monitoring using remotely sensed data. Ballesté et al. (2020) use Random Deep Forest for source classification of environmental samples and source of contamination determination in unknown environmental samples. However, while urban traffic (Hu et al. 2022), energy (Liang et al. 2020), and even surface water pollution (Shi et al. 2018b) systems have been the focus of smart city development, the water supply system, another pillar system of modern cities, has received much less attention. A blueprint for water supply systems in future smart cities is yet to be drawn.

Improving drinking water safety ought to be a priority in smart city development, as water pollution has been a global threat to drinking water supply (Vörösmarty et al. 2010; Wang et al. 2021). Bacteria have been identified as the most common type of etiologic agent driving drinking water-associated outbreaks (Beer et al. 2015). For example, more than 60% of such outbreaks in the United States over the years have been related to bacteria (Hlavsa et al. 2011; Benedict et al. 2017). Chlorination using disinfectants such as chlorine, chloramine, and chlorine dioxide has been widely implemented to control bacteria in drinking water distribution systems (DWDSs) for decades (Wolfe et al. 1984; Aieta & Berg 1986), and adjusting the dose of disinfectant used during the water treatment process is a common strategy to inhibit bacterial regrowth in DWDSs. However, the disinfectants can react with natural organic matter to form hazardous disinfection byproducts (DBPs) (Wang et al. 2019). At present, more than 600 DBPs have been identified (Li & Mitch 2018), some of which, such as haloacetic acids (HAAs) and halomethane, are classified as carcinogenic, mutagenic, or toxic for reproduction. Therefore, drinking water disinfected by chlorination encounters the dual risks of bacteria and DBPs. An intelligent system for alarming the dual risks and consequently controlling them by adjusting the total chlorine (TCl) concentration level in DWDSs is highly desired, but significant barriers remain, including the practical difficulty in having a sufficient number of monitoring points, as well as the methodological deficiency in continuous and high-frequency monitoring of bacteria and DBPs in DWDSs.

In China's developed cities, for example, the total bacterial number is typically measured once a day at the water treatment plant, and once to twice a month at limited sampling locations in the pipeline, through manual sampling. The analyses of bacteria and DBPs are typically performed in laboratory facilities. Classic culture-based methods, such as heterotrophic plate count analysis and assimilable organic carbon analysis, require labor-intensive and time-consuming processes, with a typical wait time of several days. Molecular techniques such as quantitative real-time polymerase chain reaction analysis and next-generation sequencing have been applied by researchers to study microbial ecology in DWDSs (e.g., Goraj et al. 2020). These techniques are costly and require experienced operators, making them inapplicable in routine analysis by water utilities. In addition, they still need at least several hours to acquire data. Adenosine triphosphate and flow cytometry are emerging techniques for determining the bacterial activity of drinking water (Zhang et al. 2019). The Adenosine triphosphate method can be used to determine aquatic bacterial activity in minutes (Delahaye et al. 2003). However, this method is sensitive to all living cells, including bacteria as well as cells from fungi, plants, and animals. Moreover, assays for the adenosine triphosphate method require strict storage temperatures of −20 to −80 °C, a condition that is difficult to achieve for onsite and online analysis (ASTM D7463-18 2018). The flow cytometry method can provide bacterial cell concentrations within 15 min and is considered an optimally available technique to replace heterotrophic plate count as a routine assessment of water utility (Nevel et al. 2017). However, the flow cytometry method has a high cost, preventing it from wide installation in DWDSs. Similarly, the concentrations of DBPs must be determined by gas chromatography, gas chromatography‒mass spectrometry, and high-performance liquid chromatography after a complex pretreatment process. Thus, the existing approaches only enable aftermath assessment but do not risk alarming before a potential accident.

While the analyses of bacteria and DBPs are complex, other water quality parameters, such as chlorine, pH, water temperature, and turbidity, can be readily monitored using online instrumentation. Currently, monitoring networks with in situ sensors are commonly constructed in developed regions, producing a large amount of high-frequency online water quality data. This new data source enables surrogate monitoring which refers to using high-frequency in situ monitoring parameters to generate high-frequency estimates of other parameters that are not commonly measured in situ or measured with low frequency due to the constraints of the analysis procedure (Jones et al. 2011). For example, Shi et al. (2018b) performed surrogate monitoring on the surface water quality of the Potomac River, using high-frequency monitoring data (recorded at a fixed interval of 15–60 min and transmitted every hour) of water temperature, specific conductance, and turbidity to infer concentrations of total phosphorus via a regression approach. He et al. (2021) built the artificial neural network and logistic regression models to predict microcystins based on 10 regular water quality parameters. However, studies on surrogate monitoring for bacteria in DWDSs are still very rare. Bacterial regrowth in DWDSs depends on the physical, chemical, and environmental states of water as well as the operating conditions (Berry et al. 2006). TCl, assimilable organic carbon, temperature, pH, conductivity, turbidity, distance from the water plant (DIST), pipe properties, and biofilm flaking have all been mentioned in previous studies as predictors for bacterial regrowth (e.g., Zhang et al. 2016; Seo et al. 2019; Baek et al. 2020; Vavourakis et al. 2020; Wu et al. 2020) and therefore may be considered in surrogate monitoring of bacterial risk in DWDSs.

In this study, we proposed a closed-loop monitoring-prediction-warning-advising system for controlling the dual risk of bacteria and DBPs in urban DWDSs, named BARCS (Bacterial Risk Control System). BARCS adopts AI approaches to make data-driven predictions of bacterial risk levels based on water quality parameters that can be readily measured online through in situ water quality sensors, and informs the regulation of the TCl concentration at the water plant outlet (hereafter denoted as TCloutlet), a key variable for risk control. A case study was conducted in a megacity in South China to demonstrate the applicability of BARCS in real-world settings, as well as its potential to safeguard the drinking water supply in densely populated cities. The key questions addressed in this study include the following. (1) Can AI provide a reliable prediction of the bacterial risk in urban DWDSs based on regular water quality parameters? (2) Can state-of-the-art online water quality monitoring technologies enable the implementation of BARCS in real-world settings? Overall, this study presents a promising solution to the safe management of drinking water in future smart cities.

Framework of BARCS

BARCS (Figure 1) is an intelligent early warning system (EWS). Its core idea is to regulate TCloutlet to guarantee the safety of drinking water with regard to both bacteria and DBPs in downstream pipelines. Implementation of BARCS requires both hardware for monitoring and software for detecting and responding to potential risks. The monitoring hardware ideally consists of online sensors with data acquisition modules (represented by the red fans in Figure 1), which can remotely provide high-frequency data on ambient factors (e.g., temperature, light, and precipitation), water quality parameters, and operational conditions regulated by the water utility, such as TCloutlet, water supply pressure, DIST, and water source switch. The system software processes the online monitoring data and runs the TCl prediction module, the bacterial risk identification module, and the DBP risk assessment module in sequence, and advises regulation of TCloutlet or manual sampling and further laboratory analysis when needed. The TCl prediction module maps ambient factors, water quality parameters, and operational conditions into TCl concentrations in pipelines (TClpipe). The bacterial risk identification module quantifies the relationship between the bacterial level and the TCl concentrations, based on which BARCS can infer a lower limit of TCl (TClTB) required by the DWDS to avoid bacteriological failure. A bacterial risk is detected when TClpipe is predicted to be lower than TClTB, and an iterative calculation is then activated to suggest an enhanced level of TCloutlet. The DBP risk assessment module determines a threshold value, TClDBP, for avoiding overregulation. In general, DBP formation increases with chlorine dosage (Summers et al. 1996). If the measured TCloutlet or the predicted TClpipe exceeds TClDBP, BARCS warns of potential DBP risk and suggests additional actions. For example, manual sampling and further laboratory analysis need to be performed to confirm the risk, and pretreatment on or switching to a different water source (e.g., with less organic matter) may also be necessary if the risk is confirmed.
Figure 1

General framework of BARCS. TCl and DBPs in the framework represent total chlorine and disinfection byproducts, respectively. The three modules highlighted with blue background obtain real-time data from cloud serve, which could help the system to update the thresholds. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Figure 1

General framework of BARCS. TCl and DBPs in the framework represent total chlorine and disinfection byproducts, respectively. The three modules highlighted with blue background obtain real-time data from cloud serve, which could help the system to update the thresholds. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Close modal

Study area and data

To examine the applicability of BARCS, a case study was conducted in a highly developed city located in South China that hosts a permanent population of over 13 million. In 2019, the total public water supply to the city amounted to 2.16 billion m3, and the domestic water consumption was 0.79 billion m3 based on the local water resource bulletin. The city's water supply is largely sourced from outside by long-distance water transfer, and the potential risk of water contamination is significant. Additionally, the study area has a subtropical monsoon climate with high temperatures, high humidity, and intense insolation. Thus, the quality of the source water is unstable, presenting a major challenge for drinking water security. The DWDS (Figure 2) investigated in this study serves four districts of the city and contributes to approximately 33 and 36% of the city's public water supply and domestic water consumption, respectively. Seven water plants feed water into this DWDS, and the pipelines within the DWDS have a total length of 1,558 km.
Figure 2

Schematic representation of the drinking water distribution system (DWDS) investigated in this study. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Figure 2

Schematic representation of the drinking water distribution system (DWDS) investigated in this study. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Close modal

Based on routine manual sampling and laboratory analysis by the local water supply utility between 2014 and 2016, we compiled a dataset of 4,382 water samples at 280 locations (see Figure 2) within the DWDS. A total of 197 water quality parameters were measured (not for all the samples), including water temperature (Twater), turbidity, pH, total bacteria (TB) in colony-forming units (CFUs), TCl concentration, nitrate (), ammonia (), and many others. All the parameters were analyzed in accordance with the Chinese national standard GB/T 5750. Four subdatasets were further retrieved from the original dataset for different study purposes. Subdataset 1 is for identifying key input variables of the TCl prediction module, which has 199 samples with measurements of , , chemical oxygen demand (COD), chloride, TOC, TCl, TCloutlet of the respective water plant, pH, Water temperature, and turbidity. Subdataset 2 is for building the TCl prediction module, which contains 1,120 samples with measurements of pH, turbidity, TCl, TCloutlet of the respective water plant, and water temperature. Subdataset 3 contains 4,382 data pairs of total bacteria level and TCl concentration, which was used to assess bacterial risk based on TCl concentration (i.e., the bacterial risk identification module in Figure 2). Only 3 out of the 4,382 samples had a higher total bacteria level than required by the Chinese national standard (≤100 CFU/ml), and most of them had a zero total bacteria value. Subdataset 4 has 241 data pairs of HAAs and TCl, which was used to assess the risk of DBP HAAs being good indicators of chlorination byproducts (WHO 2011), and adequate data of HAAs are available in this case study. Supplementary Table S1 summarizes the statistics of the four subdatasets.

Two additional variables, water plant ID (1–7) and DIST, are also considered in this study. Different water plants may have varying conditions (e.g., materials, service length, and biofilm flaking) that impact the water quality in the DWDS. Prior to the model training, one-hot encoding (Zhou 2021) was applied to transform the original water plant ID, say i, into a 7-element vector with the ith element being one and other elements being zero. DIST is considered as a predictor because the TCl concentration generally decays along the pipeline. Because the exact geometry information of the pipeline network is not available to the public for the security reasons and only the sampling nodes and water plant locations are known, this study did not use pipeline length to parameterize DIST. Instead, DIST was parameterized as |x1x2| + |y1y2|, where (x1, y1) and (x2, y2) represent the locations of a water plant and a node in the pipeline downstream of the plant, respectively. Figure 3 plots the histograms of all the input and output variables except water plant ID of the TCl prediction model per subdataset 2.
Figure 3

Histogram of all the input and output variables except water plant ID of the TCl prediction model.

Figure 3

Histogram of all the input and output variables except water plant ID of the TCl prediction model.

Close modal

Implementation of BARCS

Predicting TCl concentrations using machine learning

Machine learning is applied to predict the TCl concentration at the controlling nodes of the DWDS (Figure 4). BARCS in this case study adopts support vector regression (SVR) as the machine learning technique, which is an extension of the classic support vector machine (SVM) proposed by Vapnik (1998). The SVR can be described as follows:
(1)
(2)
where xi is the input variable, yi is the observed value, and w is an unknown vector to solve the primal problem. The constant C > 0 controls the tradeoff between misclassification of training examples and simplicity of the decision surface (Smola & Schölkopf 2004). A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly. SVR has many advantages, including its effectiveness in high-dimensional spaces, and for limited samples, its fast training and its flexibility (Dodangeh et al. 2020). In this study, all the input variables were normalized with the respective mean and standard deviation. To capture nonlinearity, a kernel function can be applied to map the input variable xi from its original space X to a new space F. In this study, the radial basis function (RBF) kernel (Scholkopf et al. 1997) was adopted, as it encounters fewer numerical difficulties in solving nonlinear regression problems (Lin & Yeh 2009).
Figure 4

Technical route of the SVR method employed in this study, and the way it incorporates with online parameters for real-time prediction.

Figure 4

Technical route of the SVR method employed in this study, and the way it incorporates with online parameters for real-time prediction.

Close modal
Mutual information theory was applied to subdataset 1 to identify relevant predictors from the candidate variables (i.e., , , CODcr, chloride, TOC, TCloutlet, pH, water temperature, turbidity, and water plant ID and DIST). Mutual information theory has been widely used in hydrological, climatic, biogeochemical, and other research fields to characterize complex datasets (e.g., Barzegar et al. 2018). In this approach, the marginal entropy of a random variable V, denoted as H(V), is calculated as follows:
(3)
where k is the number of groups of discrete data, represents the values of discrete variable V in group j, j = 1,2,3, …, k, and p(vi) denotes the discrete probability of occurrence. The discrete data interval was determined by Scott's choice method (Scott 1979), which asymptotically minimizes the integrated mean squared error. The joint or related entropy of two variables, RE(V, V*), is the total information contained in both variables. It is calculated as follows:
(4)
The mutual entropy of two variables, ME(V, V*), is a measure of dependency between two random variables V and V*. It is calculated as follows:
(5)
In general, both RE(V, V*) and ME(V, V*) measure the similarity between the variables V and V*. However, a standardized index provides a better way to assess the dependence of two random variables. For this purpose, the information transfer index ITI is calculated as follows:
(6)

In this study, the ITI values between the TCl concentration and the candidate input variables were calculated. A larger value of ITI indicates a higher power of the variable in predicting the TCl concentration. Eventually, based on ITI values, plant ID, pH, turbidity, TCloutlet, DIST, and water temperature were selected as the predictors (see Section 3.1 for details), and subdataset 2 was then used to further train and validate the SVR model for the TCl prediction module.

A 5-fold cross-validation was conducted on subdataset 2, with a data segmentation of 4:1 (training:validation). To test the generalization of the model, seven transfer learning experiments were also conducted. In each transfer learning experiment, the data associated with six of the seven water plants were used for training, while the data associated with the left plant were reserved for validation. We used an open-source machine learning library, scikit-learn, from Python to implement SVR modeling with an RBF kernel (https://scikit-learn.org/stable/modules/svm.html#svm-regression). ‘Grid-search’ was used to find the optimal parameters for the SVR model, and the final settings were kernel = rbf, γ = 0.1, C = 0.1, and = 0.1.

The validated SVR model was used to predict the TCl concentration at the 337 sampling locations (see Figure 2). These locations are controlling nodes of the DWDS, which were determined by the local water utility. It is assumed that if the bacterial risk at these locations is well controlled, the entire DWDS would be secured with confidence. These locations are also where online sensors, such as the hardware portion of BARCS, can be equipped. Note that determining the controlling nodes in a DWDS is a prerequisite for the implementation of BARCS and needs separate work. Some advanced approaches to the optimal design of a monitoring network can be found in the literature (Jiang et al. 2020a).

Bacterial risk assessment

To avoid bacterial risk, the lower limit of the TCl concentration, TClTB, for the target DWDS needs to be determined by the bacterial risk identification module in BARCS (Figure 1). In this case study, we took the following steps to build the risk identification module based on subdataset 3:

  • (i)

    The 4,382 samples were split into groups of TCl concentration level with a fixed interval, ΔTCl = 0.05 mg/l Cl, determined by Scott's choice method (Scott 1979). Supplementary method 1 in the Supplementary material provides details about Scott's choice method.

  • (ii)

    The ratio of samples not exceeding a given control level of total bacteria was calculated for each group, defined as the nonexceedance probability of total bacteria (denoted as P).

  • (iii)

    For different control levels of total bacteria, regression models were built between P and the maximum TCl concentration in each group. This study considered four control levels, including 0 CFU/ml (i.e., nondetection), 20 CFU/ml (the local standard for direct drinking water), 50 CFU/ml (the local standard for the outlet water of water plants), and 100 CFU/ml (the Chinese national standard for the outlet water of water plants).

  • (iv)

    Given a control target (e.g., TB ≤ 20 CFU/ml) and an acceptable confidence level α (e.g., α ≥ 0.95), the lower limit of TCl concentration to avoid bacterial risk can be determined although the corresponding regression model by setting P equal to α.

It is worth emphasizing that BARCS is a general framework and other approaches to building the risk identification module, depending on the actual data condition, can be readily accommodated by BARCS as well.

DBP risk assessment

When the data condition allows, data-driven prediction of DBP species can be performed (Sadiq & Rodriguez 2011) in BARCS to assess DBP risk. However, monitoring data of DBPs in DWDSs are usually insufficient for making such predictions. As DBPs form through the reaction between Natural Organic Matter (NOM) and chlorine-containing disinfectants, a higher TCl concentration is generally associated with a more significant risk of DBPs. Thus, an alternative approach that BARCS can adopt is to identify an upper limit of the TCl concentration, denoted TClDBP, above which the DBP risk is considered possible and further investigation of the water quality is necessary. For example, the Canada Safe Drinking Water Act sets a standard of 0.08 mg/l for HAAs, and therefore, a TCl concentration level above which exceedance of this standard is very likely to happen can be set as TClDBP.

Regulation of TCloutlet

The online parameters monitored by sensors will be transmitted to TCl prediction module, TClpipe is the predicted TCl concentration in pipeline under current TCloutlet. Logically, risks can occur in different steps as the three diamond shapes shown in BARCS (Figure 1).

  • (i)

    If TCloutlet > TClDBP, an immediate check of the DBP risk by manual sampling and instrumental analysis can be performed. If TCloutlet < TClDBP, it proceeds to (ii).

  • (ii)

    If any of the controlling nodes in the DWDS is predicted to experience TClpipe < TClTB, an iterative predict-and-adjust procedure is initiated to determine a new level of TCloutlet, which we can easily control at the water treatment plant. In each iteration, TCloutlet is increased by a given interval ΔTCloutlet, and the TClpipe concentrations at all controlling nodes are predicted again. ΔTCloutlet in this study was set to 0.05 mg/l Cl. The iteration ends when the predicted TClpipe is no lower than TClTB. And BARCS proceeds to (iii). Or the iteration ends when TCloutlet > TClDBP, indicating that BARCS cannot provide a strategy for avoiding both bacterial and DBP risk and BARCS proceeds to (iv).

  • (iii)

    A second comparison between TClpipe and TClDBP is conducted. When TClpipe < TClDBP, BARCS provides a successful regulation strategy. If any node is predicted to have TClpipe > TClDBP, the risk cannot be eliminated through the regulation of BARCS. Then, Step (iv) is required.

  • (iv)

    In the cases mentioned before, the situation is caused whether by the limitation of the BARCS or a real outbreak event, timely alert, manual sampling and further laboratory analysis are needed to confirm the risk.

Performance of the machine learning model

As Figure 5 shows, , , CODcr, chloride, and TOC have relatively low predictive power, with ITI ranging between 18.8 and 29.2, while pH, turbidity, TCloutlet, DIST, and water temperature have significantly higher ITI values (ranging from 32.9 to 52.8). Coincidentally, online monitoring is also less practiced for those parameters with lower ITI values due to technology limitations. Thus, DIST, TCloutlet, pH, water temperature, and log-transformed turbidity are included as the input variables in the final SVR model of TCl concentration, plus water plant ID.
Figure 5

Information transfer index (ITI) calculated for the different input variables of the machine learning model. The turbidity data were log-transformed before the calculation. A larger ITI value indicates a higher power of the variable in predicting TCl concentration. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Figure 5

Information transfer index (ITI) calculated for the different input variables of the machine learning model. The turbidity data were log-transformed before the calculation. A larger ITI value indicates a higher power of the variable in predicting TCl concentration. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Close modal
Figure 6(a) and 6(b) illustrates the results of 5-fold cross-validation in the training (R2 = 0.68) and validation (R2 = 0.64) stages, respectively. Figure 6(c) further demonstrates the model performance in transfer learning. The average values of R2 for training and validation are 0.75 (ranging from 0.64 to 0.83 in the seven experiments) and 0.62 (from 0.51 to 0.88), respectively. Figure 5 indicates that the SVR model can appropriately and robustly predict the TCl concentration throughout the DWDS. The performance is comparable to or even better than that of classic disinfectant decay models. For example, in a previous study (Monteiro et al. 2014), the RMSEs of a chlorine decay model were 0.035 (mg/l Cl) and 0.019 (mg/l Cl) at two prediction locations, while the overall RMSE was only 0.014 (mg/l Cl) in this study. BARCS has comparable performance in contrast to other machine learning methods as well. Onyutha (2022) used water temperature (°C), pH, conductivity (μS), and sampling time as input parameters to achieve an R2 of 0.48 on the validation set using an adaptive network-based fuzzy inference system (ANFIS). More importantly, the machine learning-based approach offers great flexibility to BARCS when complete and accurate information of the pipeline network is unavailable.
Figure 6

Performance of the final machine learning model for total chlorine prediction. (a) The 5-fold cross-validation result on the training set. (b) The 5-fold cross-validation result on the validation set. (c) The cross-validation results for different plants. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Figure 6

Performance of the final machine learning model for total chlorine prediction. (a) The 5-fold cross-validation result on the training set. (b) The 5-fold cross-validation result on the validation set. (c) The cross-validation results for different plants. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Close modal

Detection and regulation of the dual risk

In this study, the Monod equation, which is widely used to describe cell growth (e.g., Xu 2020), was found to be a suitable function form to characterize the relationship between the total bacteria nonexceedance probability (P) and the TCl concentration. Figure 7 illustrates the fitted curves and the respective equations for different total bacteria targets, revealing the tendency previously reported in the literature: the residual chlorine concentration in pipeline water is inversely related to the level of bacteria (Francisque et al. 2009). The fitted curves exhibit turning points at different bacterial levels. When the TCl concentration is lower than the turning point, the nonexceedance decreases sharply. The turning points are roughly within the range of 0.2–0.5 mg/l Cl, which is consistent with findings in other studies. For example, LeChevallier et al. (1990) indicated that systems with less than 0.5 mg/l Cl monochloramine in dead-end sections exhibit higher coliform occurrence risks than systems with higher monochloramine levels. Bai et al. (2015) suggested that the growth and regrowth of Heterotrophic Plate Count (HPC) will become uncontrolled when the chloramine value decreases below 0.45–0.5 mg/l Cl and 0.2–0.25 mg/l Cl in pipeline water and tap water, respectively.
Figure 7

Regression curves (dashed lines) for the relationship between the lower TCl limit and total bacteria nonexceedance probability with regard to different control targets. The dots indicate total bacteria nonexceedance probabilities calculated based on the sampling data. The gray zone indicates the range of turning points. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Figure 7

Regression curves (dashed lines) for the relationship between the lower TCl limit and total bacteria nonexceedance probability with regard to different control targets. The dots indicate total bacteria nonexceedance probabilities calculated based on the sampling data. The gray zone indicates the range of turning points. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Close modal

Based on the equations shown in Figure 7, the lower limit, TClTB, can be inferred under different management schemes defined by the combination of the total bacteria target and confidence level α. Table 1 shows an example of water plant 7, which has 90 samples in subdataset 2. In this case, no samples meet the scheme T0-99 (i.e., TB = 0 CFU/ml, confidence level α ≥0.99), while only 10 samples meet T0-95, indicating that TB = 0 is too strict to achieve. With α ≥0.99, 15 samples had potential risk per the target TB ≤ 20 CFU/ml, 3 samples per TB ≤ 50 CFU/ml, and no samples per TB ≤ 100 CFU/ml. With α ≥ 0.95, all 90 samples meet the targets except TB = 50.

Table 1

Lower TCl limits derived in different management schemes and the number of no-risk samples among the 90 samples of plant 7

Management schemeScheme IDTClTBNumber of no-risk samples (percentage)
P = 0.95, TB = 0 CFU/ml T0-95 0.82 mg/l Cl 10 (11.1%) 
P = 0.95, TB ≤ 20 CFU/ml T20-95 0.33 mg/l Cl 90 (100%) 
P = 0.95, TB ≤ 50 CFU/ml T50-95 0.23 mg/l Cl 90 (100%) 
P = 0.95, TB ≤ 100 CFU/ml T100-95 0.08 mg/l Cl 90 (100%) 
P = 0.99, TB = 0 CFU/ml T0-99 1.58 mg/l Cl 0 (0%) 
P = 0.99, TB ≤ 20 CFU/ml T20-99 0.59 mg/l Cl 75 (83.3%) 
P = 0.99, TB ≤ 50 CFU/ml T50-99 0.47 mg/l Cl 87 (96.7%) 
P = 0.99, TB ≤ 100 CFU/ml T100-99 0.25 mg/l Cl 90 (100%) 
Management schemeScheme IDTClTBNumber of no-risk samples (percentage)
P = 0.95, TB = 0 CFU/ml T0-95 0.82 mg/l Cl 10 (11.1%) 
P = 0.95, TB ≤ 20 CFU/ml T20-95 0.33 mg/l Cl 90 (100%) 
P = 0.95, TB ≤ 50 CFU/ml T50-95 0.23 mg/l Cl 90 (100%) 
P = 0.95, TB ≤ 100 CFU/ml T100-95 0.08 mg/l Cl 90 (100%) 
P = 0.99, TB = 0 CFU/ml T0-99 1.58 mg/l Cl 0 (0%) 
P = 0.99, TB ≤ 20 CFU/ml T20-99 0.59 mg/l Cl 75 (83.3%) 
P = 0.99, TB ≤ 50 CFU/ml T50-99 0.47 mg/l Cl 87 (96.7%) 
P = 0.99, TB ≤ 100 CFU/ml T100-99 0.25 mg/l Cl 90 (100%) 

When a bacterial risk is identified, BARCS suggests an increase in the TCloutlet level through the feedback loop (Figure 1). Figure 8 presents an example of regulating the TCloutlet level of water plant 7. Fourteen samples from three sampling locations (Figure 7(b) have risk detected on different days per T20-99 (TClTB = 0.59 mg/l Cl). If the regulations were implemented as suggested by BARCS, the mean of TCloutlet would be increased from 0.71 to 1.03 mg/l Cl, and TClpipe at the three locations would exceed 0.59 mg/l Cl in all of the 14 cases per the model.
Figure 8

Predicted chlorine of the nodes with bacterial risk before and after regulation of outlet total chlorine on plant 7 per T20-99. The red dashed line is TClTB = 0.59 for T20-99. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Figure 8

Predicted chlorine of the nodes with bacterial risk before and after regulation of outlet total chlorine on plant 7 per T20-99. The red dashed line is TClTB = 0.59 for T20-99. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Close modal

The concentration of HAAs subdataset 4 ranges from nondetection to 0.030 mg/l, well below the standard of 0.08 mg/l per the Canada Safe Drinking Water Act. Thus, we are not able to identify a clear boundary between no risk and potential risk based on this subdataset. To demonstrate the implementation of BARCS, the maximum TCl concentration (1.19 mg/l Cl) in subdataset 4 was set as TClDBP, which may represent a very conservative assumption. For instance, some risky events per T0-95 and all risky events per T0-99 could not be eliminated by a proper TCl concentration. To meet the control standard T0-99, the TCl concentration should be at least 1.58 mg/l Cl, which is higher than TClDBP.

Robustness under data error conditions

Previous results are all based on monitoring data obtained through manual sampling and offline laboratory analysis, while BARCS by nature is designed as an EWS equipped with online sensors. Online monitoring data are less accurate and stable than data acquired by laboratory-based methods. To demonstrate the applicability of BARCS under online-monitoring conditions, a robustness analysis was conducted for the TCl prediction module, the core part of BARCS. The goal is to evaluate both the individual and combined effects of TCloutlet, pH, and turbidity on the prediction accuracy of TClpipe. Water temperature was excluded from the robustness analysis, as online temperature sensors are currently very accurate. According to a local technological specification for online monitoring of drinking water quality (T/WSJ D10-2020), ±10%, ±2%, and ±10% error ranges were set for the online TCloutlet, pH, and turbidity data, respectively. The original sample measurements were then degraded with errors randomly generated within the respective error ranges by assuming uniform distributions, leading to a hypothetical online-monitoring dataset. The data generation process was repeated 500 times. Each hypothetical online-monitoring dataset was fed to the TCl prediction module. The TCl concentrations in different nodes of the DWDS were predicted, and the R2 values were calculated to analyze the robustness of the TCl prediction module.

Figure 9 shows the results of the robustness analysis using water plant 7 as the example again. In the single-parameter case, the data error of TCloutlet has the most significant influence on the prediction of TClpipe, with the median of R2 equal to 0.50 (the R2 value in the offline monitoring case is 0.56, as indicated by the red dashed line in Figure 9), while the error of turbidity has the least effect, with the median R2 equal to 0.56. Dual or triple error further degrades the prediction performance, but the SVR model still has adequate predictive power. Supplementary Figures S1–S6 provide the results of the robustness analysis for the other six water plants. Note that water plant 7 was chosen as the most difficult case for transfer learning because it is topologically isolated from other plants in the DWDS (Figure 2). In fact, the SVR model has better performance in transfer learning for some of the other six plants. The robustness analysis proves that the TCl prediction module in this study is feasible for online application as long as routine maintenance is well organized for online measurement probes.
Figure 9

R2 values between observed TCl and predicted TCl concentrations under different hypothetical online-monitoring scenarios. The labels on the y-axis indicate the parameters monitored by online devices in the respective hypothetical scenarios. The red dashed line indicates the R2 value for TCl concentrations predicted with the original offline monitoring data. Box plots are standard Tukey plots, with the center orange lines being the medians, the lower and upper box edges being the first and third quartiles, respectively, and the whiskers indicating 1.5 times the interquartile range. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Figure 9

R2 values between observed TCl and predicted TCl concentrations under different hypothetical online-monitoring scenarios. The labels on the y-axis indicate the parameters monitored by online devices in the respective hypothetical scenarios. The red dashed line indicates the R2 value for TCl concentrations predicted with the original offline monitoring data. Box plots are standard Tukey plots, with the center orange lines being the medians, the lower and upper box edges being the first and third quartiles, respectively, and the whiskers indicating 1.5 times the interquartile range. Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/aqua.2023.007.

Close modal

Discussion

The online real-time operational system developed in this study, BARCS, has a closed loop indicating monitoring-simulating-alarming-regulating (Figure 1). It has the potential to well service smart water management. The key information (e.g., pH, turbidity, and water temperature) required by BARCS can be collected in real-time using online sensors and wirelessly transferred to the system. The robustness analysis performed in this study shows that BARCS is compatible with in situ water quality monitoring, which involves measurement inaccuracy. The machine learning component in BARCS represents the intelligence to translate the real-time information into TCl concentration and water risks. With potential risks identified, the system can immediately warn the water plants and suggest TCl regulations through the Internet. Although direct online monitoring of free chlorine is possible, free chlorine sensors still face problems such as low service life, high maintenance cost, and low sensitivity (Qin et al. 2015; Janudin et al. 2022). The machine learning-based method proposed in this study is a better choice until there is a major breakthrough in free chlorine sensors.

BARCS uses machine learning in its prediction module, and it is very easy to iteratively retrain the AI model and continuously improve its accuracy with new data. Therefore, BARCS has great potential for real-world applications in the coming new era of environmental big data.

BARCS has a flexible framework that can be easily expanded to cope with changing conditions at different time scales, such as diurnal fluctuations, seasonal variations, and long-term trends. At a daily timescale, water flow varies significantly in accordance with the water demand pattern. Higher water flow implies a shorter residence time of pipeline water, thus decreasing the TCl decay ratio and inhibiting bacterial regrowth. Therefore, the predictive power of BARCS may be further improved by incorporating flow monitoring data when applicable. In areas where drinking water is sourced from surface water, the quality of the raw water may have significant seasonal patterns, in which case BARCS can be tailored to assimilate meteorological data. Furthermore, as the frequency and severity of floods and droughts tend to increase under changing climate conditions (Pachauri & Meyer 2014) and urbanization and industrialization continue to proceed quickly in developing countries, the raw water quantity is expected to show some long-term trends. In such cases, BARCS may be expanded to embrace socioeconomic big data, and an advanced AI model using deep learning (Jiang et al. 2020b) may be incorporated into the system.

The model can not only serve water treatment plants but can also be used by users. The water quality parameters can be monitored by sensors that are accessible to users, and the distance parameter we used is estimated and can be calculated without knowing the pipeline network. Specifically, the distance is estimated by half of the perimeter of the rectangle that the water treatment plant and sample point form. For example, if the locations of the water treatment plant and sample point are (x1, y1) and (x2, y2), the distance can be estimated as |x1x2| + |y1y2|. The estimated distance is proven to be helpful even when DWDS networking information is not available. Other parameters that BARCS required can be obtained with costless probs. This kind of design makes it possible for users to monitor the bacterial risk at end points on their own, which could be an ideal supplementary measure for being aware of risk.

The feedback mechanism can facilitate the unattended conditions. Combined with a programmable logic controller, pump, and flow sensor, it is expected that the feedback mechanism will be used for monitoring and automatic adjustment of bacterial risk in the pipeline network throughout the day. The system can help to keep the bacterial risk of water at any node in the DWDS in the safe range without human intervention.

To actualize intelligently safeguard DWDSs in future smart cities, further developments beyond BARCS are desired. For example, an optimally designed monitoring network can help both increase the accuracy of alerts and save monitoring costs (Jiang et al. 2020a). In the DWDS, the water quality is inhomogeneous with some end water points, high water demand points, and low water demand points. The selection of monitoring points has been proven to be critical for alert systems (Shi et al. 2018a). Another important direction for technology innovation is online monitoring of bacterial indicators. Although aimed at predicting and preventing bacteriological failure, this study relied on an indirect method, as in previous studies. Online monitoring of bacterial indicators is still challenging for various reasons, including practicality, cost, reliability, maintenance, and transparency of results (Feng et al. 2021). It is not yet possible to directly use a bacterial indicator for risk alerts. However, emerging techniques may help solve this problem. For instance, biological sensors that measure the activity of a naturally located enzyme in the periplasmic space of cells using a stable fluorescence-labeled enzyme substrate can provide results in minutes (https://bactiquant.com/). Pioneering studies using AI combined with spectrum characteristics to identify waterborne bacteria have emerged in recent years (Feng et al. 2021). Cross-validation between direct and indirect alert methods is also worth exploring in the future.

In addition, most studies on the microbial risk presented by drinking water focus on bacteria. However, viruses also cause a significant proportion of waterborne diseases (Ekundayo 2021). Many kinds of viruses have been detected in wastewater in the past decade (Farkas et al. 2020). The entire world is battling against the COVID-19 pandemic, a war that is unlikely to end soon and will have a long-lasting impact on human society (Scudellari 2020). SARS-CoV-2, the virus causing COVID-19, has already been detected in wastewater, sewage sludge, and rivers (Mallapaty 2020; Wang et al. 2020; Patel et al. 2021) and has the potential to contaminate drinking water sources, especially in areas lacking adequate sanitary infrastructure and treatment facilities. In this pandemic context, efforts to develop virus-targeting early warning systems for DWDSs are needed more than ever to mitigate the risk of drinking water contamination and protect public health.

In this study, we have proposed an intelligent EWS prototype for warning and controlling bacterial and DBP risks in urban DWDSs. BARCS is a closed-loop monitoring-prediction-warning-advising system that can help water utilities make appropriate decisions to meet predefined water safety requirements. BARCS makes data-driven predictions based on machine learning and considers the TCl concentration in drinking water as the key indicator for risk alarming and control. The applicability of BARCS in real-world settings has been demonstrated using monitoring data collected in a megacity in South China. The major findings in this study include the following. First, the SVR-based AI model in BARCS provides a reliable prediction of TCl concentration in DWDSs (with R2 values of approximately 0.68 and 0.64 for training and testing stages, similar to those of classic disinfectant decay models) and offers greater flexibility for BARCS to serve as an intelligent EWS in complex water supply settings. Second, TCl concentration has been proven to be a good indicator of bacterial risk in DWDSs, which shows the great potential of in situ surrogate monitoring for microbial contamination in drinking water systems. Third, the robustness analysis has demonstrated that with state-of-the-art water quality monitoring technologies, online implementation of BARCS in real-world settings is feasible because the data error involved in online monitoring would not significantly degrade the predictive accuracy of the AI model in BARCS.

BARCS represents a promising solution to the safe management of drinking water in future smart cities. The feedback mechanism especially facilitates the unattended conditions. Further developments are needed before the system can be implemented at the city scale. The relationships among TCl concentration, bacterial, and DBP risks need to be further refined with a larger dataset. A more sophisticated AI model (e.g., deep learning model) needs to be developed to cope with more diversified data sources and a larger volume of data. In addition, an advanced algorithm for optimizing the network of online-monitoring sensors is highly desired to make the solution cost-effective.

This study was supported by the Shenzhen Science and Technology Innovation Commission (No. KQJSCX20180322152024270; No. KCXFZ202002011006491) and the National Natural Science Foundation of China (No. 51979136).

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Aieta
E. M.
&
Berg
J. D.
1986
A review of chlorine dioxide in drinking water treatment
.
Journal – American Water Works Association
78
(
6
),
62
72
.
https://doi.org/10.1002/j.1551-8833.1986.tb05766.x
.
Albino
V.
,
Berardi
U.
&
Dangelico
R. M.
2015
Smart cities: definitions, dimensions, performance, and initiatives
.
Journal of Urban Technology
22
,
3
21
.
https://doi.org/10.1080/10630732.2014.942092
.
ASTM D7463-18
.
2018
Standard Test Method for Adenosine Triphosphate (ATP) Content of Microorganisms in Fuel, Fuel/Water Mixtures and Fuel Associated Water
.
ASTM International
,
West Conshohocken, PA
.
Bai
X.
,
Zhi
X.
,
Zhu
H.
,
Meng
M.
&
Zhang
M.
2015
Real-time ArcGIS and heterotrophic plate count based chloramine disinfectant control in water distribution system
.
Water Research
68
,
812
820
.
https://doi.org/10.1016/j.watres.2014.10.041
.
Ballesté
E.
,
Belanche-Muñoz
L. A.
,
Farnleitner
A. H.
,
Linke
R.
,
Sommer
R.
,
Santos
R.
,
Monteiro
S.
,
Maunula
L.
,
Oristo
S.
,
Tiehm
A, A.
,
Stange
C.
&
Blanch
A. R.
2020
Improving the identification of the source of faecal pollution in water using a modelling approach: from multi-source to aged and diluted samples
.
Water Research
171
,
115392
.
Barzegar
R.
,
Moghaddam
A. A.
,
Deo
R.
,
Fijani
E.
&
Tziritis
E.
2018
Mapping groundwater contamination risk of multiple aquifers using multi-model ensemble of machine learning algorithms
.
Science of the Total Environment
621
,
697
712
.
https://doi.org/10.1016/j.scitotenv.2017.11.185
.
Beer
K. D.
,
Gargano
J. W.
,
Roberts
V. A.
,
Hill
V. R.
,
Garrison
L. E.
,
Kutty
P. K.
,
Hilborn
E. D.
,
Wade
T. J.
,
Fullerton
K. E.
&
Yoder
J. S.
2015
Surveillance for waterborne disease outbreaks associated with drinking water – United States, 2011-2012
.
Morbidity and Mortality Weekly Report
64
,
842
848
.
https://doi.org/10.15585/mmwr.mm6431a2
.
Benedict
K. M.
,
Reses
H.
,
Vigar
M.
,
Roth
D. M.
,
Roberts
V. A.
,
Mattioli
M.
,
Cooley
L. A.
,
Hilborn
E. D.
,
Wade
T. J.
,
Fullerton
K. E.
,
Yoder
J. S.
&
Hill
V. R.
2017
Surveillance for waterborne disease outbreaks associated with drinking water – United States, 2013-2014
.
Morbidity and Mortality Weekly Report
66
,
1216
1221
.
Berry
D.
,
Xi
C.
&
Raskin
L.
2006
Microbial ecology of drinking water distribution system
.
Current Opinion in Biotechnology
17
,
297
302
.
https://doi.org/10.1016/j.copbio.2006.05.007
.
Delahaye
E.
,
Welté
B.
,
Levi
Y.
,
Leblon
G.
&
Montiel
A.
2003
An ATP-based method for monitoring the microbiological drinking water quality in a distribution network
.
Water Research
37
,
3689
3696
.
https://doi.org/10.1016/S0043-1354(03)00288-4
.
Dodangeh
E.
,
Panahi
M.
,
Rezaie
F.
,
Lee
S.
,
Tien Bui
D.
,
Lee
C.-W.
&
Pradhan
B.
2020
Novel hybrid intelligence models for flood-susceptibility prediction: meta optimization of the GMDH and SVR models with the genetic algorithm and harmony search
.
Journal of Hydrology
590
,
125423
.
https://doi.org/10.1016/j.jhydrol.2020.125423
.
Farkas
K.
,
Walker
D. I.
,
Adriaenssens
E. M.
,
McDonald
J. E.
,
Hillary
L. S.
,
Malham
S. K.
&
Jones
D. L.
2020
Viral indicators for tracking domestic wastewater contamination in the aquatic environment
.
Water Research
181
,
115926
.
https://doi.org/10.1016/j.watres.2020.115926
.
Feng
C.
,
Zhao
N.
,
Yin
G.
,
Gan
T.
,
Yang
R.
,
Chen
X.
,
Chen
M.
&
Duan
J.
2021
Artificial neural networks combined multi-wavelength transmission spectrum feature extraction for sensitive identification of waterborne bacteria
.
Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy
.
https://doi.org/10.1016/j.saa.2020.119423
.
Francisque
A.
,
Rodriguez
M. J.
,
Miranda-Moreno
L. F.
,
Sadiq
R.
&
Proulx
F.
2009
Modeling of heterotrophic bacteria counts in a water distribution system
.
Water Research
43
,
1075
1087
.
https://doi.org/10.1016/j.watres.2008.11.030
.
Gascó-Hernandez
M.
2018
Building a smart city: lessons from Barcelona
.
Communications of the ACM
61
,
50
57
.
https://doi.org/10.1145/3117800
.
Goraj
W.
,
Pytlak
A.
,
Kowalska
B.
,
Kowalski
D.
,
Grządziel
J.
,
Szafranek-Nakonieczna
A.
,
Gałązka
A.
,
Stępniewska
Z.
&
Stępniewski
W.
2020
Influence of pipe material on biofilm microbial communities found in drinking water supply system
.
Environmental Research
,
110433
.
https://doi.org/10.1016/j.envres.2020.110433
.
He
X.
,
Wang
H.
,
Zhuang
W.
,
Liang
D.
&
Ao
Y.
2021
Risk prediction of microcystins based on water quality surrogates: a case study in a eutrophicated urban river network
.
Environmental Pollution
275
,
116651
.
https://doi.org/10.1016/j.envpol.2021.116651
.
Hlavsa
M. C.
,
Roberts
V. A.
,
Anderson
A. R.
,
Hill
V. R.
,
Kahler
A. M.
,
Orr
M.
,
Garrison
L. E.
,
Hicks
L. A.
,
Newton
A.
,
Hilborn
E. D.
,
Wade
T. J.
,
Beach
M. J.
&
Yoder
J. S.
2011
Surveillance for waterborne disease outbreaks and other health events associated with recreational water – United States, 2007–2008
.
Morbidity and Mortality Weekly Report
60
(
SS12
),
1
32
.
Hu
C.
,
Fan
W.
,
Zeng
E.
,
Hang
Z.
,
Wang
F.
,
Qi
L.
&
Bhuiyan
M. Z. A.
2022
Digital twin-assisted real-time traffic data prediction method for 5G-enabled internet of vehicles
.
IEEE Transactions on Industrial Informatics
18
(
4
),
2811
2819
.
https://doi.org/10.1109/TII.2021.3083596
.
Ismagilova
E.
,
Hughes
L.
,
Dwivedi
Y. K.
&
Raman
K. R.
2019
Smart cities: advances in research – an information systems perspective
.
International Journal of Information Management
47
,
88
100
.
https://doi.org/10.1016/j.ijinfomgt.2019.01.004
.
Janudin
N.
,
Kasim
N. A. M.
,
Knight
V. F.
,
Halim
N. A.
,
Noor
S. A. M.
,
Ong
K. K.
,
Yunus
W. M. Z. W.
,
Norrrahim
M. N. F.
,
Misenan
M. S. M.
,
Razak
M. A. I. A.
,
Ahmad
M. Z.
&
Yaacob
M. H.
2022
Sensing techniques on determination of chlorine gas and free chlorine in water
.
Journal of Sensors
2022
,
1898417
.
Jiang
J.
,
Khan
A. U.
,
Shi
B.
,
Tang
S.
&
Khan
J.
2019
Application of positive matrix factorization to identify potential sources of water quality deterioration of Huaihe River, China
.
Applied Water Science
9
(
3
),
63
.
doi:10.1007/s13201-019-0938-4
.
Jiang
J.
,
Tang
S.
,
Han
D.
,
Fu
G.
,
Solomatine
D.
&
Zheng
Y.
2020a
A comprehensive review on the design and optimization of surface water quality monitoring networks
.
Environmental Modelling & Software
132
,
104792
.
https://doi.org/10.1016/j.envsoft.2020.104792
.
Jiang
S.
,
Zheng
Y.
&
Solomatine
D.
2020b
Improving AI system awareness of geoscience knowledge: symbiotic integration of physical approaches and deep learning
.
Geophysical Research Letters
47
.
https://doi.org/10.1029/2020GL088229
.
Jones
A. S.
,
Stevens
D. K.
,
Horsburgh
J. S.
&
Mesner
N. O.
2011
Surrogate measures for providing high frequency estimates of total suspended solids and total phosphorus concentrations
.
Journal of the American Water Resources Association
47
,
239
253
.
LeChevallier
M. W.
,
Lowry
C. D.
&
Lee
R. G.
1990
Disinfecting biofilms in a model distribution system
.
Journal – American Water Works Association
14
,
87
99
.
https://doi.org/10.1186/1471-2458-14-1092
.
Li
X.-F.
&
Mitch
W. A.
2018
Drinking water disinfection byproducts (DBPs) and human health effects: multidisciplinary challenges and opportunities
.
Environmental Science & Technology
52
,
1681
1689
.
https://doi.org/10.1021/acs.est.7b05440
.
Liang
X.
,
Ma
L.
,
Chong
C.
,
Li
Z.
&
Ni
W.
2020
Development of smart energy towns in China: concept and practices
.
Renewable and Sustainable Energy Reviews
119
,
109507
.
https://doi.org/10.1016/j.rser.2019.109507
.
Lin
H.-J.
&
Yeh
J. P.
2009
Optimal reduction of solutions for support vector machines
.
Applied Mathematics and Computation
214
,
329
335
.
https://doi.org/10.1016/j.amc.2009.04.010
.
Mallapaty
S.
2020
How sewage could reveal true scale of coronavirus outbreak
.
Nature
580
,
176
177
.
https://doi.org/10.1038/d41586-020-00973-x
.
Monteiro
L.
,
Figueiredo
D.
,
Dias
S.
,
Freitas
R.
,
Covas
D.
,
Menaia
J.
&
Coelho
S. T.
2014
Modeling of Chlorine Decay in Drinking Water Supply Systems Using EPANET MSX
.
Procedia Engineering
70,
1192
1200
.
Nevel
S. V.
,
Koetzsch
S.
,
Proctor
C. R.
,
Besmer
M. D.
,
Prest
E. I.
,
Vrouwenvelder
J. S.
,
Knezev
A.
,
Boon
N.
&
Hammes
F.
2017
Flow cytometric bacterial cell counts challenge conventional heterotrophic plate counts for routine microbiological drinking water monitoring
.
Water Research
113
,
191
206
.
Onyutha
C.
2022
Multiple statistical model ensemble predictions of residual chlorine in drinking water: applications of various deep learning and machine learning algorithms
.
Journal of Environmental and Public Health
2022
,
7104752
.
doi:10.1155/2022/7104752
.
Pachauri
R. K.
&
Meyer
L.
2014
Climate change 2014: synthesis report
. In:
Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change
.
IPCC
,
Geneva
,
Switzerland
.
https://doi.org/10.1017/CBO9781107415324
.
Patel
M.
,
Kumar
A.
,
Pittman
C. U.
,
Mlsna
T.
&
Mohan
D.
2021
Coronavirus (SARS-CoV-2) in the environment: occurrence, persistence, analysis in aquatic systems and possible management
.
Science of the Total Environment
765
,
142698
.
Peng
G. C. A.
,
Nunes
M. B.
&
Zheng
L.
2017
Impacts of low citizen awareness and usage in smart city services: the case of London's smart parking system
.
Information Systems and e-Business Management
15
,
845
876
.
https://doi.org/10.1007/s10257-016-0333-8
.
Sadiq
R.
&
Rodriguez
M. J.
2011
Empirical models to predict disinfection by-products (DBPs) in drinking water
.
Encyclopedia of Environmental Health
,
282
295
.
https://doi.org/10.1016/B978-0-444-52272-6.00282-8
.
Sagan
V.
,
Peterson
K. T.
,
Maimaitijiang
M.
,
Sidike
P.
,
Sloan
J.
,
Greeling
B. A.
,
Maalouf
S.
&
Adams
C.
2020
Monitoring inland water quality using remote sensing: potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud computing
.
Earth-Science Reviews
205
,
103187
.
https://doi.org/10.1016/j.earscirev.2020.103187
.
Scholkopf
B.
,
Kah-Kay
S.
,
Burges
C. J. C.
,
Girosi
F.
,
Niyogi
P.
,
Poggio
T.
&
Vapnik
V.
1997
Comparing support vector machines with Gaussian kernels to radial basis function classifiers
.
IEEE Transactions on Signal Processing
45
(
11
),
2758
2765
.
doi:10.1109/78.650102
.
Scott
D. W.
1979
On optimal and data-based histograms
.
Biometrika
66
,
605
610
.
https://doi.org/10.1093/biomet/66.3.605
.
Scudellari
M.
2020
How the pandemic might play out in 2021 and beyond
.
Nature
584
,
22
25
.
https://doi.org/10.1038/d41586-020-02278-5
.
Shi
B.
,
Jiang
J.
,
Sivakumar
B.
,
Zheng
Y.
&
Wang
P.
2018a
Quantitative design of emergency monitoring network for river chemical spills based on discrete entropy theory
.
Water Research
134
,
140
152
.
https://doi.org/10.1016/j.watres.2018.01.057
.
Shi
B.
,
Wang
P.
,
Jiang
J.
&
Liu
R.
2018b
Applying high-frequency surrogate measurements and a wavelet-ANN model to provide early warnings of rapid surface water quality anomalies
.
Science of The Total Environment
610–611
,
1390
1399
.
https://doi.org/10.1016/j.scitotenv.2017.08.232
.
Smola
A. J.
&
Schölkopf
B.
2004
A tutorial on support vector regression
.
Statistics and Computing
14
(
3
),
199
222
.
doi:10.1023/B:STCO.0000035301.49549.88
.
Summers
R. S.
,
Hooper
S. M.
,
Shukairy
H. M.
,
Solarik
G.
&
Owen
D.
1996
Assessing DBP yield: uniform formation conditions
.
Journal – American Water Works Association
88
(
6
),
80
93
.
doi:10.1002/j.1551-8833.1996.tb06573.x
.
Vapnik
V. N.
1998
Statistical Learning Theory
.
John Wiley & Sons, Inc.
,
Chichester, UK
.
Vavourakis
C. D.
,
Heijnen
L.
,
Peters
M. C. F. M.
,
Marang
L.
,
Ketelaars
H. A. M.
&
Hijnen
W. A. M.
2020
Spatial and temporal dynamics in attached and suspended bacterial communities in three drinking water distribution systems with variable biological stability
.
Environmental Science & Technology
54
,
14535
14546
.
https://doi.org/10.1021/acs.est.0c04532
.
Vörösmarty
C. J.
,
McIntyre
P. B.
,
Gessner
M. O.
,
Dudgeon
D.
,
Prusevich
A.
,
Green
P.
,
Glidden
S.
,
Bunn
S. E.
,
Sullivan
C. A.
,
Liermann
C. R.
&
Davies
P. M.
2010
Global threats to human water security and river biodiversity
.
Nature
467
,
555
561
.
https://doi.org/10.1038/nature09440
.
Wang
L.
,
Chen
Y.
,
Chen
S.
,
Long
L.
,
Bu
Y.
,
Xu
H.
,
Chen
B.
&
Krasner
S.
2019
A one-year long survey of temporal disinfection byproducts variations in a consumer's tap and their removals by a point-of-use facility
.
Water Research
159
,
203
213
.
https://doi.org/10.1016/j.watres.2019.04.062
.
Wang
J.
,
Shen
J.
,
Ye
D.
,
Yan
X.
,
Zhang
Y.
,
Yang
W.
,
Li
X.
,
Wang
J.
,
Zhang
L.
&
Pan
L.
2020
Disinfection technology of hospital wastes and wastewater: suggestions for disinfection strategy during coronavirus disease 2019 (COVID-19) pandemic in China
.
Environmental Pollution
262
,
114665
.
https://doi.org/10.1016/j.envpol.2020.114665
.
Wang
T.
,
Sun
D.
,
Zhang
Q.
&
Zhang
Z.
2021
China's drinking water sanitation from 2007 to 2018: a systematic review
.
Science of the Total Environment
757
.
https://doi.org/10.1016/j.scitotenv.2020.143923
.
WHO
2011
Guidelines for drinking-water quality
. In:
World Health Organization Chronicle
, 4th edn., Vol.
38
. p.
155
.
https://doi.org/10.1007/978-1-4020-4410-6_184
.
Wolfe
R. L.
,
Ward
N. R.
&
Olson
B. H.
1984
Inorganic chloramines as drinking water disinfectants: a review
.
Journal – American Water Works Association
76
(
5
),
74
88
.
https://doi.org/10.1002/j.1551-8833.1984.tb05337.x
.
Wu
D.
,
Wang
H.
&
Seidu
R.
2020
Smart data driven quality prediction for urban water source management
.
Future Generation Computer Systems
107
,
418
432
.
https://doi.org/10.1016/j.future.2020.02.022
.
WUP
2018
World Urbanization Prospects 2018: Highlights
.
United Nations, Department of Economic and Social Affairs
,
New York, NY
.
Zhang
X. Y.
,
Ying
F. U.
&
Gao
J.
2016
Analysis on biological stability influencing factors of living district water distribution system
. In:
2016 5th International Conference on Sustainable Energy and Environment Engineering (ICSEEE 2016)
.
Zhang
K.
,
Pan
R.
,
Zhang
T.
,
Xu
J.
,
Zhou
X.
&
Yang
Y.
2019
A novel method: using an adenosine triphosphate (ATP) luminescence-based assay to rapidly assess the biological stability of drinking water
.
Applied Microbiology and Biotechnology
103
(
11
),
4269
4277
.
Zhou
Z.-H.
2021
Machine Learning
.
Springer Nature
,
Singapore
.

Author notes

Co-first/equal authorship.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data