ABSTRACT
Illicit discharges into sewer systems are a widespread concern within China's urban drainage management. They can result in unforeseen environmental contamination and deterioration in the performance of wastewater treatment plants. Consequently, pinpointing the origin of unauthorized discharges in the sewer network is crucial. This study aims to evaluate an integrative method that employs numerical modeling and statistical analysis to determine the locations and characteristics of illicit discharges. The Storm Water Management Model (SWMM) was employed to track water quality variations within the sewer network and examine the concentration profiles of exogenous pollutants under a range of scenarios. The identification technique employed Bayesian inference fused with the Markov chain Monte Carlo sampling method, enabling the estimation of probability distributions for the position of the suspected source, the discharge magnitude, and the commencement of the event. Specifically, the cases involving continuous release and multiple sources were examined. For single-point source identification, where all three parameters are unknown, concentration profiles from two monitoring sites in the path of pollutant transport and dispersion are necessary and sufficient to characterize the pollution source. For the identification of multiple sources, the proposed SWMM-Bayesian strategy with improved sampling is applied, which significantly improves the accuracy.
HIGHLIGHTS
Identifying pollution sources can be applied for both instantaneous and continuous discharge scenarios.
To characterize a single pollution source, data from two monitoring sites along the pollutant's path are necessary and sufficient.
The strategic placement of monitoring sites and improved sampling enhance the effectiveness and precision of the Bayesian-SWMM approach for identifying multiple unauthorized discharge sources.
INTRODUCTION
In urban drainage systems, illegal connections and illicit discharges represent a common problem. Due to a variety of reasons, domestic and industrial sewage can be inadvertently discharged into storm sewers, leading to pollution in receiving waters and potentially to the dissemination of viruses, which poses a threat to public health and safety. Accidental spills or illicit discharges from manufacturing plants, which contain chemicals toxic to microorganisms, frequently occur in sanitary sewer systems (Li et al. 2017; Shao et al. 2021; Wu & Chen 2023). These illicit discharges reduce biomass activity and deteriorate the performance of wastewater treatment plants (WWTPs). Currently, in the service area of Zhenhai WWTP, Ningbo, China, water quality monitoring at a pumping station has revealed that persistent total phosphorus levels are particularly high; consequently, a pollution identification project is underway.
The EPA has introduced technical guidelines for illicit discharge detection and elimination in municipal storm drains (Brown et al. 2004), in which visual inspection and indicator sampling were considered effective in tracking and isolating the specific source (Irvine et al. 2011; Hachad et al. 2022). However, for unexpected pollution in sanitary and combined sewer systems, identifying the source proves to be significantly more challenging (Banik et al. 2017a, 2017b). Detecting abnormal discharge and/or water quality indicators is relatively straightforward; however, this information alone is insufficient to pinpoint the source. To mitigate the adverse impacts effectively, it is imperative to identify the discharging source promptly. Owing to the complex structure of sewer networks, pinpointing the discharge location, let alone the discharging history, is exceedingly difficult (Shao et al. 2021). In such scenarios, methods capable of tracking down sources and reconstructing the discharge profile are in high demand. Currently, numerous Chinese cities, including Ningbo, are actively engaged in developing smart urban drainage systems, utilizing modern technologies such as the Internet of Things, big data, and artificial intelligence. Through the real-time monitoring of water quantity and quality, intelligent control of the urban drainage system can be achieved. Pollution source identification constitutes a critical component of such systems.
The identification of illicit connections and discharges is categorized as an inverse problem, marked by inherent complexity and potential for error (Laird et al. 2006; Levenspiel 2011; Williams et al. 2011; Moghaddam et al. 2021). Extensive research on source identification has been conducted in water distribution networks (Kessler et al. 1998; Ostfeld & Salomons 2005; Hart & Murray 2010; Liu et al. 2015; Berglund et al. 2020) and surface water bodies such as rivers, lakes (Cheng & Jia 2010; Wang et al. 2018), and groundwater (Skaggs & Kabala 1994; Alapati & Kabala 2000; Moghaddam et al. 2021). Only a few studies have focused on source identification in sewer systems, of which most rely on optimization processes (Jiang et al. 2013). Kim et al. (2013) introduced an optimization method based on artificial neural networks and developed an algorithm to search for pathogens in sewer systems. Banik et al. (2017a, 2017b) implemented a genetic algorithm integrated with the Storm Water Management Model (SWMM) to identify contaminant intrusions in sewer systems. Xu et al. (2021) combined a microbial genetic algorithm with the SWMM to trace illicit connections within a sewer system.
Optimization is a prevalent approach for addressing inverse problems, relying on minimizing the difference between observed and simulated data to find solutions. This approach, however, may not fully consider the uncertainties inherent in the data, which can affect the accuracy of the results (Shao et al. 2021). In sewer networks, uncertainties are amplified due to the variable degradation of chemical substances, adding complexity to source identification (Chen et al. 2012; Ramin et al. 2016). Consequently, identifying sources in sewer systems remains a formidable challenge. Recognizing the limitations of traditional optimization, the focus has shifted toward identifying the stochastic distribution of parameters, which offers a more comprehensive understanding of the problem than a single optimal solution (Yee & Flesch 2010; Wang & Harrison 2013). Recent advancements in Bayesian-based stochastic approaches (Wang & Harrison 2013; Wu et al. 2020) have reinvigorated efforts to solve the source identification problems in sewer systems. Bayesian methods are particularly advantageous because they can reconstruct source information using limited data, surveying the entire solution domain to provide statistically plausible solutions.
The Bayesian method, which can make full utilization of prior information and account for uncertainty, has been employed to solve the inverse problem and determine the probability of incidents, such as the approximate location of pollution sources and the time series of discharges. The results will be represented as a probability density function. The optimal estimation is derived from the known information about the pollution source, leading to the solution of the inverse problem (Neupauer & Wilson 1999). Bayesian inference methods have been applied to address source identification problems in various fields, notably within water distribution networks (Yang et al. 2009; Wang & Harrison 2013, 2014). Researchers have utilized the Bayesian-Markov chain Monte Carlo (Bayesian-MCMC) algorithm to trace the discharge process of multiple pollution sources in river channels (Yang et al. 2016; Wang et al. 2018). Notably, Shao et al. (2021) developed a stochastic source identification model by coupling Bayesian inference with SWMM to reconstruct the profile of an instantaneous pollution event in a sewer network. The Bayesian-SWMM model was demonstrated to be effective and accurate in identifying the unknown source parameters.
This study aims to apply the SWMM-Bayesian approach to investigate more practical scenarios, specifically continuous pollution discharge events (Grbčić et al. 2021) and the presence of multiple pollution sources (Yang et al. 2016; Wang et al. 2021). The SWMM was employed to model the hydraulic dynamics and water quality characteristics associated with identified sources of pollution within the sewer system. The Bayesian-MCMC technique was utilized for the analysis and optimization of model parameters, thereby facilitating the localization of pollution sources via sampling. The algorithm facilitates the simultaneous estimation of three key parameters of the illicit source, which are its location, mass, and discharge time, and devises a pragmatic approach for the identification. Following the validation of the methodology's efficacy, the study further examined situations with multiple sources and also proposed an improved SWMM-Bayesian approach for this purpose.
METHOD
The methodology employed in the current investigation comprises two core computational components, building upon the framework established by Shao et al. (2021). The first component employs a simulation based on the SWMM to elucidate the dynamics of flow and contaminant dispersion throughout the sewer system. Subsequently, the second element applies a Bayesian-MCMC inference technique to estimate the probabilities associated with the undetermined source parameters.
Pollutant transport modeling
Bayesian inference via MCMC
Bayesian statistics
Metropolis–Hastings algorithm
Confronted with the challenge of resolving the posterior distribution in high-dimensional spaces, which is a formidable task for conventional parsing techniques, this study employs the MCMC method to facilitate the sampling process (Kass et al. 1998). The MCMC technique generates a sequence of sampling points whose distribution approximates the posterior probability. These points constitute a Markov chain, the equilibrium distribution of which, upon convergence, represents the sought-after posterior distribution.
Within the M–H algorithm, the proposal distribution is often chosen based on the prior distribution of the parameters. A common choice for the transition probability function q is to select a uniform distribution q(Xt−ε, Xt+1−ε), or alternatively, a normal distribution q(Xt, ε2), in which ε denotes the sampling step size. This step size is indicative of the interval between successive values of the unknown parameter during sampling and is a critical parameter influencing the sampling efficiency.
The workflow of the M–H algorithm can be delineated as follows:
(1) Initialization: Generate an initial state or point X0 based on prior information, such that X0 = Xt. Using the current state Xt, draw a candidate sample Xt+1 from the proposal distribution q(Xt, Xt+1).
(2) Acceptance probability calculation: Calculate the acceptance probability a(Xt, Xt+1) for the candidate state Xt+1.
(3) Decision: If the acceptance probability suggests that the candidate point is more probable than the current point, i.e., p(Xt+1) > p(Xt), set a greater than 1; accept Xt+1 as the new current state, assign Xt = Xt+1, and return to step (1) to draw a new sample. Conversely, if the candidate point is less probable, set a less than 1. The decision to accept the new point is then based on a random comparison with a. If rejected, maintain the current state Xt, and return to step (1) to draw a new sample.
Note that this description simplifies the decision-making process. In practice, a random number u from a uniform distribution U(0,1) is generated, and if u ≤ a(Xt, Xt+1), the new state Xt+1 is accepted; otherwise, it is rejected, and the algorithm remains at the current state Xt. The algorithm then iterates from step (1) with the accepted or retained state to generate a sequence of samples. According to the conclusions obtained by Shao et al. (2021), the step sizes of the unknown parameters Jx, M, and Td are 1, 10, and 1, respectively.
Coupling SWMM and Bayesian inference
The main steps of the coupling are described as follows:
(1) SWMM model construction: Develop the SWMM model, specifying the ‘true values’ of various inputs such as X(Jx, M, Td). Create an initial input .inp file for the SWMM model; run the SWMM model to generate the .rpt file thereby obtaining the concentration value Y at the monitoring node.
(2) Initial sampling: Randomly generate an initial sample X0 for the parameters of the discharge source based on the prior information.
(3) SWMM simulation and likelihood calculation: Rename the .inp file to a .txt file to edit the parameters.
(4) Locate the parameter values within the .txt file and replace them with the initial sample X0. Rename the file back to .inp and run the SWMM simulation through MatSWMM. Extract the calculated concentration time series from the .rpt report file. Calculate the likelihood function p(Y|X0) by comparing the simulated concentration values with the ‘true value’ concentrations from step (1).
(5) Parameter sampling: Use the proposal distribution to sample a new test parameter X* from the current parameter state X0 = Xt. Repeat step (3) to calculate the likelihood function p(Y|X*) for the new sample X*.
(6) Acceptance check: Draw a uniform random number u from the interval [0, 1]. If u ≤ a(Xt, X*), accept the new sample and use it as the initial sample for the next iteration. Otherwise, reject X* and retain Xt for the next iteration.
(7) Iterate: Repeat steps (3)–(5) until the desired number of iterations is reached, to obtain a collection of posterior samples of the discharge source parameters.
(8) Posterior distribution analysis: Analyze the posterior samples to infer the posterior distribution of the model parameters.
Performance evaluation
For the identification process, errors ranging from 1 to 10% were introduced into the simulation data by the addition of white noise. This method aimed to replicate the variability and uncertainty commonly found in real-world data. Analysis revealed that these introduced errors had a minimal impact on the outcomes, indicating that the findings are robust against typical data perturbations encountered in practical scenarios.
Case study sewer network
In this study, a series of numerical experiments were designed to evaluate the performance of the pollution source identification algorithm under various conditions. Table 1 summarizes the key parameters and scenarios used in these experiments. Specifically, different types of pollution sources (continuous and instantaneous discharges) and multiple monitoring point configurations were considered.
Scenario No. . | Source type . | Discharge mode . | Jx . | . | Td (min) . | M (g or g/s) . |
---|---|---|---|---|---|---|
A1 | Single | Instantaneous | 5 | 10, 20, 30 | 800, 1,000, 1,200 | |
A2 | 13 | 10, 20, 30 | 800, 1,000, 1,200 | |||
A3 | 16 | 10, 20, 30 | 800, 1,000, 1,200 | |||
A4 | 19 | 10, 20, 30 | 800, 1,000, 1,200 | |||
A5 | Continuous | 5 | 10, 20, 30 | 800, 1,000, 1,200 | ||
A6 | 13 | 10, 20, 30 | 800, 1,000, 1,200 | |||
A7 | 16 | 10, 20, 30 | 800, 1,000, 1,200 | |||
A8 | 19 | 10, 20, 30 | 800, 1,000, 1,200 | |||
B1 | Multiple | Instantaneous | 5 | 13 | 20 | 1,000 |
B2 | 5 | 16 | ||||
B3 | 13 | 19 | ||||
B4 | Continuous | 5 | 13 | 20 | 1,000 | |
B5 | 5 | 16 | ||||
B6 | 13 | 19 |
Scenario No. . | Source type . | Discharge mode . | Jx . | . | Td (min) . | M (g or g/s) . |
---|---|---|---|---|---|---|
A1 | Single | Instantaneous | 5 | 10, 20, 30 | 800, 1,000, 1,200 | |
A2 | 13 | 10, 20, 30 | 800, 1,000, 1,200 | |||
A3 | 16 | 10, 20, 30 | 800, 1,000, 1,200 | |||
A4 | 19 | 10, 20, 30 | 800, 1,000, 1,200 | |||
A5 | Continuous | 5 | 10, 20, 30 | 800, 1,000, 1,200 | ||
A6 | 13 | 10, 20, 30 | 800, 1,000, 1,200 | |||
A7 | 16 | 10, 20, 30 | 800, 1,000, 1,200 | |||
A8 | 19 | 10, 20, 30 | 800, 1,000, 1,200 | |||
B1 | Multiple | Instantaneous | 5 | 13 | 20 | 1,000 |
B2 | 5 | 16 | ||||
B3 | 13 | 19 | ||||
B4 | Continuous | 5 | 13 | 20 | 1,000 | |
B5 | 5 | 16 | ||||
B6 | 13 | 19 |
Scenario A tests are conducted for single-point source pollution. For example, as the minimum time interval for inputting pollution time series in SWMM is 1 min, it is assumed that on a given day at 0:20 a.m., a pollution event occurs instantaneously at an upstream single node 13, where a total of 1,000 g of pollutants are discharged within 1 min for A1–A4. In tests A5–A8, it was assumed that on a certain day at 0:20 a.m., pollutants were continuously discharged at an upstream single node at a rate of 1,000 g/min for 30 min.
Scenario B tests involved water quality simulations for multiple-point source pollution. For example, in test B3, it was assumed that on a given day at 0:20 a.m., a pollution event occurred instantaneously at node 13, where 1,000 g of pollutants were discharged in 1 min, and simultaneously at node 19, another 1,000 g of pollutants were discharged in 1 min.
RESULTS AND DISCUSSION
Inference of single source
Inference of single parameter
The statistical median and mean of the identified discharge mass and time were calculated and subsequently compared with the actual values in Table 2. Errors associated with the discharge node are not presented, as the node represents a discrete variable and, consequently, the associated errors lack physical significance. Comparisons presented in Table 2 suggest that the Bayesian inference results generally align with the actual solution, indicating the effectiveness of the proposed method. As reported by Shao et al. (2021), for the inverse problems with only one unknown parameter of a single source, inference utilizing the Bayes-MCMC method remains valid and accurate irrespective of whether the illicit source discharges instantaneously or continuously into the sewer system. The efficacy of the Bayes-MCMC method in thoroughly searching the parameter space is contingent upon the transfer probability density function and its sampling step size. The selection of the transfer probability function distribution is governed by the specific problem at hand, as there is no universal standard; thus, detailed problem-specific analysis is required.
Scenario No. . | Parameter . | True value . | The proposed SWMM-Bayesian predictions . | |||
---|---|---|---|---|---|---|
Mean value . | MAE (%) . | Median value . | MedAE (%) . | |||
A1–A4 | Td (min) | 10 | 10.11 | 1.1 | 10.01 | 0.1 |
20 | 20.12 | 0.6 | 20.006 | 0.03 | ||
30 | 30.51 | 1.7 | 30.15 | 0.5 | ||
A5–A8 | Td (min) | 10 | 10.181 | 1.81 | 10.002 | 0.02 |
20 | 20.202 | 1.01 | 20 | 0 | ||
30 | 30.48 | 1.6 | 30 | 0 | ||
A1–A4 | M (g) | 800 | 806.8 | 0.85 | 800 | 0 |
1,000 | 1002 | 0.2 | 1,001.3 | 0.13 | ||
1,200 | 1,219.2 | 1.6 | 1,204.8 | 0.4 | ||
A5–A8 | M (g/min) | 800 | 804.56 | 0.57 | 800.96 | 0.12 |
1,000 | 1,000.1 | 0.01 | 100.1 | 0.01 | ||
1,200 | 1,204.92 | 0.41 | 1,200.24 | 0.02 |
Scenario No. . | Parameter . | True value . | The proposed SWMM-Bayesian predictions . | |||
---|---|---|---|---|---|---|
Mean value . | MAE (%) . | Median value . | MedAE (%) . | |||
A1–A4 | Td (min) | 10 | 10.11 | 1.1 | 10.01 | 0.1 |
20 | 20.12 | 0.6 | 20.006 | 0.03 | ||
30 | 30.51 | 1.7 | 30.15 | 0.5 | ||
A5–A8 | Td (min) | 10 | 10.181 | 1.81 | 10.002 | 0.02 |
20 | 20.202 | 1.01 | 20 | 0 | ||
30 | 30.48 | 1.6 | 30 | 0 | ||
A1–A4 | M (g) | 800 | 806.8 | 0.85 | 800 | 0 |
1,000 | 1002 | 0.2 | 1,001.3 | 0.13 | ||
1,200 | 1,219.2 | 1.6 | 1,204.8 | 0.4 | ||
A5–A8 | M (g/min) | 800 | 804.56 | 0.57 | 800.96 | 0.12 |
1,000 | 1,000.1 | 0.01 | 100.1 | 0.01 | ||
1,200 | 1,204.92 | 0.41 | 1,200.24 | 0.02 |
Inference of multiple unknown parameters
When confronted with the absence of details for all three parameters related to the illicit discharge source, the scenario becomes complex owing to the myriad potential parameter combinations, varied hydraulic conditions, and other influential factors. Such complex interactions may yield similar concentration profiles at monitoring points, considerably diminishing the accuracy of concurrent inference for all three unknown parameters, in contrast to the identification in scenarios with only one parameter. The efficacy of Bayesian inference hinges significantly on the likelihood function, given its direct correlation with the number of unknown variables, as shown in Equation (6). An increase in unknowns is anticipated to adversely affect the accuracy of the determined results.
To quantitatively evaluate convergence, an analysis of the standard error of the identification results is conducted, as presented in Table 3. Compared to Table 2, both the mean error and median error are larger than in the case where only one unknown parameter is identified. According to the governing equation of the pollution transport process, the monitored concentration at any node is a function of the initial concentration, and travel time, which can be expressed in terms of travel distance and flow velocity. For a sewer network, there are numerous possible combinations of these three parameters that produce the same monitoring readings. Consequently, Bayesian inference fails to effectively discriminate among these sources.
Scenario No. . | Parameter . | True value . | The proposed SWMM-Bayesian predictions . | |||
---|---|---|---|---|---|---|
Mean value . | MAE (%) . | Median value . | MedAE (%) . | |||
A2 | Td (min) | 20 | 14.866 | 25.67 | 15.396 | 23.02 |
M (g) | 1,000 | 1,081 | 8.1 | 1,140 | 11.4 | |
A3 | Td (min) | 20 | 17.27 | 13.65 | 18.5 | 7.5 |
M (g) | 1,000 | 1,118 | 11.8 | 1,071 | 7.1 | |
A4 | Td (min) | 20 | 16.792 | 16.04 | 18.18 | 9.1 |
M (g) | 1,000 | 858 | 14.2 | 899.1 | 10.09 |
Scenario No. . | Parameter . | True value . | The proposed SWMM-Bayesian predictions . | |||
---|---|---|---|---|---|---|
Mean value . | MAE (%) . | Median value . | MedAE (%) . | |||
A2 | Td (min) | 20 | 14.866 | 25.67 | 15.396 | 23.02 |
M (g) | 1,000 | 1,081 | 8.1 | 1,140 | 11.4 | |
A3 | Td (min) | 20 | 17.27 | 13.65 | 18.5 | 7.5 |
M (g) | 1,000 | 1,118 | 11.8 | 1,071 | 7.1 | |
A4 | Td (min) | 20 | 16.792 | 16.04 | 18.18 | 9.1 |
M (g) | 1,000 | 858 | 14.2 | 899.1 | 10.09 |
Identification by adding monitoring sites
As outlined in the preceding section, pinpointing a source with three unknowns presents a significant challenge. The initial sampling generated by the Bayesian-MCMC algorithm is random. The initial point determines where the algorithm commences its iterations, which significantly influences the iterative process of the MCMC algorithm and may lead to instability sometimes. The addition of a new monitoring site upstream of node 1 allows for more targeted data collection. The data help ascertain whether pollution plumes from non-hotspot areas are likely to pass through or bypass the new monitoring node. This targeted approach significantly reduces uncertainty in our model, thereby enhancing both the precision and reliability of the inferential algorithm. Utilizing the sewer network's topology in conjunction with the frequency statistics of potential nodes offers a straightforward and efficient strategy.
The frequency of the discharge node falling in each region of the prior information is counted. In B1, the probability that the true discharge node falls between nodes 9 and 16 is the largest; in B2, the probability that the true discharge node falls between nodes 9 and 18 is the largest. In B3, the probability that the true discharge node is between nodes 9 and 20 is the largest.
Scenario No. . | Parameter . | True value . | The proposed SWMM-Bayesian predictions . | |||
---|---|---|---|---|---|---|
Mean value . | MAE (%) . | Median value . | MedAE (%) . | |||
A2 | Td (min) | 20 | 20.44 | 2.2 | 20.04 | 0.2 |
M (g) | 1,000 | 956.9 | 4.31 | 973 | 2.7 | |
A3 | Td (min) | 20 | 19.67 | 1.65 | 19.84 | 0.8 |
M (g) | 1,000 | 973.7 | 2.63 | 988.4 | 1.16 | |
A4 | Td (min) | 20 | 19.46 | 2.7 | 20 | 0 |
M (g) | 1,000 | 1,075 | 7.5 | 1009 | 0.9 |
Scenario No. . | Parameter . | True value . | The proposed SWMM-Bayesian predictions . | |||
---|---|---|---|---|---|---|
Mean value . | MAE (%) . | Median value . | MedAE (%) . | |||
A2 | Td (min) | 20 | 20.44 | 2.2 | 20.04 | 0.2 |
M (g) | 1,000 | 956.9 | 4.31 | 973 | 2.7 | |
A3 | Td (min) | 20 | 19.67 | 1.65 | 19.84 | 0.8 |
M (g) | 1,000 | 973.7 | 2.63 | 988.4 | 1.16 | |
A4 | Td (min) | 20 | 19.46 | 2.7 | 20 | 0 |
M (g) | 1,000 | 1,075 | 7.5 | 1009 | 0.9 |
Inference of multiple sources
Traditional M–H algorithms for random sampling of more complex parameter spaces are usually unable to search over all ranges. Therefore, based on inverse problems with potentially nonunique solutions, it is incomplete and perhaps even misleading to solve the inverse problem based on samples generated by random wandering that fall into local optimal solutions. To avoid the result of the local optimal solution, a scheme is designed to make it escape from the local optimum when it falls into the local optimum, and iterations are performed several times using different initial values to increase the chance of finding the global optimal solution (Zhu et al. 2009; Yang et al. 2018). Each iteration may converge to a different local optimal solution, but the probability of finding the global optimal solution can be improved by multiple attempts, and the specific improved MCMC method is as follows: record the likelihood of each sample's simulated concentration value and the observed value; in the iterative process, if the current candidate sampling value is not accepted, the sample with the largest likelihood will be used as the initial value for the next iteration; use different initial values for multiple iterations, so that the calculation can be directly used to find a more accurate initial value, thus increasing the likelihood of finding the global optimum and avoiding the large fluctuations in the Markov chain.
For the case of multiple-point sources, it is assumed that the number of sources is 2. Pollutants are discharged simultaneously at two upstream nodes, and the monitoring of the downstream concentration change process is carried out at node 1. The unknown parameter discharge nodes Jx and discharge mass M of the two sources are simultaneously reasoned using the improved SWMM-Bayesian algorithm. It can be seen in the a posteriori histogram in Figure 6 that both unknown parameters converge to the true value.
Scenario No. . | Parameter . | True value . | The proposed SWMM-Bayesian predictions . | |||
---|---|---|---|---|---|---|
Mean value . | MAE (%) . | Median value . | MedAE (%) . | |||
B1 | M (g) | 1,000 | 1,000 | 0 | 1,006.5 | 0.65 |
1,054 | 5.4 | 1,044 | 4.4 | |||
B2 | M (g) | 1,000 | 1,013 | 1.3 | 1,013 | 1.3 |
917.2 | 8.28 | 968.7 | 3.13 | |||
B3 | M (g) | 1,000 | 984.7 | 1.53 | 992 | 0.9 |
989 | 1.1 | 986 | 1.4 |
Scenario No. . | Parameter . | True value . | The proposed SWMM-Bayesian predictions . | |||
---|---|---|---|---|---|---|
Mean value . | MAE (%) . | Median value . | MedAE (%) . | |||
B1 | M (g) | 1,000 | 1,000 | 0 | 1,006.5 | 0.65 |
1,054 | 5.4 | 1,044 | 4.4 | |||
B2 | M (g) | 1,000 | 1,013 | 1.3 | 1,013 | 1.3 |
917.2 | 8.28 | 968.7 | 3.13 | |||
B3 | M (g) | 1,000 | 984.7 | 1.53 | 992 | 0.9 |
989 | 1.1 | 986 | 1.4 |
Scenario No. . | Parameter . | True value . | The proposed SWMM-Bayesian predictions . | |||
---|---|---|---|---|---|---|
Mean value . | MAE (%) . | Median value . | MedAE (%) . | |||
B4 | M (g/min) | 1,000 | 1,036 | 3.6 | 1,015 | 1.5 |
1,047 | 4.7 | 1,010 | 1.0 | |||
B5 | M (g/min) | 1,000 | 1,026 | 2.6 | 1,013 | 1.3 |
1,030.2 | 3.02 | 1,024.6 | 2.46 | |||
B6 | M (g/min) | 1,000 | 1,089.9 | 8.89 | 1,098 | 9.8 |
931 | 6.9 | 958 | 4.2 |
Scenario No. . | Parameter . | True value . | The proposed SWMM-Bayesian predictions . | |||
---|---|---|---|---|---|---|
Mean value . | MAE (%) . | Median value . | MedAE (%) . | |||
B4 | M (g/min) | 1,000 | 1,036 | 3.6 | 1,015 | 1.5 |
1,047 | 4.7 | 1,010 | 1.0 | |||
B5 | M (g/min) | 1,000 | 1,026 | 2.6 | 1,013 | 1.3 |
1,030.2 | 3.02 | 1,024.6 | 2.46 | |||
B6 | M (g/min) | 1,000 | 1,089.9 | 8.89 | 1,098 | 9.8 |
931 | 6.9 | 958 | 4.2 |
Discussion
Through the investigation of tests A5–A8, in which the discharge was continuously released for a specific time period, it was found that the identification of the pollution source is as accurate as in instantaneous dumping cases A1–A4. Both instantaneous and continuous pollution releases are deterministic processes for pollution transport and dispersion. The parameters involved in solving the inverse problem do not change substantially, indicating that the accuracy and reliability of the statistical inference achievable in both cases are of the same order. However, it is important to note the assumption of known start times of emissions in continuous pollution release cases.
The effectiveness of introducing an additional monitoring site for single-point source identification problems has been confirmed by prior research (Wang et al. 2018; Shao et al. 2021). The main reason is that it facilitates the reconstruction of the one-dimensional dispersion process of pollutants within the sewer network. As elucidated by Fischer (1968) with regard to the ‘routing procedure’, the concentration of a dispersing cloud is channeled through the sewer network, analogous to the routing of a flood. Given that the entire process is deterministic, a comparison of the upstream and downstream profiles can subsequently be employed to resolve the inverse problem. Consequently, the presence of two monitoring sites enables the reconstruction of the physical process via the concentration profiles. Therefore, in cases involving a single pollution source, the identification of the illicit discharge can be ascertained given two strategically positioned monitoring sites. For instance, the potential hotspot area can be delineated using frequency statistics. This information, when integrated with an analysis of the sewer network topology, allows for the determination of the subsequent monitoring station, thereby enabling an update to the statistical inference results. This iterative process is continued until information pertaining to all pollution sources is ascertained with the requisite degree of confidence.
In the case of identification challenges with multiple-point sources, the effectiveness of the SWMM-Bayesian identification method is limited not only by the multiple combinations of parameters but also by the superposition of the process lines of discharge concentrations at multiple points, resulting in inferred results that are easily limited to local optimal solutions. Thus, the normal distribution is employed as the proposal distribution for MCMC sampling to enhance algorithmic efficiency. In addition, the likelihood of each sample relative to observed values is recorded. During the iterative process, if the current candidate sample is not accepted, the sample with the highest likelihood is selected as the initial value for the subsequent iteration. Repeating iterations using these more accurate initial values increases the probability of identifying the global optimum and reduces substantial fluctuations in the early stages of the Markov chain. Consequently, the SWMM-Bayesian method shows good agreement with the true values.
In the present study, an idealized steady-state flow was assumed to simplify the initial analysis and model development. However, the importance of considering unsteady flow conditions should be noted for future investigations to more accurately reflect the dynamic nature of sewer flows. In addition, the limitations of the proposed strategy for placing upstream monitoring points must be acknowledged, particularly in scenarios where sudden pollution events occur, and the pollution plume may have already traversed these points by the time they are identified.
CONCLUSIONS
A SWMM-Bayesian source identification method was developed and successfully applied in a sewer network. Given a set of monitoring concentration data, the proposed method combines the Bayesian inference theory with an MCMC sampling method to produce probability distributions of the unknown source parameters. By coupling Bayesian-MCMC inference with SWMM simulations, the dispersion process of the unknown source can be accurately reconstructed.
In the context of identifying a single illicit discharge, this method accurately determines the true value of unknown parameters. It was found that the accuracy of the proposed method for both instantaneous and continuous discharge scenarios is satisfactory. In practical scenarios where pertinent information regarding the illicit discharge is often unknown, the precision of identification significantly decreases, resulting in merely an approximate determination of the source range. This limitation can be mitigated by strategically placing additional water quality monitoring stations in areas of high probability.
In scenarios involving multiple sources, the reduced precision of the method is attributable not only to the myriad combinations of undetermined parameters but also to the superimposition of concentration profiles. To address this challenge, an improved sampling method was employed. This method fully utilizes the likelihood function to optimize the initial value of the iterative process, avoids interference from local optimal solutions, and demonstrates that the proposed improvement enhances the global search capability.
FUNDING
The study was supported by the National Key R&D Program of China (2022YFC3203200) and the Key Research and Development Program of Ningbo (2023Z216).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.