It is important to identify the source information after a sudden water contamination incident occurs in a water supply system. The accuracy of the simulation model's parameters determines the accuracy of the source information. However, it is difficult to obtain the true value of these parameters by existing methods, so reduction of the errors caused by the uncertainty of these parameters is a crucial problem. A source identification framework which considers the uncertainty of the model's sensitive parameters and combines Bayesian inference and Markov Chain Monte Carlo (MCMC) algorithms simulation is established, and the South-to-North Water Diversion Project is taken as the case study in this paper. Compared with a framework which does not consider the uncertainty of the model's parameters, the proposed framework could solve the error caused by the wrong choice of model parameters and obtain more accurate results. In addition, the proposed framework based on traditional MCMC and that based on the Delayed Rejection and Adaptive Metropolis (DRAM-MCMC) are compared to prove that the DRAM-MCMC is more convergent and accurate. Lastly, the proposed framework based on DRAM-MCMC is proved to solve the problem with high practicality and generality in the studied long distance water diversion project.
INTRODUCTION
In recent years, sudden water contamination incidents in water supply systems or rivers, resulting from transport accidents, sewage pipe bursts, unregulated sewage treatment plant discharges, terrorism attack and extreme natural disasters (such as earthquake), have occurred more frequently and have caused environmental, economic and societal consequences, in particular in developing countries such as China (He et al. 2011; Tang et al. 2014). Rapid and accurate source identification of water contamination is crucial to reducing the potential impacts of such incidents and thus is an essential step to develop mitigation and adaptation strategies in water quality management.
The contamination source identification problem, which is normally regarded as an ill-posed and inverse problem, has attracted a great deal of attention. Many methods have been developed in the literature, such as the classical regularization methods, simulation-optimization methods and Bayesian inference methods (Hamdi & Mahfoudhi 2013). The Tikhonov regularization method, one of the classical regularization methods, was first used to solve such problems, has been proved to be reliable and simple, and also can cope with the noise in the experimental data (Nguyen et al. 1999; Akçelik et al. 2003; Wang et al. 2013; Qi et al. 2016). With the development of computing power, optimization algorithms were used for source identification problems in combination with water quality simulation models, such as genetic algorithms (Khlaifi et al. 2009), heuristic harmony search algorithms (Ayvaz 2010) and parallel evolutionary strategies (Mirghani et al. 2009). Such algorithms are likely to find solutions that are more accurate than those obtained from classical regularization methods. However, the regularization and optimization methods can only provide the point estimation but cannot take uncertainties in the inverse problem, which will increase the risk of obtaining the inaccurate identification and reduced reliability of finding the optimal solution because an increasing number of model parameters are uncertain. On the contrary, Bayesian approaches have a number of distinctive advantages and have been used in many areas (Wang et al. 2013; Zhang et al. 2015). This approach could provide a posterior probability distribution of the corresponding source parameters and quantify random errors in the data (Hassan et al. 2009). Takaishi (2013) combined the Bayesian inference and Markov Chain Monte Carlo (MCMC) method implemented by the importance sampling method into the generalized autoregressive conditional heteroscedasticity (GARCH) model, and they found that the methods could reduce the statistical error of the GARCH parameters. Wang & Harrison (2013) used the Bayesian and MCMC methods to identify the contaminant profile in Water Distribution Systems, they examined a statistical learning approach to build a regression model between the proposed parameters and likelihood for each pair of source and sensor nodes in the network and proved that the method was feasible and efficient. Shao et al. (2014) combined the Bayesian approaches and MCMC simulation to identify water quality model parameters, and the result shows that the method has high reliability and anti-noise capability. It can be imagined that the method combining the Bayesian approaches and MCMC is a good way to solve the contamination source identification problem.
In previous research, model parameters were regarded as deterministic, though they were obtained by the tracer tracking method, analogy method and empirical formula method generally, they had high uncertainties which resulted in the accuracy of the results directly (Van Griensven & Meixner 2007; Blasone et al. 2008; Xu et al. 2009; Zhang et al. 2013; Tian et al. 2014). Therefore, how uncertainties of model parameters are considered is crucially important. This paper aims to propose a framework, which replaces the deterministic value with the prior probability function of the model parameters obtained from the practical measure, analogy method and empirical formula method to make identification more accurate and faster. Furthermore, the modified method of MCMC, i.e., the DRAM-MCMC, combining the Delayed Rejection, Adaptive Metropolis and MCMC (Haario et al. 2006), is proposed to improve the efficiency and accuracy of the source identification.
The remainder of this paper provides an overview of the framework of the source identification which considers the uncertainties of the parameters and gives the scheme design, followed by details of the case study. The sensitivity analysis results for the parameters and uncertainty analysis of different scenarios are then provided and the conclusions drawn.
METHODOLOGY
Water quality model
Bayesian inference and MCMC
Bayesian inference
The Bayesian inference is a useful approach where prior knowledge is taken into consideration naturally and allows to the user to obtain uncertainties about the estimated parameter (Zio & Zoia 2008; Wang & Chen 2013; Zhao et al. 2014).
MCMC sampling
The main methods to construct the Markov chain transition probability matrix include the Gibbs sampling algorithm, the Metropolis–Hastings algorithm and Metropolis algorithm (Cowles & Carlin 1996; Cowles & Rosenthal 1998; Zio & Zoia 2008; Haghighattalab et al. 2012). However, according to the previous studies (e.g., Haario et al. 2006; Mbalawata et al. 2015), the challenge of the standard methods is that it is very hard to find a good proposal distribution in complicated high-dimensional models. Haario et al. (2006) proposed a modified Metropolis algorithm Delayed Rejection and Adaptive Metropolis (DRAM-MCMC) and proved that the method combining with DRAM) was more efficient than MCMC. In the method, a higher stage candidate in DR is added to preserve the property and the reversibility of the Markov chain relative to the distribution of interest at each time step in the DRAM-MCMC method. And the advantage of DR could also save in terms of simulation time depend on exploiting the hierarchy between kernels. Moreover, AM, which has the correct ergodic properties, has been introduced into MCMC so that the likelihood probability distribution could be updated along the process using the full information cumulated. More details and theory can be seen in Haario et al. (2006).
In this paper, there are not enough data to construct the likelihood distribution. So the likelihood distribution in the traditional MCMC is assumed to follow a normal distribution as a result of lacking data (Wang & Chen 2013; Zhao et al. 2014), in DRAM-MCMC, it is assumed to follow a correlated Gaussian distribution according to the study by Haario et al. (2006). And the key problems are how to set the values of covariance and how many stages should be built. On the one hand, the covariance is made up of the initial covariance C0, the length of the initial non-adaptation N0, the target's dimension d and the standard optimal factor Sd. According to the study of Haario et al. (2006), the target's dimension d should be less than 15, so it is set to 6. The length N0 of the initial non-adaptation period which is related to d is set to 1,000. The Gaussian proposal is started by the standard optimal factor Sd = 2.42/d (Gelman et al. 1996). On the other hand, a second stage proposal is used in DR.
Scenario design
Three different scenarios to identify the source are designed for comparison. Scenario 1 and Scenario 2 are set to compare the advantage between the traditional framework which does not consider the uncertainty of the model parameters and the proposed framework in this paper. Scenario 2 and Scenario 3 are used to prove the superiority of the modified MCMC. The mode of the posteriori distribution of is used as a representation of the characteristics of the contamination event. The number of the iteration was set to 50,000 to make these three schemes comparable in the paper. The detailed scenarios are shown in Table 1.
Scenario . | S1 . | S2 . | S3 . |
---|---|---|---|
Is parameter uncertainty considered | No | Yes | Yes |
Sampling method | Traditional-MCMC | Traditional-MCMC | DRAM-MCMC |
Scenario . | S1 . | S2 . | S3 . |
---|---|---|---|
Is parameter uncertainty considered | No | Yes | Yes |
Sampling method | Traditional-MCMC | Traditional-MCMC | DRAM-MCMC |
Scenario 1 (S1): the main purpose of this scenario is to illustrate that the uncertainty of parameters could easily lead to greater error in the results. It is difficult to select a single value in the range that is consistent with the truth value because of the uncertainties. In each identification, we chose a set of fixed parameters from the ranges of model parameters and use the traditional MCMC to estimate the source terms in this scenario without considering model parameter uncertainties.
Scenario 2 (S2) is based on the proposed framework and uses traditional MCMC, and Scenario 3 (S3) is based on the proposed framework and uses the modified MCMC, DRAM-MCMC.
According to the scenarios designed above, the accuracy and high efficiency of the proposed framework could be proved, because it is useful to solve the specified event. But we cannot prove that the proposed framework is suitable for other events, which differ in occurrence time, total load of the sewerage and position. Therefore, in order to prove this, 2,000 events with different levels of Load, Location and Time are designed to be identified with S3.
CASE STUDY
In order to prove the accuracy and versatility of the scenario, the middle route of the South-to-North Water Diversion Project was chosen as a case study. The project transfers water from Danjiangkou reservoir in Hubei province and crosses the Yangtze River, Huaihe River, Haihe River, Yellow River basins and finally arrives at Beijing Tuancheng Lake. The total length of the project is 1,277 km, crossing nearly 150 cities and 151,000 hectares of land via open-channels, culverts and pipes. In addition, the span of the project is very large, and 936 different types of structures, including 44 railways and 571 bridges over the channel, have been built. Also, many factories have been built near the project especially in the Shijiazhuang province. Therefore, there is a high probability of sudden water pollution incident especially in Shijiazhuang province. Above all, a section in the Shijiazhuang province of 20 km length is taken as an example to identify the source term of the chemical contaminant.
Parameter . | Interval . | Default value . |
---|---|---|
[980, 1,470] | 1,225 | |
[28.8, 43.2] | 36 | |
[17.28, 25.92] | 21.6 | |
[0.096, 0.144] | 0.12 | |
[0.08, 0.12] | 0.1 | |
[0.8, 1.2] | 1 |
Parameter . | Interval . | Default value . |
---|---|---|
[980, 1,470] | 1,225 | |
[28.8, 43.2] | 36 | |
[17.28, 25.92] | 21.6 | |
[0.096, 0.144] | 0.12 | |
[0.08, 0.12] | 0.1 | |
[0.8, 1.2] | 1 |
RESULTS AND DISCUSSION
Sensitivity analysis
Uncertainty analysis
The assumption is that only the relative errors of all of the three source terms (Load, Location and Time) of less than 20% can be accepted. As can be seen from Figure 4(a), it is solved by S1 that only 27.2% of the 2,000 set results could be accepted. It is difficult to choose the correct values within these small ranges. More than 60% of the 2,000 set results are higher than 50% of the true results. This reveals that when the uncertainty of model parameters is not considered, it is very likely to fail to identify the contamination event.
Figure 4(b) shows that 63% of the results solved by S2 could be accepted, which implies that the scenario based on the traditional MCMC could reduce the uncertainty so as to make the results more accurate. The disadvantage of this scenario is that the relative error of more than 17% of the 2,000 set results exceed 100%. This means that the convergence of the scenario will be reduced if we consider the uncertainty of the sensitive parameters of the model, which uses the range instead of the fixed value. In order to solve the problem, we propose the modified method DRAM-MCMC (Haario et al. 2006) in S3. And the results, as shown in the Figure 4(c), show that 96.55% of the 2,000 set results could be accepted, revealing that the DRAM-MCMC is more efficient and accurate than the traditional MCMC. Compared with the traditional MCMC, it could increase the convergence of the scenario because the DR keeps the Markovian property and reversibility by increasing the stage candidate and the AM improves ‘global’ adaptation based on the past history of the chain.
According to the analysis above, we could suggest that the commonality of the scenario based on DRAM-MCMC is better than that based on MCMC. In order to prove the availability of the scenario, we performed an analysis based on the data and obtained the mean and the standard deviation of relative error, as listed in Table 3. This shows that for the 80th quantile, the mean and the standard deviation of the relative errors are all less than 20%, relatively small, and the 90 quantile of the relative errors are all less than 31.18%.
. | Distance (D) . | Time (T) . | Quality (Q) . |
---|---|---|---|
Standard deviation of the relative error | 11.89% | 16.54% | 12.65% |
80th quantile of relative error | 17.80% | 19.08% | 14.20% |
90th quantile of relative error | 28.45% | 31.18% | 21.01% |
Mean of relative error | 11.50% | 11.56% | 9.28% |
. | Distance (D) . | Time (T) . | Quality (Q) . |
---|---|---|---|
Standard deviation of the relative error | 11.89% | 16.54% | 12.65% |
80th quantile of relative error | 17.80% | 19.08% | 14.20% |
90th quantile of relative error | 28.45% | 31.18% | 21.01% |
Mean of relative error | 11.50% | 11.56% | 9.28% |
CONCLUSIONS
According to previous research, we know that the uncertainty of parameters of the model, which is used to identify the source, leads to big errors in the result. So a framework considering the uncertainty of the model's insensitive parameters based on MCMC to improve accuracy of results is proposed. Two improvements have been used in this framework. One is that the sensitive parameters have been chosen by using the sensitivity analysis method to give the prior probability function instead of constant values. The other is that DRAM-MCMC is used to improve the accuracy of the result and the convergence of the identification.
This study provides a step-wise analysis of the source identification framework, which considers the uncertainty of the model's sensitive parameters. The main findings from this study can be summarized as follows: (1) from the sensitivity analysis of the framework, which does not consider the uncertainty of the model parameters, we could know that Dx and Ux are the sensitive parameters leading to very large uncertainties in the results; (2) through comparing the traditional framework which does not consider the uncertainty of the model's parameters and the proposed framework, we have found that the proposed framework which does consider the uncertainty obtains more accurate results than the traditional one; (3) a comparison of the proposed framework based on the traditional MCMC with the new DRAM-MCMC reveals that the proposed framework based on DRAM-MCMC has a better performance in improving the accuracy and convergence of the source terms; and, finally, (4) the proposed framework based on DRAM-MCMC is used to identify many different events, and the result shows that the 80th quantile, the mean and the standard deviation of the relative errors are all less than 20% which is very small, and the results prove that the proposed framework is effective for the case study of the South-to-North Water Diversion Project, but it should be tested further on more case studies in the future.
ACKNOWLEDGEMENTS
This work was supported by the National Natural Science Foundation of China (Grant No. 51320105010). Moreover, this study is partly funded by the national science and technology major project under grant 2014ZX03005001.