The blind source separation theory was introduced and the trend and amplitude (TAA) model was established in order to overcome the shortcomings of some traditional global leakage discharge analysis models in water distribution systems (WDS). The TAA model considers the leakage discharge as one part of the total water supply flow, consisting of constrained independent component analysis (CICA) model and amplitude solving model. In the CICA model, the CICA algorithm was chosen and two reference vectors were constructed, and then the trend of leakage discharge was obtained. In the amplitude solving model, the two-element coupled linear overdetermined equations were derived and the amplitude was calculated. The TAA model was optimized and verified based on the data from three kinds of WDS (the laboratory WDS, the emulational WDS and the actual WDS). The simulation accuracy of the TAA model was high enough when the total water supply flow was a non-Gaussian signal in the WDS with one entrance only; the TAA model can effectively avoid the complexity (and reflect the uncertainty) of the relationship between leakage discharge and pressure head. More importantly, the model has good transplant performance.

## INTRODUCTION

*et al.*2006; Van Zyl & Cassa 2013) where: is the leakage discharge of a single leakage; is the leakage area; is the discharge coefficient;

*g*is acceleration due to gravity; and

*h*is the pressure head (the kinetic head can be negligible). Many field tests and laboratory-based tests have proved that is more sensitive to

*h*than a square root relationship in Equation (1) (Walski

*et al.*2009; Ferrante 2012; Van Zyl & Cassa 2013) because of the variation of and . In order to better understand the variation of , was categorized, the fixed area and variable area discharge (FAVAD) concept model was proposed by May (1994): where is the fixed leakage area when the pressure head is 0 and is the variable leakage area. According to the assumption that varied linearly with

*h*, FAVAD model was improved by Cassa

*et al.*(2010), where is the pressure–area slope constant. Equation (3) has improved the sensitivity measured by the power exponent from 0.5 to the range of 0.5–1.5. With the stress balance analysis of a circular hole in a pressurized pipe, the FAVAD model was improved theoretically by Van Zyl & Clayton (2007), where

*C*is a constant. Equation (4) has increased the power exponent to a maximum value of 2.5. Both the orifice equation model and the FAVAD models are always simplified to the power function model described in Equation (5). where, is the leakage coefficient and is the leakage exponent ( is referred to ). The value of has a wide range, it is considerably larger than 0.5, and typically varies between 0.5 and 2.79 with a median of 1.15 (Farley & Trow 2003; Greyvenstein & Van Zyl 2007). The value of is affected by many factors such as the leakage hydraulics, pipe material behavior, soil hydraulics and water demand (Van Zyl & Clayton 2007), pipe thickness, discharge conditions, ratio of leakage discharging to the upstream and leakage shape (Brunone & Ferrante 2001; Van Zyl & Clayton 2007; Ferrante

*et al.*2010). can be different while the leakage area is taken as invariant in Equation (5), for the leak area has a change caused by leak elastic deformation, leak plastic deformation, hysteresis of viscoelastic behavior (Pezzinga 2002; Covas

*et al.*2004; Van Zyl & Cassa 2013), and even the changing process of pressure (Ferrante 2012). Considering the location, shape and size of almost all leakage cannot be accurately acquired, it is difficult to get the value of or for optional one leakage underground.

For the global models, the local models were widely applied for all the leakage in the WDS. Because the power function model has a more general form and can be directly applied to the EPANET software, it is widely accepted in practice applications. The best global model which can exactly simulate the actual situation will be ideal global model (IGM), in which the values of or of all leaks are assumed to be known, as well as their locations. However, the IGM cannot be solved theoretically, as the number of unknowns is at least two times greater than the number of leaks, and the known conditions are very limited. In order to simulate the actual situation, the meritorious global models were developed to simplify the IGM, the degree of simplification can be classified as follows:

Conventional method model: the leakage discharge is proportional to the total supply flow without a leakage pressure dependency relationship, it does not have the function of evaluating leakage reduction in pressure management project (Wu

*et al.*2011).Uniform leakage coefficient model: the leakage is allocated at all nodes and the leakage coefficients of all leakages are uniform (Giustolisi

*et al.*2008).Assorted leakage coefficient model (or minimum night flow (MNF) based leakage distribution model), the pipes are classified in accordance with the character information, the leakage weighting factor is used to determine the leakage coefficients (Burrows

*et al.*2003).Proportional leakage coefficient model: the leakage is allocated at all nodes and the leakage coefficient is proportional to the measured consumption, the length of pipe is connected to the leakage node (Almandoz

*et al.*2005).Conjectural leakage coefficient model (or bursts and background estimates (BABE) concept model): this attempts to figure out every burst and background leakage, the method is similar to component analysis (McKenzie & Seago 2005; Yuan

*et al.*2011).Equivalent single leak model: this considers all the leakage in the system as one single leakage, it has the same expression as Equation (5), where the pressure head

*h*should be changed to the average pressure head over the system (Thornton & Lambert 2005).

The global models mentioned previously generally take the total leakage volume as a known condition by the method such as top-down real loss assessment proposed by Wu *et al.* (2011), a uniform is assumed to solve the only unknown quantity by the method of inverse analysis, until the leakage discharge at MNF hour can match itself obtained from the approach of MNF test. However, the leakage discharge of MNF test has a feckly large error, so the simulation accuracy of the models cannot be evaluated. In addition, the afore mentioned global models build their structures under the view that system leakage is made up of all individual leaks, so the certain and monotonic relationship between leakage discharge and pressure head in the local models of Equations (1)–(5) will be reflected in the system. However, studies have found the uncertain relationship, which was interpreted by the hysteresis of viscoelastic behavior (Ferrante *et al.* 2010; Ferrante 2012; Massari *et al.* 2012), so theoretical defects must exit in the above global models. Furthermore, because of the complicated relationship between leakage discharge and pressure head, it is difficult to determine the location and magnitude of the leakage. Many unreasonable assumptions have been inevitably introduced into some global models, so now it is still a big challenge to establish a global leakage discharge analysis model in practice (Wu *et al.* 2011).

The aim of this paper is to propose a new idea and methods for modeling global leakage discharge. The model is expected to avoid the complicated relationship and reflect the uncertain relationship, thus improving the simulation accuracy.

## METHODOLOGY

### Theory

*X*is the observed signals matrix (a known quantity),

*A*is the mixing matrix (an unknown quantity) and

*S*is the source signals matrix (an unknown quantity). For Equation (6), there are infinite numbers of solutions. But there will be a unique solution under a restriction of specific target function, the form of the solution is as follows: where

*Y*is the separated signals matrix and

*W*is the separation matrix.

### Determination of observed signals and source signals

*et al.*2011). Therefore, the total water supply flow can be divided into two parts, leakage discharge and actual consumed water flow. Leakage discharge corresponds to real losses and actual consumed water flow corresponds to the other three components. It shows the following relationship in steady state: where is the total water supply flow, is actual consumed water flow and is leakage discharge. and in time series are considered as two source signals, neither of them are separately measurable. can be greatly affected by pressure head, the change of is reflected in theoretically, so and pressure head at the entrance in time series are considered as two observed signals in the research.

### Algorithm selection

There have been three types of algorithms to solve BSS problem: independent component analysis (ICA) algorithm, sparse component analysis (SCA) algorithm and non-negative matrix factorization algorithm (Yu & Hu 2011). It is known that the value of and is always positive, so it is difficult to achieve the required sparsity of the SCA algorithm. Pressure head also varies with the node demand, causing the change of leakage discharge. Therefore, there is a first-order statistical correlation between and . As a representative of ICA algorithms, constrained independent component analysis (CICA) algorithm is selected. This takes the maximal negative entropy as the objective function, the solving method is an iterative algorithm based on Lagrange function analysis (Yu & Hu 2011). The advantages of CICA algorithm applying to water distribution system are as follows. The algorithm can use some a priori informations to determine the order of the source signals, thus it changes a blind issue to a semi-blind issue; the algorithm has a good applicability, even though and are not statistically independent.

### Approach of solving the source signals’ order uncertainty

where: is the reference vector of ; is the reference vector of ; is the rounding function; is a function to get the value in the *m*-th position when *x* sequence is arranged in descending order; and are dividing factors of ; is the pressure head at the entrance of WDS; and are dividing factors of ; *T* is the total number of observed signal sequence; and *t* is the time-varying order number, .

The separated source signal provides waveform information only because of the uncertainty of source signals’ amplitude. Therefore, the physical meaning is lost. This work employed flow balance equations to solve the standard deviations of two kinds of flows and to obtain the mean of these by the methods proposed in other global models.

### Evaluation method of waveform information's accuracy

## MODELING

Both the CICA model and amplitude solving model are designed in the new global model. CICA model is designed to obtain the waveform information of source signals, and amplitude solving model is designed to obtain the real amplitude. The global model is defined as trend and amplitude (TAA) model. Because CICA algorithm cannot solve all the BSS problems, the conditions needed to be judged whether CICA algorithm is applicable in the WDS's leakage discharge analysis.

### Judgment of CICA algorithm's applicability

A premise of CICA algorithm is that no more than one source signal is Gaussian signal. Because both source signals in WDS are not separately measurable, the historical data of either source signal is absent, and it is impossible to construct a probability density function of either source signal. Determining the statistical distribution of the source signal is difficult by using sample data or probability density function. However, the sum of two source signals is measurable by Equation (8). The is assumed to be a non-Gaussian signal, the derivation process is shown as follows: Known conditions assumed: The total water supply flow is a non-Gaussian signal. Derivative process: Assume that both source signals are Gaussian signals; The linear combination of Gaussian signals is still a Gaussian signal, then the following conclusions can be obtained: The is an Gaussian signal according to Equation (8). However, the conclusion is contrary to the known conditions. So we can determine that if is a non-Gaussian signal, no more than one source signal is Gaussian signal.

Thus the CICA algorithm could be used to separate the source signals from the observed signals when is a non-Gaussian signal in the WDS.

### CICA model

*X*is whitened, is the whitening matrix of

*X*. The separation model of BSS by CICA algorithm is as follows: where

*y*is the simulated source signal in time series, it is an estimation of one source signal, and or , dimensionless;

*w*is the transpose of one row of the matrix

*W*, or and , , and are the elements of

*W*. And the non-square non-linear function is chosen as: Approximate expression of negative entropy is chosen as follows: where is the function used for solving negative entropy, is a positive constant, is a random Gaussian random variable with a zero mean and unit variance. The objective function is as follows: and the constraint function is as follows: Equation (18) is solved by the Newton iterative method as Equation (20), and the iteration stops when Equation (21) occurs. where: is learning rate; is the covariance matrix of ; is the first derivative of Lagrangian function to

*w*; or is Lagrange multiplier; is the first derivative of ; is the second derivative of ; is mean square error norm and ;

*r*is reference vector, or ; is the threshold; and is the scalar processing parameter.

### Amplitude solving model

*y*and that of

*s*are identical, then Equation (22) is derived. where: is the simulated in time series, dimensionless; is the simulated in time series, dimensionless; is true in time series; and is true in time series. Then the simulated source signals are converted as Equation (23). where is the average of and is the average of . It is known that the variance of simulated source signal is 1 according to CICA algorithm, so and are the sequences with an average value of 0 and a variance of 1. Then Equation (24) is obtained. where: is the standard deviation of ; is the mean of ; is the standard deviation of ; and is the mean of . The four parameters of , , and are considered as unknowns, then the equations of flow balance in the WDS could be expressed as follows: where is the real total water supply flow at time

*t*, which is a observed value by real-time flowmeter. There is a relationship shown in Equation (26) in the WDS. where is the average of . Then is replaced by and substituted in Equation (25), the equations are as follows: Equation (27) is a multi-element coupled linear overdetermined equation; its coefficient matrix is a column full rank matrix, so there is a unique least-square solution. Then the values of and are solved out. Additionally, the values of and are obtained from the water balance analysis report. After obtaining the values of four unknowns according to Equation (24), the amplitudes of source signals are reverted.

## OPTIMIZATION AND VALIDATION OF TAA MODEL

### Experiment and data acquisition

As mentioned above, it is impossible to get the sample data of leakage discharge respectively or actual consumed water flow in an actual WDS. However, the data are urgently needed to optimize and verify the TAA model. As a remedy, this paper gets the sample data from three places: the Leakage Test Laboratory of WDS in Harbin Institute of Technology (as shown in Figure S1 in the Supplementary Information, available with the online version of this paper); an emulational WDS (as shown in the Supplementary Information, which is generated by software EPANET 2.0); and an actual WDS (as shown in the Supplementary Information).

### Optimization of TAA model

The optimization of TAA model is based on the data of the laboratory test; the optimization goals are to maximize two similarity coefficients, respectively. For one laboratory test in a single-entrance multi-leaks ring WDS, the maximum similarity coefficient can reach about 0.95 for leakage discharge, the average of the absolute value of the relative error is about 3.2% for and about 4.3% for , the optimized parameters are shown in Table 1.

Parameter name . | Value/value range . |
---|---|

0.15–0.5 | |

0.15–0.5 | |

0.15–0.4 | |

0.15–0.4 | |

10^{−9}–10^{−7} | |

0.1–0.6 | |

200 | |

or | No specific requirements |

No specific requirements | |

No specific requirements |

Parameter name . | Value/value range . |
---|---|

0.15–0.5 | |

0.15–0.5 | |

0.15–0.4 | |

0.15–0.4 | |

10^{−9}–10^{−7} | |

0.1–0.6 | |

200 | |

or | No specific requirements |

No specific requirements | |

No specific requirements |

### Validation of TAA model

All the values of parameters refer to Table 1 and the values are fixed. The performance of TAA model is shown in Table 2.

Source of sample data . | Type of the WDS . | No. of steady states . | Similarity coefficient . | Range of relative error/% . | Average of the absolute value of the relative error/% . |
---|---|---|---|---|---|

Leakage Test Laboratory of WDS | Single-entrance multi-leaks ring | 24 | 0.9478 | (−6.03, 9.69) | 3.61 |

Single-entrance multi-leaks branched | 24 | 0.9664 | (−9.14, 12.33) | 4.90 | |

Single-entrance multi-leaks ring, with a large transmission flow | 24 | 0.8855 | (−13.88, 9.79) | 4.53 | |

An emulational WDS | Single-entrance multi-leaks ring, with a large transmission flow | 96 | 0.9312 | (−13.60, 7.29) | 2.71 |

An actual WDS | A short-distance water transportation pipeline | 24 | 0.8909 | (−4.95, 1.34) | 0.71 |

Source of sample data . | Type of the WDS . | No. of steady states . | Similarity coefficient . | Range of relative error/% . | Average of the absolute value of the relative error/% . |
---|---|---|---|---|---|

Leakage Test Laboratory of WDS | Single-entrance multi-leaks ring | 24 | 0.9478 | (−6.03, 9.69) | 3.61 |

Single-entrance multi-leaks branched | 24 | 0.9664 | (−9.14, 12.33) | 4.90 | |

Single-entrance multi-leaks ring, with a large transmission flow | 24 | 0.8855 | (−13.88, 9.79) | 4.53 | |

An emulational WDS | Single-entrance multi-leaks ring, with a large transmission flow | 96 | 0.9312 | (−13.60, 7.29) | 2.71 |

An actual WDS | A short-distance water transportation pipeline | 24 | 0.8909 | (−4.95, 1.34) | 0.71 |

## DISCUSSION OF TAA MODEL

For the TAA model, its input parameters are , and (the physical loss volume in time ), the output parameters are the and in time series. Fortunately, all the input parameters can be easily obtained from the supervisory control and data acquisition system and the water balance analysis report.

*W*was determining a new expression of the system's leakage characteristics. Because the head pressure at each node is needless, it is unnecessary to take a lot of work for a WDS hydraulic model.

In the TAA model, although the pressure head is kept constant, the output parameter will vary with . In other words, the amount of change will be reflected in . Thus, TAA model has a theoretical innovation that the uncertain relationship has been expressed.

Table 1 displays that all the optimized parameters have a reasonably large range or are not rigorous, which proves that the CICA algorithm has a fairly good applicability to separate from . More importantly, none of the parameters are the system's property parameter, so the optimized CICA model can be transfered to another WDS without changing the value of any parameter. This advantage can be well explained by the BSS theory. According to BSS theory, the information of property of the system does not need to be known, as what is used in the model is the relationship between different source signals, especially the signal's high-order statistical characteristics.

Table 2 shows the performance of TAA model in analyzing leakage discharge of some WDSs; all similarity coefficients are more than 0.88, all the averages of the absolute value of the relative error are no more than 4.9%, and the absolute value of the maximum error is only about 13.9%, that proves the TAA model has high simulation accuracy and a good transplant performance.

Unlike some global models, TAA model has no subjective evaluation of any parameter, the simulation accuracy will be theoretically high enough. However, there is still an error of about 5.0%, and the error comes from two parts when the observation errors are ignored, especially for the emulational WDS without any observation error. One part is the inaccuracy of trend from the CICA model. It seems to be a mistake to assume a linear mixing of the source signals, for it is not a linear relationship between leakage discharge and pressure head. However, the linear mixing does not have to be rebuked excessively, for it is just used to obtain the trend but not the amplitude. More importantly, the observed signals matrix has been whitened in Equation (13) before the separating process, the linear relationship will be destroyed in the process of solving the amplitude. Additionally, we recognize a better non-linear fixing relationship, but the relationship is extremely difficult to express because of the complex relationship. The other part may be the inaccuracy of equations' solution from the amplitude solving model. Accordingly many constraints including the implicit relationship between two standard deviations are adopted to solve the multi-element coupled linear overdetermined equations. In a word, the simulation accuracy can be accepted in engineering by a series of measures.

In addition, the reference vector is very important not only to determine the source signal's order, but also to restrict the relative size of the source signal. As shown in Equations (9) and (10), we can have such consensuses: The will be larger when is larger, or smaller when is smaller, because is the main component of in most actual WDSs. The probable monotonically increasing relationship between the global and can be believed, so can generally describe the trends of . Therefore, the reference vector cannot present the accurate trends of source signal, so the threshold is adopted to measure the gap between the reference vector and real source signal, and this needs to be optimized. Additionally, the reference vectors may be better constructed to obtain higher similarity coefficients by other methods.

## CONCLUSIONS

The CICA algorithm of BSS theory was adopted into WDS. A global leakage discharge analysis model, the TAA model, is established to separate the leakage discharge from the total water supply flow. The model can be well applied in the WDS with one entrance only when the total water supply flow follows the non-Gaussian distribution. The TAA model can effectively avoid the complexity (and reflect the uncertainty) of the relationship between leakage discharge and pressure head. Moreover, the simulation accuracy is high enough and the optimized model has a good transplant performance in different WDSs even without a WDS hydraulic model. This research attempts to use the signal's higher-order statistical characteristics but not the system's characteristics to achieve the desired objectives, and the BSS theory is expected to solve more problems about WDS.

## ACKNOWLEDGEMENTS

This research is supported by a Marie Curie International Research Staff Exchange Scheme within the 7th European Community Framework Programme – SmartWater (318985) and National Natural Science Foundation of China (51278148). The TAA model in the actual WDS was validated by the team of ‘Research and Application of Urban Water Supply Network Leakage Detection and Control Technology’ program (2011A090200040).