The lack of geometrical and hydraulic information about sewer networks often excludes the adoption of in-deep modeling tools to obtain prioritization strategies for funds management. The present paper describes a novel statistical procedure for defining the prioritization scheme for preventive maintenance strategies based on a small sample of failure data collected by the Sewer Office of the Municipality of Naples (IT). Novelty issues involve, among others, considering sewer parameters as continuous statistical variables and accounting for their interdependences. After a statistical analysis of maintenance interventions, the most important available factors affecting the process are selected and their mutual correlations identified. Then, after a Box-Cox transformation of the original variables, a methodology is provided for the evaluation of a vulnerability map of the sewer network by adopting a joint multivariate normal distribution with different parameter sets. The goodness-of-fit is eventually tested for each distribution by means of a multivariate plotting position. The developed methodology is expected to assist municipal engineers in identifying critical sewers, prioritizing sewer inspections in order to fulfill rehabilitation requirements.
pipe age (-)
pipe width (m)
cover depth (m)
equivalent diameter (m)
theoretical pdf (-)
pipe height (m)
pipe length (m)
mean square error
number of failure events (-)
number of variables (-)
square sum of errors
shape factor (-)
Box-Cox transformation parameter
The following subscripts and superscripts are used in this paper:
A sewer network typically reproduces the street planimetry, resulting in a complex and widespread infrastructure which needs specific maintenance planning and scheduling. Because of its extreme diffusion in urban areas (Ariaratnam et al. 2001), failures in a sewer network can cause considerable damage, such as traffic disruption, sinkholes, back-ups, spills and flooding, and corruption of the nearest water bodies. All considered, the in-charge authorities must develop managing strategies which can guarantee an efficient service to population.
The field of asset management has overcome a paradigm shift from an a posteriori approach, responding to failures with rehabilitation and replacement projects, to an a priori approach, predicting failures before they occur and mitigating the risk through risk assessment and preventive maintenance strategies (Allbee & Byrneb 2009). A successful asset management program should provide prioritization strategies for funds management by adopting predictive tools to anticipate sewers failure and to assess the risks associated with such failures (Fenner 2000; WRC 2001).
A large number of papers deal with the quantification of the deteriorating process of infrastructures, especially involving wastewater; such deterioration models must provide a predictive tool to assess the failure probability of sewers at any given time. However, sewer failures result from a complex process that is not only time-dependent but it is affected by other parameters whose influence on failure probability must be carefully determined (Davies et al. 2001a; Baur & Herz 2002; Hahn et al. 2002). Such factors can be roughly separated in two groups relating to structural and hydraulic deterioration, respectively (Davies et al. 2001a; Wirahadikusumah et al. 2001; Del Giudice & Farina 2007). The structural deterioration involves the weakening of pipe structural integrity resulting in an eventual collapse, whereas hydraulic deterioration refers to the reduced ability of the sewer to transport sewage resulting in surcharges, spills, or flooding. Various sewer deterioration models have been used in the literature to assess the condition of sanitary and storm sewers; examples are statistical, deterministic and artificial intelligence models (Kleiner & Rajani 2001; Savic et al. 2006; Tran et al. 2006; Berardi et al. 2008; Yamijala et al. 2009; Khan et al. 2010). Most of them imply records of pipe failures over a number of years for the prediction of pipe deterioration due to the aging process and consequent failure rates; however, time-dependent data are difficult to achieve, so that they often prove unavailable for sewer networks (Egger et al. 2013). Furthermore, the wastewater infrastructure building process usually spans over the centuries with limited or no data available; another problem is the absence of a standardized method for the description of sewer conditions apart from those involving expensive CCTV inspections, regulated by international norms such as EN 13508-2 (CEN 2003), so that in many countries there is a general lack of systematic information about sewer networks (Fenner 2000). In turn, the lack of data contributes to the lack of available modeling tools to predict failure patterns to assess the risks associated with the physical damage and the consequent disruption of service.
Because of the above-mentioned problems, failure models are often requested which can account for easily collectable physical data concerning the asset and generic information about historical records of pipe failure events. The present paper provides an analysis of Naples combined sewer network (Italy): for this system both an asset database and a failure database are available for the development of a model aiming to locate the sewer branches prone to failure by means of a statistical analysis of failure records. The proposed model considers sewer information in the failure dataset as a set of statistical variables. As a novelty element, the proposed analysis is possible when a correlation between variables is present, which means the hypothesis of stochastic independence is violated. Moreover, the parameters will be treated as continuous variables, whereas several statistical models can be found in the literature that require a division in classes, so that sewer parameters must be considered as discrete variables (Lei & Saegrov 1998; Caruso et al. 2002; Savic et al. 2006; Wright et al. 2006).
DATABASE OVERVIEW AND PREPROCESSING
In order to obtain a more complete analysis, an additional sewer factor should be considered, namely sewer age, which is an important parameter in the deterioration process of a conduit (Davies et al. 2001a). Age is presumably related to the construction material used for sewers over time, shifting from stone to plastic; it also implies changes in the shape of sewer sections. No information about age is reported in the AD, so that a rough approximation is held by giving each sewer an age referred to the construction period of the corresponding urban area (Davies et al. 2001b; Ahmadi et al. 2014): in the city of Naples three different urban expansion periods can be found, corresponding to years <1900, 1900–1950 and >1950, respectively. For the sake of mathematical analysis, a polytomous age variable a was considered, equal to 1 when construction period was before 1900, equal to 2 when construction period was between 1900 and 1950, and equal to 3 when construction period was after 1950.
Frequencies in Figure 1 show a deep asymmetry with respect to all the considered sewer parameters: about 90% of the whole network extension is characterized by diameters ranging from the minimum to 2 m, cover depths up to 6 m, slopes up to 0.02 m/m, and shape factors up to 1.5. As concerns age, 60% of sewers belong to the third class (>1950).
Information concerning failure events was provided by the Sewer Office of the Municipality of Naples in the shape of an incident database (ID) covering years 2002–2011; the database contains, for each record (total 914 records), the event id, date and geographic point location of the failure; then, each point was associated with the corresponding pipe and its physical characteristics by means of a geographical information system (GIS) tool. Database only refers to ordinary maintenance operations due to blockage events: they will be referred to as ‘failure events’.
Figure 1 shows the distribution of physical parameters within the ID. Histograms show that the highest failure frequency, specifically the ratio of the number of failure events for each class to the total number of failures, occurs for small diameters, small slopes, small depths and for high rectangular cross-sections, showing a deep asymmetry in the frequency distribution of events for each of the considered parameters. It should be noted that the pipes provided with these features also have the largest number of occurrences in the AD. Similarly, almost 60% of failure records belong to the most recent pipe class, but this is merely because class 3 is also the more diffusely spread in the city.
Statistical distribution of failure frequency
It is possible to conceive the incident dataset as a failure statistical sample; for each parameter a theoretical pdf f(xi) can be conceived that best suits the observed one, with i = 1,…p being p the number of considered parameters. However, this procedure is not enough to characterize failure frequency as there could be interactions among the parameters. Thus, each f(xi) can be considered as a marginal pdf for a multivariate distribution referring to the vector of variables ; if variables are independent, the multivariate pdf can be computed as the product of marginal pdfs. If a correlation exists, the joint pdf specifies in a more complex expression which must take the variance-covariance matrix into account. To understand about possible correlations between pairs of variables it is more convenient to analyze the matrix of Pearson indices which gives dimensionless correlation estimates.
In order to obtain information concerning sewer system failure probability, a multinormal distribution function can be tested, since such a model can account for dependences among variables by means of the variance-covariance matrix. If original variables have a different marginal pdf each, it is possible that a normal model does not fit with their multivariate distribution. However, if single variables have normal marginal pdfs, the possibility that their multivariate pdf is normal increases. To facilitate this, the Box-Cox transformation (Box & Cox 1964) can be applied to each sewer parameter: this allows for representing the marginal pdfs as normal probability distributions, and the normality of each parameter distribution can be tested by means of a Q-Q plot. Further, the bivariate normal distributions for each possible pair of parameters are assumed and tested by plotting contour lines of the bivariate normal density functions. Finally, a joint p-variate normal distribution is adopted; a normality test can be conducted by means of a Chi-squared plot (Johnson & Wichern 2007).
Marginal probability distributions
The normality of transformed data can be tested for each variable by means of a Q-Q plot which compares the theoretical and the observed quantiles, involving the mean and standard deviation of transformed data; normality is confirmed if the fitting line of data resembles the 1:1 line.
Joint bivariate and p-variate normal distributions
Once the marginal distributions for each parameter are confirmed normal, an intermediate step consists of testing the normality of the bivariate distributions corresponding to each possible pair of transformed variables in the case study (Johnson & Wichern 2007). The test consists of the observation of data scatter plots; for each pair, normality is proved if about 50% of the data are included within the ellipse corresponding to the 50th percentile of a Chi-squared distribution with 2 degrees of freedom, and, simultaneously, about 95% of the data are included in the ellipse corresponding to the 95th percentile of the same distribution (Johnson & Wichern 2007). Note that for a Chi-squared distribution and .
The final step of the procedure consists of verifying the hypothesis of a multivariate normal distribution accounting for all the p variables altogether; the test can be performed by evaluating the multivariate generalized distances by means of Equation (4) with a variance-covariance matrix including the whole set of available parameters; then, the computed distances are compared with the theoretical quantiles of a Chi-squared distribution with p degrees of freedom by means of a so-called Chi-squared plot (Johnson & Wichern 2007). The test is passed if data have a fitting line that resembles the 1:1 line; also, 50% data must have and 95% data must have . Note that for a Chi-squared function with five degrees of freedom and .
Adaptation to data and goodness-of-fit measures
Different multivariate distributions can be obtained by varying the set of involved variables. This can be done in order to investigate about the amount of information obtained by adding explanatory variables to the model, since this cannot be inferred by simply evaluating the goodness-of-fit of multivariate Chi-squared plots.
CASE STUDY AND DISCUSSION
To prove the robustness of the proposed methodology, calibration of the multivariate model was done by using a split sample technique. The original ID was randomly divided into a calibration sample (N = 460) and a validation sample (N = 454). The former was used to compute transformation coefficients λ, and all the normality tests were performed as discussed in previous sections. The latter was treated by using the same λ values, along with the mean vector and the variance-covariance matrix of the calibration sample, to perform all the previously described normality tests. Table 1 provides the mean vector and the Pearson matrix of the whole incident dataIDset. A deep correlation between diameter and cover depth can be found: this was expected since larger pipes require deeper excavation operations in order to be laid. Small dependences can also be found between slope, age and shape factor.
|Variables mean and λ values||Pearson matrix|
|Variables mean and λ values||Pearson matrix|
As concerns bivariate distributions, Table 2 shows percentages of data laying within 50th and 95th quantile ellipses. Values confirm the assumption of normal bivariate distributions for each pair of variables, except for the pair de-a. For this pair, the first percentage is considerably lower than 50%, whereas the second percentage is higher than 95%; this was not unexpected, since the concept of age district is a rough way of accounting for pipe age.
|Calibration sample (%)||Validation sample (%)|
|Calibration sample (%)||Validation sample (%)|
|Model||p||μresiduals||σresiduals||SSE × 102||AIC|
|Model||p||μresiduals||σresiduals||SSE × 102||AIC|
Basing on an AD and failure events records of the sewer system in the city of Naples (Italy), only containing basic physical information about sewer pipes, a statistical model is provided that allows for the estimation of failure probability and the location of critical sewers. The model can be successfully coupled to an intervention strategy aimed at optimizing the allocation of economic resources for the management of a sewer network. In the case study, marginal cdfs are lognormal, whereas the best fitting occurs when the joint failure model neglects age information.
Within the statistical framework, compared to literature regression models, the proposed procedure does not need a division of data in classes, so that computing is straightforward and each variable acts in the model with its own value; also, dependences among parameters can be taken into account by means of the variance-covariance matrix. Moreover, the model is flexible since it can be applied with any number p of variables, just requiring an increasing number of marginal and bivariate normal tests. Conversely, a considerable drawback is that the model is stationary in the sense that it does not account for variations in time; also, the ID covers a short number of years. Consequently, failure probability estimates should be considered as a short-term provision. Another marginal drawback is that the model does not automatically update sewer conditions by taking into account historical records about interventions: this fail is negligible for ordinary maintenance operations, but could invalidate failure probability estimates for extraordinary repairs, often entailing entirely substituting pipes. In this case, the replaced pipes should be removed from the database as a preliminary operation.