The planning and scheduling of maintenance operations of large conventional sewer systems generate a complex decision-making environment due to the difficulty in the collection and analysis of the spatiotemporal information about the operational and structural condition of their components (e.g. pipes, gully pots and manholes). As such, water utilities generally carry out these operations following a corrective approach. This paper studies the impact of the spatiotemporal correlation between these failure events using Log-Gaussian Cox Process (LGCP) models. In addition, the association of failure events to physical and environmental covariates was assessed. The proposed methods were applied to analyze sediment-related blockages in the sewer system of an operative zone in Bogotá (Colombia). The results of this research allowed the identification of significant covariates that were further used to model spatiotemporal clusters with high sediment-related failure risk in sewer systems. The LGCP model proved to be more accurate in comparison to those models that build upon a fundamental assumption that a failure is equally likely to occur at any time regardless of the state of the system and the system's history of failures (i.e. a homogeneous Poisson process model).

Urban drainage systems provide fundamental services to the health and well-being of populations, as well as to the protection of the environment (Santos et al. 2017). However, it is well known that wastewater and urban runoff transport a variety of solid particles that can create obstructions in the sewer system, thus deteriorating its operation. Sediment-related blockages may result in the release of raw sewage into the environment due to a reduced hydraulic capacity that increases the probability of local surcharge and/or combined sewer overflow events (Xie et al. 2017). The malfunctioning of the sewer system and subsequent flooding constitute a risk to human health as a result of exposure to potential water-borne pathogens (ten Veldhuis et al. 2010; Collender et al. 2016). Furthermore, the release of untreated wastewater (e.g. via sewer overflows) into water bodies such as rivers and lakes negatively alters the water quality and health of fluvial and lacustrine ecosystems, affecting fish and other types of aquatic life (Munawar et al. 1993; Rodríguez-Jeangros et al. 2018). In general, the consequences of these contaminated water flow releases range from mild disruption for a small number of residents to events that may seriously damage the environment (Xie et al. 2017).

Urban drainage services account for a significant portion of the built infrastructures, which require large investments for their construction, protection, and maintenance (Santos et al. 2017). In different studies, it has been stated that blockage events account for a significant proportion of reported failures in sewer systems (Jin & Mukherjee 2010; Xie et al. 2017). These blockages are normally detected when a component of the system, or the system as a whole, stops performing as expected (Santos et al. 2017). Although many sewer system failures triggering mechanisms have been identified, sediment-related sewer blockages often appear at random (Rodríguez et al. 2012). Thus, it is necessary to improve our understanding of the processes involved in the system's deterioration to support its maintenance and guarantee adequate levels of performance (Torres et al. 2017).

Urban water management generates a particularly complex decision-making environment that is subject to engineering, social, environmental, and economic constraints (Makropoulos et al. 2003). The latter is particularly marked when planning sewer maintenance operations: as a consequence of the subterraneous nature of the system and due to time and cost constraints, it is normally unfeasible to collect comprehensive information about the state of its components (Fenner 2000; Yang & Su 2007). Although the amount of data relevant to sewer asset management and its variety is expected to increase in the near future (e.g. new low-cost sensors, growing use of the Internet of Things, and the potential of social media platforms to report different types of problems) (Sirkïa et al. 2017), the lack of knowledge of the current state of the system is the reason why most water utilities carry out maintenance operations with a corrective approach (Abraham et al. 1998).

In the case of corrective maintenance, the system is exposed to overflows and flooding until the maintenance operation is effectively performed (Fontecha et al. 2016), thus generating cost overruns with respect to preventive maintenance. In particular, such cost overruns include, but are not limited to: costs for service disruptions, adverse publicity costs, and costs associated with public health and safety problems (Korving et al. 2006; Rodríguez et al. 2012; Hickcox 2013). Thus, the prediction of blockages plays an important role in the management of the infrastructure of urban water systems. Deterioration models of sewer systems can be classified into two main groups: (a) physically based, and (b) statistical models. Physically based models are developed based on the understanding of the physical mechanisms that govern sewer system deterioration. However, scarcity of data about local conditions leads to great uncertainty about the suitability of these models (Fenner et al. 2007). Consequently, most deterioration models are of the statistical kind based on observations of failure events (Rodríguez et al. 2012).

Much of the research on sewer blockages has either focused on predicting mean blockage rates and on examining how these failure rates change over time (Savic et al. 2006; Ugarelli et al. 2009; Jin & Mukherjee 2010; Rodríguez et al. 2012; Post et al. 2016; Santos et al. 2017; Xie et al. 2017), or on representing the spatial distribution of sewer failures and/or flood hazards (Caradot et al. 2011; Cherqui et al. 2015; Post et al. 2017). However, blockages come in clusters, perhaps due to weather conditions, or maybe simply because releasing sediment from one part of the system leads to a knock-on blockage in another structural unit (Rodríguez et al. 2012). Despite the fact that sediment-related blockages in sewer systems occur in a spatiotemporal context, to the best knowledge of the authors of this paper, there is no previous research on modeling the intensity of these failures and relative risk as continuous phenomena in both time and space, considering that there may be a statistical residual component of spatiotemporal variation in these response surfaces that will only be captured by including in the model one or more latent stochastic process (Taylor et al. 2015).

This paper proposes a modeling approach to assess the impact of the spatiotemporal correlation between sediment-related blockages considering those factors that are associated with spatiotemporal clustering of failures, which may be physical properties of the sewer system or characteristics of the location where the failure event occurs. The blockage data is modeled as spatiotemporal point process by means of a spatiotemporal Log-Gaussian Cox Process (LGCP). The principal advantage is that LGCP provide a flexible and relatively tractable class of empirical models for describing spatially and temporally correlated phenomena (Diggle et al. 2013).

Log-Gaussian Cox Process

The LGCP is a class of statistical models for spatial and spatiotemporal point processes. Contrary to other point process models, the LGCP does not rely on a random contagious mechanism to create clusters and spatial correlations. Instead, it uses a stochastic intensity function to capture the aggregation of events driven by environmental conditions. The resulting process is extremely flexible as it enables the presence of both fixed and random effects in the space-time behavior (Taylor et al. 2013). It can be stated as a generalization of the Poisson process, but allows more realistic correlations between non-overlapping spatial/temporal regions and the incorporation of covariables effects.

To define the LGCP process, let N(A, T*) be the number of events located inside a region and in the interval time T* = [t1, t2] T, where S and T are the region and horizon of study, respectively. Then, conditional on the intensity function R(s, t), this random variable follows a standard Poisson process, that is,
(1)
where the intensity R(s, t) is considered a positive stochastic function on (Cox process), and r(s, t) is a particular realization of this function. The LGCP models R(s, t) as the logarithm of a Gaussian process which, allowing the incorporation of spatiotemporal covariates, results in an intensity function that can be expressed as (Møller et al. 1998):
(2)
where (s, t) is a known offset that represents some measure of exposure, Z(s, t) are the covariates (explicitly measured in space and/or time); and Y(s, t) is the latent Gaussian process on . Note that Y(s, t) is the component that makes the intensity a non-deterministic function. In the stationary case, the process Y is a second-order process with constant mean and parameterized spatiotemporal covariance function, which is assumed to be separable; that is,
(3)

The parameter represents the variance and is the scale parameter of spatial dependence, which determines the strength of the spatial correlation. is the scale of temporal dependence, which determines the temporal correlation in Y. For convenience, we use the standard approach of parameterizing (Diggle et al. 2013; Taylor et al. 2013, 2015), that makes . This parameterization allows for to be interpreted as covariate-adjusted relative risk.

The parameters in the model are (the vector of covariate effects), and , the parameters of the spatiotemporal Gaussian process Y transformed onto log-scale (Taylor et al. 2015).

For computational estimation, a discrete representation of space and time is necessary. The discretization of the space is performed considering squared cells and the period of reference is divided into intervals of the same size. Implicitly, it is assumed that the intensity is constant or presents little variation within each cell (voxel). In this study, the authors consider spatial cells according to the strength of the spatial correlation, whereas time intervals coincide with the months in order to match the measurement of some temporal covariates.

Bayesian inference

Given the observed data of the point process X =(x1, …, xn), where xi=(si, ti), and the measurements of covariates Z(s, t); the inferential problem is to estimate the unknown parameters and . There are three main methodologies for inference of LGCP in the literature (see e.g. Diggle et al. (2013)): moment based estimation, maximum likelihood, and Bayesian inference. The first one is considered a simplistic approach rarely used in applications. Both maximum likelihood and Bayesian estimation require the joint probability distribution of the data points, which considering the double stochastic nature of the model, is not easy to optimize. That is, the price of having a flexible probabilistic generation of the point process implies that the likelihood is derived as:
(4)
where the integral on the right is made over the random intensity function that is, by definition, an infinite dimensional probability measure. This difficulty results in the use of numerical methods that project the continuous process on a discrete representation of time and space.
In this paper, Bayesian estimation is used given the complications of the maximum likelihood direct approach. Additionally, using a computational approximation of the posterior parametric distributions has several advantages, such as being able to incorporate the predictive probability model of new data points and forecasting the process during the periods that follow the observational window. In the Bayesian approach, Y, and are considered random variables, and therefore, assigning prior probability densities to and allows to make inferential statements using the posterior (also the predictive) distribution, expressed as:
(5)
where g(,) is the prior distribution and the symbol means proportionality up to a constant.

The complexity of the posterior distribution requires the use of numerical integration by computational methods. A sampling mechanism of the posterior through Markov chain Monte Carlo (MCMC) is used, in particular, selecting the Metropolis-adjusted Langevin algorithm (MALA), which mimics a Langevin diffusion process, and has been recommended by several authors for this type of spatiotemporal models (see, for example, Roberts & Tweedie (1996) and Brix & Diggle (2001, 2003)). Details about the computational implementation are presented in Appendix A (available with the online version of this paper).

Predictive analysis

The Bayesian estimation procedure with a double stochastic process allows the modeler to obtain a posteriori distributions of the whole intensity function given the observed data points. Therefore, it is possible to sample the behavior of Y (s, t) given the data X, and from it construct posteriori distributions of different target parameters of interest. In particular, the predictive functionality is very useful for simulating a new set of data points, affected conditionally by parameters and , which take into account the effects of covariates and spatiotemporal correlations and also the effect of the previous set of points. One of the main advantages of this method is the smooth transition from estimation to predictive analysis, on which the distribution of new m data points X* = (x*1 , . . ., x*m) can be represented as:
(6)

This distribution allows a prediction (probabilistically) of where and when new cases will appear. The number of cases can also be derived directly from the conditional inhomogeneous Poisson (predictive) distribution. Forecasting can also be performed for future periods as a regular temporal process where, conditional to the observations until time T, it is possible to use the temporal correlation structure to predict future events. The prediction and forecasting features of the model will be used to perform validation with real data.

Case study

Bogotá is the capital of Colombia and the country's largest city, with a population of approximately eight million inhabitants. The water utility of the city, Empresa de Acueducto y Alcantarillado de Bogotá (EAB), serves over 1.7 million residential users, approximately 400,000 commercial users, and provides services to 11 neighboring municipalities. In order to provide integral and efficient service to citizens, the water utility divided the city into five operative zones. Each zone is responsible for the operation and maintenance of water supply and sewage networks in a specific area of the city. For a detailed description of Bogotá’s drainage system see Rodríguez et al. (2008). In this research, the proposed statistical analyses methods are evaluated in the case study of sediment-related blockages in the sewer system of Zone 2, which drains an area of approximately 100 km2 by means of both combined and separate sewer systems (see Figure 1).

Figure 1

Case study area in the north of Bogotá (Colombia) (Zone 2 is highlighted) and an example of the spatial and temporal distribution of customer complaints.

Figure 1

Case study area in the north of Bogotá (Colombia) (Zone 2 is highlighted) and an example of the spatial and temporal distribution of customer complaints.

Close modal

Customer complaints database

The record of sewer failures was obtained from a platform managed by Bogotá’s water utility that collects and handles customer complaints. The customer complaints database available for testing the methods proposed in this research corresponds to a period of seven years and four months (from January 2004 to April 2011). For each reported failure, the following information is available: geographic coordinates where the failure was indicated, report date and the system component in which the failure was observed (i.e. pipe, gully pot, or manhole). The database contains 32,097 reports of sediment-related failures that occurred in the case study area during the period of analysis. For a detailed description of the customer complaints database see Rodríguez et al. (2012).

Covariates

Covariates are the factors that are suspected to be associated with the occurrence of sediment-related failures. They include sewer system properties as well as environmental factors. Twenty-one covariates were available and considered for this study, of which 20 were of spatial type and the remaining were of temporal type (i.e. monthly precipitation). Models can also include other temporal explanatory variables, such as soil wetness, water consumption rates, and pipe structural condition, to simulate the time-dependence of hydraulic deterioration. However (both in general and in this case study), such data is not available. Covariates are described in Table 1. Note that categorical covariates (i.e. pipe material and land use) were transformed into continuous covariates.

Table 1

Description of the covariates considered in the analysis

CovariatesDescription
Stormwater pipe length Stormwater pipe length in the cell 
Sanitary and combined pipe length Sanitary and combined pipe length in the cell 
Stormwater pipe diameter Mean stormwater pipe diameter in the cell 
Sanitary and combined pipe diameter Sanitary and combined pipe diameter in the cell 
Stormwater pipe depth Mean stormwater pipe depth in the cell 
Sanitary and combined pipe depth Mean sanitary and combined pipe depth in the cell 
Stormwater pipe slope Mean stormwater pipe slope in the cell 
Sanitary and combined pipe slope Mean sanitary and combined pipe slope in the cell 
Stormwater pipe age Mean stormwater pipe age in the cell 
Sanitary and combined pipe age Mean sanitary and combined pipe age in the cell 
Stormwater concrete pipe Proportion of stormwater pipe length of the cell made from concrete 
Stormwater reinforced concrete pipe Proportion of stormwater pipe length of the cell made from reinforced concrete 
Sanitary and combined concrete pipe Proportion of sanitary and combined pipe length of the cell made from concrete 
Sanitary and combined stoneware pipe Proportion of sanitary and combined pipe length of the cell made from stoneware 
Number of trees Number of trees in a 1 km buffer centered in the cell centroid 
Number of gully pots Number of gully pots in a 1 km buffer centered in the cell centroid 
Number of manholes Number of manholes in a 1 km buffer centered in the cell centroid 
Commercial – services Proportion of the cell area assigned to commercial-services use 
Residential Proportion of the cell area assigned to residential use 
Population Population density in the cell 
Monthly precipitation Average monthly precipitation (in mm) 
CovariatesDescription
Stormwater pipe length Stormwater pipe length in the cell 
Sanitary and combined pipe length Sanitary and combined pipe length in the cell 
Stormwater pipe diameter Mean stormwater pipe diameter in the cell 
Sanitary and combined pipe diameter Sanitary and combined pipe diameter in the cell 
Stormwater pipe depth Mean stormwater pipe depth in the cell 
Sanitary and combined pipe depth Mean sanitary and combined pipe depth in the cell 
Stormwater pipe slope Mean stormwater pipe slope in the cell 
Sanitary and combined pipe slope Mean sanitary and combined pipe slope in the cell 
Stormwater pipe age Mean stormwater pipe age in the cell 
Sanitary and combined pipe age Mean sanitary and combined pipe age in the cell 
Stormwater concrete pipe Proportion of stormwater pipe length of the cell made from concrete 
Stormwater reinforced concrete pipe Proportion of stormwater pipe length of the cell made from reinforced concrete 
Sanitary and combined concrete pipe Proportion of sanitary and combined pipe length of the cell made from concrete 
Sanitary and combined stoneware pipe Proportion of sanitary and combined pipe length of the cell made from stoneware 
Number of trees Number of trees in a 1 km buffer centered in the cell centroid 
Number of gully pots Number of gully pots in a 1 km buffer centered in the cell centroid 
Number of manholes Number of manholes in a 1 km buffer centered in the cell centroid 
Commercial – services Proportion of the cell area assigned to commercial-services use 
Residential Proportion of the cell area assigned to residential use 
Population Population density in the cell 
Monthly precipitation Average monthly precipitation (in mm) 

Covariates data was handled and analyzed using ArcGIS 10.2.2 (ESRI 2014) and R (R Core Team 2015). The effect of pipe material on sediment-related blockages was evaluated by comparing the two most common materials used in the sewer network (i.e. concrete and reinforced concrete for the stormwater collection system and concrete and stoneware for the sanitary and combined system). The impact of land use on sediment-related failures was also evaluated by comparing the two most common uses in the case study area (i.e. residential and commercial). Monthly precipitation data was provided by Instituto de Hidrología, Meteorología y Estudios Ambientales (IDEAM). Because there was available information from only three rainfall stations within the case study area, this feature worked as a temporal covariate.

Model implementation

The dataset that was used in the construction of the model comprises reported failures from January 2004 to February 2011. According to the procedure for Bayesian Modeling of LGCP proposed by Taylor et al. (2015), approximate estimates of the parameters were computed using the minimum contrast method, which resulted in initial values for the MCMC routine of =log(), =log(283) and = log(0.5). The initial values for were not defined; by default, LGCP initialized the MCMC using the estimates obtained from an over dispersed Poisson generalized linear model fitted to the cell counts against the covariates, ignoring spatial correlation. Considering that = 283 m was the approximate distance beyond which the failure events were no longer spatially correlated, it was necessary to select a cell width that captured these dependence properties and also generated an efficient computational grid size. Thus, this value was set as equal to 400 m. Under the assumption that the estimated temporal correlation parameter was approximately 0.55, the temporal correlation in the Y process was expected to drop to 0.04 after six months . Therefore, an observation window width of six months was used, comprising February 2011 and the preceding five months of data.

Multivariate Gaussian prior densities for the parameters and multivariate Gaussian prior densities on the log-scale for the parameters were defined as follows: N (0, 106) (it was a diffuse prior), logN (log (1.2), 0.12), logN (log (700), 0.05) and logN (log (0.2), 0.1). Additionally, it was specified that the spatial dependence properties of the latent field Y should follow an exponential covariance function.

An offset was not defined in this study. By default, LGCP package assigns the value of 1 to (s, t) if the cell of the computational grid falls within the case study area, and 0 otherwise. The extension parameter of the computational grid in the x and y directions was set as equal to 2. The target acceptance probability of the MCMC algorithm was set to 0.574, which would be approximately optimal for a Gaussian target explored by MALA proposal as the dimension of the problem tends to infinity (Taylor et al. 2013, 2015).

The MALA algorithm was run for 2,000,000 iterations, with an initial burn-in of 200,000 iterations, followed by a further 1,800,000 iterations, of which every 3,600th sample was retained so as to give a sample size of 500. For the base model (i.e. model with all covariates), the total computation time was around 4.47 days on a 2.4 Ghz Intel Xeon E5-2695 v2 PC with 64 Gb RAM. For the final model (i.e. model with significant covariates), it was approximately 3.772 days on the same PC.

Convergence and mixing diagnostics of the final model are presented in Appendix B (available with the online version of this paper). These analyses indicate that the chain appeared to have reached stationarity with low autocorrelation. Figure 2 compares the prior (represented by the continuous curve) and posterior (represented by the histogram) distributions of parameters and of the spatiotemporal LGCP model. The plots suggest that the covariate effects are well identified by the data, and with respect to , it is observed that whilst there was information in the data on parameter , the parameters and were not so well identified.

Figure 2

Plots of the prior (represented by the continuous curve) and posterior (represented by the histogram) distributions for parameters and ( for the variance of the model, for the spatial dependence scale, and for the temporal dependence scale).

Figure 2

Plots of the prior (represented by the continuous curve) and posterior (represented by the histogram) distributions for parameters and ( for the variance of the model, for the spatial dependence scale, and for the temporal dependence scale).

Close modal

Significant covariates

The model was first run with all 21 covariates. The decision on which covariates were to play a role in the analysis was based on their statistical significance and the redundant information (multicollinearity) among groups of variables. Of the 21 covariates considered, four were selected for the final model. These were: population density, monthly precipitation, sanitary and combined sewer pipe length, and stormwater pipe diameter. Thus, the final model was run with these four covariates. Table 2 summarizes the parameter estimates from this model (i.e. median, 0.025 and 0.975 quantiles from the posterior density that correspond to the credible intervals).

Table 2

Parameter estimates for LGCP final model: Medians of the posterior distributions and limits of the 95% credible intervals (CRI)

VariableMedianLower 95% CRIUpper 95% CRI
 0.4635 0.3512 0.5747 
 6.7520 6.5091 6.9927 
 0.6194 0.3559 0.9352 
 −14.5874 −16.1129 −12.9713 
 0.0003 0.0002 0.0005 
 0.0001 0.0001 0.0002 
 −0.8339 −1.6113 −0.2001 
 0.0053 0.0025 0.0078 
VariableMedianLower 95% CRIUpper 95% CRI
 0.4635 0.3512 0.5747 
 6.7520 6.5091 6.9927 
 0.6194 0.3559 0.9352 
 −14.5874 −16.1129 −12.9713 
 0.0003 0.0002 0.0005 
 0.0001 0.0001 0.0002 
 −0.8339 −1.6113 −0.2001 
 0.0053 0.0025 0.0078 

These results show that the estimated number of sediment-related failures was higher in places with higher population density and, therefore, in places with higher domestic wastewater discharges. In addition, the estimated intensity was higher in those months in which the precipitation levels were greater, probably due to increased runoff sediment loads into the sewer system. Finally, an increase in the sanitary and combined pipe length and a decrease in the stormwater pipe diameter led to an increase in expected number of blockages which are both expected results.

Intensity and relative risk

Figure 3 presents the resulting maps of the expected number of sediment-related blockage events for the observation window comprising September 2010 to February 2011. The intensity of failures shows a clear spatial and temporal variation. The highest values are concentrated mainly in the center and northwest of the case study area, coinciding with the most populated zones.

Figure 3

Plots of the Poisson intensity of failures. From top left to bottom right: September 2010, October 2010, November 2010, December 2010, January 2011 and February 2011, respectively.

Figure 3

Plots of the Poisson intensity of failures. From top left to bottom right: September 2010, October 2010, November 2010, December 2010, January 2011 and February 2011, respectively.

Close modal

Figure 4 shows the predicted covariate-adjusted relative risk surfaces derived from the model for all six months of the observation window. That is, after removing the effect of the significant variables, there is a relative risk estimation that explains the spatial and temporal expected intensity. These relative intensities may be the effect of different variables, or a product of the natural correlation of the process. These plots reveal some small clusters of raised risk, mainly in the center, northwest, and east of the studied area, which were probably formed as a consequence of the spatiotemporal correlation.

Figure 4

Plots of the covariate-adjusted relative risk. From top left to bottom right: September 2010, October 2010, November 2010, December 2010, January 2011 and February 2011, respectively.

Figure 4

Plots of the covariate-adjusted relative risk. From top left to bottom right: September 2010, October 2010, November 2010, December 2010, January 2011 and February 2011, respectively.

Close modal

Forecasting performance

In this study, Poisson intensity was forecasted for March 2011 and April 2011. The resulting maps are shown in Figure 5. For each month, we measured the spatial forecasting performance of the size of the error using the mean square error (MSE), which was calculated on a monthly basis as the average of the sum of the squares of errors:
(7)
where I is the set of cells, Yit is the actual number of sediment-related failures in cell i occurred during month t, Fit is the forecasted number of failures in cell i during month t, and n is the number of cells.
Figure 5

Plots of the forecasted Poisson intensity of failures for two months: March 2011 (left) and April 2011 (right).

Figure 5

Plots of the forecasted Poisson intensity of failures for two months: March 2011 (left) and April 2011 (right).

Close modal

Given that the actual total number of failures occurred was not high enough (i.e. 396 for March 2011 and 437 for April 2011), a large number of cells of 400 × 400 m2 were not subject to any blockage events, which could affect the results of the performance measure. As a consequence, bigger cells of 800 × 800 m2 comprising four cells of the original grid were created, each of which was assigned values of Yit and Fit equivalent to the sum of the values obtained by the corresponding four cells. Under this scheme, the total number of cells was equal to 1024.

The forecasted total number of failures was of 317 for March 2011 and 419 for April 2011. The corresponding MSE values were 0.891 and 0.837, respectively. For comparison, on the same grid and with the same dataset, a homogeneous Poisson process was also fitted independently in each cell. This baseline model produced MSE values of 0.894 and 0.863 for March and April, respectively, proving that the LGCP can give more accurate predictions given its ability to model the complete stochastic process and explain the covariables effect.

In addition to the MSE measure, the area under the Receiver Operating Characteristic (ROC) curve (AUC) measure was also employed, considering the variable related to the probability that the number of failures in a cell of 800 × 800 m2 is greater than or equal to 1. The AUC values were 0.938 for March 2011 and 0.940 for April 2011 (Figure 6). The Poisson process, per cell, gives almost the same AUC performance.

Figure 6

ROC curves of the forecasted Poisson intensity of failures for two months: March 2011 (left) and April 2011 (right).

Figure 6

ROC curves of the forecasted Poisson intensity of failures for two months: March 2011 (left) and April 2011 (right).

Close modal

Finally, confusion matrices were generated for both months, assuming a threshold value of 0.5 (see Table 3). These matrixes present the contrast of the predicted classes (failure or no-failure) versus the real classes using both testing datasets (March and April of 2011). On the diagonal, the numbers of cells with correct prediction are presented. Sensitivity values (percentage of cells with failures well predicted) were of 0.729 for March 2011 and 0.732 for April 2011. Specificity values (percentage of cells with no-failures well predicted) were 0.942 and 0.938 for both months, respectively. The lower sensitivity of the model is due to the fact that most of 1024 cells of 800 × 800 m2 did not present any failure events during March and April 2011.

Table 3

Confusion matrix of number of cells of 800 × 800 m2 which have a number of failures greater than or equal to 1 (Threshold = 0.5) for March 2011 and April 2011

March 2011
April 2011
ActualActual
  
Predicted 782 53 785 50 
 49 140 53 136 
March 2011
April 2011
ActualActual
  
Predicted 782 53 785 50 
 49 140 53 136 

Results of the predictive performance, by MSE, AUC and metrics of the confusion matrix, indicate that the proposed model has a very good capacity to anticipate the spatial distribution of failures in the sewer system. At the scale we defined the grid, this indicates that it is possible to plan ahead maintenance operations or rehabilitation strategies. It is important to take into account that the forecasting of several months in the future could be much less precise. However, the model can be estimated periodically in order to have better predictions.

It is known that sediment-related blockages in sewer systems can be considered a spatiotemporal failure process (e.g. Rodríguez et al. 2012). However, there is a lack of studies assessing appropriate methods to model such processes simultaneously in time and space. In this work, we applied a LPCP model to fill this knowledge gap. The LGCP model uses a stochastic intensity function to capture the aggregation of events driven by diverse physical and environmental conditions. Population density, monthly precipitation, sanitary and combined sewer pipes length and stormwater pipes diameter were identified as significant physical and environmental covariates in our case study. The LGCP model led to more accurate predictions than the homogenous Poisson model, given its ability to model the complete stochastic process and explain the covariables effect. Due to data availability restrictions, it was not possible to include other potential temporal explanatory variables (e.g. soil wetness, water consumption rates, and pipe structural condition, among others); thus, further work is required using case studies with more comprehensive datasets.

Abraham
D. M.
,
Wirahadikusumah
R.
,
Short
T. J.
&
Shahbahrami
S.
1998
Optimization modeling for sewer network management
.
Journal of Construction Engineering and Management
124
(
5
),
402
410
.
Brix
A.
&
Diggle
P. J.
2001
Spatiotemporal prediction for log-Gaussian Cox processes
.
Journal of the Royal Statistical Society. Series B (Statistical Methodology)
63
(
4
),
823
841
.
Brix
A.
&
Diggle
P. J.
2003
Corrigendum: spatiotemporal prediction for log-Gaussian Cox processes
.
Journal of the Royal Statistical Society. Series B. Statistical Methodology
65
(
4
),
946
.
Caradot
N.
,
Granger
D.
,
Chapgier
J.
,
Cherqui
F.
&
Chocat
B.
2011
Urban flood risk assessment using sewer flooding databases
.
Water Science and Technology
64
(
4
),
832
840
.
Cherqui
F.
,
Belmeziti
A.
,
Granger
D.
,
Sourdril
A.
&
Le Gauffre
P.
2015
Assessing urban potential flooding risk and identifying effective risk-reduction measures
.
Science of the Total Environment
514
,
418
425
.
Collender
P. A.
,
Cooke
O. C.
,
Bryant
L. D.
,
Kjeldsen
T. R.
&
Justin
V. R.
2016
Estimating the microbiological risks associated with inland flood events: bridging theory and models of pathogen transport
.
Critical Reviews in Environmental Science and Technology
46
(
23–24
),
1787
1833
.
Diggle
P.
,
Moraga
P.
,
Rowlingson
B.
&
Taylor
B.
2013
Spatial and Spatio-Temporal Log-Gaussian Cox processes: extending the geostatistical paradigm
.
Statistical Science
28
(
4
),
542
563
.
ESRI
2014
ArcGIS Desktop. Redlands: Environmental Systems Research Institute. https://support.esri.com/es/download/2093
.
Fenner
R. A.
2000
Approaches to sewer maintenance: a review
.
Urban Water
2
(
4
),
343
356
.
Fenner
R. A.
,
McFarland
G.
&
Thorne
O.
2007
Case-based reasoning approach for managing sewerage assets
.
Proceedings of the Institution of Civil Engineers – Water Management
160
(
1
),
15
24
.
Fontecha
J. E.
,
Akhavan-Tabatabaei
R.
,
Duque
D.
,
Medaglia
A. L.
,
Torres
M. N.
&
Rodríguez
J. P.
2016
On the preventive management of sediment-related sewer blockages: a combined maintenance and routing optimization approach
.
Water Science and Technology
74
(
2
),
302
308
.
Hickcox
C.
2013
Save money with proactive substation maintenance
.
Powergrid International
18
(
10
),
12
15
.
Jin
Y.
&
Mukherjee
A.
2010
Modeling blockage failures in sewer systems to support maintenance decision making
.
Journal of Performance of Constructed Facilities
24
(
6
),
622
633
.
Korving
H.
,
Fraņcois
C.
&
van Noortwijk
J. M.
2006
Statistical modeling of the serviceability of sewage pumps
.
Journal of Hydraulic Engineering
132
(
10
),
1076
1085
.
Makropoulos
C. K.
,
Butler
D.
&
Maksimovic
C.
2003
Fuzzy logic spatial decision support system for urban water management
.
Journal of Water Resources Planning and Management
129
(
1
),
69
77
.
Møller
J.
,
Syversveen
A.
&
Waagepetersen
R.
1998
Log Gaussian Cox processes
.
Scandinavian Journal of Statistics
25
(
3
),
451
482
.
Munawar
M.
,
Munawar
I. F.
,
McCarthy
L.
,
Page
W.
&
Gilron
G.
1993
Assessing the impact of sewage effluent on the ecosystem health of the Toronto Waterfront (Ashbridges Bay), Lake Ontario
.
Journal of Aquatic Ecosystem Health
2
(
4
),
287
315
.
Post
J. A.
,
Pothof
I. W. M.
,
Dirksen
J.
,
Baars
E. J.
,
Langeveld
J. G.
&
Clemens
F. H. L. R.
2016
Monitoring and statistical modelling of sedimentation in gully pots
.
Water Research
88
,
245
256
.
Post
J.
,
Langeveld
J.
&
Clemens
F.
2017
Analysing spatial patterns in lateral house connection blockages to support management strategies
.
Structure and Infrastructure Engineering
13
(
9
),
1146
1156
.
R Core Team
2015
R: A Language and Environment for Statistical Computing
.
R Foundation for Statistical Computing
,
Vienna, Austria
.
Rodríguez
J. P.
,
Díaz-Granados
M. A.
,
Camacho
L. A.
,
Raciny
I. C.
,
Maksimovic
C.
&
McIntyre
N.
2008
Bogotá’s urban drainage system: context, research activities and perspectives
.
BHS 10th National Hydrology Symposium
378
386
.
Rodríguez
J. P.
,
McIntyre
N.
,
Díaz-Granados
M. A.
&
Maksimovic
C.
2012
A database and model to support proactive management of sediment-related sewer blockages
.
Water Research
46
(
15
),
4571
4586
.
Rodríguez-Jeangros
N.
,
Camacho
L. A.
,
Rodríguez
J. P.
&
McCray
J. E.
2018
Integrated urban water resources model to improve water quality management in data-limited cities with application to Bogotá, Colombia
.
Journal of Sustainable Water in the Built Environment
4
(
2
),
1
15
.
Santos
P.
,
Amado
C.
,
Coelho
S. T.
&
Leitão
J. P.
2017
Stochastic data mining tools for pipe blockage failure prediction
.
Urban Water Journal
14
(
4
),
343
353
.
Savic
D.
,
Giustolisi
O.
,
Berardi
L.
,
Shepherd
W.
,
Djordjevic
S.
&
Saul
A.
2006
Modelling sewer failure by evolutionary computing
.
Proceedings of the Institution of Civil Engineers – Water Management
159
(
2
),
111
118
.
Sirkiä
J.
,
Laakso
T.
,
Ahopelto
S.
,
Ylijoki
O.
,
Porras
J.
&
Vahala
R.
2017
Data utilization at finnish water and wastewater utilities: current practices vs. state of the art
.
Utilities Policy
45
,
69
75
.
Taylor
B. M.
,
Davies
T. M.
,
Rowlingson
B. S.
&
Diggle
P. J.
2013
Lgcp: an R package for inference with spatial and Spatio-Temporal Log-Gaussian Cox processes
.
Journal of Statistical Software
52
(
4
),
1
40
.
Taylor
B. M.
,
Davies
T. M.
,
Rowlingson
B. S.
&
Diggle
P. J.
2015
Bayesian inference and data augmentation schemes for spatial, spatiotemporal and multivariate Log-Gaussian Cox processes in R
.
Journal of Statistical Software
63
(
7
),
1
48
.
ten Veldhuis
J. A. E.
,
Clemens
F. H. L. R.
,
Sterk
G.
&
Berends
B. R.
2010
Microbial risks associated with exposure to pathogens in contaminated urban flood water
.
Water Research
44
(
9
),
2910
2918
.
Torres
M. N.
,
Rodríguez
J. P.
&
Leitão
J. P.
2017
Geostatistical analysis to identify characteristics involved in sewer pipes and urban tree interactions
.
Urban Forestry & Urban Greening
25
,
36
42
.
Ugarelli
R.
,
Kristensen
S.
,
Røstum
J.
,
Sægrov
S.
&
Di Federico
V.
2009
Statistical analysis and definition of blockages-prediction formulae for the wastewater network of Oslo by evolutionary computing
.
Water Science and Technology
59
(
8
),
1457
1470
.
Xie
Q.
,
Bharat
C.
,
Nazim Khan
R.
,
Best
A.
&
Hodkiewicz
M.
2017
Cox proportional hazards modelling of blockage risk in vitrified clay wastewater pipes
.
Urban Water Journal
14
(
7
),
669
675
.
Yang
M.
&
Su
T.
2007
An optimization model of sewage rehabilitation
.
Journal of the Chinese Institute of Engineers
30
(
4
),
651
659
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Supplementary data