Municipal water managers rely on pipe deterioration models to plan maintenance, repair, and replacement. Although efforts have been made to increase their accuracy, these models are subject to uncertainties in the predictions. In this paper, an optimization procedure of the Bayesian updating period of the parameters of an existing deterioration model is proposed to sequentially reduce the uncertainty in the prediction of the water pipe breakage rate variable. This latter is modeled using a structured geoadditive regression technique where covariates are allowed to have linear (e.g., categorical) and nonlinear (e.g., continuous) relationships with the response variable. Unknown and unobserved covariates are included in the model through a geospatial component that captures spatial auto-correlations and local heterogeneities. The optimization procedure searches through the time series data to identify the optimal updating period horizon that corresponds to the minimum error between the predicted coefficient of determination between predictions and observations using the unupdated and updated models. The process is repeated until the entire time series data is covered. The application of this approach to failure data of large Canadian urban water systems shows a significant reduction in the uncertainty of the parameters and increases the accuracy in the prediction of the output response variable.

  • Optimization.

  • Bayesian Updating.

  • Gaussian Markov Random Fields.

  • P-Splines.

  • Deterioration model.

Different statistical deterioration models (Kleiner & Rajani 2001; Liu et al. 2012; Renaud et al. 2014; Scheidegger et al. 2015) have been developed to assess the failure of water pipes. Among these models, the most used techniques are the probabilistic approach and artificial neural networks (Scheidegger et al. 2015). A set of factors that are assumed to have an impact on the selected response variable (pipe breakage rate, PBR) are included in the model following a predefined set of relationship types (Kapelan et al. 2011). The definition of the types of relationships between the covariate and the response aims at building an accurate representation scheme of the structural deterioration occurring in water pipes (Kleiner & Rajani 2001). However, the models are still affected by uncertainty in the predictions. Different techniques including the nonparametric estimators and statistical tests (Kleiner & Rajani 1999; Fuchs-Hanusch et al. 2012), the Bayesian model averaging (Kabir et al. 2015), cross-validation (Balekelayi & Tesfamariam 2020) which is a robust approach to assess the predictive uncertainty for a given set of covariates are used to reduce uncertainty in degradation models. In addition, Balekelayi & Tesfamariam (2019a, 2019b) introduced the geospatial location of the pipes, as a covariate, to account for the information about the unobserved and unknown factors that contribute to the deterioration of the physical condition of buried pipes. Their approaches led to a better representation of the deterioration process for both sewer and water pipes. Kakoudakis et al. (2017) proposed clustering of water pipes (i.e., forming homogeneous groups) before developing the deterioration model. Most of the proposed deterioration models are used to quantify failure rate within the time frame of the available data sets. For future forecasting, the variance of the forecast response variable increases as the prediction period is long, and the parameters of the model are not updated. Lu & Madanat (1994) used Bayesian updating to improve the precision of forecasting models by successively updating the parameters of the model. Kabir et al. (2016) applied the Bayesian updating to reduce the uncertainty in the water pipe failure model. They developed a Bayesian Weibull Proportional Hazard Model that is successively updated every time new information is made available from new observations. Their result showed significant improvement in the model performance. In their approach, the entire data set was arbitrarily divided into four groups of data ranging from 25 to 5 years long, and the first data set was used to determine the posteriors of the deterioration model parameters. Noninformative priors were selected for updating the parameters in the first step. The obtained posterior, in the first step, was considered as prior for the following step and updated based on the new data in the second data set. While this approach allows the reduction of the uncertainty in the response variable, it does not follow a systematic approach in the subdivision of the data series in different groups. It is important to determine the time up to which the uncertainty band around the predicted pipeline breakage rate becomes large. This time is not easy to determine. In this paper, the geoadditive regression-based deterioration model proposed by Balekelayi & Tesfamariam (2019a, 2019b) was used as the baseline model. It is an enhanced Bayesian semi-parametric predictive model that considers, in its nonparametric part, nonlinear flexible interaction between the continuous covariates and the response variable. The parametric part of the model equation is formed by the discrete covariates evaluated using the least squared error algorithm. The correlation between the unknown and unobserved covariates, captured through a geospatial variable, is evaluated using the Markov random field approach. Uncertainties, in this model, are sequentially reduced using a Bayesian updating coupled with an optimization process to improve the forecast (Figure 1). The optimal Bayesian updating period times (Optimization 1, 2, etc. in Figure 1) correspond to the optimal decision variable values obtained after each minimization of an objective function defined as a difference between future predicted and actual errors. This paper is organized as follows: Section 2 describes the proposed methodology. The description of the case study, the available data, and the optimization of the updating period from a baseline deterioration mode are presented in Section 3. The accuracy of the baseline and the changes in the uncertainty of the effect of the factors on the response and the prediction of the response variable are discussed in the Results and Discussion section before the conclusion is drawn in Section 5.
Figure 1

Bayesian updating process.

Figure 1

Bayesian updating process.

Close modal
The proposed framework is shown in Figure 2. Inspection data are chronologically divided into two groups. The first group is obtained after analyzing significant periods of the development of the water system under study and the water pipe failure history. For instance, in the City of Calgary, the municipality, after observation of many failures on metallic pipes, implemented cathodic protection (CP) in their water system and reduced significantly the proportion of metallic pipe failures. This factor should be incorporated in the baseline to allow its effect on the PBR evolution to be captured. The definition of the upper limit of the first group is obtained iteratively through the development of a deterioration model for every year after the installation of the CP and comparing the values. The model with the high is selected as the baseline. Assume the failure database goes from year to . The range period is equal to . If the selected baseline is built for data going from to , then the remaining period in the database is with . The range defines the search area in which the updating should be performed. An optimization procedure is defined to determine the periodicity of providing new information to the deterioration model and keep its uncertainty low for future predictions. The defined optimal time period gives the time during which the model predictions are acceptable compared with a predefined threshold. After this time, the variance of the response increases, and the model cannot effectively support the decision-making with an acceptable level of uncertainty. In the optimization process, the decision variable is set to vary in with the bounds of this interval defined as the constraints. The selected baseline is used to predict the expected PBR over the next t years. The of the prediction is calculated. Then, the baseline is updated for the coming t years, and we make predictions with the updated model. The difference between the of the baseline predictions and the updated model is minimized in the optimization process. The optimal time is used to subset from to . The new subset is used to update the baseline. The next optimization will have the decision variable defined from to . The process will run until is empty.
Figure 2

Optimization of Bayesian updating framework for water main failure.

Figure 2

Optimization of Bayesian updating framework for water main failure.

Close modal

Geoadditive regression model

Flexible models better represent the complex nonlinear relationships between the regressors and the response variables. The addition of noise to account for over-dispersion caused by unobserved heterogeneities and possible autocorrelation increases the susceptibility of a statistical model to capture the stochastic nature of the deterioration. Assume the observations are elements of the water failure database, where , , represents the PBR (continuous), vectors of categorical (e.g., material) and continuous covariates (e.g., age). The relation between the observations and the predictions is defined as:
(1)
where i is a subset of , that is not necessary all latent are observed through the data . F is an unknown distribution function of error terms and may depend on some additional parameters . The model in Equation (1) is rewritten as follows (Kneib et al. 2009; Hastie 2017):
(2)
where are the nonparametric functions capturing the partial effect of each continuous variable, are the coefficients of the parametric part of the proposed model, and for all k. The vector backfitting algorithm is used to fit univariate factors. The updating of the is obtained through the minimization of the partial residual (Hastie & Tibshirani 1987). The final proposed deterioration model for water pipes proposed is given as (Balekelayi & Tesfamariam 2019b):
(3)
where is the pipe breakage rate, is the existence of the cathodic protection, is the length of the consider pipe segment, is the diameter of the pipe segment, is the time between the installation and the failure date, is the number of residential and domestic connections, is the number of commercial connections, is the number of previous failure on the considered pipe segment, is the freezing index, is the thawing index, is rain deficit, is the manufacturing period, is the district identification number, is the Soil Corrosivity Index as proposed in Demissie et al. (2015). The model's accuracy is estimated using the coefficient of determination. Different deterioration models, with varying chronological data periods, are developed and the model presenting the highest coefficient of determination is selected as the baseline that will be updated with new information. The selected deterioration modeling approach is based on the Bayesian approach and allows learning from newly available data and updating the parameters of the model.

Bayesian updating

Bayesian updating is a consistent and effective framework that incorporates prior knowledge about the parameters and updates its belief based on new observations. The updated belief will be the prior distribution for the next updating cycle when new data becomes available. This iterative process results in the minimization of the uncertainties in the parameters. This approach is applicable to models where prior beliefs of its parameters and hyperparameters can also be updated in the presence of new information obtained from new measurements (Watson et al. 2004; Ching & Leu 2009). In the proposed deterioration model, the prior distributions of the nonparametric functions f and the parametric coefficients will be updated at each iteration when new data becomes available. Specifically, the posterior distributions for the uncertain parameters of the deterioration model are updated every time period determined from the optimization process. The Bayes theorem is used to update the prior knowledge about the parameters , based on the new available data . The resulting posterior is obtained using the following equation:
(4)
where is the likelihood of experimental data , is the prior probability for the trainable parameter . The denominator is a normalizing constant and it couples the randomness inherent to the model and the statistical uncertainty in the parameters. The direct evaluation of the n-fold integral in Equation (4) is done through Markov Chain Monte Carlo (MCMC) sampling technique that is an efficient algorithm for sampling from posterior distribution (Ching & Chen 2007). Since the integral in Equation (4) is not easy to directly evaluate, the posterior probability density function is known only up to a proportionality constant as follows:
(5)

Bayesian belief updating is the transformation of prior beliefs into posterior beliefs when new information is observed. The posterior distribution quantifies the uncertainty of model parameters and structure using the best available information and data (Watson et al. 2004). The proposed approach here considers a multistage updating process where the posterior obtained after the first inference becomes the prior in the second inference and the new posterior is estimated from it and the newly collected data. The state of knowledge is updated with information gained at each step.

Optimization of the updating time

Finding the optimal interval time for collecting new data and updating the built prediction model can be solved using optimization. Several optimization algorithms are used to solve complex problems where the objective function is obtained after running another simulation model. Robust optimization tools should therefore be selected to ensure the convergence to an optimal global solution (Balekelayi & Tesfamariam 2017). Genetic algorithms are classified in the group of evolutionary computation because they mimic the biological processes of reproduction and natural selection to find the ‘fittest’ individual that corresponds to the optimal solution. Genetic algorithms are more powerful and more efficient than simple linear, dynamic programming, random search, and exhaustive search algorithms with the same level of information (Balekelayi et al. 2022). They are used to search for solutions in complex problems where there is a lack of continuity, derivatives, and linearity such as municipal infrastructure (Kielhauser & Adey 2018). The inherent nonlinearity of deterioration models makes the selection of this algorithm a good choice for assessing the time at which the predictions are done with reduced uncertainty. The components of the genetic algorithms are the fitness function for optimization, the population of chromosomes, the selection from which chromosomes will reproduce, the crossover to produce the next generation of chromosomes, and the random mutation of chromosomes in the new generation. The genetic algorithm begins with a randomly chosen assortment of chromosomes, which serves as the first generation (initial population). Then, each chromosome in the population is evaluated by the fitness function to test how well it solves the problem. The selection operator chooses some of the chromosomes for reproduction based on a probability distribution defined by the user. The fitter a chromosome is, the more likely it is to be selected. For example, if f is a non-negative fitness function, then the probability that chromosome is chosen to reproduce is given by (Rao 2019):
(6)
In this study, the proposed objective function is the difference between the coefficient of determination of the model prediction over the t coming years and the updated prediction over the same time period. The optimization will minimize this function under the constraint that the decision variable is contained within the range of the available data. The proposed optimization objective function is formulated as follows:
(7)
where is the coefficient of determination between predictions made using the baseline deterioration model over the next t years and the observed data, is the coefficient of determination between the predictions made using the updated model over the next t and the observed data. The optimal point will define the updating period that will go from the end time of the baseline model to . The updated model is then considered as a baseline and the next optimization is run with the time range going from to . The process is repeated until the time range goes to zero.
The proposed optimization approach is applied on the water distribution system (WDS) of the City of Calgary. The City of Calgary (Alberta, Canada) is situated at 1,099.10 m above sea level and has a humid continental climate. The WDS conveys 8.4 million of drinking water per month for an estimated demand of 7 per month per capita. Water infrastructure consists of a network of 49,531 pipes with a total length of 4,281 km. Pipe materials are represented as follows: 21.92% ductile iron (DI), 16.10% cast iron (CI), 54.64% polyvinyl chloride (PVC), and 7.34% other pipes (asbestos, concrete cylinder, steel, and copper). CI pipes were installed from 1910 to 2012 and the last recorded failure was reported in 2015. On the other side, DI pipes were installed from 1962 to 2012. After 1970, failed pipes were replaced with PVC pipes that showed better resistance to corrosion than metallic pipes. The preference for PVC pipes, over years, made the proportion of plastic pipes steadily increase (Figure 3). In this figure (representing the situation in the City of Calgary in 2015), the proportion of plastic pipes is higher than any other material type in the network. In addition, it can be observed that metallic pipes remain installed in the inner part of the city only.
Figure 3

Water distribution system in the City of Calgary.

Figure 3

Water distribution system in the City of Calgary.

Close modal
The City of Calgary started a systematic recording of pipe breaks in 1956 and their actual database contains 13,692 individual pipe breaks. Figure 3 shows that the majority of pipe breaks occurred in areas where CI (63.73%) and DI pipes (32.08%) are installed whereas PVC and others (i.e., asbestos, concrete, copper, and steel) pipes experienced very few breaks (1.18% and 3.01%, respectively). The density of CI and DI pipe breaks in different years is shown in Figure 4. Figure 4 indicates that the city network has experienced a steady increase in the number of breaks for both CI and DI pipes in the first 30 years (1956–1986) of data recording. However, due to the retrofitting program that started in 1961, the utility has been experiencing a drop in the number of breaks, especially for the DI pipes for the last two decades. Moreover, the number of breaks was also reduced due to the replacement program that has targeted pipes showing signs of high break patterns and the installation of PVC pipes in the system. As CI and DI pipe breaks represent about 96% of the total number of pipe breaks in the network, the management is interested in finding tools that can support their replacement decisions as financial resources are scarce. This study investigates the failure of CI and DI pipes in the City of Calgary. From the GIS database of the City of Calgary, pipe characteristics such as installation year, age (year), street block length (m), diameter (mm), manufacturing period or vintage, number of residential and domestic connections of each pipe (for land use determination), soil resistivity (-cm) are collected and processed. The soil corrosivity index has been proposed in Demissie et al. (2017). It is a continuous variable that depends on several soil characteristics including the soil resistivity, pH, redox potential, sulfide contents, and moisture content. The weather data such as temperature, rainfall, and precipitation used in this study were collected from Environment Canada and the Calgary International Airport weather station. Table 1 gives the factors included in the model, their units, and their description. The flexibility of the proposed approach to build the baseline and the need to increase the accuracy of the failure model led to the selection of all the available factors without any sensitivity analysis. The definition of the type of relationship between the factor and the output response is dependent on the user and the estimation of the possible impact the factor can have on the output response. The selected response variable is the PBR which represents the rate of failure of 100 km of pipe during one year. This rate is obtained after dividing the number of reported failures during a year over the length of the pipe in kilometers and multiplying the obtained result by 100.
Table 1

Factors included in the deterioration model and their descriptions

Covariates (Physical)UnitDescription
Material NAa Designed material of pipes (categorical: 1 = cast iron, 2 = ductile iron) 
Age years The difference between the reported failure date and the installation date (continuous) 
Length The manhole to manhole distance (continuous) 
Diameter mm Size of the pipes (continuous) 
Rservs NA Number of residential connections to the pipe (continuous) 
Cservs NA Number of commercial buildings connected to the pipe (continuous) 
Covariates (Soil characteristics)   
Soil corrosivity index (SCI) NA The nature of soil representing its aggressiveness to metallic pipes (continuous) 
Cathodic protection (CP) NA Is the cathodic protection in place (categorical) 
Covariates (Climatic)   
Thawing index (TI) degree-days Magnitude of thawing season (continuous) 
Freezing index (FI) degree-days Severity of freezing period (continuous) 
Rain Deficit (RD) cm Difference between received and evaporated precipitation (continuous) 
Geospatial location NA The geographic location of the water pipe 
PBR (Response variable) NA Pipe breakage rate is the number of breakage per year/100 km (continuous) 
Covariates (Physical)UnitDescription
Material NAa Designed material of pipes (categorical: 1 = cast iron, 2 = ductile iron) 
Age years The difference between the reported failure date and the installation date (continuous) 
Length The manhole to manhole distance (continuous) 
Diameter mm Size of the pipes (continuous) 
Rservs NA Number of residential connections to the pipe (continuous) 
Cservs NA Number of commercial buildings connected to the pipe (continuous) 
Covariates (Soil characteristics)   
Soil corrosivity index (SCI) NA The nature of soil representing its aggressiveness to metallic pipes (continuous) 
Cathodic protection (CP) NA Is the cathodic protection in place (categorical) 
Covariates (Climatic)   
Thawing index (TI) degree-days Magnitude of thawing season (continuous) 
Freezing index (FI) degree-days Severity of freezing period (continuous) 
Rain Deficit (RD) cm Difference between received and evaporated precipitation (continuous) 
Geospatial location NA The geographic location of the water pipe 
PBR (Response variable) NA Pipe breakage rate is the number of breakage per year/100 km (continuous) 

aNot applicable.

Figure 4

Continuous factors affecting the water pipe failure data in the City of Calgary.

Figure 4

Continuous factors affecting the water pipe failure data in the City of Calgary.

Close modal

Covariates are divided into two groups (continuous and discrete covariates) forming the two parts of the proposed additive model. The first part is the nonparametric part which is composed of the effect of continuous factors. All the continuous covariates in the database are assumed to have nonlinear effects on the response PBR. Categorical covariates will only have a linear effect. The geospatial component of the model captures the unknown and unobserved factors that can affect the degradation of the structural properties of pipes (see details in Balekelayi & Tesfamariam (2019a)). The parametric components of the model are estimated using the least squared error algorithm. The continuous covariates are modeled nonlinearly using P-splines. A Bayesian representation of the model is selected to account for the stochastic behavior of the studied process.

Baseline selection

The objective of building the most accurate model as a baseline is fulfilled as explained in this section. The model with the highest is selected as a baseline. Two parameters have been considered in the selection of the baseline model. The parameters considered in the selection of the baseline model are the R2 and the time period of simulation. As stated in the description of the case study, the city implemented CP after 1980. Thus, to account for this important factor that has demonstrated a positive impact against the deterioration of water pipes, baseline models are developed for the period going from 1956, the initial date when the city started recording failure to after 1980. However, the analysis of the failure database showed that the first pipes that have been protected with the CP failed after 1986. Thus, multiple runs have been performed from 1986 to 2006. Table 2 shows how the and the RMSE changed for different runs between 1986 and 1993. It is observed that the maximum and the minimum RMSE were obtained in 1990.

Table 2

Optimization runs and optimal updating time

RunModelRMSETime period
Baseline 0 0.78 0.467 1956–1986 
Baseline 1 0.785 0.432 1956–1987 
Baseline 2 0.8 0.410 1956–1988 
Baseline 3 0.815 0.389 1956–1989 
Baseline 4 0.83 0.363 1956–1990 
Baseline 5 0.83 0.373 1956–1991 
Baseline 6 0.77 0.402 1956–1992 
Baseline 7 0.76 0.405 1956–1993 
RunModelRMSETime period
Baseline 0 0.78 0.467 1956–1986 
Baseline 1 0.785 0.432 1956–1987 
Baseline 2 0.8 0.410 1956–1988 
Baseline 3 0.815 0.389 1956–1989 
Baseline 4 0.83 0.363 1956–1990 
Baseline 5 0.83 0.373 1956–1991 
Baseline 6 0.77 0.402 1956–1992 
Baseline 7 0.76 0.405 1956–1993 

Figure 5 shows the coefficient of determination of the selected baseline model. This baseline ( = 1990) is used in the first optimization run to define the time period for updating. Details about how the selected factors fit the response are given in the next section. In the following figures, the vertical axes represent the partial effects f function represented in Equation (3) and the dark and light bands are the 95 and 80% credit intervals, respectively.
Figure 5

Baseline regression coefficient.

Figure 5

Baseline regression coefficient.

Close modal

Geoadditive regression model

The developed deterioration model is an enhanced Bayesian deterioration model proposed in Balekelayi & Tesfamariam (2019a, 2019b). The variable factors available in the historical database of the City of Calgary have described in Table 1 and the geospatial information. Pipes that share neighborhoods are assumed to have similar deterioration pattern due to unobserved and unknown factors that are captured by the id of the district where the pipes are buried. The definition of the parameters of the flexible nonlinear functions (see Equation (3)) that are able to capture the data pattern for each covariate is obtained after trial and error. The parameters defined for each function are the number of basis functions, the selection of the prior, and the definition of interpolating intervals where the basis functions are optimized. The results confirm the assumptions made for the categorical covariates affecting linearly the response variable (Figure 12(e) and 12(f)) and the continuous covariates affecting nonlinearly the response variable (see Figure 6(a) and 6(b)).
Figure 6

Partial effect of (a) the length and (b) diameter covariates on the PBR.

Figure 6

Partial effect of (a) the length and (b) diameter covariates on the PBR.

Close modal

The correlation between the observed and predicted values using the baseline deterioration model for the first time period is = 0.83 (see Figure 5) and a root mean squared error (RMSE) of 0.363. This is selected as the starting point of the updating of the pipe failure model. The novelty in the proposed approach is in the definition of the updating time period. These values are obtained during a successive optimization process. The global effect of the selected covariates on the response output is obtained after the sum of the partial effects of individual covariates on the response. The advantage of additive models is due to the way each factor's impact is modeled and analyzed before it is added to the global model. The categorical variables have the expected impact on the response PBR. DI pipes are more prone to failure than CI pipes. The failure rate is high when there is no CP than after its installation. The CP reduces the effect of electrochemical corrosion in metallic pipes. It mitigates the corrosivity effect of the soil on the metallic pipes.

The analysis of the age covariate in Figure 7 is in agreement with previous studies (Røstum 2000). However, the 80 and 95% credible intervals are large. The dispersion around the mean value shows how uncertain the predictions are. This is why it is important to update the model with new data (information) to enhance the learning of the process. The observations are the same for the number of previous failures (Figure 7). The uncertainty band is lower for NOPF ranging from 10 to 15. Out of this interval, the uncertainty is large in the determination of the partial effect of this covariate on the response variable PBR. The length covariate (Figure 6) shows a high effect for short pipes and the effect diminishes as the length of the pipe increases. During the installation of water pipes, long pipes (subjected to high losses if a failure occurs) are treated with care and installed by qualified personnel. However, short pipes are not subject to much attention and can be installed with poor workmanship. On the other side, the expectation would be to have high degradation for long pipes since the external surface exposed to the soil environment is larger compared with the area exposed to the environment by short pipes. But this should be analyzed in conjunction with the age covariate which in Figure 7 shows that old pipes are more prone to high breakage rate, especially when their age is above 40 years.
Figure 7

Partial effect of (a) the age and (b) the number of previous failure NOPF on the PBR.

Figure 7

Partial effect of (a) the age and (b) the number of previous failure NOPF on the PBR.

Close modal
When the soil is too dry, the effect of the soil on the failure is small because the migration of ions is reduced. However, when the soil is humid, the failure rate is much affected. This is understood in the sense that humid soil facilitates the migrations of ions in the soil and accelerates the corrosion (Figure 8). RD, FI, and TI are the climatic factors that are introduced in the baseline to account for the seasonal and climate changes. The results show that they nonlinearly impact the output PBR and their partial effects are between −1.5 and 1 for the FI, between −4.5 and 4.5 for the RD, and between −1 and 1 for the TI. It is observed that for low values (cold temperatures) of TI, the TI partial effect is high which is the same for high values of FI (cold temperature) (see Figure 8). The selected baseline deterioration model is used in the optimization process to calculate the coefficient of determination between the predicted and the observed values over the next t years using the baseline. Then, the updated model over the next coming t years is used to calculate the using data of the t years following the end of the baseline. The difference between the two is minimized with the time t constrained to be between the defined range and positive.
Figure 8

Partial effect of (a) the freezing index and (b) the rain deficit on the PBR.

Figure 8

Partial effect of (a) the freezing index and (b) the rain deficit on the PBR.

Close modal

Optimization and model updating

The optimization is run sequentially. After the first run, the deterioration model is updated and the range of the time period is redefined (see Figure 2). Multiple optimizations runs with the updated deterioration models are used to calculate the updating time periods within the available range of historical data. Results from different runs with the obtained and RMSE are given in Table 3. The analysis of this table shows how long the deterioration model can be used to make predictions and the accuracy expected during the given time. It also shows how the Bayesian updating is improving the accuracy of the deterioration model. For all the selected covariates, the uncertainty band reduces as the model is updated until the final model is selected for long-period prediction. Brief descriptions of the length, diameter, and age covariates' partial effects change during the optimization process are presented.

Table 3

Optimization runs and optimal updating time

RunModelRMSEOptimal timeTime period
Baseline 0.83 0.363  1956–1990 
Optimization 1 0.965 0.152 1990–1992 
Optimization 2 0.96 0.099 1992–1994 
Optimization 3 0.96 0.099 1994–1997 
Optimization 4 0.914 0.125 11 1997–2008 
RunModelRMSEOptimal timeTime period
Baseline 0.83 0.363  1956–1990 
Optimization 1 0.965 0.152 1990–1992 
Optimization 2 0.96 0.099 1992–1994 
Optimization 3 0.96 0.099 1994–1997 
Optimization 4 0.914 0.125 11 1997–2008 

Length covariate analysis

The partial effect of the length covariate is shown in Figure 9. This figure gives the evolution of the impact of the length covariate from the baseline to the last optimization run. It can be seen that short pipes have higher breakage rate than long pipes. This result is in contradiction with several previous studies (Balekelayi & Tesfamariam (2019b)). However, this can be understood as a result of localized conditions in the soil that are well represented by incorporating spatial data. The prediction of the response PBR using both the updated and the baseline shows an increased accuracy in the predictions using the updated model. In this study, it is demonstrated that the reduction in uncertainty can be achieved through sequential updating of the initial model built from a reduced amount of data. The time period of updating to increase the accuracy is obtained through optimization.
Figure 9

The partial effect of the length covariate from the baseline to the last optimization run.

Figure 9

The partial effect of the length covariate from the baseline to the last optimization run.

Close modal

The consideration of the entire data set as a single simulation period leads to high uncertainty. Figure 9 shows how the uncertainty band is reduced from one optimization run to another until the end of the simulation period. The length has a reduced credible interval for short pipes (below 400 m). When the length of the pipes increases, the uncertainty increases. However, it is observed that the partial effect of the length is getting values below zero meaning for long pipes, the length is not a significant factor that can increase the failure rate of metallic pipes in the City of Calgary. Long pipes are not prone to failure.

Age covariate analysis

The variable age of the pipes is defined as the time elapsed from the installation of the pipes to their failure date. This factor is accurate as the system is well monitored and, in this study, it is assumed that failures are timely localized and carefully documented. In the baseline model, it is seen that the partial effect of the age covariate is a function of time that starts from zero (when the pipe is not installed, it does not have any effect on the PBR), decreases (when new pipes are installed, there is a tendency to observe a decrease in PBR), and then starts a monotonic increase after 30 years (see Figure 10). This curve can be comparable to the bathtub curve that is assumed to represent the life of pipes in water distribution systems (Rajani & Kleiner 2001). The uncertainty band in the partial effect of the age covariate in the baseline model is large (going from −1 to +1) and does not allow accurate predictions. After the first optimization and updating the baseline, the increasing trend of the partial effect is maintained, and we can see a reduction in the uncertainty band that is due to the new information being brought to the model through the updating process (Figure 10). This period coincides with the early experimentation of CP in the system. Fine-tuning the current potential and time to find the best operating parameters of the newly installed technology can be the reasons for the obtained increasing trends. After 2 years, it is assumed that the technology is known and well deployed and the expected results are observed through a significant reduction of the failure rate of metallic pipes in the system. During the following 5 years, it is observed that the breakage rate of pipes is inversely correlated with the age of the pipes. During the same period, the city started installing plastic pipes to replace metallic pipes that were above 50 years old and had a declining structural condition. So the role of CP is reduced and plastic pipes' lives follow the ‘bathtub’ theory (which has a direct effect on the increase of the failure rate).
Figure 10

The partial effect of the Age covariate from the baseline to the last optimization run.

Figure 10

The partial effect of the Age covariate from the baseline to the last optimization run.

Close modal

Diameter covariate analysis

The diameter of the pipes is an important factor that should be considered when estimating the risk of pipe failure. That is why municipal operators assign the installation of large diameter pipes to experienced personnel to reduce the risk of pipe failure since the consequences of failure of large pipes are large. The trend in the baseline confirms the assumption that large diameter pipes are less prone to failure. Their partial effect on the failure rate is low (<0.2) and goes below zero for pipes whose diameter is greater than 400 mm. During the optimization, the pattern of the partial effect of the diameter is changing and leads to almost no significant effect of the diameter in the last optimization run. Details can be seen in Figure 11. The reduced number of instances in the data set could not allow a better sampling to update the parameters of the deterioration model. An additional reason is the small range of the failure diameter which can also affect the distribution of the covariate.
Figure 11

The partial effect of the Diameter covariate from the baseline to the last optimization run.

Figure 11

The partial effect of the Diameter covariate from the baseline to the last optimization run.

Close modal
Figure 12

PBR prediction using single factors.

Figure 12

PBR prediction using single factors.

Close modal

Model predictions

The analysis of the uncertainty bound in the output response also shows an improvement. The updated models are presenting a reduced distribution range of the effect on the response output. These models are used to predict the PBR of the pipes in the network.

Figure 12 gives the univariate prediction of the PBR using the age and the length. The presented figures are obtained by fixing all the factors in Equation (4) to constant values and only changing the selected factor. Figure 12(a) shows how well the updated model after the fourth optimization is performing compared with the baseline for the age covariate. The uncertainty is reduced and for pipes above 60 years old, the prediction's accuracy is very high. However, for the pipes between 20 and 60 years old, there is still some uncertainty that is lower than the uncertainty obtained in the baseline. This is an improvement. Figure 12(b) shows the length prediction in the same conditions as the age. It is also an improvement compared with the baseline. Figure 12 gives the univariate prediction of the PBR using the age, the length, the diameter, the number of previous failures, the material and the cathodic protection. It is also an improvement compared with the baseline. The same conclusions are drawn for the diameter, the number of the previous failure, and the material (Figure 12(c)–12(e)) where the uncertainty bands are reduced in the updated model. However, CP behavior is different. The final optimization run is going from 1997 to 2008 during which the CP was already installed. So all the pipes had protection. This is observed in Figure 12(f) where the baseline has both protected and unprotected pipes. However, the updated model will only consider the future protected pipes.

Comparison of the 5 years forecast

Since the first optimal updating period obtained from the optimization is 2 years, the baseline model parameters are updated using the next 2 years' data (1991–1992) and the new model is named the ‘updated model’. This updated model is used to make future pipe breakage (PBR) predictions over the next 5 years (1993–1997) and compare the results with PBR predictions made with the baseline model over the same time period.

A visual analysis of the correlation between predicted and observed values obtained using the baseline model shows that this model is overpredicting the PBR (Figure 13(a)). The majority of data points are above the line. After updating the model at an optimally selected time period (2 years), an increased correlation appears between the predicted PBR and the observed PBR which is observed in Figure 13(b) where points data are aligned over the line. Model improvement can also be evaluated through the comparison between the coefficients of determination for both models. The baseline model is less accurate with a = 0.76 while the updated model gives more accurate predictions with a = 0.89. It is shown here how important models need to be updated, and also that the selection of an optimal updating span period will have a high impact on the accuracy increase and will reduce the uncertainty in the model as seen in previous sections.
Figure 13

Correlation between the observed and the predicted.

Figure 13

Correlation between the observed and the predicted.

Close modal

In this paper, an individual deterioration model is developed using Bayesian inference with P-splines and Gaussian Markov random field to represent the nonlinear functions of the continuous covariates and the geospatial correlation of the pipe's location, respectively. The model considered the effects of physical, soil, and climate factors on the structural condition of water pipes using linear, nonlinear, and geospatial effects on the expected mean value of the PBR. The Bayesian inference captures the uncertainty in the data through appropriate polynomials and degree selections. Spatial autocorrelation and local heterogeneities are introduced to capture unknown covariates that may have an influence on the deterioration of water pipes. The individual effects of the covariates show a given level of uncertainty that affects the future predictions of the response variable. Optimal Bayesian updating is used to reduce the uncertainty in the initial model using the learning flexibility of Bayes's theorem. The determination of the number of years, up to which the PBR predictions made using the deterioration model are acceptable, is obtained through an optimization process that determines the optimal time after which the deterioration model should be updated. The updated model is obtained after updating the parameters of the actual model using the newly acquired information provided by data collected during the time period. Sequential updating is executed from an initial model that is built from noninformative prior distributions of the covariates. The optimization coupled with the updating process shows enhancement in the future predictions of the PBR and that the updated deterioration model performs well although the optimization is requiring short time periods to update the model. In the application of the proposed framework to predict the failure of water pipes in the City of Calgary, the optimal updating time periods vary from 2 to 10 years. The updating following this timing shows significant improvement in the assessment of the uncertainty in the partial effects of the input factors on the response PBR and in the accuracy of the same response. On the other side, the reduced updating time periods induce costs (failure data evaluation and recording) that should be included in the decision-making process. The gain in accuracy has a side effect of the cost that should be evaluated to decide whether it is acceptable to collect the information needed. The final proposed model allows making predictions over the next 11 years. That is an acceptable time to allow two successive inspections of the pipes in a system. The future application will consider the life cycle cost and the updating through optimization to allow optimal decision-making under uncertainty.

The authors acknowledge the financial support through the Natural Sciences and Engineering Research Council of Canada (RGPIN-2019-05584) under the Discovery Grant programs.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Balekelayi
N.
&
Tesfamariam
S.
2017
Optimization techniques used in design and operations of water distribution networks: a review and comparative study
.
Sustainable and Resilient Infrastructure
2
(
4
),
153
168
.
Balekelayi
N.
&
Tesfamariam
S.
2019a
Geoadditive Bayesian regression models for water mains failure rate prediction
. In:
13th International Conference on Application of Statistics and Probability in Civil Engineering
,
May 26–30, 2019
,
Seoul, South Korea
.
CERRA
.
Balekelayi
N.
&
Tesfamariam
S.
2019b
Statistical inference of sewer pipe deterioration using Bayesian geoadditive regression model
.
Journal of Infrastructure Systems
25
(
3
),
04019021
.
Balekelayi
N.
,
Woldesellasse
H.
&
Tesfamariam
S.
2022
Comparison of the performance of a surrogate based Gaussian process, NSGA2 and PSO multi-objective optimization of the operation and fuzzy structural reliability of water distribution system: case study for the City of Asmara, Eritrea
.
Water Resources Management
2022
,
1
17
.
Demissie
G.
,
Tesfamariam
S.
&
Sadiq
R.
2015
Prediction of soil corrosivity index: a Bayesian belief network approach
. In
International Conference on Applications of Statistics and Probability in Civil Engineering, Canada
.
211 p
,
July 12–15, 2015
,
Vancouver, Canada
.
Demissie
G.
,
Tesfamariam
S.
&
Sadiq
R.
2017
Prediction of pipe failure by considering time-dependent factors: dynamic Bayesian belief network model
.
ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering
3
(
4
),
04017017
.
Fuchs-Hanusch
D.
,
Friedl
F.
,
Möderl
M.
,
Sprung
W.
,
Plihal
H.
,
Kretschmer
F.
&
Ertl
T.
2012
Risk and performance oriented sewer inspection prioritization
. In:
World Environmental and Water Resources Congress 2012: Crossing Boundaries
, pp.
3711
3723
.
Hastie
T. J.
2017
Generalized additive models
. In:
Statistical Models in S
(J. M. Chambers & T. J. Hastie, eds.).
Routledge
,
Pacific Grove, CA
, pp.
249
307
.
Hastie
T.
&
Tibshirani
R.
1987
Generalized additive models: some applications
.
Journal of the American Statistical Association
82
(
398
),
371
386
.
Kabir
G.
,
Tesfamariam
S.
&
Sadiq
R.
2015
Predicting water main failures using Bayesian model averaging and survival modelling approach
.
Reliability Engineering & System Safety
142
,
498
514
.
Kabir
G.
,
Tesfamariam
S.
,
Loeppky
J.
&
Sadiq
R.
2016
Predicting water main failures: a Bayesian model updating approach
.
Knowledge-Based Systems
110
,
144
156
.
Kapelan
Z.
,
Banyard
J. K.
,
Randall-Smith
M.
&
Savić
D. A.
2011
Asset planning and management
. In:
Water Distribution Systems
(D. A. Savić & J. K. Banyard, eds.).
ICE Publishing
,
London, UK
, pp.
227
261
.
Kielhauser
C.
&
Adey
B. T.
2018
Determination of intervention programs for multiple municipal infrastructure networks: considering network operator and service costs
.
Sustainable and Resilient Infrastructure
5
(
1–2
),
1
13
.
Kleiner
Y.
&
Rajani
B.
1999
Using limited data to assess future needs
.
Journal-American Water Works Association
91
(
7
),
47
61
.
Liu
Z.
,
Kleiner
Y.
,
Rajjani
B.
,
Wang
L.
&
Condit
W.
2012
Condition Assessment of Water Transmission and Distribution Systems
.
Technical Report, Tech. Rep. EPA/600/R-12/017, United States Environmental Protection Agency
.
Lu
Y.
&
Madanat
S.
1994
Bayesian updating of infrastructure deterioration models
.
Age
15
(
20
),
25
.
Rao
S. S.
2019
Engineering Optimization: Theory and Practice
.
John Wiley & Sons
,
New York, NY
.
Renaud
E.
,
Bremond
B.
&
Le Gat
Y.
2014
Water pipes: why ‘lifetime’ is not an adequate concept on which to base pipe renewal strategies
.
Water Practice and Technology
9
(
3
),
307
315
.
Røstum
J.
2000
Statistical Modelling of Pipe Failures in Water Networks
.
PhD Thesis
,
Norwegian University of Science and Technology
.
Watson
T.
,
Christian
C.
,
Mason
A.
,
Smith
M.
&
Meyer
R.
2004
Bayesian-based pipe failure model
.
Journal of Hydroinformatics
6
(
4
),
259
264
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).