## Abstract

Municipal water managers rely on pipe deterioration models to plan maintenance, repair, and replacement. Although efforts have been made to increase their accuracy, these models are subject to uncertainties in the predictions. In this paper, an optimization procedure of the Bayesian updating period of the parameters of an existing deterioration model is proposed to sequentially reduce the uncertainty in the prediction of the water pipe breakage rate variable. This latter is modeled using a structured geoadditive regression technique where covariates are allowed to have linear (e.g., categorical) and nonlinear (e.g., continuous) relationships with the response variable. Unknown and unobserved covariates are included in the model through a geospatial component that captures spatial auto-correlations and local heterogeneities. The optimization procedure searches through the time series data to identify the optimal updating period horizon that corresponds to the minimum error between the predicted coefficient of determination between predictions and observations using the unupdated and updated models. The process is repeated until the entire time series data is covered. The application of this approach to failure data of large Canadian urban water systems shows a significant reduction in the uncertainty of the parameters and increases the accuracy in the prediction of the output response variable.

## HIGHLIGHTS

Optimization.

Bayesian Updating.

Gaussian Markov Random Fields.

P-Splines.

Deterioration model.

## INTRODUCTION

*et al.*2012; Renaud

*et al.*2014; Scheidegger

*et al.*2015) have been developed to assess the failure of water pipes. Among these models, the most used techniques are the probabilistic approach and artificial neural networks (Scheidegger

*et al.*2015). A set of factors that are assumed to have an impact on the selected response variable (pipe breakage rate, PBR) are included in the model following a predefined set of relationship types (Kapelan

*et al.*2011). The definition of the types of relationships between the covariate and the response aims at building an accurate representation scheme of the structural deterioration occurring in water pipes (Kleiner & Rajani 2001). However, the models are still affected by uncertainty in the predictions. Different techniques including the nonparametric estimators and statistical tests (Kleiner & Rajani 1999; Fuchs-Hanusch

*et al.*2012), the Bayesian model averaging (Kabir

*et al.*2015), cross-validation (Balekelayi & Tesfamariam 2020) which is a robust approach to assess the predictive uncertainty for a given set of covariates are used to reduce uncertainty in degradation models. In addition, Balekelayi & Tesfamariam (2019a, 2019b) introduced the geospatial location of the pipes, as a covariate, to account for the information about the unobserved and unknown factors that contribute to the deterioration of the physical condition of buried pipes. Their approaches led to a better representation of the deterioration process for both sewer and water pipes. Kakoudakis

*et al.*(2017) proposed clustering of water pipes (i.e., forming homogeneous groups) before developing the deterioration model. Most of the proposed deterioration models are used to quantify failure rate within the time frame of the available data sets. For future forecasting, the variance of the forecast response variable increases as the prediction period is long, and the parameters of the model are not updated. Lu & Madanat (1994) used Bayesian updating to improve the precision of forecasting models by successively updating the parameters of the model. Kabir

*et al.*(2016) applied the Bayesian updating to reduce the uncertainty in the water pipe failure model. They developed a Bayesian Weibull Proportional Hazard Model that is successively updated every time new information is made available from new observations. Their result showed significant improvement in the model performance. In their approach, the entire data set was arbitrarily divided into four groups of data ranging from 25 to 5 years long, and the first data set was used to determine the posteriors of the deterioration model parameters. Noninformative priors were selected for updating the parameters in the first step. The obtained posterior, in the first step, was considered as prior for the following step and updated based on the new data in the second data set. While this approach allows the reduction of the uncertainty in the response variable, it does not follow a systematic approach in the subdivision of the data series in different groups. It is important to determine the time up to which the uncertainty band around the predicted pipeline breakage rate becomes large. This time is not easy to determine. In this paper, the geoadditive regression-based deterioration model proposed by Balekelayi & Tesfamariam (2019a, 2019b) was used as the baseline model. It is an enhanced Bayesian semi-parametric predictive model that considers, in its nonparametric part, nonlinear flexible interaction between the continuous covariates and the response variable. The parametric part of the model equation is formed by the discrete covariates evaluated using the least squared error algorithm. The correlation between the unknown and unobserved covariates, captured through a geospatial variable, is evaluated using the Markov random field approach. Uncertainties, in this model, are sequentially reduced using a Bayesian updating coupled with an optimization process to improve the forecast (Figure 1). The optimal Bayesian updating period times (Optimization 1, 2, etc. in Figure 1) correspond to the optimal decision variable values obtained after each minimization of an objective function defined as a difference between future predicted and actual errors. This paper is organized as follows: Section 2 describes the proposed methodology. The description of the case study, the available data, and the optimization of the updating period from a baseline deterioration mode are presented in Section 3. The accuracy of the baseline and the changes in the uncertainty of the effect of the factors on the response and the prediction of the response variable are discussed in the Results and Discussion section before the conclusion is drawn in Section 5.

## METHODOLOGY

*PBR*over the next

*t*years. The of the prediction is calculated. Then, the baseline is updated for the coming

*t*years, and we make predictions with the updated model. The difference between the of the baseline predictions and the updated model is minimized in the optimization process. The optimal time is used to subset from to . The new subset is used to update the baseline. The next optimization will have the decision variable defined from to . The process will run until is empty.

### Geoadditive regression model

*i*is a subset of , that is not necessary all latent are observed through the data .

*F*is an unknown distribution function of error terms and may depend on some additional parameters . The model in Equation (1) is rewritten as follows (Kneib

*et al.*2009; Hastie 2017):where are the nonparametric functions capturing the partial effect of each continuous variable, are the coefficients of the parametric part of the proposed model, and for all

*k*. The vector backfitting algorithm is used to fit univariate factors. The updating of the is obtained through the minimization of the partial residual (Hastie & Tibshirani 1987). The final proposed deterioration model for water pipes proposed is given as (Balekelayi & Tesfamariam 2019b):where is the pipe breakage rate, is the existence of the cathodic protection, is the length of the consider pipe segment, is the diameter of the pipe segment, is the time between the installation and the failure date, is the number of residential and domestic connections, is the number of commercial connections, is the number of previous failure on the considered pipe segment, is the freezing index, is the thawing index, is rain deficit, is the manufacturing period, is the district identification number, is the Soil Corrosivity Index as proposed in Demissie

*et al.*(2015). The model's accuracy is estimated using the coefficient of determination. Different deterioration models, with varying chronological data periods, are developed and the model presenting the highest coefficient of determination is selected as the baseline that will be updated with new information. The selected deterioration modeling approach is based on the Bayesian approach and allows learning from newly available data and updating the parameters of the model.

### Bayesian updating

*et al.*2004; Ching & Leu 2009). In the proposed deterioration model, the prior distributions of the nonparametric functions

*f*and the parametric coefficients will be updated at each iteration when new data becomes available. Specifically, the posterior distributions for the uncertain parameters of the deterioration model are updated every time period determined from the optimization process. The Bayes theorem is used to update the prior knowledge about the parameters , based on the new available data . The resulting posterior is obtained using the following equation:where is the likelihood of experimental data , is the prior probability for the trainable parameter . The denominator is a normalizing constant and it couples the randomness inherent to the model and the statistical uncertainty in the parameters. The direct evaluation of the

*n*-fold integral in Equation (4) is done through Markov Chain Monte Carlo (MCMC) sampling technique that is an efficient algorithm for sampling from posterior distribution (Ching & Chen 2007). Since the integral in Equation (4) is not easy to directly evaluate, the posterior probability density function is known only up to a proportionality constant as follows:

Bayesian belief updating is the transformation of prior beliefs into posterior beliefs when new information is observed. The posterior distribution quantifies the uncertainty of model parameters and structure using the best available information and data (Watson *et al.* 2004). The proposed approach here considers a multistage updating process where the posterior obtained after the first inference becomes the prior in the second inference and the new posterior is estimated from it and the newly collected data. The state of knowledge is updated with information gained at each step.

### Optimization of the updating time

*et al.*2022). They are used to search for solutions in complex problems where there is a lack of continuity, derivatives, and linearity such as municipal infrastructure (Kielhauser & Adey 2018). The inherent nonlinearity of deterioration models makes the selection of this algorithm a good choice for assessing the time at which the predictions are done with reduced uncertainty. The components of the genetic algorithms are the fitness function for optimization, the population of chromosomes, the selection from which chromosomes will reproduce, the crossover to produce the next generation of chromosomes, and the random mutation of chromosomes in the new generation. The genetic algorithm begins with a randomly chosen assortment of chromosomes, which serves as the first generation (initial population). Then, each chromosome in the population is evaluated by the fitness function to test how well it solves the problem. The selection operator chooses some of the chromosomes for reproduction based on a probability distribution defined by the user. The fitter a chromosome is, the more likely it is to be selected. For example, if

*f*is a non-negative fitness function, then the probability that chromosome is chosen to reproduce is given by (Rao 2019):

*t*coming years and the updated prediction over the same time period. The optimization will minimize this function under the constraint that the decision variable is contained within the range of the available data. The proposed optimization objective function is formulated as follows:where is the coefficient of determination between predictions made using the baseline deterioration model over the next

*t*years and the observed data, is the coefficient of determination between the predictions made using the updated model over the next

*t*and the observed data. The optimal point will define the updating period that will go from the end time of the baseline model to . The updated model is then considered as a baseline and the next optimization is run with the time range going from to . The process is repeated until the time range goes to zero.

## CASE STUDY AND DATA DESCRIPTION

*et al.*(2017). It is a continuous variable that depends on several soil characteristics including the soil resistivity, pH, redox potential, sulfide contents, and moisture content. The weather data such as temperature, rainfall, and precipitation used in this study were collected from Environment Canada and the Calgary International Airport weather station. Table 1 gives the factors included in the model, their units, and their description. The flexibility of the proposed approach to build the baseline and the need to increase the accuracy of the failure model led to the selection of all the available factors without any sensitivity analysis. The definition of the type of relationship between the factor and the output response is dependent on the user and the estimation of the possible impact the factor can have on the output response. The selected response variable is the

*PBR*which represents the rate of failure of 100 km of pipe during one year. This rate is obtained after dividing the number of reported failures during a year over the length of the pipe in kilometers and multiplying the obtained result by 100.

Covariates (Physical) . | Unit . | Description . |
---|---|---|

Material | NA^{a} | Designed material of pipes (categorical: 1 = cast iron, 2 = ductile iron) |

Age | years | The difference between the reported failure date and the installation date (continuous) |

Length | m | The manhole to manhole distance (continuous) |

Diameter | mm | Size of the pipes (continuous) |

Rservs | NA | Number of residential connections to the pipe (continuous) |

Cservs | NA | Number of commercial buildings connected to the pipe (continuous) |

Covariates (Soil characteristics) | ||

Soil corrosivity index (SCI) | NA | The nature of soil representing its aggressiveness to metallic pipes (continuous) |

Cathodic protection (CP) | NA | Is the cathodic protection in place (categorical) |

Covariates (Climatic) | ||

Thawing index (TI) | degree-days | Magnitude of thawing season (continuous) |

Freezing index (FI) | degree-days | Severity of freezing period (continuous) |

Rain Deficit (RD) | cm | Difference between received and evaporated precipitation (continuous) |

Geospatial location | NA | The geographic location of the water pipe |

PBR (Response variable) | NA | Pipe breakage rate is the number of breakage per year/100 km (continuous) |

Covariates (Physical) . | Unit . | Description . |
---|---|---|

Material | NA^{a} | Designed material of pipes (categorical: 1 = cast iron, 2 = ductile iron) |

Age | years | The difference between the reported failure date and the installation date (continuous) |

Length | m | The manhole to manhole distance (continuous) |

Diameter | mm | Size of the pipes (continuous) |

Rservs | NA | Number of residential connections to the pipe (continuous) |

Cservs | NA | Number of commercial buildings connected to the pipe (continuous) |

Covariates (Soil characteristics) | ||

Soil corrosivity index (SCI) | NA | The nature of soil representing its aggressiveness to metallic pipes (continuous) |

Cathodic protection (CP) | NA | Is the cathodic protection in place (categorical) |

Covariates (Climatic) | ||

Thawing index (TI) | degree-days | Magnitude of thawing season (continuous) |

Freezing index (FI) | degree-days | Severity of freezing period (continuous) |

Rain Deficit (RD) | cm | Difference between received and evaporated precipitation (continuous) |

Geospatial location | NA | The geographic location of the water pipe |

PBR (Response variable) | NA | Pipe breakage rate is the number of breakage per year/100 km (continuous) |

^{a}Not applicable.

Covariates are divided into two groups (continuous and discrete covariates) forming the two parts of the proposed additive model. The first part is the nonparametric part which is composed of the effect of continuous factors. All the continuous covariates in the database are assumed to have nonlinear effects on the response *PBR*. Categorical covariates will only have a linear effect. The geospatial component of the model captures the unknown and unobserved factors that can affect the degradation of the structural properties of pipes (see details in Balekelayi & Tesfamariam (2019a)). The parametric components of the model are estimated using the least squared error algorithm. The continuous covariates are modeled nonlinearly using P-splines. A Bayesian representation of the model is selected to account for the stochastic behavior of the studied process.

## RESULTS AND DISCUSSION

### Baseline selection

The objective of building the most accurate model as a baseline is fulfilled as explained in this section. The model with the highest is selected as a baseline. Two parameters have been considered in the selection of the baseline model. The parameters considered in the selection of the baseline model are the R^{2} and the time period of simulation. As stated in the description of the case study, the city implemented CP after 1980. Thus, to account for this important factor that has demonstrated a positive impact against the deterioration of water pipes, baseline models are developed for the period going from 1956, the initial date when the city started recording failure to after 1980. However, the analysis of the failure database showed that the first pipes that have been protected with the CP failed after 1986. Thus, multiple runs have been performed from 1986 to 2006. Table 2 shows how the and the *RMSE* changed for different runs between 1986 and 1993. It is observed that the maximum and the minimum *RMSE* were obtained in 1990.

Run . | Model . | . | RMSE . | Time period . |
---|---|---|---|---|

1 | Baseline 0 | 0.78 | 0.467 | 1956–1986 |

2 | Baseline 1 | 0.785 | 0.432 | 1956–1987 |

3 | Baseline 2 | 0.8 | 0.410 | 1956–1988 |

4 | Baseline 3 | 0.815 | 0.389 | 1956–1989 |

5 | Baseline 4 | 0.83 | 0.363 | 1956–1990 |

6 | Baseline 5 | 0.83 | 0.373 | 1956–1991 |

7 | Baseline 6 | 0.77 | 0.402 | 1956–1992 |

8 | Baseline 7 | 0.76 | 0.405 | 1956–1993 |

Run . | Model . | . | RMSE . | Time period . |
---|---|---|---|---|

1 | Baseline 0 | 0.78 | 0.467 | 1956–1986 |

2 | Baseline 1 | 0.785 | 0.432 | 1956–1987 |

3 | Baseline 2 | 0.8 | 0.410 | 1956–1988 |

4 | Baseline 3 | 0.815 | 0.389 | 1956–1989 |

5 | Baseline 4 | 0.83 | 0.363 | 1956–1990 |

6 | Baseline 5 | 0.83 | 0.373 | 1956–1991 |

7 | Baseline 6 | 0.77 | 0.402 | 1956–1992 |

8 | Baseline 7 | 0.76 | 0.405 | 1956–1993 |

*f*function represented in Equation (3) and the dark and light bands are the 95 and 80% credit intervals, respectively.

### Geoadditive regression model

*id*of the district where the pipes are buried. The definition of the parameters of the flexible nonlinear functions (see Equation (3)) that are able to capture the data pattern for each covariate is obtained after trial and error. The parameters defined for each function are the number of basis functions, the selection of the prior, and the definition of interpolating intervals where the basis functions are optimized. The results confirm the assumptions made for the categorical covariates affecting linearly the response variable (Figure 12(e) and 12(f)) and the continuous covariates affecting nonlinearly the response variable (see Figure 6(a) and 6(b)).

The correlation between the observed and predicted values using the baseline deterioration model for the first time period is = 0.83 (see Figure 5) and a root mean squared error (RMSE) of 0.363. This is selected as the starting point of the updating of the pipe failure model. The novelty in the proposed approach is in the definition of the updating time period. These values are obtained during a successive optimization process. The global effect of the selected covariates on the response output is obtained after the sum of the partial effects of individual covariates on the response. The advantage of additive models is due to the way each factor's impact is modeled and analyzed before it is added to the global model. The categorical variables have the expected impact on the response *PBR*. DI pipes are more prone to failure than CI pipes. The failure rate is high when there is no CP than after its installation. The CP reduces the effect of electrochemical corrosion in metallic pipes. It mitigates the corrosivity effect of the soil on the metallic pipes.

*NOPF*ranging from 10 to 15. Out of this interval, the uncertainty is large in the determination of the partial effect of this covariate on the response variable

*PBR*. The length covariate (Figure 6) shows a high effect for short pipes and the effect diminishes as the length of the pipe increases. During the installation of water pipes, long pipes (subjected to high losses if a failure occurs) are treated with care and installed by qualified personnel. However, short pipes are not subject to much attention and can be installed with poor workmanship. On the other side, the expectation would be to have high degradation for long pipes since the external surface exposed to the soil environment is larger compared with the area exposed to the environment by short pipes. But this should be analyzed in conjunction with the age covariate which in Figure 7 shows that old pipes are more prone to high breakage rate, especially when their age is above 40 years.

*RD*,

*FI*, and

*TI*are the climatic factors that are introduced in the baseline to account for the seasonal and climate changes. The results show that they nonlinearly impact the output

*PBR*and their partial effects are between −1.5 and 1 for the FI, between −4.5 and 4.5 for the RD, and between −1 and 1 for the TI. It is observed that for low values (cold temperatures) of TI, the TI partial effect is high which is the same for high values of FI (cold temperature) (see Figure 8). The selected baseline deterioration model is used in the optimization process to calculate the coefficient of determination between the predicted and the observed values over the next

*t*years using the baseline. Then, the updated model over the next coming

*t*years is used to calculate the using data of the

*t*years following the end of the baseline. The difference between the two is minimized with the time

*t*constrained to be between the defined range and positive.

### Optimization and model updating

The optimization is run sequentially. After the first run, the deterioration model is updated and the range of the time period is redefined (see Figure 2). Multiple optimizations runs with the updated deterioration models are used to calculate the updating time periods within the available range of historical data. Results from different runs with the obtained and *RMSE* are given in Table 3. The analysis of this table shows how long the deterioration model can be used to make predictions and the accuracy expected during the given time. It also shows how the Bayesian updating is improving the accuracy of the deterioration model. For all the selected covariates, the uncertainty band reduces as the model is updated until the final model is selected for long-period prediction. Brief descriptions of the length, diameter, and age covariates' partial effects change during the optimization process are presented.

Run . | Model . | . | RMSE . | Optimal time . | Time period . |
---|---|---|---|---|---|

1 | Baseline | 0.83 | 0.363 | 1956–1990 | |

2 | Optimization 1 | 0.965 | 0.152 | 2 | 1990–1992 |

3 | Optimization 2 | 0.96 | 0.099 | 2 | 1992–1994 |

4 | Optimization 3 | 0.96 | 0.099 | 3 | 1994–1997 |

5 | Optimization 4 | 0.914 | 0.125 | 11 | 1997–2008 |

Run . | Model . | . | RMSE . | Optimal time . | Time period . |
---|---|---|---|---|---|

1 | Baseline | 0.83 | 0.363 | 1956–1990 | |

2 | Optimization 1 | 0.965 | 0.152 | 2 | 1990–1992 |

3 | Optimization 2 | 0.96 | 0.099 | 2 | 1992–1994 |

4 | Optimization 3 | 0.96 | 0.099 | 3 | 1994–1997 |

5 | Optimization 4 | 0.914 | 0.125 | 11 | 1997–2008 |

#### Length covariate analysis

*PBR*using both the updated and the baseline shows an increased accuracy in the predictions using the updated model. In this study, it is demonstrated that the reduction in uncertainty can be achieved through sequential updating of the initial model built from a reduced amount of data. The time period of updating to increase the accuracy is obtained through optimization.

The consideration of the entire data set as a single simulation period leads to high uncertainty. Figure 9 shows how the uncertainty band is reduced from one optimization run to another until the end of the simulation period. The length has a reduced credible interval for short pipes (below 400 m). When the length of the pipes increases, the uncertainty increases. However, it is observed that the partial effect of the length is getting values below zero meaning for long pipes, the length is not a significant factor that can increase the failure rate of metallic pipes in the City of Calgary. Long pipes are not prone to failure.

#### Age covariate analysis

*PBR*), decreases (when new pipes are installed, there is a tendency to observe a decrease in

*PBR*), and then starts a monotonic increase after 30 years (see Figure 10). This curve can be comparable to the bathtub curve that is assumed to represent the life of pipes in water distribution systems (Rajani & Kleiner 2001). The uncertainty band in the partial effect of the age covariate in the baseline model is large (going from −1 to +1) and does not allow accurate predictions. After the first optimization and updating the baseline, the increasing trend of the partial effect is maintained, and we can see a reduction in the uncertainty band that is due to the new information being brought to the model through the updating process (Figure 10). This period coincides with the early experimentation of CP in the system. Fine-tuning the current potential and time to find the best operating parameters of the newly installed technology can be the reasons for the obtained increasing trends. After 2 years, it is assumed that the technology is known and well deployed and the expected results are observed through a significant reduction of the failure rate of metallic pipes in the system. During the following 5 years, it is observed that the breakage rate of pipes is inversely correlated with the age of the pipes. During the same period, the city started installing plastic pipes to replace metallic pipes that were above 50 years old and had a declining structural condition. So the role of CP is reduced and plastic pipes' lives follow the ‘bathtub’ theory (which has a direct effect on the increase of the failure rate).

#### Diameter covariate analysis

#### Model predictions

The analysis of the uncertainty bound in the output response also shows an improvement. The updated models are presenting a reduced distribution range of the effect on the response output. These models are used to predict the *PBR* of the pipes in the network.

Figure 12 gives the univariate prediction of the *PBR* using the age and the length. The presented figures are obtained by fixing all the factors in Equation (4) to constant values and only changing the selected factor. Figure 12(a) shows how well the updated model after the fourth optimization is performing compared with the baseline for the age covariate. The uncertainty is reduced and for pipes above 60 years old, the prediction's accuracy is very high. However, for the pipes between 20 and 60 years old, there is still some uncertainty that is lower than the uncertainty obtained in the baseline. This is an improvement. Figure 12(b) shows the length prediction in the same conditions as the age. It is also an improvement compared with the baseline. Figure 12 gives the univariate prediction of the PBR using the age, the length, the diameter, the number of previous failures, the material and the cathodic protection. It is also an improvement compared with the baseline. The same conclusions are drawn for the diameter, the number of the previous failure, and the material (Figure 12(c)–12(e)) where the uncertainty bands are reduced in the updated model. However, CP behavior is different. The final optimization run is going from 1997 to 2008 during which the CP was already installed. So all the pipes had protection. This is observed in Figure 12(f) where the baseline has both protected and unprotected pipes. However, the updated model will only consider the future protected pipes.

### Comparison of the 5 years forecast

Since the first optimal updating period obtained from the optimization is 2 years, the baseline model parameters are updated using the next 2 years' data (1991–1992) and the new model is named the ‘updated model’. This updated model is used to make future pipe breakage (*PBR*) predictions over the next 5 years (1993–1997) and compare the results with *PBR* predictions made with the baseline model over the same time period.

*PBR*(Figure 13(a)). The majority of data points are above the line. After updating the model at an optimally selected time period (2 years), an increased correlation appears between the predicted

*PBR*and the observed

*PBR*which is observed in Figure 13(b) where points data are aligned over the line. Model improvement can also be evaluated through the comparison between the coefficients of determination for both models. The baseline model is less accurate with a = 0.76 while the updated model gives more accurate predictions with a = 0.89. It is shown here how important models need to be updated, and also that the selection of an optimal updating span period will have a high impact on the accuracy increase and will reduce the uncertainty in the model as seen in previous sections.

## CONCLUSION

In this paper, an individual deterioration model is developed using Bayesian inference with P-splines and Gaussian Markov random field to represent the nonlinear functions of the continuous covariates and the geospatial correlation of the pipe's location, respectively. The model considered the effects of physical, soil, and climate factors on the structural condition of water pipes using linear, nonlinear, and geospatial effects on the expected mean value of the *PBR*. The Bayesian inference captures the uncertainty in the data through appropriate polynomials and degree selections. Spatial autocorrelation and local heterogeneities are introduced to capture unknown covariates that may have an influence on the deterioration of water pipes. The individual effects of the covariates show a given level of uncertainty that affects the future predictions of the response variable. Optimal Bayesian updating is used to reduce the uncertainty in the initial model using the learning flexibility of Bayes's theorem. The determination of the number of years, up to which the *PBR* predictions made using the deterioration model are acceptable, is obtained through an optimization process that determines the optimal time after which the deterioration model should be updated. The updated model is obtained after updating the parameters of the actual model using the newly acquired information provided by data collected during the time period. Sequential updating is executed from an initial model that is built from noninformative prior distributions of the covariates. The optimization coupled with the updating process shows enhancement in the future predictions of the *PBR* and that the updated deterioration model performs well although the optimization is requiring short time periods to update the model. In the application of the proposed framework to predict the failure of water pipes in the City of Calgary, the optimal updating time periods vary from 2 to 10 years. The updating following this timing shows significant improvement in the assessment of the uncertainty in the partial effects of the input factors on the response *PBR* and in the accuracy of the same response. On the other side, the reduced updating time periods induce costs (failure data evaluation and recording) that should be included in the decision-making process. The gain in accuracy has a side effect of the cost that should be evaluated to decide whether it is acceptable to collect the information needed. The final proposed model allows making predictions over the next 11 years. That is an acceptable time to allow two successive inspections of the pipes in a system. The future application will consider the life cycle cost and the updating through optimization to allow optimal decision-making under uncertainty.

## ACKNOWLEDGEMENTS

The authors acknowledge the financial support through the Natural Sciences and Engineering Research Council of Canada (RGPIN-2019-05584) under the Discovery Grant programs.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*Statistical Modelling of Pipe Failures in Water Networks*