Abstract
Municipal water managers rely on pipe deterioration models to plan maintenance, repair, and replacement. Although efforts have been made to increase their accuracy, these models are subject to uncertainties in the predictions. In this paper, an optimization procedure of the Bayesian updating period of the parameters of an existing deterioration model is proposed to sequentially reduce the uncertainty in the prediction of the water pipe breakage rate variable. This latter is modeled using a structured geoadditive regression technique where covariates are allowed to have linear (e.g., categorical) and nonlinear (e.g., continuous) relationships with the response variable. Unknown and unobserved covariates are included in the model through a geospatial component that captures spatial auto-correlations and local heterogeneities. The optimization procedure searches through the time series data to identify the optimal updating period horizon that corresponds to the minimum error between the predicted coefficient of determination between predictions and observations using the unupdated and updated models. The process is repeated until the entire time series data is covered. The application of this approach to failure data of large Canadian urban water systems shows a significant reduction in the uncertainty of the parameters and increases the accuracy in the prediction of the output response variable.
HIGHLIGHTS
Optimization.
Bayesian Updating.
Gaussian Markov Random Fields.
P-Splines.
Deterioration model.
INTRODUCTION
METHODOLOGY
Geoadditive regression model
Bayesian updating
Bayesian belief updating is the transformation of prior beliefs into posterior beliefs when new information is observed. The posterior distribution quantifies the uncertainty of model parameters and structure using the best available information and data (Watson et al. 2004). The proposed approach here considers a multistage updating process where the posterior obtained after the first inference becomes the prior in the second inference and the new posterior is estimated from it and the newly collected data. The state of knowledge is updated with information gained at each step.
Optimization of the updating time
CASE STUDY AND DATA DESCRIPTION
Covariates (Physical) . | Unit . | Description . |
---|---|---|
Material | NAa | Designed material of pipes (categorical: 1 = cast iron, 2 = ductile iron) |
Age | years | The difference between the reported failure date and the installation date (continuous) |
Length | m | The manhole to manhole distance (continuous) |
Diameter | mm | Size of the pipes (continuous) |
Rservs | NA | Number of residential connections to the pipe (continuous) |
Cservs | NA | Number of commercial buildings connected to the pipe (continuous) |
Covariates (Soil characteristics) | ||
Soil corrosivity index (SCI) | NA | The nature of soil representing its aggressiveness to metallic pipes (continuous) |
Cathodic protection (CP) | NA | Is the cathodic protection in place (categorical) |
Covariates (Climatic) | ||
Thawing index (TI) | degree-days | Magnitude of thawing season (continuous) |
Freezing index (FI) | degree-days | Severity of freezing period (continuous) |
Rain Deficit (RD) | cm | Difference between received and evaporated precipitation (continuous) |
Geospatial location | NA | The geographic location of the water pipe |
PBR (Response variable) | NA | Pipe breakage rate is the number of breakage per year/100 km (continuous) |
Covariates (Physical) . | Unit . | Description . |
---|---|---|
Material | NAa | Designed material of pipes (categorical: 1 = cast iron, 2 = ductile iron) |
Age | years | The difference between the reported failure date and the installation date (continuous) |
Length | m | The manhole to manhole distance (continuous) |
Diameter | mm | Size of the pipes (continuous) |
Rservs | NA | Number of residential connections to the pipe (continuous) |
Cservs | NA | Number of commercial buildings connected to the pipe (continuous) |
Covariates (Soil characteristics) | ||
Soil corrosivity index (SCI) | NA | The nature of soil representing its aggressiveness to metallic pipes (continuous) |
Cathodic protection (CP) | NA | Is the cathodic protection in place (categorical) |
Covariates (Climatic) | ||
Thawing index (TI) | degree-days | Magnitude of thawing season (continuous) |
Freezing index (FI) | degree-days | Severity of freezing period (continuous) |
Rain Deficit (RD) | cm | Difference between received and evaporated precipitation (continuous) |
Geospatial location | NA | The geographic location of the water pipe |
PBR (Response variable) | NA | Pipe breakage rate is the number of breakage per year/100 km (continuous) |
aNot applicable.
Covariates are divided into two groups (continuous and discrete covariates) forming the two parts of the proposed additive model. The first part is the nonparametric part which is composed of the effect of continuous factors. All the continuous covariates in the database are assumed to have nonlinear effects on the response PBR. Categorical covariates will only have a linear effect. The geospatial component of the model captures the unknown and unobserved factors that can affect the degradation of the structural properties of pipes (see details in Balekelayi & Tesfamariam (2019a)). The parametric components of the model are estimated using the least squared error algorithm. The continuous covariates are modeled nonlinearly using P-splines. A Bayesian representation of the model is selected to account for the stochastic behavior of the studied process.
RESULTS AND DISCUSSION
Baseline selection
The objective of building the most accurate model as a baseline is fulfilled as explained in this section. The model with the highest is selected as a baseline. Two parameters have been considered in the selection of the baseline model. The parameters considered in the selection of the baseline model are the R2 and the time period of simulation. As stated in the description of the case study, the city implemented CP after 1980. Thus, to account for this important factor that has demonstrated a positive impact against the deterioration of water pipes, baseline models are developed for the period going from 1956, the initial date when the city started recording failure to after 1980. However, the analysis of the failure database showed that the first pipes that have been protected with the CP failed after 1986. Thus, multiple runs have been performed from 1986 to 2006. Table 2 shows how the and the RMSE changed for different runs between 1986 and 1993. It is observed that the maximum and the minimum RMSE were obtained in 1990.
Run . | Model . | . | RMSE . | Time period . |
---|---|---|---|---|
1 | Baseline 0 | 0.78 | 0.467 | 1956–1986 |
2 | Baseline 1 | 0.785 | 0.432 | 1956–1987 |
3 | Baseline 2 | 0.8 | 0.410 | 1956–1988 |
4 | Baseline 3 | 0.815 | 0.389 | 1956–1989 |
5 | Baseline 4 | 0.83 | 0.363 | 1956–1990 |
6 | Baseline 5 | 0.83 | 0.373 | 1956–1991 |
7 | Baseline 6 | 0.77 | 0.402 | 1956–1992 |
8 | Baseline 7 | 0.76 | 0.405 | 1956–1993 |
Run . | Model . | . | RMSE . | Time period . |
---|---|---|---|---|
1 | Baseline 0 | 0.78 | 0.467 | 1956–1986 |
2 | Baseline 1 | 0.785 | 0.432 | 1956–1987 |
3 | Baseline 2 | 0.8 | 0.410 | 1956–1988 |
4 | Baseline 3 | 0.815 | 0.389 | 1956–1989 |
5 | Baseline 4 | 0.83 | 0.363 | 1956–1990 |
6 | Baseline 5 | 0.83 | 0.373 | 1956–1991 |
7 | Baseline 6 | 0.77 | 0.402 | 1956–1992 |
8 | Baseline 7 | 0.76 | 0.405 | 1956–1993 |
Geoadditive regression model
The correlation between the observed and predicted values using the baseline deterioration model for the first time period is = 0.83 (see Figure 5) and a root mean squared error (RMSE) of 0.363. This is selected as the starting point of the updating of the pipe failure model. The novelty in the proposed approach is in the definition of the updating time period. These values are obtained during a successive optimization process. The global effect of the selected covariates on the response output is obtained after the sum of the partial effects of individual covariates on the response. The advantage of additive models is due to the way each factor's impact is modeled and analyzed before it is added to the global model. The categorical variables have the expected impact on the response PBR. DI pipes are more prone to failure than CI pipes. The failure rate is high when there is no CP than after its installation. The CP reduces the effect of electrochemical corrosion in metallic pipes. It mitigates the corrosivity effect of the soil on the metallic pipes.
Optimization and model updating
The optimization is run sequentially. After the first run, the deterioration model is updated and the range of the time period is redefined (see Figure 2). Multiple optimizations runs with the updated deterioration models are used to calculate the updating time periods within the available range of historical data. Results from different runs with the obtained and RMSE are given in Table 3. The analysis of this table shows how long the deterioration model can be used to make predictions and the accuracy expected during the given time. It also shows how the Bayesian updating is improving the accuracy of the deterioration model. For all the selected covariates, the uncertainty band reduces as the model is updated until the final model is selected for long-period prediction. Brief descriptions of the length, diameter, and age covariates' partial effects change during the optimization process are presented.
Run . | Model . | . | RMSE . | Optimal time . | Time period . |
---|---|---|---|---|---|
1 | Baseline | 0.83 | 0.363 | 1956–1990 | |
2 | Optimization 1 | 0.965 | 0.152 | 2 | 1990–1992 |
3 | Optimization 2 | 0.96 | 0.099 | 2 | 1992–1994 |
4 | Optimization 3 | 0.96 | 0.099 | 3 | 1994–1997 |
5 | Optimization 4 | 0.914 | 0.125 | 11 | 1997–2008 |
Run . | Model . | . | RMSE . | Optimal time . | Time period . |
---|---|---|---|---|---|
1 | Baseline | 0.83 | 0.363 | 1956–1990 | |
2 | Optimization 1 | 0.965 | 0.152 | 2 | 1990–1992 |
3 | Optimization 2 | 0.96 | 0.099 | 2 | 1992–1994 |
4 | Optimization 3 | 0.96 | 0.099 | 3 | 1994–1997 |
5 | Optimization 4 | 0.914 | 0.125 | 11 | 1997–2008 |
Length covariate analysis
The consideration of the entire data set as a single simulation period leads to high uncertainty. Figure 9 shows how the uncertainty band is reduced from one optimization run to another until the end of the simulation period. The length has a reduced credible interval for short pipes (below 400 m). When the length of the pipes increases, the uncertainty increases. However, it is observed that the partial effect of the length is getting values below zero meaning for long pipes, the length is not a significant factor that can increase the failure rate of metallic pipes in the City of Calgary. Long pipes are not prone to failure.
Age covariate analysis
Diameter covariate analysis
Model predictions
The analysis of the uncertainty bound in the output response also shows an improvement. The updated models are presenting a reduced distribution range of the effect on the response output. These models are used to predict the PBR of the pipes in the network.
Figure 12 gives the univariate prediction of the PBR using the age and the length. The presented figures are obtained by fixing all the factors in Equation (4) to constant values and only changing the selected factor. Figure 12(a) shows how well the updated model after the fourth optimization is performing compared with the baseline for the age covariate. The uncertainty is reduced and for pipes above 60 years old, the prediction's accuracy is very high. However, for the pipes between 20 and 60 years old, there is still some uncertainty that is lower than the uncertainty obtained in the baseline. This is an improvement. Figure 12(b) shows the length prediction in the same conditions as the age. It is also an improvement compared with the baseline. Figure 12 gives the univariate prediction of the PBR using the age, the length, the diameter, the number of previous failures, the material and the cathodic protection. It is also an improvement compared with the baseline. The same conclusions are drawn for the diameter, the number of the previous failure, and the material (Figure 12(c)–12(e)) where the uncertainty bands are reduced in the updated model. However, CP behavior is different. The final optimization run is going from 1997 to 2008 during which the CP was already installed. So all the pipes had protection. This is observed in Figure 12(f) where the baseline has both protected and unprotected pipes. However, the updated model will only consider the future protected pipes.
Comparison of the 5 years forecast
Since the first optimal updating period obtained from the optimization is 2 years, the baseline model parameters are updated using the next 2 years' data (1991–1992) and the new model is named the ‘updated model’. This updated model is used to make future pipe breakage (PBR) predictions over the next 5 years (1993–1997) and compare the results with PBR predictions made with the baseline model over the same time period.
CONCLUSION
In this paper, an individual deterioration model is developed using Bayesian inference with P-splines and Gaussian Markov random field to represent the nonlinear functions of the continuous covariates and the geospatial correlation of the pipe's location, respectively. The model considered the effects of physical, soil, and climate factors on the structural condition of water pipes using linear, nonlinear, and geospatial effects on the expected mean value of the PBR. The Bayesian inference captures the uncertainty in the data through appropriate polynomials and degree selections. Spatial autocorrelation and local heterogeneities are introduced to capture unknown covariates that may have an influence on the deterioration of water pipes. The individual effects of the covariates show a given level of uncertainty that affects the future predictions of the response variable. Optimal Bayesian updating is used to reduce the uncertainty in the initial model using the learning flexibility of Bayes's theorem. The determination of the number of years, up to which the PBR predictions made using the deterioration model are acceptable, is obtained through an optimization process that determines the optimal time after which the deterioration model should be updated. The updated model is obtained after updating the parameters of the actual model using the newly acquired information provided by data collected during the time period. Sequential updating is executed from an initial model that is built from noninformative prior distributions of the covariates. The optimization coupled with the updating process shows enhancement in the future predictions of the PBR and that the updated deterioration model performs well although the optimization is requiring short time periods to update the model. In the application of the proposed framework to predict the failure of water pipes in the City of Calgary, the optimal updating time periods vary from 2 to 10 years. The updating following this timing shows significant improvement in the assessment of the uncertainty in the partial effects of the input factors on the response PBR and in the accuracy of the same response. On the other side, the reduced updating time periods induce costs (failure data evaluation and recording) that should be included in the decision-making process. The gain in accuracy has a side effect of the cost that should be evaluated to decide whether it is acceptable to collect the information needed. The final proposed model allows making predictions over the next 11 years. That is an acceptable time to allow two successive inspections of the pipes in a system. The future application will consider the life cycle cost and the updating through optimization to allow optimal decision-making under uncertainty.
ACKNOWLEDGEMENTS
The authors acknowledge the financial support through the Natural Sciences and Engineering Research Council of Canada (RGPIN-2019-05584) under the Discovery Grant programs.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.