A water distribution network is a critical infrastructure in a city whose proper function affects significantly human life. However, aging pipe assets require periodic investment plans to reduce the risk of having leaks. In order to maximize the value of the existing water infrastructure and optimize asset investment, assessing and predicting pipe life in water distribution systems has become very important. Up to now, the study for determining relevant variables and pipe failure occurrence has drawn most of the attention, which has scientific value but cannot assist real operations in the water industry. To add practical values to pipe life assessment and prognosis methods, this paper contributes (1) first, several comparable data-driven approaches are proposed to quantify pipe deterioration and the influencing variables, such as pipe diameters, materials and age; (2) then, a prediction method, for the remaining useful life of pipe assets based on the algorithms described previously, is introduced; (3) finally, an easy reading risk-level checklist is presented for all pipe assets to assist the water industry with daily operation, maintenance of assets and renewal of their water networks. All these approaches will be implemented into a real-life case study, the Barcelona WDN.

  • Several data-driven approaches are proposed to quantify pipe deterioration, considering influencing factors.

  • A method is introduced to predict the remaining useful life of pipe assets in a water network.

  • An easy-reading checklist is presented to assist the water industry with asset maintenance and renewal .

  • The proposed approaches and methods are demonstrated in a real case study: the Barcelona water distribution network.

Water scarcity problems are due to climate change because of increasing demand and important loss during transportation in many cities in the world. Water distribution networks (WDNs) are critical infrastructures dealing with water supply and distribution to customers which are growing in complexity, and simultaneously facing aging pipe failures with time. According to the American Environmental Protection Agency (EPA), around 30% of water pipes need immediate action (Allbee 2005). Maintaining or renewing these aging pipes requires a large amount of investment. In the member states of the Organization for Economic Co-operation and Development, around 0.5% of the annual gross domestic product is needed (OECD 2006). Furthermore, according to the water balance report from the International Water Association (IWA), pipe failures can lead to up to 27% of the total extracted water loss (Berg 2015). Moreover, maintaining or renewing aging pipes faces a number of barriers, due to the lack of adequate knowledge to assess the assets state, particularly the ones in the underground. Innovative tools, mathematical models and geographical information systems assess pipes’ health and predict their remaining useful life (RUL) in a WDN can reduce the investment of asset.

However, it is quite complex to assess the pipes’ life due to the knowledge required for the influencing factors and their relations with their failure . There are mainly three types of approaches to assessing pipes’ life (Winkler et al. 2018): reliability, physical degradation, and data-driven models. Among these, the reliability approaches characterize the failure probability of the buried water pipes, using historical data, which allow one to determine their RUL (Pietrucha-Urbanik & Pociask 2016), depending on important parameterized factors, such as diameter, material, and age (Su et al. 1987; Fujiwara & Tung 1991; Khomsi et al. 1996).

Beyond reliability, the degradation model uses physical principles to describe pipes' aging process. The event type, the smallest element to be described, and the process to be modeled are the three dimensions that classify degradation models (Kleiner & Rajani 2001). Event type can affect and characterize the models of failure occurrence and RUL by representing different processes of physical degradation. The dimension of the smallest element decides whether the study object is a large or a small scale of a pipe network. All three dimensions can be integrated for specific objectives.

The data-driven approach was reviewed by Scheidegger et al. (2015). Among these, the dependance of the failure rate on different features (diameter, age, and total length) has been characterized using evolutionary regression models (Berardi et al. 2008). Moreover, a neural network (NN)-based approach, the adaptive neuro-fuzzy inference system has also been proposed (Tabesh et al. 2009), to map the pipe failure rate with pipe pressure, installation depth, length, age, and diameter. Compared with other methods such as different non-linear regression, and reliability models, NN-based models work much better. An automatic damage segmentation framework for buried sewer pipes, based on machine vision techniques using a dataset of 3,558 images, was proposed by Wang et al. (2023). However, the images of buried pipes are quite difficult to obtain, especially to analyze the historical recordings 100 years ago. The performance of different machine learning models, for pipe breaks using datasets from multiple utilities, was evaluated by Chen et al. (2022), concluding that reliable historical break data affect significantly model accuracy. Machine learning was combined (Snider & McBean 2021) with survival statistics to predict the remaining service life of water mains. However, although various research studies have covered enough of the occurrence law and association of pipe failures with different features (Sun et al. 2020), an applicable approach that can assist water managers in their daily maintenance and management of their water assets is still missing.

Due to the fast development of sensing and communication techniques, a vast amount of data are available, which lead to groundbreaking advances in data-driven approaches. While turning to pipes’ life prognosis, a time-to-event prediction problem can be considered, which can predict when a future event will happen and why. For the time-to-event prediction problem, Kvamm et al. (2019) proposed one method that integrates the survival model with machine learning by extending the Cox proportional hazards model with neural networks (CoxMLP). In this way, the flexibility of neural networks can be added while modeling events using continuous time, which could be a good approach for pipes’ life prognosis. Besides survival models based on CoxMLP, evolutionary polynomial regression (EPR) is also a novel method that can provide optimal solutions by processing and learning input parameters with a large amount of data (Kvamm et al. 2019). Due to its strong capacity to handle noisy data with missing input, the EPR approach was first applied to the water field (Giustolisi & Savic 2006), and then has been widely used in predicting river discharge (Balf et al. 2018; Rezaie-Balf & Kisi 2018) and water supply (Mounce et al. 2015).

In order to have reliable solutions with comparable results, this paper proposes two data-driven approaches: a CoxMLP-based survival model and an EPR-based model, according to their approved performances in predicting and optimization (as in Awolusi et al. 2018), as well as in modeling failure features (Kvamm et al. 2019), respectively. To enhance the performance of the proposed data-driven approaches, cumulative Weibull distribution (CDF) has been used to develop a statistical reliability model as an additional comparison. The CDF has already been remarkably used in many different applications, such as distribution in life and response time, breakage data, and probabilistic analysis of reliability (Gifty & Bharathi 2020). Since CDF can approximate closely practical distributions, CDF can model numerous failure characteristics while satisfying performances for pipes’ life prognosis and assessment.

This paper mainly contributes:

  • -

    To develop several comparable data-driven models to quantify deterioration in pipes using survival, EPR and statistical reliability models. Besides, the correlations between pipes’ failure and different features have also been studied. Through analysis using state-of-the-art methods, diameter, material, and age have been considered as the most influencing features to represent pipes’ failure rate. Pipes’ life prognosis is carried out by the survival model, which consists of an NN with Cox nonlinear proportional hazards regression (CoxMLP). CDF is used to model the failure rate using techniques from reliability theory.

  • -

    To predict the RUL of pipes through pipes’ life evolution using the aging models obtained from data.

  • -

    To create an easy-reading checklist with risk-level categories and corresponding advice to assist water managers in the daily operation, maintenance, and investment of their water supply assets.

  • -

    A real-life demonstration, based on the Barcelona WDN case study, has been used to validate the performance of the proposed approaches.

Following this section, the considered case study is first presented in order to prepare a dataset for use later on. A preliminary analysis of the different features of pipe failures is provided in the ‘Case Study’ section. Afterward, all the proposed modeling approaches, which include the CoxMLP-based survival model, the EPR model, as well as reliability model based on CDF, are presented in the ‘Proposed Approaches’ section. The demonstration and application results of all the proposed approaches based on the Barcelona WDN case study are provided in the ‘Results’ section, including the checklist generated for the water industry to assist operators in planning their maintenance or renewal. Conclusions and future work are discussed in the ‘Conclusions’ section.

Barcelona WDN

The case study used in this paper is based on the Barcelona WDN. The considered database contains information from year 2002 to 2015, where 153,285 event failures have been recorded, including 15 other relevant related variables. In order to prepare a valid dataset for analysis and validation, useful data records were extracted from the raw measurements while focusing only on the pipe failure records with the useful information we need. The records without complete data were deleted.

After the dataset had been prepared, the failure rate of each record was computed for the corresponding pipe in order to develop and learn models suitable for predicting temporal evolution. There are many ways to compute the failure rate (Tabesh et al. 2009). In this paper, the pipe failure rate is calculated from historical data as follows:
(1)
where is the number of failures, is the age of a pipe, and the length of the pipe.
In Tabesh et al. (2009), the variable failure rate is defined as the length in kilometers where the leak occurs. However, in this work, failures on each pipe are more concentrated. There may be a few failures at the same pipe so the variable could be larger than 1. The age of the pipe has been defined as the Failure Year when the failure was detected minus the Installation Year of the pipe. Compared with the features considered in other references, such as Age, Diameter, Length, Pressure, and Head considered in the work of Tabesh et al. (2009), the Number of Previous Breaks, Material, Diameter, Length, and Traffic are considered in the work of Gifty & Bharathi (2020), the Barcelona DWN (Drinking Water Network) database contains more features that could be used. In particular, in the Barcelona WDN case study, Pressure zone code, Pressure, Usage, Length, Diameter, Material, Temperature, and Age are used as input features from the recorded dataset to be used for the research presented in this paper. In the Barcelona WDN case study, the Pressure zone code describes the pressure zone where the pipe is installed. In WDNs, it is usual to divide the water network into zones according to their pressure/elevation. In the Barcelona case study, pressure zones are defined by the water company due to the elevations of different areas (see Figure 1). In the literature, the value of the elevation plus the water pressure is also known as the head (Tabesh et al. 2009). The Number of Previous Breaks is another feature that needs to be explained. This variable records the number of previous breaks of a pipe for each break. To have a better analysis, all the breaks in a pipe in this paper are grouped together, considering a uniform pipe resistance in time. Therefore, we only consider all the breaks in our studied period for each pipe. However, we could have studied that separately, because once a pipe has a failure, it is repaired, so the resistance of the pipe changes. The Temperature was referred from the historical meteorological measurements obtained by Barcelona's local government available from an open-access database.
Figure 1

Barcelona WDN pressure zones identified with different colors and codes.

Figure 1

Barcelona WDN pressure zones identified with different colors and codes.

Close modal

Preliminary analysis

In general, the use of more feature/factor inputs could potentially lead to a better model but more data measurements are also needed, which also adds difficulties and uncertainties to practical implementation. For developing approaches that are much easier to apply in practice, a preliminary analysis is conducted first to determine the importance of different features. Only the most influencing features will be taken into account for predicting failure rate temporal evolution later on.

The preliminary analysis is carried out by implementing correlation analysis to the valid dataset and pipe failure rates computed in Section 2.1 using the pandas.DataFrame.corr function in the Python environment. The analysis results are shown in Figure 2, which indicates the importance of different features in the form of percentages. Figure 2 shows that Material, Age, and Diameter are the three most important features, leading to pipes’ failures with a total importance of 95.64%. Therefore, in this paper, the proposed models and approaches will mainly focus on Material, Age, and Diameter as the three most important features. However, the Pressure Zone could also be applicable for focusing the analysis on parts of the network, according to the requirements of the water company.
Figure 2

Features correlation importance regarding pipes’ failure rate.

Figure 2

Features correlation importance regarding pipes’ failure rate.

Close modal

Two data-driven approaches, a CoxMLP-based survival model and an EPR model, are developed for data-based pipe prognosis, according to their performances in prediction (Awolusi et al. 2018) and modeling numerous failure characteristics (Kvamm et al. 2019). As an additional comparison for data-driven methods, a CDF-based statistical reliability model has also been presented.

The application of all the approaches follows the same procedure:

Step 1: Failure rate calculation: Based on the historical database, the pipe failure rate (1) is calculated for each tuple (material, age, and diameter).

Step 2: Training/validation phase: A model for the pipe failure rate is built for each tuple obtained in Step 1 using 70% of the data in the considered database. The remaining 30% of the data are used for model validation.

Step 3: Prediction phase: The trained/validated model is used to forecast the evolution of the failure rate in the future (several years ahead), determining the RUL and the maintenance checklist.

The CoxMLP model

In the Cox model (Scheidegger et al. 2015), the failure rate can be described as a function with time and influencing features x (presented in Section 2):
(2)
where is the initial value of base failure rate and is a function estimated from historical data using a regression approach (see Harrel 2001, for more details). The relative failure rate is expressed using an exponential term which only depends on the studied co-variate changes in time.

According to the NN prediction performance and flexibility while modeling the failure times continuously (see e.g. Awolusi et al. 2018; Kvamm et al. 2019), in this paper, the function is obtained using a multilayer perceptron (NN instead of considering a linear predictor () as in the classical Cox model (Scheidegger et al. 2015). This is why this method is called CoxMLP The constraint of the proportionality of this model has not been affected by this parameterization process of the failure rate function (2). Moreover, this paper also proposes a parametric method without requiring grouping the matching features. Time can used as a regular co-variate , which permits interactions between time and other co-variate. Similar to survival analysis, time-dependent co-variate can be taken into account to model the non-proportional effect of a co-variate

The EPR model

The EPR model is an evolutionary computation-based data-driven approach, which can handle pseudo-polynomial structures and represent a physical system precisely. There are several steps in the EPR model: (1) EPR finds suitable model structures using a genetic algorithm based on an evolutionary procedure; (2) carrying out a linear regression using least square optimization in order to compute the model constants. More detailed mathematics about EPR can be found in Berardi et al. (2008) and Mounce et al. (2015), who have already proposed the EPR method to build models for different materials in WDNs. For the pipes without known failures, the same model coefficient as given by EPR is used; for the pipes with failures being monitored, the model coefficient is computed by their historical burst data (Berardi et al. 2008).

In this work, both healthy and faulty pipes are characterized jointly using linear regression to the model coefficient parameter for every material. The pipes that did not have any faults are also considered to carry out the normalization step so that the failure rate over time can be modeled independently from whether the failure exists. This explains why the number of the recorded faults is larger than or equal to 0, . The method to obtain the model coefficient parameter for different materials is as follows:

  • Failures for each age at different years of the monitoring period have been computed and accumulated. The cumulative failure numbers are then normalized considering total pipe numbers, as well as the length of each material.

  • Compute the model coefficient parameter for the failures at the same age and at the same monitoring year.

  • Compute the model coefficient parameter for each material by weighing each with the number of total failures for each age over the total number of failures.

Afterward, considering the failure rate of pipe, i evolves according to
(3)
with
(4)
(5)
then the EPR model is derived as follows:
(6)
where denotes the corresponding EPR model coefficient; is the recorded failures for the ith pipe; is the equivalent age of the pipe class at the end of the monitoring period, and t is the time variable. is the slope that can return , is the y-intercept for each pipe at Equation (6). can be obtained using Equation (5). T represents the monitoring period, and is the pipe's length.
Thus, the failure rate of every pipe can be computed using the following equation:
(7)
where
where means the number of failures for the pipe at the age j, while is the coefficient model parameter for the pipes at age j.
The number of predicted failures per pipe can be computed by integrating all pipes at Equation (8) regarding the h time horizon:
(8)
Then, the failure probability () can be computed according to
(9)

The statistical reliability model

The model of statistical reliability can be computed through learning from the historical data for the model parameters, which permits the probability of failures at a given time for a certain pipe. The cumulative failure probability (CDF) will provide failure probability for one specific pipe at a certain age with respect to the cumulative number of pipes at the same age. Different distribution functions can be used to have a good fit between the model and the analyzed data. Regarding the physical evolution of the pipe failures (Vladeanu & Koo 2015), the probability density function is given by
(10)
with the following parameters:
  • -

    c is the scale (or characteristic) life parameter; k is the shape parameter, also called the Weibull slope, and x is the considered pipe feature.

In order to compute the FP, of a given material, we have to consider how many failures happened at a certain date since the pipe installation data. Then, the pipe failure probability at a certain date is computed from the CDF as follows:
(11)
among which the parameters that need to be calibrated are c, k in order to have an optimal fitting. From (10) and (11), the failure rate can be calculated as follows:
Figure 3 presents the dynamic evolution for CDF under different values of parameters k (9.5908 and 3) for pipes of diameter 100 mm. In this case, the abscissa axis represents the age of the pipe for a certain material in a scaled form to show the whole time evolution of the CDF. The material of a pipe is a categorical variable, thus we defined a number for each studied material. As shown in Table 1, only three materials have been considered: Asbestos, Gray Cast Iron, and Ductile Iron; therefore, our materials are (1, 2, and 3). To obtain the annual failure probability, a scaling of the axis up to the total number of years should be applied. Note that implies the failure rate increases with time. In addition, higher values show that distributions reach the total number of failures later than those for lower k values.
Table 1

25-year Ahead failure prognosis

WeibullCOXEPR
MaterialTime horizonFP (%)FP (%)FP (%)
Asbestos cement 5 years 0.9537 0.8749 0.8994 
10 years 2.1618 2.0638 1.8556 
15 years 3.0661 2.9728 2.8685 
20 years 4.0284 3.8419 3.9382 
25 years 5.1749 5.0454 5.0647 
Gray cast iron 5 years 0.9214 0.7921 0.8589 
10 years 2.0871 1.9762 1.7789 
15 years 2.9584 2.9042 2.7601 
20 years 3.8848 3.5587 3.8023 
25 years 4.9879 4.9247 4.9057 
Ductile iron 5 years 0.3256 0.2696 0.2540 
10 years 0.7393 0.6741 0.5429 
15 years 1.0503 0.9929 0.8668 
20 years 1.3822 1.2193 1.2258 
25 years 1.7781 1.6395 1.6197 
WeibullCOXEPR
MaterialTime horizonFP (%)FP (%)FP (%)
Asbestos cement 5 years 0.9537 0.8749 0.8994 
10 years 2.1618 2.0638 1.8556 
15 years 3.0661 2.9728 2.8685 
20 years 4.0284 3.8419 3.9382 
25 years 5.1749 5.0454 5.0647 
Gray cast iron 5 years 0.9214 0.7921 0.8589 
10 years 2.0871 1.9762 1.7789 
15 years 2.9584 2.9042 2.7601 
20 years 3.8848 3.5587 3.8023 
25 years 4.9879 4.9247 4.9057 
Ductile iron 5 years 0.3256 0.2696 0.2540 
10 years 0.7393 0.6741 0.5429 
15 years 1.0503 0.9929 0.8668 
20 years 1.3822 1.2193 1.2258 
25 years 1.7781 1.6395 1.6197 
Figure 3

CDF with different k values.

Figure 3

CDF with different k values.

Close modal

Pressure zone and material statistical validation

Once the probability distributions for all pressure zones have been obtained, we will carry out an analysis of the RUL for a certain material. Knowing the total materials in each zone will help us to discover the weight of each material in different zones. This will enable obtaining the joint probability of materials in a particular pressure zone. Considering the failures in different pressure zones, only the zones with the highest number of failures have been selected, for illustrative purposes, from a total of 157 pressure zones:

  • -

    Pressure zone 10101 (with 14,048 failures);

  • -

    Pressure zone 10004 (with 12,470 failures) and

  • -

    Pressure zone 10103 (with 12,197 failures)

Afterward, we can weigh the associated failure probabilities according to the number of failures per material (i=1,,nm) in the zone under study, so that the joint accumulated failure distribution of each pressure zone will be given by the following equation:
(12)
In addition, the failure probability per each material in each pressure zone in a future time (e.g. in 5, 25, 50, and 75 years) can be easily obtained. On the other hand, the failure probabilities obtained in each pressure zone per each material can also be obtained considring the distribution of the material in the current pressure zone and the material probabilities considering all pressure zones as follows:
(13)
where we use material probabilities that appear in the pressure zone and the weighted factor given by the total failures under different materials over the number of failure for all materials that can be also computed as the percentage . With only the material distribution in each pressure zone and the material probabilities obtained considering the whole network, we can estimate the failure probabilities of each pressure zone.

The motivation behind this verification is to ensure that failure probabilities obtained per pressure zone considering the material distribution in that pressure zone coincide to a great extent with the failure probabilities obtained per pressure zone. This results in a better understanding of the failure probability of each feature and also allows verifying that the probability computation for each material at a macro-level extrapolated to a specific pressure zone.

Prediction and prognosis

According to the explanation and discussion in previous subsections, the CoxMLP can predict the failure probability based on features that have been selected in Section 2. Besides that, CoxMLP can also be applied for pipe failure prognosis. To proceed with the pipe prognosis, we have predicted pipe status using a 25-year ahead horizon with a 5-year step. The features we consider are the most influential ones, the material, the diameter, and also the age. Furthermore, in the considered period, we will also compute the failure rate, using Weibull distributions and the built EPR model.

To prepare for the prediction and prognosis, we have listed material types and the percentage of usage of each material in the Barcelona WDN. The summary is as follows. 18 different types of material have been used in the Barcelona WDN. Among these, ductile iron has been used mostly, with 41.69%; after that, gray cast iron covered 22.35% of the network and asbestos cement corresponded to 13.92%. The rest of the most used materials are used in the following percentages: high-density polyethylene (10.88%), low-density polyethylene (5.60%), reinforced concrete (1.50%), and reinforced concrete welded joint (1.48%). As we cannot collect enough data for the material type with less usage percentage in the prognosis process, we can only focus on the material type and diameter with a higher usage percentage. Therefore, the material types that we are going to implement pipes’ failure probability prediction include asbestos cement, gray cast iron, and ductile iron. In addition to material, diameter, and age will also be considered during the prediction and prognosis processes.

In this section, the proposed approaches are validated using the Barcelona WDN presented in Section 2. The results show the considered case study:

  • -

    comparable results among survival CoxMLP models, EPR models, and CDF models in predicting pipe failures in future;

  • -

    comparable performance between the survival CoxMLP model and CDF model regarding the RUL over time;

  • -

    easy-reading maintenance and renewal plan for the coming 25 years in the form of a checklist.

Failure rate prediction

The Barcelona WDN includes different types of pipes due to their materials, installation periods, diameters, and construction processes. The failure rate is calculated using (1) for all the methods as discussed at the beginning of Section 3. As the calculation is a kind of prediction, we will accumulate the new leakages. As shown in Table 1, the prediction of the pipe failure probability will cover the three main materials and most common diameter (100 mm) in the coming 25 years, with a 5-year time step, using the three proposed methods. While using the EPR approach, we consider the parameter for each material that has been estimated using historical data and the procedure described in the work of Berardi et al. (2008) 

  • -

    = 0.0560 for the asbestos cement material;

  • -

    = 0.0858 for the gray cast iron material;

  • -

    = 0.0199 for the ductile iron material.

It is remarkable to note that from the above table, all the prognosis results provided by the three different approaches are quite close. For a clearer interpretation of the probability prognosis, the prediction evolution for each method can be consulted in Figures 46. The pipe failure probabilities calculated by the Weibull approach are relatively high, giving rise to an upper bound, while the results calculated by Cox and EPR are quite close to each other.
Figure 4

Gray cast iron failure probability (%).

Figure 4

Gray cast iron failure probability (%).

Close modal
Figure 5

Ductile iron failure probability (%).

Figure 5

Ductile iron failure probability (%).

Close modal
Figure 6

Asbestos cement failure probability (%).

Figure 6

Asbestos cement failure probability (%).

Close modal

In the case of pipes with a current age greater than 60 years, it is highly recommended to carry out an intensive monitoring of the evolution of the pipe because the prediction for all materials and diameter ranges indicates that in less than 5 years it will enter the high-risk zone.

RUL estimation

To give more information about the pipes’ status in the coming years, we have developed the reliability and survival models in terms of RUL over time for the three main materials in the Barcelona WDN (the gray cast iron, the asbestos cement, and the ductile iron). The RUL is usually defined as the estimated amount of time an asset has until it becomes unusable or requires replacement. To calculate the RUL, a threshold value for the probability of failure is fixed. The time required for the probability of failure to reach such a threshold determines the RUL. In Figure 7, the RUL evolution for pipes in gray cast iron is based on the reliable model of CDF, and the survival model of CoxMLP in 118 years is presented. In this figure, the RUL evolution is shown in the vertical axis going from the ‘0’ value, which means no failures have been experienced. The RUL will go up as time goes by. This figure shows that the RUL starts to increase sharply from 0 until around 50 years. After that, there is a smooth period of RUL from around 50 to around 100 years. RUL will then increase sharply afterward tending asymptotically to ‘1’, which means that the pipe is arriving at the end of life. This can be related to the classical bathtub curve (Berardi et al. 2008). Initially, the installed pipes have a major risk of failure as there can be errors in the installation or other factors. Once they start working, this risk decreases fast till becomes stable for some years. However, with the years, after some point, this risk increases again as time causes damage to the pipes. This kind of curve is common in other fields of application. As shown in Figure 7, we start the curve from 0 as, initially, the installation or other failures are not considered, so we can see how it only shows half of the U-shape.
Figure 7

RUL based on Weibull and CoxMLP.

Figure 7

RUL based on Weibull and CoxMLP.

Close modal

In order to compare clearly the performance of the two approaches, we add more details to Table 2. As this additional comparative analysis has been implemented in a subset of the entire network, the conclusions apply at the subset level with limitations to the entire set. In Table 2, as presented in the column, it is clear that, after considering age and material using CDF (second method), the material with a higher usage percentage has more failures. As the method using CoxMLP (first method) needs more calibrated parameters, the RUL percentage is lower because each intervening factor adds given weight to the total failure. Not all the factors will be considered in the second method, which may decrease the factual percentage. Moreover, as the number of failures decreases for the less used materials, the CoxMLP method shows that the RUL is lower with a higher percentage of failures. However, including more factors using the CoxMLP method will permit us to characterize more precisely pipes behavior.

Table 2

RUL regarding material

RUL regarding material (probability in % of pipe failures)
Material (presence)Date (years) First methodSecond methodDifference (%)
Asbestos cement (13.92%) (56–101) 1,844 70.8% 74.9% 4.1% 
Gray cast iron (22.35%) (63–101) 4,258 63.8% 67.2% 3.4% 
Ductile iron (41.69%) (59–102) 2,985 67.8% 71.4% 3.6% 
RUL regarding material (probability in % of pipe failures)
Material (presence)Date (years) First methodSecond methodDifference (%)
Asbestos cement (13.92%) (56–101) 1,844 70.8% 74.9% 4.1% 
Gray cast iron (22.35%) (63–101) 4,258 63.8% 67.2% 3.4% 
Ductile iron (41.69%) (59–102) 2,985 67.8% 71.4% 3.6% 

From Table 2 it is important to note how some materials present more problems than others. For example, the gray cast iron has more failures than the ductile iron even though this last one has more presence in the WDN. From this, we can deduce that the gray cast iron is a more problematic material for the pipes than the Ductile Iron.

Maintenance and renewal plan

After presenting all the predictions, prognosis, and analysis results, our final aim is to assist wisely the WDN managers or operators in deciding how to proceed with the renewal plans. Based on the obtained results, an easy-reading checklist has been developed for pipeline monitoring and maintenance plans under diameter, age, and material features. The checklist includes three types of operations: continue to work, should be supervised, and should be replaced according to the status analysis of their pipes in terms of A, B, C, and D as four assessment zones. The procedure of building up this checklist is elaborated as follows:

  • (1) Zone A corresponds to the period that does not need assessment, as in zone A, the failure probability is quite low.

  • (2) We will start the assessment from the B and C zones, where zone B refers to the [65–70]% of the age while zone C refers to the [75–80]% of the age. Zones B and C represent the failures in relatively low appearance.

  • (3) Zone D corresponds to the period where the failure rate indicates that there is a very high probability of failing without needing an assessment.

B and C zones are used as thresholds to evaluate possible operations of the pipes, should they continue to work, need to be supervised, or need to be replaced immediately. The important parameters of age, diameter, and pressures, which have a strong impact on the failure occurrence, will be considered. More explanations about Zone B and Zone C are given as follows:

Zone B:

  • -

    Among the three important parameters, there is only one parameter that has values higher than the defined threshold, which will activate the monitoring operation. Thresholds for different features are determined from the obtained results and practical knowledge of the end-users.

  • -

    Among the three important parameters, there is more than one parameter that are higher than the thresholds, which will activate the asset replacement operation.

Zone C:

  • -

    Among the three important parameters, there is only one parameter that has a value higher than the defined threshold, which will suggest the asset replacement operation.

  • -

    Among the three important parameters, there are more than one parameter that are higher than the thresholds, so, as in Zone B, it will activate the asset replacement operation.

A checklist for gray cast iron pipes is extracted for Zone B and Zone C. Moreover, the assessment of a pipe of 59 years (asset enters monitoring phase) is shown in Table 3, where WAR (Within Allowed Range of value) and OAR (Outside Allowed Range of value) mean that a certain parameter is within or outside the allowed range of values, respectively.

Table 3

Checklist for water managers

Gray cast iron
Zone B
Zone C
Failure 4258Critical phaseAssessAnnotateImmediate renewalAssessAnnotate
Age [68–74] 76 OAR [79–87] 79 – 
Diameter 100, 150, 200, 80 100 WAR 100, 150, 200, 80 100 – 
Min pressure (m.w.c) [48–62] 52 WAR [44–66] 52 – 
Max pressure (m.w.c) [49–63] 58 WAR [46–70] 58 – 
Gray cast iron
Zone B
Zone C
Failure 4258Critical phaseAssessAnnotateImmediate renewalAssessAnnotate
Age [68–74] 76 OAR [79–87] 79 – 
Diameter 100, 150, 200, 80 100 WAR 100, 150, 200, 80 100 – 
Min pressure (m.w.c) [48–62] 52 WAR [44–66] 52 – 
Max pressure (m.w.c) [49–63] 58 WAR [46–70] 58 – 

Likewise, a checklist has been carried out to incorporate failure prediction in the next 25 years. It provides additional information to the water managers, by means of time intervals in years that are inside of three assessment zones: low, medium, and high risk, following the same criteria applied for zones A, B, and C, that is, less than 65%, [65–70]% and [75–80]% of the asset age, respectively. Thus, a given low-risk zone age interval of [0–10]% shows that the given material will enter the medium-risk zone after 10 years where the risk of asset failure is already significant. The parameter age (years) has been divided into three intervals [0–30], [30–60], and [60 – ]. Regarding the materials, the asbestos cement types, the gray cast iron types, and the ductile iron types have been included. The diameter (mm) has been split into three intervals [0–100], [100–150], and [150 – ]. This will facilitate the adoption of maintenance tasks for certain properties of a pipe. A checklist has been given in Table 4.

Table 4

Failure prediction in the next 25 years

Prognosis of failure in the next 25 years
AgeMaterialDiameterLow-risk zoneMedium risk zoneHigh-risk zone
[0–30] Asbestos cement [0–100] [0–25) [25–37) [37 – onwards] 
[100–150] [0–19) [19–31) [31 – onwards] 
[150 – ] [0–29) [29–38) [38 – onwards] 
Gray cast iron [0–100] [0–21) [21–35) [35 – onwards] 
[100–150] [0–20) [20–28) [28 – onwards] 
[150 – ] [0–25) [25–37) [37 – onwards] 
Ductile iron [0–100] [0–25) [25–36) [36 – onwards] 
[100–150] [0–18) [18–32) [32 – onwards] 
[150 – ] [0–27) [27–39) [39 – onwards] 
[30–60] Asbestos cement [0–100] [0–11) [11–18) [18 – onwards] 
[100–150] [0–13) [13–22) [22 – onwards] 
[150 – ] [0–9) [9–17) [17 – onwards] 
Gray cast iron [0–100] [0–14) [14–23) [23 – onwards] 
[100–150] [0–12) [12–20) [20 – onwards] 
[150 – ] [0–16) [16–25) [25 – onwards] 
Ductile iron [0–100] [0–9) [9–18) [18 – onwards] 
[100–150] [0–7) [7–15) [15 – onwards] 
[150 – ] [0–16) [16–24) [24 – onwards] 
Prognosis of failure in the next 25 years
AgeMaterialDiameterLow-risk zoneMedium risk zoneHigh-risk zone
[0–30] Asbestos cement [0–100] [0–25) [25–37) [37 – onwards] 
[100–150] [0–19) [19–31) [31 – onwards] 
[150 – ] [0–29) [29–38) [38 – onwards] 
Gray cast iron [0–100] [0–21) [21–35) [35 – onwards] 
[100–150] [0–20) [20–28) [28 – onwards] 
[150 – ] [0–25) [25–37) [37 – onwards] 
Ductile iron [0–100] [0–25) [25–36) [36 – onwards] 
[100–150] [0–18) [18–32) [32 – onwards] 
[150 – ] [0–27) [27–39) [39 – onwards] 
[30–60] Asbestos cement [0–100] [0–11) [11–18) [18 – onwards] 
[100–150] [0–13) [13–22) [22 – onwards] 
[150 – ] [0–9) [9–17) [17 – onwards] 
Gray cast iron [0–100] [0–14) [14–23) [23 – onwards] 
[100–150] [0–12) [12–20) [20 – onwards] 
[150 – ] [0–16) [16–25) [25 – onwards] 
Ductile iron [0–100] [0–9) [9–18) [18 – onwards] 
[100–150] [0–7) [7–15) [15 – onwards] 
[150 – ] [0–16) [16–24) [24 – onwards] 

Future broader implication

To explore the further and broader implications of these proposed methods, a comparison of these approaches has been created in Table 5 to discuss their broader implications. Moreover, to indicate how the proposed methods and maintenance plan can be implemented for pipe life prognosis, a flowchart is also developed from data collection to check-list generation, as shown in Figure 8.
Table 5

A comparable table for the proposed approaches

TypesCharacteristicPipe prognosisRUL
CoxMLP Survival model Flexible neural network, suitable for continuous time-to-event problems Quite close pipe failure probability for pipes with different materials and time horizons, for more details, see Table 1  RUL for additional comparisons are quite close with less than a 5% difference, for more details, see Table 2  
EPR Evolutionary polynomial regression model Strong capacity in handling noisy data with missing inputs; can represent a physical system precisely. 
CDF Statistical reliability model Can approximate closely practical distributions; Has the potential to model numerous failure characteristics  
TypesCharacteristicPipe prognosisRUL
CoxMLP Survival model Flexible neural network, suitable for continuous time-to-event problems Quite close pipe failure probability for pipes with different materials and time horizons, for more details, see Table 1  RUL for additional comparisons are quite close with less than a 5% difference, for more details, see Table 2  
EPR Evolutionary polynomial regression model Strong capacity in handling noisy data with missing inputs; can represent a physical system precisely. 
CDF Statistical reliability model Can approximate closely practical distributions; Has the potential to model numerous failure characteristics  
Figure 8

Application flowchart.

Figure 8

Application flowchart.

Close modal

This paper has proposed multiple comparable data-driven approaches for pipe life prognosis in water distribution networks from both scientific and practical perspectives. Quantitative analysis has been carried out for the first time to evaluate the impact of different influential factors (mainly material, diameter, and age) on the pipe's failure . Occurrence regulations of pipe failures are revealed in a comparable way through the CDF-based statistical reliability model, the CoxMLP-based survival model, as well as EPR, an EPR model. Similar performance in predicting future failure rates confirms the consistency of the proposed approaches. The RUL evolution in around 120 years with no more than a 5% difference between the survival and reliability models has deepened these similarities. The survival model, based on CoxMLP, works in the considered case study with lower RUL because there are more factors involved. Furthermore, the conclusion can also be made that more failures happen to the materials with higher usage percentages in the network. To apply wisely this research into practice, an easy-reading checklist has also been provided, which divides all pipes into A, B, C, and D as four assessment zones with corresponding operations continue to work, should be supervised, and should be replaced. To enhance clarification, a failure probability for the future 25 years is also predicted and added in an easy-reading table using low, medium, and high to evaluate risk levels for each pipe. This work contributes to asset monitoring and maintenance from both scientific and practical perspectives. In future research, the information extracted for this study will be used to enhance the current leak localization methodologies based on pressure monitoring and also to develop optimal pipe renewal plans, trying to obtain the best tradeoff between investment and quality of service.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Allbee
S.
(
2005
)
America's pathway to sustainable water and wastewater systems
,
Water Asset Management International
,
1
,
9
14
.
Awolusi
T.
,
Oke
O.
,
Sojobi
A.
&
Aluko
O.
(
2018
)
Performance comparison of neural network training algorithms in the modeling properties of steel fiber reinforced concrete
,
Heliyou
,
5
,
e01115
.
Balf
M. R.
,
Noori
R.
,
Berndtsson
R.
,
Ghaemi
A.
&
Ghiasi
B.
(
2018
)
Evolutionary polynomial regression approach to predict longitudinal dispersion coefficient in rivers
,
Journal of Water Supply: Research and Technology-Aqua
,
67
(
5
),
447
457
.
Berardi
L.
,
Giustolisi
O.
,
Kapelan
Z.
&
Savic
D. A.
(
2008
)
Development of pipe deterioration models for water distribution systems using EPR
,
Journal of Hydroinformatics
,
10
(
2
),
113
126
.
Chen
T. Y. J.
,
Vladeanu
G.
,
Yazdekhasti
S.
&
Daly
C. M.
(
2022
)
Performance evaluation of pipe break machine learning models using datasets from multiple utilities
,
Journal of Infrastructure Systems
,
28 (2), 5022002.
Fujiwara
O.
&
Tung
H.
(
1991
)
Reliability improvement for water distribution networks through increasing pipe size
,
Water Resource Research
,
27
(
7
),
1395
1042
.
Giustolisi
O.
&
Savic
D.
(
2006
)
A symbolic data-driven technique based on evolutionary polynomial regression
,
Journal of Hydroinformatics
,
8
(
3
),
207
222
.
Harrel
F. E.
(
2001
)
Regression Modeling Strategies: with Applications to Linear Models, Logistic Regression, and Survival Analysis
.
New York
:
Springer
.
Khomsi
D.
,
Walters
G. A.
,
Thorley
A. R. D.
&
Ouazar
D.
(
1996
)
Reliability tester for water distribution networks
,
Journal of Computing in Civil Engineering
,
10
(
1
),
10
19
.
Kvamm
H.
,
Borgan
Ø.
&
Scheel
I.
(
2019
)
Time-to-event prediction with neural networks and cox regression
,
achine Learning Research
,
20
,
6
8
.
Mounce
S. R.
,
Blokker
E. J. M.
,
Husband
S. P.
,
Furnass
W. R.
,
Schaap
P. G.
&
Boxall
J. B.
(
2015
)
Multivariate data mining for estimating the rate of discolouration material accumulation in drinking water distribution systems
,
Journal of Hydroinformatics
,
18
(
1
),
96
114
.
OECD
(
2006
)
Infrastructure to 2030: Telecom, Land Transport, Water and Electricity
.
Paris, France
:
OECD Publishing
.
Pietrucha-Urbanik
K.
&
Pociask
K.
(
2016
)
Analysis and assessment of water distribution subsystem failure
,
Journal of KONBiN
,
40
,
47
62
.
Snider
B.
&
McBean
E. A.
(
2021
)
Combining machine learning and survival statistics to predict remaining service life of watermains
,
Journal of Infrastructure Systems
,
27
(
3
),
1
14
.
Su
Y. C.
,
Mays
L. W.
,
Duan
N.
&
Lansey
K. E.
(
1987
)
Reliability based optimization model for water distribution systems
,
Journal of Hydraulic Engineering
,
12
(
114
),
1539
1556
.
Sun
C. C.
,
Parellada
B.
,
Puig
V.
&
Cembrano
G.
(
2020
)
Leak localization in water distribution networks using pressure and data-driven classifier approach
,
Water
,
1
(
12
),
54
.
Tabesh
M.
,
Soltani
J.
,
Farmani
R.
&
Savic
D. A.
(
2009
)
Assessing pipe failure rate and mechanical reliability of water distribution networks using data-driven modeling
,
Journal of Hydroinformatics
,
11
(
1
),
1
17
.
Winkler
D.
,
Haltmeier
M.
,
Kleidorfer
M.
,
Rauch
W.
&
Tscheikner-Gratl
F.
(
2018
)
Pipe failure modelling for water distribution networks using boosted decision trees
,
Structure and Infrastructure Engineering
,
14
,
1402
1411
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).