Using deterioration modelling to simulate sewer rehabilitation strategy with low data availability.

Most cities face the problem of an aging infrastructure in need of extensive and ongoing repair, renovation or replacement. Since the 1980s, CCTV has been the industry standard for sewer system inspection and the main source of information for structural performance evaluation. Due to low inspection rates and the lack of information about sewer condition, deterioration models have been developed to simulate the condition of non-inspected sewers and assess the influence of several rehabilitation scenarios. This paper presents an innovative modelling tool for long-term sewer rehabilitation planning based on the integration of a deterioration and a rehabilitation model. The tool is demonstrated in full scale using CCTV and sewer data of the city of Sofia, in Bulgaria. Results provide tangible proofs of investment needs for sewer rehabilitation and support the utility in the negotiation of budgets with the municipality. Since age is one key variable for deterioration modelling, a new method is proposed to estimate missing construction years in the utility database with a prediction error of less than 7 years.


INTRODUCTION
Insufficient public and municipal investment represents a major challenge for the long-term management of urban drainage systems. Almost 10 years ago, the American Water Works Association already estimated that the replacement era was dawning, in which the country will need to rehabilitate massively the water and sewer networks built by the previous generations (AWWA ). In many cities worldwide, the underground infrastructure is nearing the end of its technical lifetime and will reach soon the age of renewal. A recent survey among 397 water and wastewater industry participants in the USA and Canada highlighted that the aging of infrastructures and the management of capital and operational costs are the two main industry issues (Black & Veatch ). The ASCE estimated the required capital investment to maintain and upgrade water infrastructure in the USA at $91 billion (ASCE ). However, only $36 billion of this $91 billion needed was funded, leaving a capital funding gap of nearly $55 billion.
Closed-circuit television (CCTV) inspection has been used since the 1980's as industry standard for sewer investigation system and structural performance evaluation. It provides visual data (images or videos) of the internal surface of the inspected pipe (the term 'pipe' indicates here a pipe segment from manhole to manhole). The analysis of the images enables to identify the type and location of defects like offset joints, cracks, leaks, sediment, debris and root intrusion. Due to budget restrictions, inspection rates are generally low and municipalities tend to inspect only a small part of their network (Harvey & McBean ). Few data exist on the national inspection rates of countries worldwide, but many publications highlight the low availability of CCTV data, e.g. in Colombia (Hernández et al. ), in Canada (Harvey & McBean ), in France (ONEMA ), in Portugal (Sousa et al. ) and in Australia (Tran ).
In current practice, CCTV data are crucial to support asset management decisions. However, the quality and uncertainty of sewer condition assessment are rarely questioned. Dirksen et al. () highlighted the high subjectivity of the inspection procedure at three main steps: (1) the recognition of defects, (2) the description of defects and (3) the evaluation of sewer condition. The probability of a false positive is in the order of a few percent whereas the probability of a false negative is in the order of 25%. It was shown that individual inspectors arrive at different results when evaluating a given set of CCTV reports, thereby highlighting the subjectivity of interpreting images. Caradot et al. (a) found that the probability to inspect correctly a pipe in poor condition is close to 80-85% and thus the probability to overestimate the condition of the pipe is close to 15-20% (false negative). Uncertainties in sewer condition assessment are not only due to the inspection procedure but also to undocumented rehabilitation (i.e. the rehabilitation of segment has not been documented in the database) or inspections done on a specific purpose and leading to an incorrect inventory of defects, e.g. to identify house connections.
Since the definition of rehabilitation strategies is limited by the lack of information about sewer condition and remaining life, deterioration models have been developed to forecast the evolution of the system according to its current and past condition. Deterioration models can be used (i) to presume the condition class of non-inspected pipes and (ii) to forecast the evolution of the system condition. Such model outputs provide key information to operators and municipalities for the scheduling of inspection programs (i.e. the detection of sewers in critical condition) and the planning of rehabilitation budgets (i.e. the comparison of different sewer rehabilitation scenarios and the evaluation of necessary investment rates). A wide range of modelling approaches have been developed over the past 20 years (Mashford et Kley & Caradot () and Santos et al. ().
Sewer construction year is generally the key variable for the calibration and prediction of deterioration models. However, for many utilities the construction year of pipes has not been systematically recorded in the database and remains unknown for a large part of the network. This issue is neglected by most studies on deterioration modelling. It is nevertheless crucial to be able to estimate missing sewer age (as well as further characteristics) in order to apply a deterioration model on the entire sewer network. Ward et al. () proposed a methodology to attribute pipe dates retrospectively using a logical data hierarchical procedure. The approach takes advantage of various data sources such as the construction year of adjacent pipes, distribution mains and the age of surrounding properties. The methodology is relevant and intuitive but could not be fully validated due to the lack of reliable data. This paper introduces a new simulation module to support long-term sewer rehabilitation planning in case of low data availability. The module is based on (1) a statistical deterioration model to simulate the evolution of the network condition and (2) a rehabilitation model to assess the influence of annual rehabilitation strategies on a series of indicators. The innovation proposed here comes mainly from the integration of both deterioration and rehabilitation models in the same tool, combined with a methodology to handle missing data. Both models are implemented in the software prototype RELIABLE SEWER operated by Veolia. Input data of the module are (1) the pipes characteristics; (2) a set of inspected pipes for model calibration; (3) the annual rehabilitation budget over the simulation period; and (4) the actual local prices for construction and repair activities. The tool handles three types of rehabilitation measures: replacement, CIPP lining and repair. This paper demonstrates its successful application on the sewer network of Sofia in Bulgaria and presents the developed methodology to handle missing data.

Input data
The city of Sofia is the capital city of Bulgaria. The urban area has a population of 1.5 million inhabitants. The sewer network is operated by Sofiyska Voda (SV). It has a length of about 1,729 km and is composed of 49,930 pipes. The system is mainly combined (88% of the network) and most pipes are made of concrete (75% of the network). The construction of the network started at the beginning of the 20th century and evolved slowly until 1950. The expansion of the network accelerated after 1950 to cope with the increasing demography of the city. Most pipe characteristics are available in the GIS database: sewer type, material, shape, length, diameter and depth. Pipe construction year is missing for 30% of the pipes.
In 2019, the CCTV database contained 685 inspected pipes for a sewer length of 31.4 km. Inspected pipes have been selected randomly among the entire network. For the project's scope, pipes with known construction year, material, type and profile were randomly selected in order to get a representative sample of the network. The relevance of having a randomly selected sample of inspection data has already been highlighted by several studies (Duchesne et al. CCTV inspections have been carried out since 2012 and inspection reports are systematically encoded using the European standard EN - (). Structural condition assessment is performed using the French RERAU methodology (Le Gauffre et al. ); the scores range from 1 to 4, with 4 being the worst condition. Inspections with inconsistent defect coding, without age (inspection year or construction year is missing) or that could not be linked to a specific sewer pipe have been discarded from the database. Figure 1 shows the evolution of the condition distribution of the inspected pipes for different age groups.
It is interesting to note that the condition seems to improve for pipes in the age classes >¼ 40 and >¼ 50 years (i.e. for pipes between 40 and 59 years of age), compared to younger pipes. A further investigation of the pipe properties (such as material, depth and diameter) did not reveal any particularity of these two age classes compared to the other age classes. The analysis of the detailed reports reveals that for pipes in the age class >¼ 50 (pipes constructed in the 60s) the densities of the codes BAB (fissure) and BAC (cracks) are lower compared to the corresponding densities in the age class >¼ 30 and >¼ 40 (pipes constructed in the 70 and 80s).

Estimation of missing construction years
Pipe age is generally the main driver to explain sewer deterioration (Caradot et al. b). Old pipes are expected to be in worse condition than young pipes. This trend is confirmed by the evolution of the condition distribution shown in Figure 1. Since sewer age is missing for 30% of the pipes, two approaches have been developed to estimate the missing construction years.
The first method attributes an estimated construction year using the median of the construction year of pipes in the same neighbourhood (or same region; a larger geographical entity as the neighbourhood), material and depth category. This approach assumes that pipes have been installed following urban growth and that most pipes in a given area have been installed in the same period (the city is divided in 219 neighbourhoods). This hypothesis is confirmed by the geographical distribution of pipe age; older pipes are in the city centre and the suburbs neighbourhoods show a relative homogeneity of construction year corresponding to the period of expansion of the network.
The second method uses the widely known k-nearest neighbour approach (Altman ) and attributes to each pipe the median year of the five closest adjacent pipes with same material and diameter. Similar to the first method, this approach assumes that adjacent pipes have been constructed together during urban planning projects. The hypothesis is generally validated as soon as pipes have not been rehabilitated and replaced by a new pipe since. Since only few pipes have been rehabilitated so far, the neighbour pipes provide a fair estimation of the missing construction years The accuracy of the two methods have been assessed by predicting the construction years of the pipes with known age and analysing the difference between the predicted and the actual ages.

Simulation module: deterioration and rehabilitation models
The simulation module is based on (1) a statistical deterioration model and (2) a rehabilitation model that considers the influence of annual rehabilitation strategies on a series of indicators. Both modules are implemented in the software prototype RELIABLE SEWER. The simulation module predicts the impacts of fixed rehabilitation strategies over a set of network output indicators. These indicators give a strategic overview of the rehabilitation strategy and its financial and technical impact on the network: annual condition distribution, mean age and remaining life of the network; number of annual rehabilitation actions and annual CAPEX and OPEX.
The deterioration model relies on the statistical model GompitZ. It is based on the theory of Non-Homogeneous Markov Chains, with transition probabilities derived from the Gompertz distribution (Le Gat ; Caradot et al. ). The model can be applied to simulate the evolution of condition probabilities for all pipes in the network (probability for a pipe to be in a given condition at a given simulation year). The survival curves have been calculated using only the age as covariate. The inspected network length in Sofia (31.4 km) is too small to create groups of pipes with different ageing behaviours (e.g. pipes of same material and diameter). Several authors highlighted that a subset of at least 40-50 km selected randomly in the network is required to simulate network condition with relatively good accuracy (refer to Ahmadi et al. ; Tran ; Caradot et al.  for sensitivity analysis of deterioration models to sampling size). Considering the homogeneity of the network characteristics in Sofia (mainly combined sewer made of concrete), the available subset of 31.4 km seems to be a good starting point to get a first overview of the network deterioration behaviour.
The rehabilitation model attributes rehabilitation actions for each simulation year. First, pipes are ranked from worst to best predicted condition. The model then (1) selects the first pipe of the list, (2) allocates a rehabilitation action (replacement, liner or repair) using a probability distribution that follows the annual budget for every rehabilitation action given as input and (3) calculates the corresponding costs using a locally calibrated cost model. If the cumulative costs for the selected rehabilitation action are higher than the available annual budget, another rehabilitation action is chosen.
For each year, the procedure is repeated until no more budget is available. At this stage of development, the selection of the pipes does not consider any feasibility or 'good practice' criteria. Focus is not given to the exact identification of the pipes to be rehabilitated in a given year but to the estimation of the number of pipes to rehabilitate given the available annual rehabilitation budget. The utility might need to consider additional budget for the rehabilitation of other infrastructures but also for identifying the actual candidates for rehabilitation through CCTV inspections.
Input data for the deterioration model are (1) the pipes characteristics and (2) a set of inspected pipes for model calibration. Refer to Caradot et al.  for more details about the calibration procedure. Input data for the rehabilitation model are (3) the annual rehabilitation budget over the simulation period, split in 3 sub-budgets for each rehabilitation action: replacement, liner and repair and (4) the actual local prices for construction and repair activities.

Cost function
The cost function allows the calculation of rehabilitation costs within the simulation module. In its simplest form, the cost function attributes a constant specific cost for a given rehabilitation action (in €/m). The prototype offers a high flexibility to the user to configure more complex functions, for example, by using pipe properties (such as material, diameter, traffic load, etc.) to estimate rehabilitation costs more accurately. The cost functions can also consider the inflation rate of construction prices over the simulation period. The parameters of the cost functions are estimated by regression between the pipes characteristics and the real local costs of previous rehabilitation actions.

Replacement cost
Based on historical real replacement costs, a linear regression model is built to estimate replacement cost depending on the length and the diameter of the pipes. For each pipe the replacement cost (BGN) is defined as • For pipe with diameter < 500 mm, cost_replacement ¼ 1,176 * length þ 281 • For pipe with length > 500 mm, cost_replacement ¼ 1,973 * length þ37,586

Lining cost
Only costs of two last lining rehabilitations made in Sofia are available. Based on these two values, a specific lining cost is calculated and used to define the lining cost. For each pipe, the lining cost (BGN) is defined as • cost_lining ¼ 1,020 * length.

Repair cost
Historical real repair costs from around 55 pipes are available. Repair costs depends on the characteristics of the pipes and the length of the pipes to be repaired. Since this information is unknown for future prediction, a median cost per pipe equal to 5,940 BGN is used, based on the mean of historical repair costs of Sofia.
Costs in BGN can be easily translated to EUR: at the time of the study, 1 EUR equals 1.96 BGN. To ease the reading, costs and expenses in the following sections are indicated in EUR. CAPEX are the sum of replacement and lining costs whereas OPEX are the sum of repair costs. OPEX consider inspection and flushing costs linked to repair actions but does not include additional costs for operation and maintenance of the network.

Deterioration of liners
The deterioration model cannot be easily calibrated for liners since no reliable data is available to assess the structural condition of liners from CCTV inspections (Alam et al. ). Despite the massive public investment represented by the use of this technique, little quantitative evaluation has been conducted on whether they are performing as expected and whether rehabilitation is indeed cost-effective compared to replacement. One of the most relevant studies on this topic was performed by the US Environmental Protection Agency (Allouche et al. ) through a six-year project to document the in-service performance of trenchless pipe rehabilitation techniques. They analysed a set of liners from up to 34 years in service using destructive techniques and mechanical analysis of a series of parameters such as flexural modulus and strength, tensile modulus and strength and liner thickness. Results indicate that there was no visible evidence of progressive deterioration of the liners: while some defects were noted in the samples or the associated CCTV inspections, it is believed that most of these defects were created at the time of installation and do not represent a degradation of the liner with time. Current datasets do not allow to conclude about the average expected lifetime of CIPP liner and main degradation factors. However, from a series of studies in Germany, France and USA, it appears that the original design life of 50 years should be met and could potentially be longer ( As for other materials, they follow a Gompertz distribution. The main assumption is that most liners will reach the end of their service life at 60 years. As a result, the probability to be in very poor condition is 50% at 60 years ( Figure 3). The prototype allows for tuning the median life duration of liners by changing these assumptions to lower or higher values. In particular, it can be used for sensitivity analysis to highlight the influence of this assumption on modelling outcomes.

Deterioration of short repairs
The deterioration of repaired pipes is also a complex topic. First, there is a wide range of very different repair techniques, such as short lining or short sewer replacement (EPA ). Secondly, the life duration of each technique might depend on the characteristics of the pipe but also on the quality and period of installation (WERF ). Finally, no data is available for a statistical analysis of the deterioration of the different types of repair techniques. As a result, the simplified approach proposed in the project focuses on short lining as main repair techniques with the following assumptions: • After rehabilitation with short lining, the pipe is back to good or perfect condition (i.e. no need of rehabilitation). This assumption makes sense since most repairs would be done on pipes with few defects that can be handled with the repair technique (Oelmann et al. ). No (major) defect should remain after a repair.
• At the end of the life duration of the repair technique, the repaired pipe cannot be in a worst condition as if it had not been repaired: the pipe is back to the condition it would have had without repair. The life duration of repair techniques is set to the standard value of 15 years based on the experience of the operators but can be tuned easily in the prototype.
Based on these assumptions, the prototype creates interpolated survival curves for repaired pipes. Similarly to liner, the approach is relatively simple and based on operators' experience due to the lack of reliable data about the life duration of rehabilitation techniques. In particular, it allows the user to run flexible sensitivity analysis to identify the impacts of the assumptions on the predicted strategies.

Estimation of missing construction years
In a first step, the two methodologies have been applied to estimate missing construction years. The accuracy of the two approaches have been assessed by predicting the construction years of the pipes with known age and analysing the difference between the predicted and the actual ages ( Figure 4). Method 0-1, 0-2, 0-3, 1-1, 1-2, 1-3 represent  observations. The 80% confidence line allows to read the maximal error obtained with a given model, for 80% of the data. Note: 'regions' are larger geographical areas than 'neighbourhood'. different variations of the first methodology: the construction year is attributed using the median of the construction year of pipes in the same neighbourhood (or same region; a larger geographical entity as the neighbourhood) and/or material and/or depth category. Method 2 is the second methodology where the construction year is estimated as the median year of the five closest adjacent pipes with same material and diameter.
For 80% of the pipes, method 0-3 (median by neighbourhood of pipes with same material and depth) estimates pipe construction year with an error lower than 7 years whereas method 2 (neighbours with same materialdiameter) gives an error of 0 year. This means that method 2 is able to reproduce the missing years of the pipes with an excellent accuracy for 80% of the pipes. The error grows for the remaining 20%. Figure 4 shows the evolution of the error (maximal difference between known and simulated years) for the two methods as well as other alternative methods.
Method 2 is highly accurate but can be applied only to pipes where adjacent pipes have known construction years (13% of the pipes with missing construction years). Indeed, the method is not usable when the construction years of the neighbour pipes are also missing. This is the case for many pipes, for example, when the construction year of an entire catchment is missing. In this case, the alternative method 0-3 can be applied with an uncertainty lower than 7 years (80% confidence interval) to the remaining pipes.
This method has proven to estimate accurately the missing ages of the pipes in the network. It could be used by other operators to reconstruct the incomplete databases and provide the necessary starting point for applying deterioration models.

Simulation of rehabilitation scenarios
The simulation module has been deployed to simulate the current rehabilitation scenario of Sofiyska Voda (SV) over the next 20 years. This scenario allocates an annual replacement budget of 2 million € and an annual repair budget of 250,000 €. These values have been estimated as the average of the annual CAPEX and OPEX planned for the coming years. In the period 2012-2017, the average annual investment budget for sewer network rehabilitation ranged between 1 and 2 million euros. The business plan for the period 2018-2021 foresees an average annual investment of 2 million euros. Figure 5 shows simulation results for the reference scenario (S1).
In 2019, 28% of the pipes are in poor condition and require rehabilitation and the mean age of the network is 36 years (28% also corresponds to the proportion of pipes in poor condition in the inspection dataset, considering that pipes have been randomly selected). Between 2019 and 2040, the network condition is deteriorating: the number of pipes in poor condition will increase by 12% and the mean age of the network will increase by 18 years while the annual rehabilitation rate is 0.15%. With an annual budget of around 2 million €, Sofiyska Voda can replace around 2.5 km of pipes every year which corresponds to about 0.15% of the network length (2.5/1729). The simulated replaced length is consistent with the annual replacement length of the previous 5 years.
In a further experiment, several scenarios with various increases of the replacement budget have been tested to identify the required funding to tackle network deterioration. Results indicate that a massive increase of annual replacement budget from 2 to 20 million € would be necessary to avoid network degradation and to keep the condition of the network stable ( Figure 5 -S2).
Additional scenarios show that this budget could be significantly reduced by increasing the share of alternative rehabilitation techniques such as lining and repair. This behaviour has already been observed in other studies such as Oelmann et al. (). This last scenario (S3) corresponds to scenario 2 with a reduction of total budget from 20 million € to 15 million € and an increased share of liner. Results are displayed in Figure 5. With this third scenario, the condition and the age of the network remains relatively stable with a reduced budget compared to scenario 2. However the rehabilitation rate (1.27% for scenario 3 versus 1.30% for scenario 2) is lower due to the reduced replacement budget and thus reduced replaced length (liner and repair actions are not accounted for in the calculation of the rehabilitation rate).

CONCLUSIONS
A new simulation module has been proposed to support long-term sewer rehabilitation planning and deployed on the sewer network of Sofia in Bulgaria. The module is based on (1) a statistical deterioration model to simulate the evolution of the network condition and (2) a rehabilitation model to assess the influence of annual rehabilitation strategies on a series of indicators. The modelling outcomes provide tangible proofs of the investments needed for sewer rehabilitation and support the utility in the negotiation with the municipality regarding rehabilitation budgets. Since age is a key variable for sewer deterioration modelling and the information was missing for 30% of the pipes, a new methodology has been proposed to estimate missing construction years in the utility database with a prediction error of less than 7 years. This methodology could be useful in other cities to support the establishment of an exhaustive sewer database as a starting point for the simulation of asset management strategies.
The simulated scenarios in Sofia are relevant to highlight some trends given by the simulation module. The reference scenario quantifies the influence of the actual rehabilitation scenario applied in Sofia over the evolution of the network. It indicates that the current rehabilitation rate (0.15%) is very low and not sufficient to avoid the  Water Science & Technology | in press | 2020 Uncorrected Proof deterioration of the network. A massive increase of CAPEX replacement would have a positive impact on the condition, age and rehabilitation rate of the network. However, an increase from 2 million to 20 million € would be necessary to avoid the deterioration and aging of the network and reach the stability of the condition distribution. The network condition could remain stable with an annual budget reduction of 5 million € and an increase of the share of liners (see scenario 3). In this case the network aging is tackled but the rehabilitation rate is lower, due to the replacement budget reduction. This scenario highlights the relevance of using liner and alternative techniques for sewer rehabilitation. The median life duration of liners is similar to the life duration of sewers (62 years for sewers in Sofia VS assumption of 60 years for liners) but the rehabilitation costs are lower. Note that this outcome is highly dependent on the assumptions done for the survival curves of liners and that they do not reflect any of the drawbacks of liners versus replacement. For example, liners cannot be used or are not relevant when manhole and house connections rehabilitation are required too. Further studies are needed to quantify carefully each uncertainty sources and assess their cumulated propagation in deterioration models. In particular, the survival bias seems to be a critical issue for the future development of deterioration models (Ouellet & Duchesne ). Current models are expected to be optimistic because the observed pipes used for model calibration are only those that 'survived' until the date of inspection, i.e. pipes that were not replaced before they reach their current degradation state (Le Gat ). Caradot et al. () also showed that uncertainties in sewer condition assessment are not negligible and that their propagation into deterioration models need to be carefully considered to correct systematic uncertainties.
The simulated strategies highlight the influence of various rehabilitation scenario over the structural condition of the network. For the planning of sustainable strategies, this outcome has to be analysed in the light of other indicators such as the future performance of the network and associated risk levels. In particular, it should be investigated how the structural condition improvement of the network might reduce future expenses for sewer maintenance and cleaning in order to eliminate blockages, sewer flooding and overflows.

DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.