## Abstract

Given the continuous aging of pipeline infrastructure buried underground, water utilities need to make a strategic plan on how to maintain the entire water distribution systems (WDSs) to ensure the required standard of supply. This paper investigates the nexus of three key factors that have a significant impact on the decision-making process of the rehabilitation plan for WDSs. The factors considered include the problem formulations, the pipe selection methods for identifying critical components of WDSs, and the multi-objective evolutionary algorithms (MOEAs). The nexus was revealed by considering all the combinations of two variants for each factor. The optimal rehabilitation problem of the Exeter network was used as a case study. Results exhibit that the problem formulation determined the range of Pareto fronts essentially, which should give the highest priority in the decision-making process. The pipe selection method played a secondary role, mainly affecting the shape of Pareto fronts. Optimization algorithms also had a considerable impact on the optimality of solutions, subject to their characteristics and parameter settings. This paper also highlights that breakthroughs need to focus on these key factors to facilitate a more cost-effective solution to the rehabilitation of large WDSs.

## HIGHLIGHTS

A hierarchical framework is presented for the strategic rehabilitation planning of WDSs.

The nexus among three key factors related to the rehabilitation of WDSs is investigated.

The problem formulation is the most significant factor.

The pipe selection method for identifying the vulnerable pipes of WDSs plays a secondary role.

Advanced MOEAs that alleviate the parameterization issues are needed in practice.

### Graphical Abstract

## INTRODUCTION

The aging of water distribution systems (WDSs) is a common issue worldwide, especially in large and mega cities, which can deteriorate the service performance provided by utilities. As a result, WDSs need to be regularly maintained to cope with the deterioration of pipes and other components and meet the increasing demand for sustainable urban development (Sitzenfrei *et al.* 2013). The rehabilitation plan, either short- or long-term, is a vital resource since it usually involves substantial investment and has significant impacts on the operation of WDSs and citizens’ wellbeing.

Taking the United States as an example, more than two decades ago, the first report to Congress by the US Environmental Protection Agency found that the total infrastructure investment need was $138.4 billion and over 50% (about $77 billion) would be dedicated to transmission and distribution pipelines over the next 20 years (USEPA 1997). In the sixth report, the total 20-year need rises to $472.6 billion, with over 60% expenditure (about $312.6 billion) on water transmission and distribution projects. The need for the rehabilitation, replacement, and upgrade of existing infrastructure continuously escalates and is the cause of the overall increase in total national needs (USEPA 2018). Another outlook report agreed with the same situation in the OECD member states. Infrastructure investment will be challenged by the growing importance of maintenance, upgrading, and rehabilitation of existing infrastructure (OECD 2006).

Rehabilitation of WDSs is a challenging task, as it needs to ensure that substantial yet limited investment will improve the service performance to the required level in the most cost-effective way. As such, many methods and strategies have been proposed in the past decades. Readers are referred to the following review papers for insightful opinions and comments (Engelhardt *et al.* 2000; Bubtiena *et al.* 2012; Scheidegger *et al.* 2015). The rehabilitation task can be even more complicated by taking water demand uncertainty and staged construction into consideration. The system demand may continuously increase due to the effect of urbanization or occasionally decrease due to depopulation. Therefore, this intrinsic uncertainty will have significant impacts on the decisions of WDS rehabilitation.

On the other hand, the engineering construction involved in rehabilitation usually happens sequentially at different intervals, thus being prone to be affected by factors such as macro policies, fluctuations of inflation rates, and labor costs. Several strategic methodologies have been proposed to cope with difficulties mentioned above under long-term, unknown future conditions, including the scenario-based method (Kang & Lansey 2012), the flexible design method (Basupi & Kapelan 2015), the real options-based approach (Marques *et al.* 2015), and that using multi-criteria decision analysis (Scholten *et al.* 2014). Compared with the traditional, deterministic design methods, these strategic approaches provide decision-makers with a set of more cost-effective alternatives over the long-term horizons.

From the perspective of the nature of analytical methods, previous studies concerning the decision models for rehabilitation can be generally divided into two categories: (1) prioritization models based and (2) optimization models based. The former usually rely on various types of predictive models of pipe deterioration/failure rates, combining either statistical models or data-driven (mining) techniques with multi-criteria decision-making methods to generate a list of pipes which are of top priority in the rehabilitation program (Yoo *et al.* 2010; Choi *et al.* 2015). The latter apply various optimization algorithms, especially those based on multi-objective evolutionary algorithms (MOEAs), to dealing with where system components (mainly focused on pipes) should be considered for rehabilitation (Halhal *et al.* 1999; Cheung *et al.* 2003; Bozorg-Haddad *et al.* 2008; Olsson *et al.* 2009; Rahmani *et al.* 2016; Muhammed *et al.* 2017). Most recently, a combination of both types of models mentioned above has also been reported to be a promising alternative for planning the rehabilitation of WDSs (Vertommen *et al.* 2018).

The common issues associated with the previous methods include: (1) the prioritization models based methods usually require sufficient and accurate historical data about the leakage or burst of pipes, which are sometimes unavailable for many utilities, especially in developing countries. Also, the statistical or data-driven models without any linkage with hydraulic simulations may lead to missing really critical pipes within WDSs; (2) the optimality of solutions obtained by various MOEAs heavily relies on the settings of associated parameters, which is usually case by case and should be identified/fine-tuned through preliminary runs. This requirement could be time-intensive for larger WDSs and presents a challenging task to practitioners who lack the expertise of MOEAs; (3) most cases for testing/comparative purposes were hypothetical or small networks, which implies that these methods might not be easily expanded to cope with larger, real-world WDSs.

This paper focuses on the deterministic optimization models based method for the rehabilitation task, assuming that utilities have their hydraulic models of WDSs that are well maintained as key assets. We aim to answer the following two questions: (1) what are key factors of the strategic decision-making for the rehabilitation of large WDSs and (2) what is the nexus between these factors? The main contributions of this work are threefold: (1) proposing an analytical framework to facilitate a rational decision-making process, (2) exhibiting the impacts of key factors on the rehabilitation planning, and (3) suggesting the future work directions toward more informed decision-making leading to cost-effective, sustainable, and energy-efficient management of WDSs.

## METHODS

### Overview

No matter the long- or short-term, some fundamental questions need to be considered during the strategic planning on the rehabilitation of WDSs. As shown in Figure 1, those questions generally include (but not limited to) the following three: (i) How to define the problem, i.e., what are the goals of system rehabilitation? (ii) Where are the bottleneck components located? (iii) What kinds of analytical methods are effective and efficient in dealing with the planning task? Those questions can be further organized as a top-down structure, containing three levels of thinking. At the goal level, decision-makers need to determine what objectives are really of concern, and this level implicitly influences the lower levels. A well-maintained WDS should consistently meet its customers’ demand in sufficient quantity (i.e., satisfying minimum pressure heads) and good quality (e.g., no risks of unpleasant smell, discoloration, and recontamination). Importantly, this performance should be achieved in a cost-effective manner, reducing the energy cost and greenhouse gas emissions. The rehabilitation planning often targets the areas with poor performance (e.g., frequent customer complaints) due to aging infrastructure or excessive residence times. At the component level, decision-makers want to identify all the vulnerable components, such as the pipes with high unit head loss or a higher possibility of breakage, as these are the real bottlenecks of the system, which will be repaired, duplicated, or replaced. This level sets the scope of the rehabilitation work, in which only critical elements are taken into account. At the tool level, decision-makers have to rely on some analytical methods to establish a satisfactory solution to the rehabilitation problem defined at the previous two levels. Various simulation and optimization packages are usually necessary and helpful tools in making this task efficient.

More relevant questions, which may have a great impact, can be raised during the stage of brainstorming. However, in this paper, we only focus on three common factors (questions) mentioned above and make an effort to demonstrate how these three factors are interconnected with each other. For each factor, we choose two representative variants (i.e., two kinds of problem formulations, two types of selection methods for the identification of critical pipes, and two optimization algorithms) to restrict the analyses in an affordable range. In particular, we consider the tradeoffs between capital costs and two surrogate indicators of system performance, one is the number of nodes with pressure deficiencies (*NNPD*) (Muhammed *et al.* 2017), and the other is the network resilience index (*NRI*) (Prasad & Park 2004). A recently proposed, cluster-based technique for identifying the critical pipes within a WDS is compared with the engineers’ knowledge-based approach (Muhammed *et al.* 2017). Regarding the optimization methods, we consider the well-known NSGA-II (Deb *et al.* 2002) and a brand-new, hybrid MOEA (Wang *et al.* 2017). A brief introduction to these variants of each factor is provided below. Due to space limitations, readers are referred to the corresponding references for more details.

### Problem formulations: *Cost* vs. *NNPD* and *Cost* vs. *NRI*

*NNPD*. In a narrow sense, only pipe sizes for duplication are taken as decision variables, which can be chosen from a list of discrete options available in the market. This multi-objective formulation is a relaxed form of the least-cost design of WDSs, in which the constraints regarding the minimum nodal pressure heads are converted as the second objective.where

*C*is the capital cost;

*U*(

*D*) is the unit cost of pipe

_{i}*i*which mainly depends on its corresponding diameter

*D*;

_{i}*L*is the length of pipe

_{i}*i*;

*np*is the number of pipes considered for duplication;

*PD*is a Boolean value (i.e., 0/1) depending on the relationship between the actual head

_{j}*H*and the minimum required head at node

_{j}*j*;

*nn*is the number of nodes within the network; and

*ns*is the number of commercially available pipe sizes.

The benefits of using this kind of formulation are twofold. Firstly, it eliminates the troublesome task (often time-consuming) of setting the penalty factor to discriminate against the infeasible solutions from feasible ones. Secondly, in case of an insufficient budget, this formulation can provide some alternative solutions that only a few nodes suffer from pressure deficiencies but with significant investment savings. The latter point is probably a great advantage in dealing with practical projects. However, an improved formulation, which measures the magnitude of pressure deficiencies, should be considered to ensure the deficiencies are within an acceptable scale (i.e., marginally violating the head requirements).

*NRI*. The

*NRI*is a widely used surrogate indicator of system reliability (Raad

*et al.*2010; Wang

*et al.*2015; Liu

*et al.*2017; Bin Mahmoud & Piratla 2018). It is derived from the law of energy conservation, where the total available power injected into a given WDS is dissipated in pipes (due to pipe wall friction resistance) and delivered to demand nodes (users). Besides the surplus power at demand nodes, the

*NRI*also takes the pipe size uniformity into account. The increase of

*NRI*can improve the energetic redundancy of the network system and ensure reliable loops around demand nodes, which in turn increases the mechanical reliability to overcome potential failures of components (e.g., pipe breakage). However, improving

*NRI*may also introduce an adverse impact on water quality, and this is particularly of concern in the extremities of networks due to longer residence time.where

*U*is the uniformity of node

_{j}*j*;

*q*is the demand at node

_{j}*j*;

*np*is the number of pipes that connected to node

_{j}*j*;

*d*is the diameter of pipe

_{i}*i*that connected to node

*j*;

*Q*and

_{r}*H*are the discharge and head at reservoir

_{r}*r*, respectively;

*P*is the energy provided by pump

_{k}*k*; is the specific weight of water; and

*nr*and

*npu*are the numbers of reservoirs and pumps, respectively.

Compared with the first problem formulation, minimization of cost and maximization of the *NRI* simultaneously pose a constrained optimization problem and pay attention to the tradeoff between capital cost and system reliability. This will help decision-makers meet the regulations of supply service strictly and identify more reliable solutions.

### Pipe selection methods: knowledge-based empirical method vs. cluster-based approach

As the aging of a WDS, the hydraulic performance of system components deteriorates accordingly. Therefore, identifying the most critical/vulnerable components plays a crucial role in making a cost-effective rehabilitation plan. In the past, such a plan mainly relied on the expertise of local engineers who were familiar with the topological and operational characteristics of their WDSs. The drawback of the knowledge-based empirical method lies in the fact that the value (accuracy) of engineering judgment may diminish as the system layout changes or expands due to urban development. Also, water utilities may lose such expertise in case of staff turnover. Therefore, the analytical methods that can replace engineering judgment are demanded.

Recently, a novel method based on the graph theory clustering concept was developed to identify the critical pipes within a large WDS (Muhammed *et al.* 2017). The methodology starts with dividing the entire network into several clusters according to its connectivity properties. Then, pipes that may have a significant impact on system performance are identified and considered as decision variables for the rehabilitation problem through multi-objective optimization. Those pipes include the ones located inside clusters resulting in nodal pressure deficiencies and the ones existing as inter-cluster transmission pipelines or linking paths between water source(s) and clusters. The application of the cluster-based approach mentioned above was applied to the EXNET rehabilitation problem (Farmani *et al.* 2005) and successfully found better cost-effective solutions than those discovered by engineering judgment.

### Optimization algorithms: NSGA-II vs. hybrid MOEAs

The rehabilitation problems of WDSs described in Equations (1) and (2) are extremely challenging due to the combinatorial nature, which is NP-hard and computationally intensive. MOEAs are suitable for this type of problem due to the characteristics of metaheuristics. Below some key features of NSGA-II and the hybrid MOEA used are described.

NSGA-II is acknowledged as an ‘industry standard’ algorithm that has been successfully applied to a variety of optimization problems related to WDSs (Nicklow *et al.* 2010; Maier *et al.* 2014, 2019). The main driving force of this algorithm is the fast non-dominated sorting approach for ranking solutions based on the Pareto-dominance relationship. Solutions with higher rankings survive and are selected as parents for reproduction. In case solutions are non-dominated to each other (i.e., with the same ranking), a secondary sorting criterion known as the crowding distance is used. Solutions with larger crowding distances are preferred.

NSGA-II approximates the Pareto-optimal front of a given problem by generating better offspring iteratively. It implements a binary tournament selection to establish the mating pool (i.e., parents) and applies the simulated binary crossover (SBX) and the polynomial mutation (PM) with designated probabilities to create children from parents (Deb *et al.* 2002). The implicit elitism strategy ensures that the best solutions ever found in the search history are always retained in the population. This allows a new population to be derived from the combination of parents and their offspring via the fast non-dominated sorting approach. NSGA-II also provides an effective constraint-handling technique to deal with constrained problems.

Compared with NSGA-II, GALAXY, which stands for a genetically adaptive leaping algorithm for approximation and diversity, is a brand-new hybrid optimizer for dealing with the discrete, combinatorial, multi-objective design of WDSs (Wang *et al.* 2017). It follows the generational framework of MOEAs and employs six search operators for reproduction simultaneously. These operators are selected based on their leaping ability in the objective space from the global and local search perspectives. Three search operators (i.e., Turbulence Factor, Differential Evolution, and Simulated Binary Crossover for Integers) guide the search at the early stage due to excellent leaping abilities in the global sense; in contrast, the other search operators (i.e., Uniform Mutation, Gaussian Mutation, and Dither Creeping) intensify the local search by fine-tuning the solutions found in the near-optimal region.

GALAXY also implements several important strategies that guarantee its performance on the combinatorial optimization problems. The hybrid replacement strategy combines the power of Pareto-dominance and *ε*-dominance concepts (Laumanns *et al.* 2002), thus maintaining a good balance between exploration and exploitation and preventing global optima from being lost. A subtle modification to the original genetically adaptive strategy (Vrugt & Robinson 2007) adjusts the best portfolio of six operators and ensures a consistently robust search capability. The global information strategy (Vrugt & Robinson 2007) increases the probability of not-so-good operators to contribute high-quality solutions subsequently. The duplicates handling strategy periodically helps further to improve the convergence and diversity of final solutions.

Another advantage of GALAXY over existing MOEAs is the alleviation of the parameterization issues to a great extent. In practice, users only need to specify the population size and the computational budget (i.e., times of evaluating the objective functions). This significantly simplifies the fine-tuning process of parameters and improves its robustness intrinsically.

### Case study

The same EXNET water distribution system, as used in Muhammed *et al.* (2017), is considered in this study. The EXNET was set up by the Centre for Water Systems at the University of Exeter as a realistic and challenging rehabilitation problem (Farmani *et al.* 2004). The network serves a population of approximately 400,000 and needs to be reinforced to meet the projected demand in 2020. It consists of a great number of small pipes and few transmission mains, with a large head-loss range at the extremities of the system. Initially, this model was built as an extended period simulation model. It is simplified here as per Farmani *et al.* (2005) by replacing two tanks with fixed-head reservoirs and turning the extended period simulation model into a steady-state (snapshot) model, considering the peak demand of the system over 24 h.

The layout of the EXNET is shown in Figure 2, which contains 2,465 pipes, 5 valves, 1,891 junction nodes, and 7 water sources (blue dots). The total consumption is 3,245.81 l/s. Two major reservoirs (node 3001 and 3002) supply water to the system at fixed heads of 58.4 and 62.4 m, respectively. The system is also fed by its neighbor systems via node 3003 to 3007 at fixed rates. Three non-return valves (also known as check valves) are connected to node 3001 and 3002 to control the flow direction into and out of the system. One pressure reducing valve is located in the downstream of node 3004 to maintain the downstream pressure below 58.4 m. One throttle control valve is also in the link downstream of node 3004 to control the system's flow and pressure.

The minimum pressure required at each demand node is 15 m (Muhammed *et al.* 2017). As seen from Figure 2, one-third of nodes have pressure deficiencies (larger gray dots) evaluated by EPANET 2.0 (Rossman 2000), mainly concentrated in the west and southeast boundaries of the network. There are also some low-pressure nodes scattered near the sources and in the middle part of the system. In particular, some demand nodes suffer from negative pressure (red dots), which are located at the left and right terminals of the network. The reasons for pressure deficiencies in EXNET mainly include a great number of pipes with insufficient diameters, limited transmission mains, and significant differences in the elevations of demand nodes. Therefore, it is necessary to propose a cost-effective rehabilitation plan to improve the performance of this network and meet the projected demand.

Figure 2 depicts the layout of candidate pipes selected for duplication by the two methods described previously. The green and blue lines show the set of pipes identified by the cluster-based approach (denoted as Muhammed_{349}) and the engineering judgment method (denoted as Engineer_{567}), respectively. The numbers ‘349’ and ‘567’ at the subscript positions denote the numbers of critical pipes selected by Muhammed_{349} and Engineer_{567}. The total length considered for rehabilitation by Muhammed_{349} and Engineer_{567} are about 101 and 167 km, which account for 17 and 28% of the pipe length of the entire network (i.e., 594 km), respectively. As shown in Figure 2, the former method has a clear pattern, including the pipes that link water sources to *NNPD*. In contrast, the pipes considered by the latter method seem to be scattered across the network. More data related to the rehabilitation of EXNET are provided in CWS (2020), including the Colebrook–White friction factors and unit costs of new pipes, which depend on the importance of roads (i.e., major roads or footpaths) and internal diameters.

### Experimental setup

The Exeter network's optimal rehabilitation problem was solved using different combinations of problem formulations, pipe selection methods, and MOEAs to investigate the nexus among the three factors. For simplicity, we considered only two cases for each factor, leading to eight groups of experiments. Ten independent runs were implemented for each experiment to eliminate the impact of randomness imposed at the initialization stage of MOEAs. The same computational budget was applied to each run of optimization; that is, the number of function evaluations (NFEs) was equal to 5 million, and the population size of 100 was fixed between NSGA-II and GALAXY. This computational budget allowed two MOEAs to evolve 50,000 generations, which was observed to achieve a satisfactory convergence in the preliminary runs.

As mentioned before, the GALAXY method has no extra parameters besides the population size and the NFEs. Therefore, to ensure the excellent performance of NSGA-II, a fine-tuning procedure of parameters was conducted, as described in Wang *et al.* (2019). In particular, the probabilities of SBX and PM were set to 0.9 and the inverse of the number of decision variables, respectively. The distribution indices of SBX and PM were fixed to 20 and 1, respectively.

We performed *a posteriori* reliability assessment of each rehabilitated network as a complementary and straightforward indicator to the *NRI* for some representative solutions obtained by different groups of experiments. For clarity, the average demand satisfaction (*ADS*) hereafter refers to a network system's capability to fulfill the nodal demands under mechanical failures (e.g., pipe breakage). In a narrow sense, we only considered the single-pipe failure scenarios due to the relatively low probabilities of simultaneous breaks of two or more pipes (Liu *et al.* 2017).

*ADS*, is calculated by Equation (3). The nodal demands were evaluated by a pressure-driven analysis (PDA) model built into the recently released EPANET Toolkit 2.2 (McDonnell

*et al.*2019). The required pressure and the minimum pressure were set to 15 and 0 m, respectively. This means that the nodal demand can be supplied only if the pressure is above 0 m and is fully satisfied if the pressure head is greater than 15 m. This is because PDA is recognized as providing a more realistic picture of the hydraulic behavior of WDSs during pressure-deficient conditions than the traditional demand-driven analysis (Giustolisi & Walski 2012). Specifically, each pipe considered failed was first closed in the EPANET model. This was followed by the hydraulic simulation and the retrieval of the actual demand at each node. This iterative process continued until all candidate pipes were processed, one at a time. Without losing the generality, every single pipe is assumed to be out of service with equal probability. Thus, the

*ADS*exhibits the overall satisfaction rate of nodal demand across all the possible pipe failures.where

*nf*is the number of potential single-pipe isolation scenarios; is the actual demand at node

*j*evaluated by the PDA when pipe

*i*is closed; and is the required demand at node

*j*.

*j*when pipe

*i*is out of service.

## RESULTS AND DISCUSSION

### Impact of problem formulations

Figure 3 shows how the problem formulations impact the Pareto fronts, no matter which pipe selection method and MOEA were used. There exists a considerable gap between the solutions obtained under two formulations in terms of capital costs, despite a small number of overlaps around £3 M. This gap suggests the different landscapes of the search space established by the two problem formulations. In particular, minimizing both the capital cost and *NNPD* makes the rehabilitation task an unconstrained optimization problem. Only the solutions lying on the *x*-axis are fully feasible and meet the minimum pressure head requirements across the network. In contrast, minimizing the capital cost and maximizing the *NRI* transform the rehabilitation of the Exeter network as a constrained problem; thus, each solution on the Pareto front is strictly feasible. Consequently, more investment is required to ensure sufficient pressure heads at demand nodes across the network.

Assessment criteria . | Cost vs. NNPD. | Cost vs. NRI. | |||||||
---|---|---|---|---|---|---|---|---|---|

. | . | . | . | . | . | . | . | ||

Cost (£M) | 1.6434 | 1.7158 | 2.5004 | 2.5079 | 2.8540 | 2.9713 | 3.0021 | 4.2809 | |

NRI (–) | 0.1915 | 0.1949 | 0.1846 | 0.1808 | 0.2152 | 0.2139 | 0.1900 | 0.2388 | |

ADS (–) | 0.9991 | 0.9991 | 0.9991 | 0.9991 | 0.9993 | 0.9993 | 0.9993 | 0.9993 | |

No. of duplications | 40 | 42 | 61 | 65 | 111 | 92 | 109 | 186 | |

Duplication length (m) | 8,903 | 9,993 | 12,160 | 14,033 | 21,303 | 15,279 | 22,187 | 37,365 | |

Avg. diam. (mm) | 231.7 | 215.5 | 248.0 | 233.2 | 180.4 | 233.2 | 193.0 | 139.0 | |

(%) | = 0 | 365 | 365 | 376 | 430 | 367 | 486 | 442 | 432 |

(0,5] | 1,479 | 1,482 | 1,475 | 1,420 | 1,496 | 1,380 | 1,417 | 1,428 | |

(5,10] | 34 | 33 | 31 | 30 | 16 | 14 | 23 | 21 | |

(10,15] | 5 | 3 | 1 | 3 | 5 | 2 | 0 | 4 | |

(15,20] | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | |

(20,25] | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | |

(25,30] | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | |

(30,35] | 1 | 2 | 1 | 1 | 0 | 0 | 2 | 0 |

Assessment criteria . | Cost vs. NNPD. | Cost vs. NRI. | |||||||
---|---|---|---|---|---|---|---|---|---|

. | . | . | . | . | . | . | . | ||

Cost (£M) | 1.6434 | 1.7158 | 2.5004 | 2.5079 | 2.8540 | 2.9713 | 3.0021 | 4.2809 | |

NRI (–) | 0.1915 | 0.1949 | 0.1846 | 0.1808 | 0.2152 | 0.2139 | 0.1900 | 0.2388 | |

ADS (–) | 0.9991 | 0.9991 | 0.9991 | 0.9991 | 0.9993 | 0.9993 | 0.9993 | 0.9993 | |

No. of duplications | 40 | 42 | 61 | 65 | 111 | 92 | 109 | 186 | |

Duplication length (m) | 8,903 | 9,993 | 12,160 | 14,033 | 21,303 | 15,279 | 22,187 | 37,365 | |

Avg. diam. (mm) | 231.7 | 215.5 | 248.0 | 233.2 | 180.4 | 233.2 | 193.0 | 139.0 | |

(%) | = 0 | 365 | 365 | 376 | 430 | 367 | 486 | 442 | 432 |

(0,5] | 1,479 | 1,482 | 1,475 | 1,420 | 1,496 | 1,380 | 1,417 | 1,428 | |

(5,10] | 34 | 33 | 31 | 30 | 16 | 14 | 23 | 21 | |

(10,15] | 5 | 3 | 1 | 3 | 5 | 2 | 0 | 4 | |

(15,20] | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | |

(20,25] | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | |

(25,30] | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | |

(30,35] | 1 | 2 | 1 | 1 | 0 | 0 | 2 | 0 |

*Note*: represents the least-cost solution obtained by an MOEA and a pipe selection method. The letters *‘N*’ and *‘G*’ at the superscript positions denote ‘NSGA-II’ and ‘GALAXY’, respectively. The numbers ‘349’ and ‘567’ have the same meanings as mentioned before, which denote the numbers of critical pipes selected by Muhammed_{349} and Engineer_{567}, respectively. The total number of demand nodes considered for calculating is 1,885.

From the decision-making perspective, the two problem formulations show conflicting information, which would yield significantly increased investment. For instance, the capital costs of the least-cost solutions (marked as red dots in Figure 3) obtained under *Cost* vs. *NNPD* and *Cost* vs. *NRI* are around £1.64 M and £2.85 M, respectively. The *NRI* values of the two solutions are about 0.192 and 0.215. This means that increasing the capital cost by 74% leads to only a 12% improvement in the surrogate reliability indicator, which seems to be not cost-effective. Table 1 compares the eight least-cost solutions identified in each experiment. The capital costs of these solutions range from £1.64 M to £4.28 M, implying a sharp increase (more than 160%) in capital costs influenced by problem formulations, pipe selection methods, and optimization algorithms. The solutions discovered under *Cost* vs. *NNPD* are always cheaper than their counterparts under *Cost* vs. *NRI*, which correspond to the gap between the two problem formulations, as shown in Figure 3.

Additionally, and under *Cost* vs. *NNPD* can save 34.5 and 31.4% compared with and , respectively. Meanwhile, the *NRI* values of and under *Cost* vs. *NNPD* are slightly better than and . In the Pareto sense, the former two solutions dominate the latter ones, which may suggest that identifying the critical components by local engineers are more accurate than those by the cluster-based method under *Cost* vs. *NNPD*.

To further validate to what extend the network demands would be affected during the pipe failure scenarios, we calculated the *ADS* and the following Equations (3) and (4), respectively (see Table 1). The *ADS* of the original network before rehabilitation is 0.9651. No matter which formulation was used, the *ADS* of the EXNET was improved (about 3.5%) after critical pipes were duplicated. However, the differences in terms of *ADS* are not as noteworthy as appeared in *NRI*. Specifically, from under *Cost* vs. *NNPD* to under *Cost* vs. *NRI*, the *ADS* has a small increase (0.02%) despite the fact that the capital cost and the *NRI* are raised by 160 and 24.7%, respectively. Furthermore, from the distribution of across the eight least-cost solutions, solutions obtained under *Cost* vs. *NRI* are generally more reliable in the single-pipe isolation events than those under *Cost* vs. *NNPD*. However, this advantage is negligible. In particular, 19.4–25.8% of nodes are free of risk, no matter which pipe is failed. 73.2–79.4% of nodes have a very low chance (less than 5%) of facing low-pressure supply. Only 1–2.2% of nodes have to bear pressure deficiencies with a probability higher than 5%. In the absence of information on the importance of different nodes, the improved reliability cannot compensate for the massive increase in capital costs. In other words, the solutions obtained under *Cost* vs. *NNPD* are better choices due to considerable savings in expenditure. Without the comparison between two types of problem formulations, decision-makers who follow the multi-objective optimization involving *Cost* and *NRI* may incur far less efficient investment with marginal benefit.

Table 1 also shows the number, length, and average diameter of pipes rehabilitated for each least-cost solution obtained in this study. The number (or length) of pipes duplicated plays a key role in rehabilitation cost. Specifically, the rehabilitation cost has a clear positive correlation with the number and length of pipes duplicated. In contrast, the rehabilitation cost seems to have an adverse trend against the average diameter of pipes duplicated. No matter which MOEA and pipe selection method was used, the number and length of pipes rehabilitated under *Cost* vs. *NNPD* are much less than those under *Cost* vs. *NRI*. This is straightforward because the former formulation tends to meet the minimum pressure heads at the lowest cost. Therefore, duplicating a minimum set of bottleneck pipes with smaller pipes in diameter turns out to be a cost-effective decision. In this sense, the lengths of pipe rehabilitated are within 8.9–14.0 km under *Cost* vs. *NNPD*, accounting for less than 2.5% of the total length of the EXNET (i.e., 594 km). When solving the rehabilitation problem from the perspective of *Cost* vs. *NRI*, more investment is required to strengthen the reliable loops across the EXNET.

### Impact of pipe selection methods

Figure 4(a) compares the Pareto fronts obtained by NSGA-II using different pipe selection methods under *Cost* vs. *NNPD* formulation. The Pareto fronts identified by Engineer_{567} nearly dominate those by Muhammed_{349}, especially when the capital costs are beyond £0.6 M. The gaps between the two Pareto fronts become larger as the capital costs increase, indicating that the pipe selection method suggested by Engineer_{567} leads to better converged Pareto front. After optimization, only duplicates 40 existing pipes (7.1% of the 567 candidate pipes), while suggests 65 of 349 old pipes (18.6%) are rehabilitated. This proves Engineer_{567} effectively covers more critical components compared with those by Muhammed_{349}; thus, money can be spent where it matters most.

Figure 4(b) compares the Pareto fronts obtained by NSGA-II using different pipe selection methods under *Cost* vs. *NRI* formulation. Unlike Figure 4(a), the Pareto fronts obtained by two pipe selection methods have their advantages. The Pareto front obtained by Muhammed_{349} marginally dominates that by Engineer_{567} in the range of costs between £3 M and £13 M. Beyond £13 M, the Pareto fronts obtained by Engineer_{567} outperform those by Muhammed_{349}. The cross of the two Pareto fronts implies that each pipe selection method captures some more critical pipes than its counterpart. In other words, when the capital costs are limited to less than £13 M, Muhammed_{349} identifies more cost-effective solutions by duplicating the pipes not covered by Engineer_{567}, and this situation reverses when more expenditure is available.

Figure 5(a) shows the comparison of Pareto fronts obtained by GALAXY using different pipe selection methods under *Cost* vs. *NNPD* formulation. A very similar pattern can be observed, as seen in Figure 4(a). This confirms that no matter what kind of MOEAs is used, the Pareto fronts identified by Engineer_{567} largely outperforms those by Muhammed_{349}. In contrast, Figure 5(b) demonstrates minor differences from Figure 4(b). Specifically, the Pareto fronts obtained by Muhammed_{349} intersect twice with those by Engineer_{567}. In the range of costs between £10 M and £20 M, the solutions discovered by Muhammed_{349} dominate those by Engineer_{567}. However, if the expenditure is out of this range, the solutions found by Engineer_{567} are superior to those by Muhammed_{349}, especially when capital costs are over £20 M. The differences between Figures 4(b) and 5(b) are probably attributed to the search capability of two MOEAs, which will be discussed in detail in the subsection below.

### Impact of optimization algorithms

Figure 6 presents the comparisons of Pareto fronts discovered by GALAXY and NSGA-II under different combinations of the other two factors, i.e., the problem formulations and the pipe selection methods. Figure 6 is derived from the same data, as shown in Figures 4 and 5, but organized differently. This is to differentiate the performance of two MOEAs under different scenarios. As shown in Figure 6, the Pareto fronts identified by NSGA-II consistently outperform those by GALAXY, except for a small number of solutions observed in the boundary regions in Figure 6(c). In other words, NSGA-II, after a fine-tuning procedure, managed to discover better converged and distributed solutions than GALAXY regardless of problem formulations and pipe selection methods. However, it is worth noting that the performance of NSGA-II varied significantly if different parameter settings were used (see Supplementary Appendix for details). This means that using default settings as specified in Deb *et al.* (2002) or less effective parameter combinations could result in much worse Pareto fronts compared with those by GALAXY. In contrast, GALAXY is capable of providing robust solutions due to the alleviation of the parameterization issues of MOEAs to a great extent.

As a larger network than other commonly used benchmark networks for the rehabilitation purpose in the literature (e.g., New York Tunnel), EXNET requires more computational time for MOEAs to converge. In particular, we carried out the numerical experiments on a desktop computer with Intel^{®} Core™ i7-6700 CPU @ 3.4 GHz with a RAM of 8 GB. GALAXY and NSGA-II took on average 9.4 and 5.2 h to solve the *Cost* vs. *NNPD* problems for 5 million NFEs. Under *Cost* vs. *NRI* formulation, the required CPU runtime for GALAXY and NSGA-II increased to 18.1 and 21.7 h, respectively, because calculating *NRI* is more computationally intensive than *NNPD*. Admittedly, applying the proposed approach to a real-world rehabilitation problem, including hundreds of thousands of nodes and pipes in a hydraulic model, would be hindered by the computational overhead. A smart workaround will be necessary in such a case. Researchers and practitioners can either resort to high-performance computing techniques and facilities (e.g., using GPU-accelerated computing on a supercomputer) or using a skeletonized network model, eliminating a significant number of components that have minor impacts.

On the other hand, since the comparison of two MOEAs was not the target of this study, we did not pay particular attention to the impact of population size on the quality of Pareto fronts. Using a small population size (i.e., 100) to deal with the optimal rehabilitation of the Exeter network, which contains 349 or 567 decision variables, may not be efficient to steer the exploration out of the massive local optima existing in the landscape of search space. Therefore, by increasing the population size, both NSGA-II and GALAXY may demonstrate improved (or degraded) performance. However, it is out of the scope of this paper.

## CONCLUSIONS

Many factors can have significant impacts on the strategic decision-making process of the rehabilitation of aging WDSs. This paper establishes a three-layer hierarchical framework of decision-making by abstracting some key factors from the technical perspective. The nexus of these key factors is investigated using the EXNET case study.

The three key factors, including the problem formulations, the pipe selection methods, and the MOEAs, have significantly different impacts on the strategic rehabilitation planning, which should not be overlooked by decision-makers. Specifically, the problem formulation has the highest priority in determining where and how the rehabilitation measures (e.g., pipe duplications in this study) should be implemented. This is because the way that how the optimization problem is formulated affects the landscape of search space in essence. Furthermore, this landscape controls where an effective and efficient optimization algorithm can approach, which in turn influences the optimality of the solutions obtained. Second, the pipe selection method plays a secondary role, which mainly dominates the shape of the Pareto fronts obtained. If more bottleneck pipes within a WDS can be identified in a specific pipe selection method, it is more likely to generate better converged and more diversified Pareto fronts than other pipe selection methods. Third, optimization algorithms also have a considerable impact on the optimality of results, despite not as important as the pipe selection methods. The performance of MOEAs is subject to their characteristics and parameter settings provided that the computational budget is chosen appropriately.

The implications of the current study are threefold. First, how to properly define the optimization problem is vital to decision-makers. In particular, many-objective optimization has been demonstrated in some relevant studies (Fu *et al.* 2013; Reed *et al.* 2013) to provide a more comprehensive perspective than traditional multi-objective optimization methodologies, with only two or three objectives optimized concurrently. What kind of problem formulation is more convincing and reasonable than most used ones (as considered in this paper) in the rehabilitation of WDSs turns out to be a salient and promising research direction. For instance, the optimal rehabilitation problem can be formulated to discover the tradeoff of diversified objectives at both the design and operation stages (i.e., capital cost, operational cost, network leakage, and water age). This may facilitate the identification of a more sustainable and energy-efficient solution to network rehabilitation, which ensures a better benefit over the long-term horizons.

Second, a more theoretical-sound methodology to identify the bottleneck pipes within a given WDS is practical and beneficial to decision-makers. Some researchers rely on the clustering methods to screen out candidate pipes, which is generally under the philosophy of ‘divide-and-conquer’. However, with the continuously increasing volume of monitoring data and the aid of modern techniques for dealing with big data, for instance, deep learning, it is possible to build up a data-driven model that is more sophisticated in identifying critical pipes within a broad set of candidate ones. The static data such as pipe length, diameter, material, or even length of service, and simulated data based on a calibrated model, such as pressure heads of adjacent nodes, unit head loss, and flow rates within pipes, are valuable assets which can be used to train the data-driven model. This will probably substitute the empirical methods (e.g., engineering judgment).

Last but not least, more in-depth investigations on the existing MOEAs, as shown in Zecchin *et al.* (2012), Zheng *et al.* (2016, 2017a, 2017b), and Wang *et al.* (2020), can provide more insights into the behavior of these optimization tools and improve the search efficiency by reducing the computational budget and/or alleviating the parameterization issues to a great extent.

The progress in all these above research directions will further advance the decision-making toward more cost-effective, sustainable, and energy-efficient WDSs. Such technological breakthroughs probably lead to user-friendly software, which will address the challenging rehabilitation issues facing most water companies worldwide.

## ACKNOWLEDGEMENTS

The authors disclosed receipt of the following financial support for the research, authorship, and publication of this article: This work was sponsored by the Open Research Fund of the Yellow River Sediment Key Laboratory affiliated to the Ministry of Water Resources, P.R. China (Grant No. 201904).

## DATA AVAILABILITY STATEMENT

All relevant data are available from an online repository or repositories (http://dx.doi.org/10.17632/f2ydwz75nh.1).