This paper presents an extensive analysis of the sensitivity of multi-objective algorithm parameters and objective function scaling tested on a large number of parameter setting combinations for a water distribution system optimisation problem. The optimisation model comprises two operational objectives minimised concurrently, the pump energy costs and deviations of constituent concentrations as a water quality measure. This optimisation model is applied to a regional non-drinking water distribution system, and solved using the optimisation software GANetXL incorporating the NSGA-II linked with the network analysis software EPANet. The sensitivity analysis employs a set of performance metrics, which were designed to capture the overall quality of the computed Pareto fronts. The performance and sensitivity of NSGA-II parameters using those metrics is evaluated. The results demonstrate that NSGA-II is sensitive to different parameter settings, and unlike in the single-objective problems, a range of parameter setting combinations appears to be required to reach a Pareto front of optimal solutions. Additionally, inadequately scaled objective functions cause the NSGA-II bias towards the second objective. Lastly, the methodology for performance and sensitivity analysis may be used for calibration of algorithm parameters.

## NOTATION

• ai

area determined by the solutions in PFknown and the reference point

• bmi

binary variable describing the status of pump m during a time interval i

• c

constituent (water quality parameter)

• C

number of constituents (water quality parameters) c

• ccdj

actual (modelled) concentration of a constituent c at the customer node d during a time interval j

• maximum allowed concentration of a constituent c at the customer node d

• minimum required concentration of a constituent c at the customer node d

• CRS

crossover operator

• d

customer demand node

• D

number of customer demand nodes d

• maximum allowed volume deficit in storage tank s at the end of the simulation period T

• E

metric error ratio

• ECi

electricity tariff during a time interval i

• EX

metric extent

• FI(b)

economic objective function as pump energy costs, over the simulation period T

• FII(b)

water quality objective function as squared deviations of actual constituent concentrations from required values, over the simulation period T

• GD

metric generational distance

• GEN

number of generations

• Hdi

pressure at customer demand node d during a time interval i

• minimum required pressure at customer demand node d

• Hmi

total dynamic head supplied by pump m during a time interval i

• i

time interval for hydraulic simulation

• IE

metric ɛ-indicator

• j

time interval for water quality simulation

• k

unit conversion coefficient

• L

number of equal time intervals i for a hydraulic simulation within the simulation period T

• LL

number of equal time intervals j for water quality simulation within the simulation period T

• m

pump

• M

number of pumps

• M3
• MUT

mutation operator

• n

non-demand node

• N

total number of nodes within the network

• NN

metric non-dominated number

• NN’

number of non-dominated solutions in PFknown

• ONVG

metric overall non-dominated vector generation

• p

pipe

• P

pump station

• PFknow

known (i.e. computed) Pareto front

• PFtrue

true Pareto front

• POP, POP

population size

• Qmi

flow through pump m during a time interval i

• r

reservoir

• s

storage tank

• S

number of storage tanks s

• S1P

simple one point (crossover)

• SC

metric spacing

• SEG

sensitivity group

• SPs

sensitivity points

• SXP

simple multi point (crossover)

• SM

S-metric

• TN

metric true number

• UN

metric unique non-dominated number

• Vs

volume of storage tank s constrained by minimum and maximum water levels

• ysi

water level in storage tank s during a time interval i

• minimum water level in storage tank s

• maximum water level in storage tank s

• value of the ith objective in PFknown

• value of the ith objective in PFtrue

• δi

Euclidean distance between ith solution in PFknown and the nearest solution in PFtrue

• Δti

length of a time interval i

• ΔVsT

volume deficit (difference between the initial and final volume) in storage tank s at the end of the simulation period T

• ɛi

Euclidean distance between ith solution and its closest neighbour in PFknown

• mean of all ɛi

• ɛouter

Euclidean distance between the two outer solutions in PFknown

• Euclidean distance between the two outer solutions in the PFtrue

• ηi

solution in PFknown

• ηm

efficiency of pump m

## INTRODUCTION

Optimisation of water distribution systems (WDSs), which generally covers optimum system design, rehabilitation and operation, has been a subject of intensive research over the past several decades. Although these problems inherently involve multiple objectives, they were initially approached as single-objective due to the computational complexity involved. Over the past two decades, nonetheless, multi-objective approaches have been increasingly applied to analyse trade-offs between competing objectives; for example system and operational costs, service quality, system reliability and water quality. Regarding optimum design and rehabilitation, the minimisation of the network costs is often defined as the first objective, with other competing objectives being the maximisation of network reliability (Prasad & Park 2004; Alvisi & Franchini 2006; Farmani et al. 2006, Farmani et al. 2005b; Giustolisi & Berardi 2009; Raad et al. 2010), network robustness (Kapelan et al. 2005; Babayan et al. 2007) or network performance (Halhal et al. 1999; Walters et al. 1999; Vamvakeridou-Lyroudia et al. 2005). Additionally, network design and/or rehabilitation costs were traded off against service quality measures such as pressure head deficit (Farmani et al. 2005a; Atiquzzaman et al. 2006; Keedwell & Khu 2006; Jin et al. 2008). Optimum operation of WDSs has been mainly concerned with the minimisation of energy costs for pumping as the first objective and pump maintenance as the second objective (Savic et al. 1997; Sotelo & Baran 2001; Kelner & Leonard 2003; Lopez-Ibanez et al. 2005). Most recently, tradeoffs between pump energy costs and greenhouse gas emissions (Wu et al. 2010) or water quality measures (Kurek & Ostfeld 2013, 2014) have also been investigated.

Scaling of the objective and constraint functions is important, because it may have a significant influence on the performance of optimisation algorithms. It represents down-scaling or up-scaling of these functions, so that they cover a similar range of numerical values. However, the majority of the above literature did not consider scaling of the objective functions. Inadequate scaling of the objective functions may lead, according to Bhattacharjya (2007), to the algorithm being biased towards one objective, specifically an objective with larger objective function values. It may affect spacing between the solutions with concentrations of solutions either increasing or decreasing along the Pareto front and may cause discontinuity of the Pareto front (Bhattacharjya 2007). It may also impact the location of the Pareto front with solutions being away from the true Pareto front (Bhattacharjya 2007). More detailed information on scaling of optimisation problems can be found in Gill et al. (1981). Work presented within this paper addresses those scaling issues by comparing a broad range of results with non-scaled and scaled objective functions for a multi-objective problem of the optimal operation of WDSs.

To solve multi-objective optimisation problems in WDSs, multiple objective evolutionary algorithms (MOEAs) have been mostly applied. Those MOEAs include non-dominated sorting genetic algorithm (NSGA) (Srinivas & Deb 1994), NSGA-II (Deb et al. 2002), strength Pareto evolutionary algorithm (SPEA) (Zitzler & Thiele 1999), SPEA2 (Zitzler et al. 2001) and others. These algorithms use certain parameters being population size, probability of mutation and crossover, which need to be set up (i.e. calibrated) for each individual problem, so the algorithm delivers optimal solution (Gibbs et al. 2010; Younis & Dandy 2012; Marchi et al. 2014). Nevertheless, very limited attention has been dedicated to date to analyse sensitivity of algorithm parameters and their performance in multi-objective optimisation of WDSs. In the majority of literature reviewed, the selection of parameter settings is either not discussed (Halhal et al. 1999; Walters et al. 1999; Kelner & Leonard 2003; Farmani et al. 2006; Babayan et al. 2007; Raad et al. 2010) or is based on literature values (Farmani et al. 2005a; Alvisi & Franchini 2006; Alfonso et al. 2010) with no further rationale provided. There are also cases where the same algorithm with the same parameter settings is applied to different test networks (Farmani et al. 2005a; Keedwell & Khu 2006), which contradicts the importance of calibrating the algorithm parameter values for each individual problem as highlighted by Gibbs et al. (2010). Although some authors tested a range of parameter settings (Kapelan et al. 2005; Vamvakeridou-Lyroudia et al. 2005; Kurek & Ostfeld 2014), they did not present such analysis as part of their research. This current work explores the sensitivity of multi-objective algorithm parameters, evaluates the performance of a wide range of parameter settings and establishes the influence of these settings on solutions for the optimal operation of WDSs including two objectives. It utilises a set of performance metrics to do so.

Performance metrics were proposed in Van Veldhuizen (1999) and Zitzler (1999) and are normally used to compare performances of different multi-objective algorithms (Van Veldhuizen & Lamont 2000b; Zitzler et al. 2000; Kollat & Reed 2006; Kollat et al. 2008). According to Zitzler et al. (2003), the comparison of solutions in multi-objective optimisation is substantially more complex than in single-objective, because there is no single performance metric which is both compatible and complete. Zitzler et al. (2000) suggest that such a comparison evaluates (i) distance of the obtained Pareto front from the Pareto front of optimal solutions (i.e. true Pareto front), (ii) uniformity of distribution of the solutions in the Pareto front and (iii) the extent of the obtained Pareto front to ensure that a wide range of objective values is covered. In WDS optimisation, performance metrics were applied in optimal system design (Keedwell & Khu 2006; Raad et al. 2010; Kanta et al. 2012) and operation (Baran et al. 2005) to compare performances of multiple MOEAs. To the authors' knowledge, there are no WDS-focused studies which have analysed the sensitivity of algorithm parameters within one MOEA. Early studies were applied to general test cases (Hadka & Reed 2012) or general water resources applications (Reed et al. 2012), and used performance metrics and Sobol's global variance decomposition to evaluate parameter sensitivity.

The purpose of this current work is to comprehensively explore the sensitivity of multi-objective algorithm parameters and objective function scaling. This work is believed to be the first dealing with the performance and sensitivity of algorithm parameters within one MOEA in WDS optimisation. Algorithm parameters considered include population size, number of generations, probability of crossover (subsequently referred to as crossover) and probability of mutation (subsequently referred to as mutation). The test network adopted from the literature (Ostfeld & Salomons 2004; Ostfeld et al. 2011) represents a regional non-drinking WDS, where two operational objectives, economic and water quality, are formulated to be optimised. It was identified in both urban drinking WDSs (Kurek & Ostfeld 2014) and regional non-drinking WDSs (Mala-Jetmarova et al. 2014) that a relationship between the economic objective (i.e. pump energy costs) and water quality objective is conflicting. The reason for this relationship is that, with more relaxed water quality requirements, increased flexibility exists for pumps to operate more effectively over a broader objective space (Mala-Jetmarova et al. 2014).

## OPTIMISATION MODEL

### Objective functions

#### Economic objective

The economic objective is represented by pump energy costs and is often referred to in the literature as a pump scheduling problem (Lopez-Ibanez et al. 2008). Explicit pump schedules are used, which specify the time when a pump operates. For this purpose, simulation period T is divided into equal time intervals i, in which the pump m adopts a binary variable bmi of either 0 or 1 to describe its status as being off or on, respectively. The economic objective function for the pump scheduling problem is written as:
1
where M (-) is the number of pumps, m=1, …, M; L (-) is the number of equal time intervals i for a hydraulic simulation within the simulation period T, i=1, …, L; ECi ($/kWh) is the electricity tariff during a time interval i; k (-) is a unit conversion coefficient; Hmi (m) is the total dynamic head supplied by pump m during a time interval i; Qmi (L/s) is the flow through pump m during a time interval i; ηm (-) is the efficiency of pump m; Δti (s) is the length of a time interval i; bmi (-), bmi=0, 1 is the binary variable describing the status of pump m as being off or on, respectively, during a time interval i. #### Water quality objective The water quality objective aims to meet the water quality requirements of various customer groups. This objective is defined as deviations of actual constituent concentrations from required values, which are normally prescribed by health and other regulatory guidelines. Water quality requirements of various customer groups, who are represented as demand nodes, are specified by lower and upper bounds of constituent concentrations. Each of those customer groups has different water quality requirements. The constituents considered are conservative (i.e. non-reactive). The objective function is defined as the square of deviations of actual (modelled) constituent concentrations from required minimum and maximum values as follows (adapted from Sakarya & Mays (2003)): 2 where C (-) is the number of constituents (water quality parameters) c, c=1, …, C; D (-) is the number of customer demand nodes d, d=1, …, D; LL (-) is the number of equal time intervals j for water quality simulation within the simulation period T, j=1, …, LL; and (mg/L) are the minimum required and maximum allowed concentrations of a constituent c at the customer node d, respectively; ccdj (mg/L) is the actual (modelled) concentration of a constituent c at the customer node d during a time interval j. ### Constraints and decision variables The constraints to the optimisation problem are, firstly, explicit system constraints such as conservation of mass of flow, conservation of energy and conservation of mass of constituent, which are controlled by network analysis software EPANet. These can be found in Rossman (2000), thus are not listed here. Secondly, there are implicit bound constraints, which include the minimum pressure at customer nodes (3), the water level limits at storage tanks (4) and the volume deficit at storage tanks at the end of the simulation period (5) as follows: 3 4 5 where Hdi (m) is pressure at demand node d during a time interval i; (m) is the minimum required pressure at demand node d; ysi (m) is the water level in storage tank s during a time interval i; (m) and (m) is the minimum and maximum water levels in storage tank s, respectively; ΔVsT (m3) is volume deficit (difference between the initial and final volume) in storage tank s at the end of the simulation period T; Vs (m3) is volume of storage tank s constrained by minimum and maximum water levels; (%) is the maximum allowed volume deficit in storage tank s at the end of the simulation period T. The optimisation problem includes one type of decision variable, which is the binary variable bmi ɛ {0, 1} describing the status of pump m (off or on, respectively) during a time interval i. ### Multi-objective optimisation problem The multi-objective optimisation problem for the minimisation of pump energy costs and optimisation of water quality is written as: 6 subject to conservation of mass of flow and constituent, conservation of energy and (3)–(5). Conservation of mass of flow and constituent, conservation of energy and constraint (4) are managed by the EPANet, while constraints (3) and (5) are included in the optimisation problem using the penalty functions. These penalty functions are implemented using an ‘infeasibility cell’ within the Excel spreadsheet of software GANetXL (CWS 2011), described further in the Solution scheme section. ## MODEL APPLICATION The optimisation model is applied to a network shown in Figure 1, which represents a regional WDS, where non-drinking water is supplied to various customer groups, for example, agricultural, domestic and industrial. Requirements for water quality vary across these customer groups. This network was adapted from Ostfeld & Salomons (2004) with minor modifications from Ostfeld et al. (2011). It consists of 11 pipes (p1–p11), six customer demand nodes (d1–d6), three non-demand nodes (n1–n3), eight pumps (m1–m8) located at three pump stations (P1–P3) and an elevated storage tank (s1). The system is supplied from three distinct water sources (reservoirs) (r1–r3) of different qualities. Figure 1 Network layout (adapted from Ostfeld & Salomons 2004; Ostfeld et al. 2011). Figure 1 Network layout (adapted from Ostfeld & Salomons 2004; Ostfeld et al. 2011). Water quality is represented by one non-specified conservative (i.e. non-reactive) water quality parameter with constituent concentration (mg/L), which differs from reservoir to reservoir, but is constant over time. Each customer group (d1–d6) has different water quality requirements characterised by the maximum value of the concentration. Data for the network can be found in Mala-Jetmarova et al. (2013). The optimisation is undertaken for an extended period simulation (EPS) with the length of the simulation period being 24 hours, and the simulation time step for both hydraulic and water quality analysis being 1 hour. Because the emphasis of this work is not concerned with detailed system operations, but rather sensitivity of the NSGA-II parameters, these settings are deemed acceptable for the purpose of this work. ## SOLUTION SCHEME The solution methodology integrates a network analysis simulator with an optimisation model (Figure 2). Network analysis software EPANet (Rossman 2000) version 2.0 is used to perform hydraulic and water quality extended period simulations (EPS). EPANet is a widely accepted open-source software package, which is frequently used in conjunction with optimisation research for WDSs. Figure 2 Solution methodology. Figure 2 Solution methodology. ## GANetXL Software GANetXL (CWS 2011), available from the Centre for Water Systems (CWS), University of Exeter, UK, is used as the optimisation tool. GANetXL, which was developed by the CWS as a Microsoft Excel add-in, is a generic optimisation engine with spreadsheet based interface for solving both single and multi-objective optimisation problems (Savic et al. 2011). The foremost advantage of the GANetXL is that it enables easy integration with the EPANet via visual basic for applications (VBA) code and also enables direct access to evolutionary single and multi-objective algorithms. From the family of MOEAs, GANetXL incorporates the NSGA-II (Deb et al. 2002). This work confirmed that GANetXL is well suited for solving multi-objective optimisation problems in WDS operation, which incorporate water quality aspects with conservative constituents. ## NSGA-II The NSGA-II is a population based multi-objective optimisation algorithm, which was introduced by Deb et al. (2000). Since then, this algorithm has been successfully applied to multi-objective optimisation problems of the design and rehabilitation (Prasad & Park 2004; Farmani et al. 2005b; Atiquzzaman et al. 2006; Jin et al. 2008; Kanta et al. 2012), and operation (Baran et al. 2005) of WDSs. The NSGA-II principally works as follows. An initial population P0 of random solutions of size N is generated and each solution assigned a fitness (or rank). Applying binary tournament selection, and crossover and mutation operators, the offspring population Q0 of size N is created. Populations P0 and Q0 are combined into the population R0 = P0Q0 of the size 2N. Non-dominated sorting is then applied, where the population R0 is sorted into sets of non-dominated fronts or sets of solutions which are dominated by the same number of Pareto fronts. From these sets F1, F2, F3, … ,Fn of the combined population R0, the set F1 includes the best and Fn worst solutions. If the size of the set F1 is smaller than N, all members of this set are selected for the new population Pt. The remaining places in the population Pt are filled in subsequently from sets F2, F3 and so on until the size N is reached, using both the rank and crowding distance. The solutions in the set, which as the last set contributes to the new population Pt, are sorted using the crowded-comparison operator and the best solutions are selected to fill in the last places in Pt. The offspring population Qt is then created from Pt applying selection, crossover and mutation operators, where selection is based on crowded-comparison operator. The process is repeated by creating the population Rt=PtQt of the size 2N, forming non-dominated sets F1, F2, F3, …, Fn and creating the new population Pt+1 and so on, until stopping criteria is met. A detailed description of the NSGA-II can be found in Deb et al. (2000, 2002). ## METHODOLOGY FOR PERFORMANCE AND SENSITIVITY ANALYSIS The value of this work lies chiefly in the extensive performance and sensitivity analysis of the NSGA-II parameters and objective function scaling, which is tested on a wide range of parameter settings. The NSGA-II parameters considered are: (i) population size, (ii) the number of generations, (iii) crossover and (iv) mutation. A reasonable range for each of these parameters was identified based on the recommended values from within literature and also preliminary testing, and is included in Table 1. Mutation and crossovers can also adopt different types. The type selected for mutation is ‘simple by gene’, which is set as the default within GANetXL. Two types were tested for crossover, being ‘simple one point’ (S1P) and ‘simple multi-point’ (SXP). These mutation and crossover types are described in more detail in CWS (2011). Table 1 NSGA-II parameter settings Parameter Parameter setting Parameter type Comment Population size 50; 100; 150; 200; 300; 400 N/A Values based on preliminary testing Number of generations 500; 1,000; 2,000 N/A Values based on preliminary testing Mutation operator 0.002; 0.005*; 0.01; 0.1 Simple by gene *0.005 ≈ 1/192, where the value of 192 represents the number of decision variables (8 pumps × 24 1-hour intervals) (Deb et al. 2002 Crossover operator 0.6; 0.7; 0.8; 0.9; 1.0 Simple one point Values based on the recommendation of (Goldberg 1989 Simple multi point Parameter Parameter setting Parameter type Comment Population size 50; 100; 150; 200; 300; 400 N/A Values based on preliminary testing Number of generations 500; 1,000; 2,000 N/A Values based on preliminary testing Mutation operator 0.002; 0.005*; 0.01; 0.1 Simple by gene *0.005 ≈ 1/192, where the value of 192 represents the number of decision variables (8 pumps × 24 1-hour intervals) (Deb et al. 2002 Crossover operator 0.6; 0.7; 0.8; 0.9; 1.0 Simple one point Values based on the recommendation of (Goldberg 1989 Simple multi point Regarding scaling of the objective function, two alternatives are considered. Firstly, objective functions are not scaled. Secondly, the water quality objective function is linearly scaled down to ensure that the two objective functions have a similar range of numerical values. As an example, the range of the water quality objective as deviations of constituent concentrations is 0–100,000 (mg/L)2 and the range of the economic objective as pump energy costs is 150–210$/day. Hence, the water quality objective is multiplied by 0.01 to obtain a similar order to the economic objective. The first case is referred to subsequently in the text as non-scaled objective functions, and the second case as scaled objective functions.

NSGA-II parameters settings were systematically arranged into 76 parameter setting combinations with either non-scaled or scaled objective functions (Table 2), so the total number of analyses performed was 152 (i.e. 2 × 76). Due to the large number of modelling combinations the optimisation runs were undertaken as a two stage process, being sensitivity analysis 1 (SA1) and sensitivity analysis 2 (SA2). In the first stage (SA1), sensitivity of mutation and crossover was tested with a constant population size and constant number of generations. Those constant values for the population size and number of generations were cautiously selected based on preliminary testing, which was used initially to identify a reasonable range for the algorithm parameters (as previously reported in Table 1). The results obtained were visually compared with the well performing mutation and crossover operators then selected. In the second stage (SA2), these operators were applied for the sensitivity analysis of the population size and the number of generations. Additionally, all parameter setting combinations were analysed with both non-scaled and scaled objective functions.

Table 2

Parameter setting combinations for sensitivity analysis

Analysis name Sensitivity analysis of Setting of constant parameters Setting of variable parameters* No. of parameter setting combinations No. of analyses conducted
SA1
• Crossover

POP: 100 MUT: 0.002; 0.005; 0.01; 0.1 40 40 (non-scaled**)
• Mutation

GEN: 2,000 CRS: 0.6; 0.7; 0.8; 0.9; 1.0 (all both S1P and SXP) 40 (scaled**)
SA2
• Population size

MUT: 0.005 POP: 50; 100; 150; 200; 300; 400 36 36 (non-scaled**)
• Number of generations

CRS: 0.7 SXP; 1.0 S1P GEN: 500; 1,000; 2,000 36 (scaled**)
Total 76 152
Analysis name Sensitivity analysis of Setting of constant parameters Setting of variable parameters* No. of parameter setting combinations No. of analyses conducted
SA1
• Crossover

POP: 100 MUT: 0.002; 0.005; 0.01; 0.1 40 40 (non-scaled**)
• Mutation

GEN: 2,000 CRS: 0.6; 0.7; 0.8; 0.9; 1.0 (all both S1P and SXP) 40 (scaled**)
SA2
• Population size

MUT: 0.005 POP: 50; 100; 150; 200; 300; 400 36 36 (non-scaled**)
• Number of generations

CRS: 0.7 SXP; 1.0 S1P GEN: 500; 1,000; 2,000 36 (scaled**)
Total 76 152

POP = population size, GEN = number of generations, MUT = mutation, CRS = crossover.

*All possible combinations were analysed.

**Objective functions either non-scaled or scaled.

A single run was performed for each parameter setting combination from an initial population of random solutions. Therefore, the sensitivity of seed value (i.e. convergence of an algorithm from different initial populations of random solutions) was not investigated as part of this work and has been left for further research. A seed value of 1 was used for all analyses conducted.

Generally, to evaluate the sensitivity of a particular model parameter (i.e. model input), the following question needs to be answered: How sensitive is the solution (i.e. model output such as Pareto front) to the change of this particular parameter, when other parameters are kept constant? Analysing sensitivity of individual parameters, authors strive to answer this question. It is paramount to note that the sensitivity analysis was undertaken as performance evaluation. For example, if a particular parameter (adapting various values, when other parameters are kept constant) delivers ‘good’ solutions for some of its values and ‘poor’ solutions for others, a solution is very sensitive towards this parameter, and consequently the parameter is very sensitive. Vice versa, when a parameter keeps delivering ‘good’ solutions (while it adapts various values and other parameters are kept constant), it is not very sensitive.

Two measures were used for performance evaluation of parameter setting combinations; first, a visual comparison and second, a set of performance metrics. Due to the large amount of results, where often the Pareto fronts are non-parallel and intersecting, the visual comparison in isolation would become a very time consuming and difficult task. For this reason, the set of metrics supplemented the visual comparison to provide a faster, less subjective, and more quantitative means for performance evaluation.

Only Pareto fronts, in which all solutions are feasible (i.e. comply with constraints and thus are useful for operators), can be used for performance evaluation. The exclusion of Pareto fronts containing infeasible solutions is to ensure that a parameter setting combination, which delivered a Pareto front with infeasible solutions, cannot score better than another parameter setting combination, which produced only feasible solutions.

### Performance metrics

Performance metrics can measure algorithm effectiveness and efficiency (Van Veldhuizen 1999). In this work, effectiveness (or quality of solution) is the main aim as to evaluate how parameter settings affect the final solution. In other words, the quality of the Pareto front is the forefront interest, thus objective function values are being evaluated only. In order to keep consistent terminology with previous research, the following terms are adapted from Van Veldhuizen & Lamont (2000a):

• A known Pareto front (PFknown), which is the final computed Pareto front returned by the NSGA-II at termination, for the particular parameter setting combination.

• The true Pareto front (PFtrue), similarly described as a true Pareto set, ‘is not explicitly known for problems of any difficulty … is implicitly defined by the functions composing an MOP (multi-objective optimisation problem) – it is fixed and does not change’ (Van Veldhuizen & Lamont 2000a).

Similar to Baran et al. (2005), a very good approximation to the PFtrue is obtained with the union of all computed PFknowns. More specifically, PFtrue used for this work is the union of the best solutions found in all analyses conducted for all parameter setting combinations, including both alternatives for objective function scaling.

No single metric can entirely capture total algorithm performance (Van Veldhuizen 1999; Deb et al. 2002). For this reason, a set of eight metrics is applied, which were adapted mainly from Van Veldhuizen (1999), Zitzler (1999) and Zitzler et al. (2000, 2003). Metrics were selected with the intent of being able to capture the overall quality of the PFknown. Some of these metrics compare similarity and proximity of a PFknown with the PFtrue (i.e. how distant they are), whereas others evaluate the quality of a PFknown (i.e. the number of non-dominated solutions) or the quality of its individual solutions (i.e. spacing between these solutions). As PFknown is a result of a parameter setting combination, it is worth noting that by assessing the quality of the PFknown, the performance of the parameter setting combination can be evaluated.

Note that two major modifications were made to the original performance metrics of Van Veldhuizen (1999), Zitzler (1999) and Zitzler et al. (2000, 2003). Firstly, the original Van Veldhuizen and Zitzler metrics were designed for n-dimensional objective space. As two objectives are considered in this work, the metrics were converted to the (simpler) form relevant to two-dimensional objective space. Secondly, some metrics were transformed, where possible, into a normalised or relative value (i.e. per cent) which is suited better for the subsequent analysis.

#### Non-dominated number

The metric non-dominated number (NN) represents the percentage of total number of non-dominated solutions in a PFknown. It is a modification of Van Veldhuizen's metric overall non-dominated vector generation (ONVG) with its original definition as (Van Veldhuizen 1999):
7
where | PFknown| is the cardinality of PFknown, expressing the total number of solutions in PFknown.
It is obvious that ONVG is dependent upon the population size, because it is expected that a certain percentage, if not all initial random solutions, will evolve into final non-dominated solutions. If all analyses were run with the same population size, ONVG would give a fair comparison. However, this is not the case as different population sizes are used. To achieve a fair comparison, NN is converted into a relative value as follows:
8
where POP is the population size used to obtain the PFknown.

It is expected that the NSGA-II with calibrated parameter settings is able to evolve the entire initial population of random solutions into the same number of non-dominated optimal solutions. It is, therefore, desirable that NN=100%, which means that the final number of non-dominated solutions equals to the population size.

#### Unique non-dominated number

The metric unique non-dominated number (UN) is introduced in this work to report the percentage of unique non-dominated solutions in a PFknown. A unique solution represents a solution which has a different pump schedule from other solutions. The metric UN is defined as:
9
where ηi is a solution in PFknown, NN’ is the number of non-dominated solutions in PFknown.

The reason for this metric is that NSGA-II may not always preserve the diversity of population, thus it may return some solutions in PFknown which are identical (i.e. having the same pump schedules). This means that some members of the initial population of random solutions evolved into the identical final solutions. Having identical solutions in the PFknown is not desirable, nonetheless, as diverse solutions are required to be available to a decision maker. In an ideal case, all solutions in PFknown are unique (i.e. UN=100%).

#### True number

The metric true number (TN) measures the percentage of solutions in PFknown, which are members of the PFtrue. This metric is a variation of the original Van Veldhuizen's metric error ratio (E) reporting the number of solutions in PFknown, which are not members of the PFtrue (Van Veldhuizen 1999) as:
10
A result E=0 (an ideal situation) indicates that all solutions in PFknow are contained in the PFtrue, while E=1 indicates that there are none. For easier interpretation of how many solutions in PFknown are part of the PFtrue, TN simply reports the percentage of such solutions as:
11

In an ideal situation, when all solutions in PFknown are members of PFtrue, TN=100%. The higher the percentage of PFknown solutions are members of PFtrue, the better the PFknown is.

#### Generational distance

The metric generational distance (GD) evaluates how close a PFknown to the PFtrue is, by measuring the distance between those two Pareto fronts. It is adapted from Van Veldhuizen (1999) as follows:
12
where δi is the Euclidean distance between the ith solution in PFknown and the nearest solution in PFtrue. For example, the Euclidean distance between the two points A=(xA, yA) and B=(xB, yB) in 2-dimensional space x-y is calculated as δAB=[(xA–xB)2+(yA–yB)2]1/2.

The GD metric is kept as an absolute value consistent with the original Van Veldhuizen's formula (Van Veldhuizen 1999). It is obvious that the smaller the distance, the closer the PFknow is to the PFtrue. A result GD=0 indicates that PFknow=PFtrue in terms of location, any other value indicates that PFknown deviates from the PFtrue (Van Veldhuizen 1999), so zero is a desired value.

#### ɛ-indicator

The metric ɛ-indicator (IE) measures ‘the smallest distance that an approximation set [PFknown] must be translated in order to completely dominate a reference set [PFtrue]’ (Kollat et al. 2008). This metric is adapted from Zitzler et al. (2003) as:
13
where is the value of the ith objective in PFknown, is the value of the ith objective in PFtrue.

The IE metric adopts values equal or bigger than 1. A result IE=1 indicates that PFknow=PFtrue in terms of location, the bigger IE value represents the larger distance from the PFtrue, so 1 is a desired value.

#### S-metric

The S-metric (SM) is defined for 2-dimensional objective space (Zitzler & Thiele 1998), an equivalent metric for n-dimensional objective space is called hypervolume (Fleischer 2003; Zitzler et al. 2003). This metric measures the area or volume, respectively, covered by the PFknown from the worst possible solution (the reference point). The SM is defined as follows:
14
where ai is the area determined by the solutions in PFknown and the reference point.

The SM is kept as an absolute value. A PFknown with the largest SM value is closest to the PFtrue. Therefore, the larger value of SM, which expresses the closer proximity to the PFknown, is desirable.

#### Extent

The metric extent (EX) represents the spread or extent of PFknown across the objective space. This metric is a modification of Zitzler's metric (M3) calculated as (Zitzler 1999; Zitzler et al. 2000):
15
where ɛouter is the Euclidean distance between the objective function values of two outer solutions in PFknown.
To enable better comparison for the purpose of sensitivity analysis, Equation (15) is modified to express the percentage of the extent of a PFknown in relation to the PFtrue as follows:
16
where is the Euclidean distance between the objective function values of two outer solutions in the PFtrue.

The extent of PFknown ‘should be maximised, i.e. for each objective a wide range of values should be covered by the non-dominated solutions’ (Zitzler et al. 2000), so a wide range of solutions is available to a decision maker. Equally, the greater value of EX expressing the larger the extent of PFknown is desirable. It is worth noting that the extent of a PFknown> 100% indicates that the PFknown has the larger extent than the PFtrue (i.e. EXPFtrue=100%).

#### Spacing

The metric spacing (SC) evaluates the spread or distribution of solutions in PFknown and is adapted from Van Veldhuizen (1999) as follows:
17
where ɛi is the Euclidean distance between the ith solution and its closest neighbour in PFknown, calculated in the objective space; is the mean of all ɛi.

The SC metric is kept as an absolute value according to the original Van Veldhuizen's (1999) formula. To ensure that the results are not affected by identical solutions in the PFknown, only unique solutions were included into SC calculations. The more uniformly distributed solutions within the PFknown, the better the PFknown is. A result of SC = 0 indicates an equidistant spacing between solutions in the PFknown, which is an ideal situation.

An overview of the performance metrics, including their expected range and desired values, is shown in Table 3. Some of these metrics require knowledge about the PFtrue, which is also indicated in Table 3.

Table 3

Metric overview

Metric Abbreviation Expected range Desired value PFtrue required? Comment
Non-dominated number NN 0–100% 100% No Percentage of the population size
Unique non-dominated number UN 0–100% 100% No
True number TN 0–100% 100% Yes Percentage of the total number of non-dominated solutions in PFknown
Generational distance GD ≥0 Yes Absolute value
ɛ-indicator IE ≥1 Yes (no)* Absolute value
S-metric SM ≥0 >> 0 No Absolute value
Extent EX ≥0% >> 0% Yes (no)** Percentage of the PFtrue extent
Spacing SC ≥0 No Absolute value
Metric Abbreviation Expected range Desired value PFtrue required? Comment
Non-dominated number NN 0–100% 100% No Percentage of the population size
Unique non-dominated number UN 0–100% 100% No
True number TN 0–100% 100% Yes Percentage of the total number of non-dominated solutions in PFknown
Generational distance GD ≥0 Yes Absolute value
ɛ-indicator IE ≥1 Yes (no)* Absolute value
S-metric SM ≥0 >> 0 No Absolute value
Extent EX ≥0% >> 0% Yes (no)** Percentage of the PFtrue extent
Spacing SC ≥0 No Absolute value

*Individual PFknowns could be compared without using PFtrue (see Zitzler et al. 2003).

**If an absolute value was used (i.e. EX = εouter), PFtrue would not be required.

### Performance and sensitivity evaluation

There are two possible approaches to evaluate performance using the metrics. The first approach examines the results which were directly obtained from the metrics and compares them. This approach provides, on the one hand, very detailed information about performance of a parameter setting combination in each aspect of the quality of the Pareto front. On the other hand, the quantity of results (eight results for each parameter setting combination of 152 total combinations) would cause difficulties in analysing and presenting them. Hence, this approach was not used as it is more suitable for low quantities of results. It was used, for example, by Baran et al. (2005) to compare six multi-objective evolutionary algorithms.

The second approach, which is developed in the following paragraphs and applied for this work, combines the metrics together with the aim to obtain an overall performance for a parameter setting combination. This approach requires, besides setting the performance criteria for each of the metric with the corresponding performance scores, also the assignment of a weighting factor to each metric (Table 4). Based on the performance criteria, the value for each metric is replaced by means of linear interpolation with a corresponding performance score of 0.0–5.0 to provide a common basis for comparison. This step would not be required if all metrics were assessed as normalised or relative values. Weighting factors are then determined to provide a desired level of balance across all metrics according to the user's preference. These weighting factors (Table 4) combine the individual performance scores of metrics into a combined performance score of all metrics (Table 5). A disadvantage of this approach is in the subjectivity of weighting factors. However, the clear advantage is the significantly more succinct set of results obtained which are more easily interpreted and presented.

Table 4

Metric weighting factors and performance criteria

Metric NN (%) UN (%) TN (%) GD IE SM** EX (%) SC
Weighting factor 0.120 0.050 0.050 0.250 0.250 0.250 0.005 0.025
Individual performance Performance score Metric performance criteria
Very good 〈5.0–4.0) 100–80 100–80 100–80 0–50 1–1.05 37.3*–34 313*–80 0–500
Good 〈4.0–3.0) 80–60 80–60 80–60 50–100 1.05–1.1 34–31 80–60 500–1,000
Average 〈3.0–2.0) 60–40 60–40 60–40 100–200 1.1–1.3 31–28 60–40 1,000–1,500
Poor 〈2.0–1.0) 40–20 40–20 40–20 200–500 1.3–1.5 28–25 40–20 1,500–2,000
Very poor 〈1.0–0.0〉 20–0 20–0 20–0 500–12,816* 1.5–3.34* 25–21.3* 20–0 2,000–48,838*
Metric NN (%) UN (%) TN (%) GD IE SM** EX (%) SC
Weighting factor 0.120 0.050 0.050 0.250 0.250 0.250 0.005 0.025
Individual performance Performance score Metric performance criteria
Very good 〈5.0–4.0) 100–80 100–80 100–80 0–50 1–1.05 37.3*–34 313*–80 0–500
Good 〈4.0–3.0) 80–60 80–60 80–60 50–100 1.05–1.1 34–31 80–60 500–1,000
Average 〈3.0–2.0) 60–40 60–40 60–40 100–200 1.1–1.3 31–28 60–40 1,000–1,500
Poor 〈2.0–1.0) 40–20 40–20 40–20 200–500 1.3–1.5 28–25 40–20 1,500–2,000
Very poor 〈1.0–0.0〉 20–0 20–0 20–0 500–12,816* 1.5–3.34* 25–21.3* 20–0 2,000–48,838*

*Maximum or minimum values obtained.

**SM Metric performance criteria × 106.

Table 5

Overall metrics performance

Overall performance of a parameter setting combination Combined performance score
Very good 〈5.0–4.0)
Good 〈4.0–3.0)
Average 〈3.0–2.0)
Poor 〈2.0–1.0)
Very poor 〈1.0–0.0〉
Overall performance of a parameter setting combination Combined performance score
Very good 〈5.0–4.0)
Good 〈4.0–3.0)
Average 〈3.0–2.0)
Poor 〈2.0–1.0)
Very poor 〈1.0–0.0〉

The development of the metric performance criteria and weighting factors was undertaken by a trial-and-error approach. Initially, performance criteria for all metrics were distributed evenly with the same weighting factors assigned so all factors summed to 1.0. A process was then undertaken whereby the overall performance was assessed against a visual comparison where the proximity of the PFknown to the PFtrue was considered of main importance. Basically, the visual comparison served as a means of ‘calibration’ of metric performance criteria and weighting factors, so the combined performance scores returned (see Table 5) were (more or less) in alignment with the location of the PFknown from the PFtrue. For that reason, metrics GD, IE and SM were assigned a higher weighting factor than the other metrics reflecting their higher importance. Similarly, metric TN was assigned a low weighting factor because the bulk of the results (mainly SA1) for TN metric did not score well due to the lower population size applied. Fine tuning the performance criteria and weighting factors was an iterative task requiring careful considerations. Furthermore, these considerations will vary for different problems.

The following paragraphs illustrate a practical example of establishing the sensitivity of a particular parameter of interest using the performance metrics. Firstly, the metrics for the parameter setting combination are calculated and using linear interpolation, assigned a performance score from 0.0 to 5.0 (as per Table 4). For example, the results NN = 100%, UN = 94%, TN = 5%, GD = 117, IE = 1.06, SM = 37.3 × 106, EX = 99%, SC = 2,319 are replaced with performance scores as NN = 5.0, UN = 4.7, TN = 0.3, GD = 2.8, IE = 3.8, SM = 5.0, EX = 4.1, SC = 1.0. Secondly, the combined performance score, which can adopt values from 0.0 to 5.0, is established as the sum of individual metric performance scores added together after each of them was multiplied by a weighting factor. Accordingly, the combined performance score is obtained as 5.0 × 0.120 + 4.7 × 0.050 + 0.3 × 0.050 + 2.8 × 0.250 + 3.8 × 0.250 + 5.0 × 0.250 + 4.1 × 0.005 + 1.0 × 0.025 = 3.8, with the overall performance of the parameter setting combination ‘good’ (as per Table 5).

Finally, an additional step is undertaken to obtain the sensitivity of a particular parameter of interest. This involves grouping together parameter setting combinations, in which the parameter of interest is systematically changing (i.e. incrementally increasing) while the other parameters are held constant. These groups are further referred to as sensitivity groups (SEG). The sensitivity of the parameter of interest is then expressed in sensitivity points (SPs), which is the difference between the maximum and minimum combined performance scores within the SEG. For example, combined performance scores in the SEG ranging from 0.6 to 3.9 give 3.3 SPs (i.e. 3.9–0.6 = 3.3), indicating the high sensitivity of the parameter. Note that the maximum possible SPs = 5.0 (as per Table 5). Conversely, combined performance scores in the SEG ranging from 2.8 to 3.2 give 0.4 SPs, indicating low sensitivity of the parameter.

The methodology for performance and sensitivity analysis of algorithm parameters is summarised as a flowchart in Figure 3.

Figure 3

Flowchart for performance and sensitivity analysis of algorithm parameters.

Figure 3

Flowchart for performance and sensitivity analysis of algorithm parameters.

## RESULTS

All analyses were conducted using Intel Core i3 CPU M 380 processor with RAM 2.0 GB. Due to limited computer capacity for required analyses, about 30,000 function evaluations only were able to be run at once. Whereas the number of function evaluations required for one analysis ranged from 25,000 (i.e. population size of 50 for 500 generations) up to 800,000 (i.e. population size of 400 for 2,000 generations). Advantageously, the GANetXL's capability to conduct one analysis in a sequence of interrupted runs was used, where the previous run is resumed using the ‘resume’ function in the application's toolbar (CWS 2011).

The results for all 152 analyses conducted are displayed in Figure 4. Note that only feasible solutions are shown in Figure 4 as only feasible solutions were obtained from the optimisation model (6) subject to constraints (3)–(5). SA1 (solid grey squares in Figure 4), which tested the sensitivity of mutation and crossovers for a constant value of population size of 100 and the number of generations of 2000, includes 80 Pareto fronts (i.e. PFknowns). SA2 (solid black triangles in Figure 4), which tested the sensitivity of the population size and number of generations for a constant value of mutation of 0.005 and crossover of 0.7 SXP and 1.0 S1P, includes 72 Pareto fronts (i.e. PFknowns). It can be seen that SA1 (solid grey squares) consist of two isolated clouds of Pareto fronts. This is caused by a particular value of a mutation, which is discussed further in the Mutation section.

Figure 4

Results of all analyses conducted.

Figure 4

Results of all analyses conducted.

### True Pareto front

As previously described, PFtrue is the union of the best solutions found in all analyses conducted. To be more exact, 152 Pareto fronts (i.e. PFknowns) from both SA1 and SA2 were joined together (solid grey dots in Figure 5) and the optimal solutions identified (solid black dots as PFtrue in Figure 5). Because some of these solutions were identical (i.e. had the same pump schedules), they were eliminated from the PFtrue. Those identical solutions were mainly gained from the same combinations of NSGA-II parameter settings, but different objective function scaling. The resulting PFtrue, which was used for subsequent calculations of some of the metrics, consists of a total of 333 non-dominated and unique solutions.

Figure 5

Pareto fronts of all analyses conducted and true Pareto front.

Figure 5

Pareto fronts of all analyses conducted and true Pareto front.

Some interesting observations were made regarding the PFtrue (Figure 6, Table 6), which relate to the NSGA-II parameter setting combinations and population sizes creating the PFtrue. The first observation was that the PFtrue was reached by several NSGA-II parameter setting combinations, more specifically 13 combinations. Interestingly, five of those combinations (i.e. 38%) represented 91% of all solutions in PFtrue. Conversely, eight combinations (i.e. 62%) represented only 9% of solutions in PFtrue. So, the minority of the parameter setting combinations, which reached the PFtrue, formed the vast majority of all PFtrue solutions and vice versa. The second observation was that as the solutions within the PFtrue progress from the top left to bottom right, they are derived from (more or less) dedicated sections to a particular and increasing population size. The last observation may indicate that population size performs inconsistently across a search space for this particular problem, which is discussed in the Discussion section.

Table 6

NSGA-II parameter setting combinations which reached the PFtrue

NSGA-II parameter setting combination

Solutions in PFtrue

POP GEN MUT CRS Number
50 2,000 0.005 1.0S1P 0.3
100 2,000 0.005 1.0S1P 10 3.0
100 2,000 0.01 0.7S1P 15 4.5
100 2,000 0.005 1.0SXP 0.6
150 2,000 0.005 0.7SXP 47 14.1
150 1,000 0.005 0.7SXP 1.2
200 2,000 0.005 0.7SXP 0.6
300 2,000 0.005 0.7SXP 133 39.9
300 2,000 0.005 1.0S1P 30 9.0
300 1,000 0.005 0.7SXP 0.9
300 1,000 0.005 1.0S1P 0.6
400 2,000 0.005 1.0S1P 79 23.7
400 1,000 0.005 1.0S1P 1.5
Total 333 100
NSGA-II parameter setting combination

Solutions in PFtrue

POP GEN MUT CRS Number
50 2,000 0.005 1.0S1P 0.3
100 2,000 0.005 1.0S1P 10 3.0
100 2,000 0.01 0.7S1P 15 4.5
100 2,000 0.005 1.0SXP 0.6
150 2,000 0.005 0.7SXP 47 14.1
150 1,000 0.005 0.7SXP 1.2
200 2,000 0.005 0.7SXP 0.6
300 2,000 0.005 0.7SXP 133 39.9
300 2,000 0.005 1.0S1P 30 9.0
300 1,000 0.005 0.7SXP 0.9
300 1,000 0.005 1.0S1P 0.6
400 2,000 0.005 1.0S1P 79 23.7
400 1,000 0.005 1.0S1P 1.5
Total 333 100
Figure 6

True Pareto front.

Figure 6

True Pareto front.

### Objective function scaling

The results for the sensitivity of objective function scaling (Figure 7) include two sets of 76 Pareto fronts (i.e. PFknowns) each gained from the same NSGA-II parameter setting combinations, with the difference that only one set had its objective functions scaled. Figure 7 indicates that there may be a considerable number of identical solutions in the two Pareto front sets, because black dots representing scaled objective functions appear to lie on top of grey crosses representing non-scaled objective functions. Comparing individual solutions from the two Pareto front sets confirmed that about 70% of them are indeed identical, resulting in the majority of Pareto fronts (64 of 76) from the two sets being also identical. Accordingly, 64 NSGA-II parameter setting combinations (84%) delivered the same Pareto fronts regardless of the objective functions being or not being scaled. Whereas only 12 NSGA-II parameter setting combinations (16%) delivered varying Pareto fronts for scaled and non-scaled objective functions.

Figure 7

Pareto fronts for sensitivity of objective function scaling.

Figure 7

Pareto fronts for sensitivity of objective function scaling.

Differences in those two sets of 12 Pareto fronts were each examined using metrics. Results from the SC metric showed that uniformity of spacing between solutions in the Pareto fronts with scaled objective functions improved in five instances and deteriorated in seven instances. Consequently, it was not confirmed that adequate scaling of the objective functions assures more uniformly distributed solutions, for the optimisation model and network used. Similarly, the improvement in quality of Pareto fronts for scaled objective functions regarding other performance metrics was non-conclusive, as some of those metrics gave better results and some worse, with the overall performance being the same comparing scaled and non-scaled objective functions. An exception was the EX metric, which for scaled objective functions improved in nine instances and deteriorated in three instances only, so scaled objective functions generally delivered Pareto fronts with larger extents.

Considering the bias of the NSGA-II towards the water quality objective when objective functions are inadequately scaled, the results for non-scaled and scaled objective functions were compared in terms of concentrations of solutions along the Pareto front and the location of the Pareto front. It can be seen from Figure 7 that the concentration of solutions for minimum pump energy cost values improved (for example ellipses A and B in Figure 7) for scaled objective functions. Moreover, scaled objective functions tended to find better solutions for minimum water quality values (for example ellipse E in Figure 7), with an example provided in Table 7. It was thus confirmed that non-scaled (i.e. inadequately scaled) objective functions caused the NSGA-II bias towards the water quality objective for the optimisation model and network used. However, the discontinuity of Pareto fronts is still obvious (for example ellipses C and D in Figure 7), with the reasons discussed in the Discussion section.

Table 7

Example of minimum solution for economic and water quality objective

Analysis ID POP GEN MUT CRS Objective functions scaled Economic objective Water quality objective
104 100 2,000 0.005 1.0 S1P No 148.7 98,149.0
204.8 1,815.9
140 100 2,000 0.005 1.0 S1P Yes 148.7 98,149.0
204.5 1,521.8*
Analysis ID POP GEN MUT CRS Objective functions scaled Economic objective Water quality objective
104 100 2,000 0.005 1.0 S1P No 148.7 98,149.0
204.8 1,815.9
140 100 2,000 0.005 1.0 S1P Yes 148.7 98,149.0
204.5 1,521.8*

POP = population size, GEN = number of generations, MUT = mutation, CRS = crossover.

*Note that the water quality objective improved when scaled.

### NSGA-II parameters

Because the two sets of results for non-scaled and scaled objective functions were very similar, with scaled objective functions proved to deliver slightly better results, the subsequent sensitivity analysis of NSGA-II parameters was undertaken using results with scaled objective functions only.

The graphical results for exploring the sensitivity of mutation, crossover (both types S1P and SXP), population size and the number of generations are displayed in Figures 812. In those figures, each set of results representing a single parameter value (mutation value of 0.002 as white triangles in Figure 8, or crossover value of 0.6 as solid black squares in Figure 9, and so on) exist as a ‘cloud’ rather than a single front. The reason for this is that a single parameter value was always combined with other parameters, some as constants, but some as a range, as noted in the figure titles. For example, the data set for the aforementioned mutation value of 0.002 (white triangles in Figure 8) includes results for population size of 100, the number of generations of 2,000 and five crossover values from 0.6 to 1.0 both S1P and SXP, so this data set contains 10 Pareto fronts.

Figure 8

Pareto fronts for sensitivity of mutation (POP 100, GEN 2,000, CRS 0.6-1.0 both S1P and SXP).

Figure 8

Pareto fronts for sensitivity of mutation (POP 100, GEN 2,000, CRS 0.6-1.0 both S1P and SXP).

Figure 9

Pareto fronts for sensitivity of S1P crossover (POP 100, GEN 2,000, MUT 0.002-0.1).

Figure 9

Pareto fronts for sensitivity of S1P crossover (POP 100, GEN 2,000, MUT 0.002-0.1).

Figure 10

Pareto fronts for sensitivity of SXP crossover (POP 100, GEN 2,000, MUT 0.002-0.1).

Figure 10

Pareto fronts for sensitivity of SXP crossover (POP 100, GEN 2,000, MUT 0.002-0.1).

Figure 11

Pareto fronts for sensitivity of population size (MUT 0.005, CRS 0.7 SXP and 1.0 S1P, GEN 500-2,000).

Figure 11

Pareto fronts for sensitivity of population size (MUT 0.005, CRS 0.7 SXP and 1.0 S1P, GEN 500-2,000).

Figure 12

Pareto fronts for sensitivity of the number of generations (MUT 0.005, CRS 0.7 SXP and 1.0 S1P, POP 50-400).

Figure 12

Pareto fronts for sensitivity of the number of generations (MUT 0.005, CRS 0.7 SXP and 1.0 S1P, POP 50-400).

#### Mutation

The observation gleaned from Figure 8 is that the best and most competitive results were obtained from the mutation values of 0.005 and 0.01. The mutation value of 0.002 gave poorer results and the value of 0.1 provided Pareto fronts very distant from the optimal solutions, creating an isolated cloud. This indicates that the mutation value of 0.1 is too high for this particular problem, which caused the search to degenerate into a random process as previously described by Savic & Walters (1997). In contrast, the mutation value of 0.002 is too low for this particular problem, which caused the premature convergence of the algorithm to a local optimum as previously described by Srinivas & Patnaik (1994).

The sensitivity of mutation was investigated using metrics. Table 8 lists two sample SEGs from 10 available SEGs for the mutation. The performance of mutation varies from ‘very poor’ to ‘good’ for the both SEGs, with the corresponding 3.3 and 2.9 SPs, respectively. Because of the wide range of scores across all SEGs with the average of 2.9 SPs, the sensitivity of mutation is high for the optimisation model and network used. This means that the selection of a suitable mutation value to achieve good results requires careful attention.

Table 8

Sensitivity of mutation (two sample SEGs)

Metrics

ID Mutation value Constant parameters NN (%) UN (%) TN (%) GD IE SM (×106EX (%) SC Combined performance score Overall performance
41 0.002 POP: 100
GEN: 2,000
CRS: 0.6 S1P
100 51 93 1.10 31.9 38 1,071 3.0 Average
51 0.005 100 86 125 1.10 32.3 63 1,914 2.9 Average
61 0.01 100 91 61 1.06 35.4 45 1,335 3.9 Good
71 0.1 18 18 4,113 1.74 21.3 149 14,416 0.6 Very poor
48 0.002 POP: 100
GEN: 2,000
CRS: 0.9 SXP
100 57 83 1.17 28.7 45 1,759 2.7 Average
58 0.005 100 100 71 1.10 32.0 58 1,551 3.2 Good
68 0.01 100 100 55 1.09 32.7 66 3,615 3.3 Good
78 0.1 14 14 11,085 2.36 22.0 219 17,899 0.4 Very poor
Metrics

ID Mutation value Constant parameters NN (%) UN (%) TN (%) GD IE SM (×106EX (%) SC Combined performance score Overall performance
41 0.002 POP: 100
GEN: 2,000
CRS: 0.6 S1P
100 51 93 1.10 31.9 38 1,071 3.0 Average
51 0.005 100 86 125 1.10 32.3 63 1,914 2.9 Average
61 0.01 100 91 61 1.06 35.4 45 1,335 3.9 Good
71 0.1 18 18 4,113 1.74 21.3 149 14,416 0.6 Very poor
48 0.002 POP: 100
GEN: 2,000
CRS: 0.9 SXP
100 57 83 1.17 28.7 45 1,759 2.7 Average
58 0.005 100 100 71 1.10 32.0 58 1,551 3.2 Good
68 0.01 100 100 55 1.09 32.7 66 3,615 3.3 Good
78 0.1 14 14 11,085 2.36 22.0 219 17,899 0.4 Very poor

POP = population size, GEN = number of generations, CRS = crossover.

#### Crossover

It can be observed from Figures 9 and 10 that particular crossover values are very widely spread and, unlike mutation, are not formed into designated performance areas. This may not make the identification of the best performing crossover value as straightforward as for the mutation. Indeed, the metrics with the following sensitivity analysis confirmed the low sensitivity to different crossover values, meaning that the change of crossover setting does not affect the final solution significantly.

Table 9 presents two sample SEGs from eight available SEGs for the crossover. The performance of crossover varies from ‘average’ to ‘good’ for the first SEG and is ‘very poor’ across the second SEG, with the corresponding 0.9 SPs and 0.4 SPs, respectively. Because the range of scores across all SEGs is narrow with the average of 0.7 SPs, the sensitivity of crossover is low for the optimisation model and network used. The selection of a suitable crossover value for this particular problem is still important to achieve good results, yet it does not seem as crucial as the identification of a suitable mutation value.

Table 9

Sensitivity of crossover (two sample SEGs)

Metrics

ID Crossover value Constant parameters NN (%) UN (%) TN (%) GD IE SM (×106EX (%) SC Combined performance score Overall performance
51 0.6s1p POP: 100
GEN: 2,000
MUT: 0.005
100 86 125 1.10 32.3 63 1,914 2.9 Average
53 0.7s1p 100 96 59 1.13 33.0 52 1,539 3.2 Good
55 0.8s1p 100 97 58 1.08 33.8 40 1,245 3.3 Good
57 0.9s1p 100 70 45 1.14 29.1 46 2,344 2.9 Average
59 1.0s1p 100 94 117 1.06 37.3 99 2,319 3.8 Good
72 0.6sXp POP: 100
GEN: 2,000
MUT: 0.1
16 16 5,655 1.84 22.9 166 14,186 0.6 Very poor
74 0.7sXp 18 18 12,816 3.34 24.0 313 48,838 0.4 Very poor
76 0.8sXp 17 17 3,775 1.42 22.8 123 7,396 0.8 Very poor
78 0.9sXp 14 14 11,085 2.36 22.0 219 17,899 0.4 Very poor
80 1.0sXp 16 16 7,259 1.83 23.6 167 13,545 0.6 Very poor
Metrics

ID Crossover value Constant parameters NN (%) UN (%) TN (%) GD IE SM (×106EX (%) SC Combined performance score Overall performance
51 0.6s1p POP: 100
GEN: 2,000
MUT: 0.005
100 86 125 1.10 32.3 63 1,914 2.9 Average
53 0.7s1p 100 96 59 1.13 33.0 52 1,539 3.2 Good
55 0.8s1p 100 97 58 1.08 33.8 40 1,245 3.3 Good
57 0.9s1p 100 70 45 1.14 29.1 46 2,344 2.9 Average
59 1.0s1p 100 94 117 1.06 37.3 99 2,319 3.8 Good
72 0.6sXp POP: 100
GEN: 2,000
MUT: 0.1
16 16 5,655 1.84 22.9 166 14,186 0.6 Very poor
74 0.7sXp 18 18 12,816 3.34 24.0 313 48,838 0.4 Very poor
76 0.8sXp 17 17 3,775 1.42 22.8 123 7,396 0.8 Very poor
78 0.9sXp 14 14 11,085 2.36 22.0 219 17,899 0.4 Very poor
80 1.0sXp 16 16 7,259 1.83 23.6 167 13,545 0.6 Very poor

POP = population size, GEN = number of generations, MUT = mutation.

#### Population size

The performance of different population sizes (Figures 6 and 11) was discussed in relation to the PFtrue in previous sections, so only sensitivity is examined here. Table 10 lists two sample SEGs from six available SEGs for the population size. The performance of population size varies from ‘average’ to ‘very good’ for the first SEG and is ‘good’ across the second SEG, with the corresponding 1.4 SPs and 0.7 SPs, respectively. Because of the intermediate range of scores across all SEGs with the average of 1.2 SPs, the sensitivity of population size is moderate for the optimisation model and network used. The selection of a suitable population size to achieve good results needs careful attention, which is emphasised by the finding that a particular population size appears to perform better at some locations within the search space and worse in others as demonstrated by the PFtrue (Figure 6). This inconsistent performance is discussed in the Discussion section.

Table 10

Sensitivity of population size (two sample SEGs)

Metrics

ID Population size Constant parameters NN (%) UN (%) TN (%) GD IE SM (×106EX (%) SC Combined performance score Overall performance
118 50 GEN: 1,000
MUT: 0.005
CRS: 0.7 SXP
100 100 202 1.13 32.1 83 4,119 2.8 Average
121 100 100 100 23 1.08 30.6 22 927 3.2 Good
124 150 100 86 46 1.05 36.5 64 1,239 4.1 Very good
127 200 100 51 42 1.12 35.1 54 867 3.7 Good
130 300 100 48 36 1.12 35.8 71 1,219 3.7 Good
133 400 100 43 102 1.15 33.4 115 2,007 2.7 Average
137 50 GEN: 2,000
MUT: 0.005
CRS: 1.0 S1P
100 98 67 1.10 31.8 55 2,474 3.2 Good
140 100 100 94 117 1.06 37.3 99 2,319 3.8 Good
143 150 100 99 24 1.13 33.7 38 1,139 3.3 Good
146 200 100 100 29 1.11 31.3 48 988 3.5 Good
149 300 100 34 24 1.07 34.6 58 822 3.9 Good
152 400 100 83 22 1.08 32.9 53 1,017 3.5 Good
Metrics

ID Population size Constant parameters NN (%) UN (%) TN (%) GD IE SM (×106EX (%) SC Combined performance score Overall performance
118 50 GEN: 1,000
MUT: 0.005
CRS: 0.7 SXP
100 100 202 1.13 32.1 83 4,119 2.8 Average
121 100 100 100 23 1.08 30.6 22 927 3.2 Good
124 150 100 86 46 1.05 36.5 64 1,239 4.1 Very good
127 200 100 51 42 1.12 35.1 54 867 3.7 Good
130 300 100 48 36 1.12 35.8 71 1,219 3.7 Good
133 400 100 43 102 1.15 33.4 115 2,007 2.7 Average
137 50 GEN: 2,000
MUT: 0.005
CRS: 1.0 S1P
100 98 67 1.10 31.8 55 2,474 3.2 Good
140 100 100 94 117 1.06 37.3 99 2,319 3.8 Good
143 150 100 99 24 1.13 33.7 38 1,139 3.3 Good
146 200 100 100 29 1.11 31.3 48 988 3.5 Good
149 300 100 34 24 1.07 34.6 58 822 3.9 Good
152 400 100 83 22 1.08 32.9 53 1,017 3.5 Good

GEN = number of generations, MUT = mutation, CRS = crossover.

#### Number of generations

The first observation gleaned from Figure 12 is that generally, the results show steady improvement with the increasing number of generations. Hence, the best results were obtained from the number of generations of 2,000, which was confirmed by the PFtrue, where 319 (i.e. 96%) of total 333 solutions in the PFtrue were gained from this number of generations.

The sensitivity of the number of generations was assessed using metrics. Table 11 lists two sample SEGs from 12 available SEGs for the number of generations. The performance of the number of generations varies from ‘average’ to ‘good’ for the first SEG and is ‘good’ across the second SEG, with the corresponding 0.4 SPs and 0.8 SPs, respectively. Because the range of scores across all SEGs is narrow with the average of 0.7 SPs, the sensitivity of number of generations is low for the optimisation model and network used. In spite of the steady improvements in results with increased number of generations, there are additional associated computational costs which may be considered to negate the perceived benefits.

Table 11

Sensitivity of the number of generations (two sample SEGs)

Metrics

ID Number of generations Constant parameters NN (%) UN (%) TN (%) GD IE SM (×106EX (%) SC Combined performance score Overall performance
120 500 POP: 100
MUT: 0.005
CRS: 0.7 SXP
100 71 72 1.14 29.5 52 2,764 2.8 Average
121 1,000 100 100 23 1.08 30.6 22 927 3.2 Good
122 2,000 100 100 28 1.07 30.8 22 526 3.2 Good
147 500 POP: 300
MUT: 0.005
CRS: 1.0 S1P
100 33 43 1.14 32.5 40 675 3.1 Good
148 1,000 100 28 51 1.08 34.3 59 1,001 3.6 Good
149 2,000 100 34 24 1.07 34.6 58 822 3.9 Good
Metrics

ID Number of generations Constant parameters NN (%) UN (%) TN (%) GD IE SM (×106EX (%) SC Combined performance score Overall performance
120 500 POP: 100
MUT: 0.005
CRS: 0.7 SXP
100 71 72 1.14 29.5 52 2,764 2.8 Average
121 1,000 100 100 23 1.08 30.6 22 927 3.2 Good
122 2,000 100 100 28 1.07 30.8 22 526 3.2 Good
147 500 POP: 300
MUT: 0.005
CRS: 1.0 S1P
100 33 43 1.14 32.5 40 675 3.1 Good
148 1,000 100 28 51 1.08 34.3 59 1,001 3.6 Good
149 2,000 100 34 24 1.07 34.6 58 822 3.9 Good

POP = population size, MUT = mutation, CRS = crossover.

## DISCUSSION

### Performance metrics and their application

The UN metric (9) reports the percentage of unique non-dominated solutions in a PFknown (i.e. solutions with unique pump schedules). There were some solutions found which had different pump schedules, yet objective function values for both objectives were nearly the same, for example, only differing in the third decimal place for one objective. Note that solutions with the same objective function values, but different decision variables (i.e. pump schedules) are called alternate solutions (Bhattacharjya 2007). In this work, the formerly described solutions were considered ‘unique’ for the following reason. If a solution has desirable objective function values but is impractical, the decision maker would probably be interested in a different solution with similar objective function values.

There is a limitation to the GD metric (12), which measures the distance between a PFknown and the PFtrue. This metric may give misleading results if the length (i.e. extent) of the PFknown is significantly greater than the extent of the PFtrue as demonstrated in Figure 13. Despite the proximity of PFknownA and PFknownB to the PFtrue being the same, the GD metric returns larger value for the PFknownA due to inclusion of the points A1–A4 in the equation, calculating their distance from the outer PFtrue point T1. This limitation was recognised but was considered adequate for the purpose of the sensitivity analysis. This limitation is overcome by using the ɛ-indicator metric.

Figure 13

GD metric limitation.

Figure 13

GD metric limitation.

In addition, both the GD metric and ɛ-indicator metric require knowledge about the PFtrue, which is not normally available. The metric with a similar function, which measures the closeness of a PFknown to the PFtrue, yet does not require the PFtrue information, is S-metric for 2-dimensional objective space (Zitzler & Thiele 1998) or hypervolume for n-dimensional objective space (Fleischer 2003; Zitzler et al. 2003). The disadvantage of this metric is, however, that the choice of the reference point can influence the evaluation of the PFknowns (Knowles & Corne 2002).

The methodology for performance and sensitivity analysis using metrics may be applied for the calibration of multi-objective algorithm parameters for any WDS optimisation problem. It needs to be noted, nevertheless, that the metric performance criteria, performance scores and weighting factors (see Table 4) were specifically developed for the purpose of this work and, that these parameters need to be developed and fine-tuned to suit the particular problem. Furthermore, the selection of the metrics and their performance evaluation also depends on the purpose for their application. In the case of calibration, the metrics GD, IE, SM and TN would likely adopt the higher weighting factors than the other metrics. For applications comprising more than two objectives, metrics can easily be adjusted to the required n-dimensional format.

The disadvantage of the methodology used is that a large computational effort is required. This effort may be partially reduced by using only metrics, which do not require knowledge about the PFtrue (refer to Table 3). Alternatively, algorithms with adaptive parameter settings, such as Borg multi-objective evolutionary algorithm (Hadka & Reed 2013) or algorithms with few parameters, such as Pareto archived dynamically dimensioned search (PA-DDS) with only one parameter (Asadzadeh & Tolson 2011) could be considered. It is expected that these algorithms may well require less computational effort to calibrate their parameters.

### Objective function scaling

In spite of the adequate scaling of objective functions, the Pareto fronts obtained were discontinuous, which is possibly due to the following:

1. Discrete decision variables.

2. Model simplifications, for example, water quality analysis (the length of time step, i.e. 1 hour rather than 1–10 minutes, the length of simulation period, i.e. 1 day rather than 1 week or more).

3. Only a single run was performed from one initial population of random solutions.

### NSGA-II parameters

#### Parameter sensitivity

Performance metrics enabled to evaluate the sensitivity of the NSGA-II parameters on a quantitative and more objective basis than the use of visual comparison only. In particular, the sensitivity of NSGA-II parameters was expressed as SPs of maximum possible SPs, which assisted in understanding of sensitivity hierarchy (Table 12). The most sensitive parameter was identified mutation and the least sensitive crossover, with the limitation to the optimisation model and network used. Population size was more sensitive than the number of generations for this problem. Note that the more sensitive parameter is, its change has greater influence on results.

Table 12

Sensitivity of NSGA-II parameters

Sensitivity points (SPs)*

Parameter Average Median Minimum Maximum Parameter sensitivity Number of sensitivity groups (SEGs)
Mutation 2.9 2.9 2.7 3.3 High 10
Population size 1.2 1.3 0.6 1.8 Moderate
Number of generations 0.7 0.8 0.0 2.1 Low 12
Crossover 0.7 0.7 0.2 1.4 Low
Sensitivity points (SPs)*

Parameter Average Median Minimum Maximum Parameter sensitivity Number of sensitivity groups (SEGs)
Mutation 2.9 2.9 2.7 3.3 High 10
Population size 1.2 1.3 0.6 1.8 Moderate
Number of generations 0.7 0.8 0.0 2.1 Low 12
Crossover 0.7 0.7 0.2 1.4 Low

*Difference between the maximum and minimum combined performance scores within SEGs.

A comment needs to be made though that the identification of reasonable range of settings for each parameter was important in this exercise. Authors acknowledge that the value for the mutation of 0.1 may be a little too high and if this value was not used, the sensitivity of mutation would not be evaluated as high. However, quite a wide range of settings for each parameter was identified purposely so the impact of those settings on the final results could be clearly demonstrated.

#### Parameter interdependency

The performance of a mutation in relation to crossover and vice versa is now examined by exploring charts detailing concrete mutation and crossover values, and crossover types. Figure 14 indicates that an increase in mutation from 0.005 to 0.01 using the crossover of 0.6 improved the solution (arrow 1a in Figure 14); conversely, the solution deteriorated for the same increase in mutation, but crossover of 1.0 (arrow 1b in Figure 14). Vice versa, when crossover increased from 0.6 to 1.0 using mutation of 0.005, the solution improved (arrow 2a in Figure 14); conversely, the solution deteriorated for the same increase in crossover, but mutation of 0.01 (arrow 2b in Figure 14). Figure 15 displays Pareto fronts for varying mutation and crossover values, and crossover types. It demonstrates that the change of crossover type from S1P to SXP for crossover value of 0.6 and mutation value of 0.005 resulted in better results (arrow 3a in Figure 15); and vice versa, change of crossover type from S1P to SXP for crossover value of 1.0 and mutation value of 0.01 resulted in worse results (arrow 3b in Figure 15). The likely conclusion drawn from those observations is that the performance of a particular mutation depends on the crossover value and crossover type, and the performance of a particular crossover value and its type is probably dependent upon the mutation value. In other words, it appears that mutation and crossover are somehow interdependent. If this is the case, they ought to be evaluated as an interdependent pair rather than in isolation. More research is needed in this area. Plotting the values of computed metrics for different combinations of crossover and mutation operators on a surface or contour chart is suggested as one way of visualizing the combined effects of these two parameters.

Figure 14

Pareto fronts for varying mutation and crossover values (POP 100, GEN 2,000).

Figure 14

Pareto fronts for varying mutation and crossover values (POP 100, GEN 2,000).

Figure 15

Pareto fronts for varying mutation and crossover values, and crossover type (POP 100, GEN 2,000).

Figure 15

Pareto fronts for varying mutation and crossover values, and crossover type (POP 100, GEN 2,000).

#### Performance inconsistency

Due to Pareto fronts for different parameter setting combinations being non parallel often intersecting (Figures 1416), it indicates that some parameter setting combinations may perform better at some locations of the search space and worse in others. At Figures 14 and 15, for example, mutation value of 0.005 and crossover value of 0.6 S1P (solid grey dots) perform well towards minimum water quality values, but quite irregularly diverge from minimum pump energy costs values. Another example is in Figure 16, where all population sizes were able to capture (more or less) minimum water quality values, but their level of performance (or accuracy) over the large search space differs greatly. Hence, (certain) parameter setting combinations seem to perform with various degrees of inconsistency across the search space. This inconsistent performance was also demonstrated by the PFtrue (Figure 6), which was not reached by one but several parameter setting combinations. This is thought to reflect the complexity of the search space, and requires further research.

Figure 16

Pareto fronts for varying population size (GEN 2,000, MUT 0.005, CRS 1.0 S1P).

Figure 16

Pareto fronts for varying population size (GEN 2,000, MUT 0.005, CRS 1.0 S1P).

Unlike single-objective approaches, where the calibration process results in the ‘best’ algorithm parameter values to find the optimal solution as demonstrated in Younis & Dandy (2012), the identification of those ‘best’ values in multi-objective approach does not seem as straightforward. On the contrary, it appears that the multi-objective approach requires a range of parameter values (i.e. a range of parameter setting combinations) to reach or identify all optimal solutions within the (true) Pareto front. This may be caused by the fact that the single-objective approach seeks one particular solution, whereas the multi-objective approach searches for an entire front of optimal solutions.

### Limitations

There are three limitations to this work. The first limitation is that, due to the large number of parameter setting combinations being evaluated, only a single run of the NSGA-II was performed for each parameter setting combination. This approach is contrary to the common application of genetic algorithms (GAs) where multiple runs are used because of the stochastic nature of GAs. The reader, therefore, needs to keep in mind that the analyses of the generated results and conclusions are limited to a single run only (i.e. are not based on statistical assessment of multiple runs). The second limitation is that the number of parameter setting combinations to be evaluated needs to be limited (i.e. based on literature values and/or preliminary testing), otherwise time required to run analyses may become unacceptable. The third limitation is that one example network was used only. These limitations need to be addressed in future research.

## CONCLUSION

In this work, performance metrics were applied to comprehensively analyse the sensitivity of NSGA-II parameters and their settings and objective function scaling for multi-objective optimisation problem in WDSs. Algorithm parameters considered included population size, number of generations, crossover and mutation. A two-objective optimisation model involved the competing operational objectives of pump energy costs and water quality, and was applied to a test network adopted from the literature representing a regional non-drinking WDS.

Performance metrics enabled the evaluation of the sensitivity of the NSGA-II parameters on a quantitative and more objective basis than the use of visual comparison only. Moreover, they assisted in determining the sensitivity hierarchy of NSGA-II parameters for the particular optimisation model and network used. Performance metrics were thus demonstrated suitable for sensitivity analysis of NSGA-II parameters for multi-objective optimisation problems in WDSs. Additionally, the methodology for performance and sensitivity analysis may be used by researchers and practitioners, after converting metrics to the adequate n-dimensional objective space, for the calibration of multi-objective algorithm parameters for any WDS optimisation problem. In order to repeat a process, it is summarised in the form of a flowchart in Figure 3.

Adequate scaling of the objective functions to cover similar ranges of numerical values was confirmed important for the optimisation model and network used, because inadequately scaled objective functions caused the NSGA-II to bias towards the water quality objective. However, adequately scaled objective functions did not assure continuity of Pareto fronts. In fact, the discontinuity of Pareto fronts is likely attributed to other factors, for example discrete nature of the problem.

NSGA-II parameters were found sensitive for the optimisation model and the network used. Performance of the algorithm and the quality of the solution was thus dependent on parameter settings. Furthermore, some parameters (i.e. mutation and crossover) appeared interdependent and some parameter setting combinations performing with various degrees of inconsistency across the search space. It appears that unlike a single-objective approach, a multi-objective approach requires a range of parameter values (i.e. a range of parameter setting combinations) to cover the entire search space and to reach or identify all optimal solutions. However, additional future research is required in this area to confirm the results and conclusions of this work by the inclusion of different seed values into performance and sensitivity analysis of algorithm parameters, and using different applications in multi-objective optimisation problems of WDSs.

## ACKNOWLEDGEMENT

This research was supported by the Australian Research Council as Project LP0990908.

## REFERENCES

REFERENCES
Alfonso
L.
Jonoski
A.
Solomatine
D.
2010
.
J. Water Resour. Plann. Manage. ASCE
136
(
1
),
48
58
.
Alvisi
S.
Franchini
M.
2006
.
Civil Eng. Environ. Syst.
23
(
3
),
143
160
.
M.
Tolson
B.
2011
.
J. Hydroinform.
14
(
1
),
192
205
.
Atiquzzaman
M.
Liong
S.-Y.
Yu
X.
2006
.
J. Water Resour. Plann. Manage. ASCE
132
(
2
),
122
126
.
Babayan
A. V.
Savic
D. A.
Walters
G. A.
2007
Multiobjective optimisation of water distribution system design under uncertain demand and pipe roughness
. In:
Topics on System Analysis and Integrated Water Resources Management
(
Castelletti
A.
Soncini-Sessa
R.
, eds).
Elsevier
,
Amsterdam, The Netherlands
, pp.
161
172
.
Baran
B.
von Lucken
C.
Sotelo
A.
2005
.
36
(
1
),
39
47
.
Bhattacharjya
R. K.
2007
. J. Water Resour. Plann. Manage. ASCE
133
(
6
),
565
566
.
CWS
2011
GANetXL – User Manual
.
University of Exeter, Centre for Water Systems (CWS)
,
Exeter
,
UK
.
Deb
K.
Agarwal
S.
Pratap
A.
Meyarivan
T.
2000
A Fast Elitist Non-dominated Sorting Algorithm for Multi-objective Optimization: NSGA-II. Parallel Problem Solving from Nature VI (PPSN-VI)
.
Springer
,
France
, pp.
849
858
.
Deb
K.
Pratap
A.
Agarwal
S.
Meyarivan
T.
2002
.
IEEE Trans. Evol. Comput.
6
(
2
),
182
197
.
Farmani
R.
Savic
D. A.
Walters
G. A.
2005a
.
Eng. Optimiz.
37
(
2
),
167
183
.
Farmani
R.
Walters
G. A.
Savic
D. A.
2005b
.
J. Water Resour. Plann. Manage. ASCE
131
(
3
),
161
171
.
Farmani
R.
Walters
G.
Savic
D.
2006
Evolutionary multi-objective optimization of the design and operation of water distribution network: total cost vs. reliability vs. water quality
.
J. Hydroinform.
8
(
3
),
165
179
.
Fleischer
M.
2003
The Measure of Pareto Optima – Applications to Multi-Objective Metaheuristics. Evolutionary Multi-Criterion Optimization
.
Springer-Verlag
,
Faro
,
Portugal
, pp.
519
533
.
Gibbs
M. S.
Maier
H. R.
Dandy
G. C.
2010
.
J. Water Resour. Plann. Manage. ASCE
136
(
2
),
288
291
.
Gill
P. E.
Murray
W.
Wright
M. H.
1981
Practical Optimization
.
,
London
.
Giustolisi
O.
Berardi
L.
2009
.
J. Water Resour. Plann. Manage. ASCE
135
(
6
),
484
492
.
Goldberg
D. E.
1989
Genetic Algorithms in Search, Optimization and Machine Learning
.
,
MA
.
D.
Reed
P.
2012
.
Evol. Comput.
20
(
3
),
423
452
.
D.
Reed
P.
2013
.
Evol. Comput.
21
(
2
),
231
259
.
Halhal
D.
Walters
G. A.
Savic
D. A.
Ouazar
D.
1999
.
Evol. Comput.
7
(
3
),
311
329
.
Jin
X.
Zhang
J.
Gao
J.-l.
Wu
W.-Y.
2008
.
J. Zhejiang Uni. Sci. A
9
(
3
),
391
400
.
Kanta
L.
Zechman
E.
Brumbelow
K.
2012
.
J. Water Resour. Plann. Manage. ASCE
138
(
2
),
144
152
.
Kapelan
Z. S.
Savic
D. A.
Walters
G. A.
2005
Multiobjective design of water distribution systems under uncertainty
.
Water Resour. Res.
41
(
11
),
W11407
.
Keedwell
E.
Khu
S.-T.
2006
.
Eng. Optimiz.
38
(
3
),
319
333
.
Kelner
V.
Leonard
O.
2003
Optimal pump scheduling for water supply using genetic algorithms
. In:
International Congress on Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems
(
Bugeda
J. A. D. C.
Periaux
J.
Schoenauer
M.
Winter
G.
, eds).
EUROGEN 2003
,
Barcelona
.
Knowles
J.
Corne
D.
2002
On metrics for comparing non-dominated sets
. In:
Proceedings of the 2002 Congress on Evolutionary Computation, CEC'02 (Volume 1)
.
IEEE, Piscataway, NJ
, pp.
711
716
.
Kollat
J. B.
Reed
P. M.
2006
.
29
(
6
),
792
807
.
Kollat
J. B.
Reed
P. M.
Kasprzyk
J. R.
2008
.
31
(
5
),
828
845
.
Kurek
W.
Ostfeld
A.
2013
.
J. Environ. Manage.
115
,
189
197
.
Kurek
W.
Ostfeld
A.
2014
.
J. Water Resour. Plann. Manage. ASCE
140
(
2
),
184
193
.
Lopez-Ibanez
M.
T.
Paechter
B.
2005
Multi-objective optimisation of the pump scheduling problem using SPEA2
. In:
IEEE Congress on Evolutionary Computation
.
IEEE
,
Edinburgh
,
Scotland
, pp.
435
442
.
Lopez-Ibanez
M.
T. D.
Paechter
B.
2008
.
J. Water Resour. Plann. Manage. ASCE
134
(
4
),
337
346
.
Mala-Jetmarova
H.
Bagirov
A.
Barton
A.
2013
Pumping costs and water quality in the battlefield of optimal operation of water distribution networks
. In:
35th IAHR World Congress (IAHR 2013)
.
Chengdu, China
,
8–13 September 2013
.
Tsinghua University Press
,
Beijing
.
Mala-Jetmarova
H.
Barton
A.
Bagirov
A.
2014
.
J. Water Resour. Plann. Manage.
141
(
6
),
04014077
.
Marchi
A.
Dandy
G.
Wilkins
A.
Rohrlach
H.
2014
.
J. Water Resour. Plann. Manage. ASCE
140
(
1
),
22
31
.
Ostfeld
A.
Salomons
E.
2004
.
Eng. Optimiz.
36
(
3
),
337
359
.
Ostfeld
A.
Salomons
E.
Lahav
O.
2011
.
J. Water Resour. Plann. Manage. ASCE
137
(
6
),
531
541
.
T. D.
Park
N.-S.
2004
.
J. Water Resour. Plann. Manage. ASCE
130
(
1
),
73
82
.
D.
Sinske
A.
Van Vuuren
J.
2010
.
J. Water Resour. Plann. Manage. ASCE
136
(
5
),
592
596
.
Reed
P. M.
D.
Herman
J. D.
Kasprzyk
J. R.
Kollat
J. B.
2012
.
51
,
438
456
.
Rossman
L. A.
2000
EPANET 2 Users Manual. EPA/600/R-00/057
.
EPA United States Environmental Protection Agency
,
Cincinnati, Ohio
.
Sakarya
A. B. A.
Mays
L. W.
2003
.
J. Water Resour. Plann. Manage. ASCE
129
(
1
),
82
.
Savic
D. A.
Walters
G. A.
1997
.
J. Water Resour. Plann. Manage. ASCE
123
,
67
.
Savic
D. A.
Walters
G. A.
Schwab
M.
1997
Multiobjective Genetic Algorithms for Pump Scheduling in Water Supply. Evolutionary Computing, AISB
.
International Workshop, Selected Papers
,
Springer
,
Berlin
,
Heidelberg
.
Savic
D. A.
Bicik
J.
Morley
M. S.
2011
.
Environ. Modell. Softw.
26
(
5
),
551
561
.
Sotelo
A.
Baran
B.
2001
Optimizacion de los costos de bombeo en sistemas de suministro de agua mediante un algoritmo evolutivo multiobjetivo combinado (Pumping cost optimization in water supply systems using a multi-objective evolutionary combined algorithm)
. In:
XV Chilean Conference on Hydraulic Engineering
.
University of Concepcion
,
Concepcion
,
Chile
, pp.
337
347
(in Spanish)
.
Srinivas
N.
Deb
K.
1994
.
Evol. Comput.
2
(
3
),
221
248
.
Srinivas
M.
Patnaik
L. M.
1994
.
IEEE Trans. Syst. Man Cybernet
24
(
4
),
656
667
.
Vamvakeridou-Lyroudia
L. S.
Walters
G. A.
Savic
D. A.
2005
.
J. Water Resour. Plann. Manage. ASCE
131
(
6
),
467
476
.
Van Veldhuizen
D. A.
1999
Multiobjective Evolutionary Algorithms: Classifications, Analyses, and New Innovations
.
PhD Thesis
,
AFIT/DS/ENG/99-01, Air Force Institute of Technology, Wright-Patterson Air Force Base
,
Ohio
.
Van Veldhuizen
D. A.
Lamont
G. B.
2000a
.
Evol. Comput.
8
(
2
),
125
147
.
Van Veldhuizen
D. A.
Lamont
G. B.
2000b
On measuring multiobjective evolutionary algorithm performance
. In:
The 2000 Congress on Evolutionary Computation
.
IEEE
,
Piscataway, NJ
, pp.
204
211
.
Walters
G. A.
Halhal
D.
Savic
D.
Ouazar
D.
1999
.
Urban Water
1
(
1
),
23
38
.
Wu
W.
Maier
H. R.
Simpson
A. R.
2010
.
J. Water Resour. Plann. Manage. ASCE
136
(
5
),
555
565
.
Younis
M.
Dandy
G.
2012
A comparative study between three genetic algorithm software applications for optimizing water distribution systems
. In:
Hydrology & Water Resources Symposium 2012 (HWRS 2012)
.
Engineers Australia
,
Sydney, NSW
,
Australia
,
19–22 November 2012
, pp.
1306
1316
.
Zitzler
E.
1999
Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications
.
PhD thesis
,
Swiss Federal Institute of Technology (ETH)
,
Zurich
,
Switzerland
,
TIK-Schriftenreihe Nr. 30, Diss. ETH No. 13398, Shaker Verlag, Germany
.
Zitzler
E.
Thiele
L.
1998
Multiobjective Optimization Using Evolutionary Algorithms – A Comparative Case Study. Parallel Problem Solving from Nature V (PPSN-V)
.
Springer
,
Amsterdam
,
September 1998
, pp.
292
301
.
Zitzler
E.
Thiele
L.
1999
.
Evol. Comput.
3
(
4
),
257
271
.
Zitzler
E.
Deb
K.
Thiele
L.
2000
.
Evol. Comput.
8
(
2
),
173
195
.
Zitzler
E.
Laumanns
M.
Thiele
L.
2001
SPEA2: Improving the Strength Pareto Evolutionary Algorithm
.
TIK Report 103
,
Institut fur Technische Informatik und Kommunikationsnetze (TIK), ETH Zurich
,
Switzerland
.
Zitzler
E.
Thiele
L.
Laumanns
M.
Fonseca
C. M.
da Fonseca
V. G.
2003
.
Evol. Comput.
7
(
2
),
117
132
.