## Abstract

Applications of evolutionary algorithms (EAs) to real-world problems are usually hindered due to parameterisation issues and computational efficiency. This paper shows how the combinatorial effects related to the parameterisation issues of EAs can be visualised and extracted by the so-called compass plot. This new plot is inspired by the traditional Chinese compass used for navigation and geomantic detection. We demonstrate the value of the proposed compass plot in two scenarios with application to the optimal design of the Hanoi water distribution system. One is to identify the dominant parameters in the well-known NSGA-II. The other is to seek the efficient combinations of search operators embedded in Borg, which uses an ensemble of search operators by auto-adapting their use at runtime to fit an optimisation problem. As such, the implicit and vital interdependency among parameters and search operators can be intuitively demonstrated and identified. In particular, the compass plot revealed some counter-intuitive relationships among the algorithm parameters that led to a considerable change in performance. The information extracted, in turn, facilitates a deeper understanding of EAs and better practices for real-world cases, which eventually leads to more cost-effective decision-making.

## HIGHLIGHTS

A new tool, called the compass plot, is proposed for visualising the combinatorial effects related to evolutionary algorithms.

The compass plot helps identify key factors and corresponding values for a given problem.

Applications to the optimal design of water distribution systems provide new insights into NSGA-II and Borg.

### Graphical Abstract

## INTRODUCTION

Using evolutionary algorithms (EAs), especially multi-objective evolutionary algorithms (MOEAs), for the optimal design of water distribution systems (WDSs) is a prominent and active field in the domain of hydroinformatics (Mala-Jetmarova *et al.* 2018). The root of the complexity associated with a WDS design problem is mainly due to its highly combinatorial nature; thus, it is impossible to visit the entire search space systematically. There exist varying types of problem formulations, including single-objective, also known as the least-cost design problems (Savic & Walters 1997); multi-objective, i.e., concerning the trade-off among a set of goals usually competing with each other (Halhal *et al.* 1997); and even many-objective, i.e., considering four or more objectives simultaneously (Fu *et al.* 2013). In recent years, MOEAs have received increasing interest from the hydroinformatics community (Kollat & Reed 2006; Reed *et al.* 2013; Sweetapple *et al.* 2014; Mortazavi-Naeini *et al.* 2015; Wang *et al.* 2015; Salazar *et al.* 2016; Liu *et al.* 2017) because they provide a viable solution to complex design problems in a heuristic way and ensure a good approximation to the Pareto-optimal front.

Despite the success of MOEAs widely reported in the literature, the level of understanding of these methods, especially for scholars and practitioners who simply take MOEAs as the search engine, has generally lagged. This is particularly the case when facing the highly sensitive parameterisation issues associated with MOEAs (Mala-Jetmarova *et al.* 2015; Walker & Craven 2018b). An MOEA is a complex methodology, also known as a black-box. Its behaviour is significantly influenced by the settings of parameters, which interact in highly non-linear ways (De Jong 2007; Phan *et al.* 2020). Identification of significant parameters and their efficient values play a key role in successful applications of MOEAs to challenging problems in the real world, which are typically associated with complex constraints and huge search spaces. In such a case, it is beneficial to understand which parameters are dominant factors requiring fine-tuning and in which directions (or scales) those parameters should be adjusted. Note that the fine-tuning process is usually problem-specific, which implies that a group of efficient settings of parameters for one case may be less efficient for another. Consequently, a sophisticated method of analysing parameters' sensitivity is necessary, especially when it is computationally intensive to solve a problem. However, this aspect is beyond the scope of this paper.

Currently, researchers in the hydroinformatics community rely mostly on the trial-and-error approach to analyse the sensitivity of the parameters or simply follow the recommended settings (Wang *et al.* 2019b). One reason for such a gap lies in the lack of effective tools, preferably a visualisation method, for understanding the hidden mechanism of MOEAs responsible for their behaviour and success. That is, how their parameterisation can impact on the success of solving the problem at hand.

Visualisation tools for diagnosing whether or not an evolutionary algorithm works are available in the literature (Hart & Ross 2001). These tools are useful in choosing operators and parameters. In contrast, evidence of relevant work related to the visualisation of MOEAs is limited. Previous studies mainly focus on visualising the population in the objective space (Fu *et al.* 2013) and tracking the dynamic variations of the population by performance indicators (Zheng *et al.* 2016). The former enables a set of non-dominated solutions to be presented to decision-makers, followed by the cost-benefit analysis to determine the final design for implementation. The latter depicts the real-time execution of MOEAs from the convergence and diversity perspectives of the population. Most recently, some attempts to visualise algorithm performance, especially considering the design optimisation of WDSs, have been reported (Johns *et al.* 2018; Walker & Craven 2018a, 2018b), in which the exploration and exploitation aspects of MOEAs can be visually conveyed to non-expert users. Later, an extended and promising visualisation approach has been presented to benchmark different parameterisations for the challenging multi- and many-objective suites of test functions (Walker & Craven 2020).

On the other hand, since around 2007, there is a trend of embedding a multi-operator search strategy in advanced MOEAs, especially in an adaptive manner (Vrugt & Robinson 2007; Moral & Dulikravich 2008; Hadka & Reed 2013; Wang *et al.* 2017). Those algorithms, usually known as hybrid MOEAs, can mitigate the curse of the ‘No Free Lunch Theorem’ (Wolpert & Macready 1997) to some extent by effectively utilising the advantages of diversified search operators. Among hybrid MOEAs, Borg is probably the most famous and successful method, which has been tested on many challenging optimisation problems associated with water resources management (Reed *et al.* 2013). However, it is still unclear why several search operators are used together and how their combinations impact on optimisation performance. In essence, it is another form of parameterisation issue – deciding whether a specific operator is employed as a parameter.

This paper seeks to narrow the knowledge gap between experts of MOEAs and non-expert users by developing a visualisation method that can intuitively present the complex combinatorial effects related to MOEAs. The key question is how a visualisation tool can assist in identifying the dominant factor(s) and appropriate setting of algorithm parameters for a given problem. It will be shown that the compass plot, which stems from the ancient Chinese ingenuity, can help deal with the above question.

## METHODOLOGY

### Conceptual origin

Feng Shui, also known as Chinese geomancy, is a Chinese art that is based on the belief that how things are arranged within a room can affect one's life (e.g., luck and fortune). The creation of the compass plot was mainly inspired by the device widely used in this domain. Figure 1(a) shows the Chinese Feng Shui compass, which is a tool used by geomancers to confirm the right directions and positions in a room. It also serves as a protection appliance to stop bad luck or potential harm. A simpler but similar idea also exists in the 40-Year calendar, as shown in Figure 1(b). This delicate widget can correctly indicate the days, weeks, and months for 40 years. Rotating the brass dial tells the corresponding dates given the year and month.

A Chinese Feng Shui compass is usually composed of Tianchi (the centric compass), an inner disc, and an outer square tray (deliberately omitted from Figure 1(a)). The inner disc can be rotated to align with the centric compass. There are many concentric annuli carved on the inner disc, one annulus is called one layer, and it represents a specific dimension of information. Each layer is divided into a number of equal-sized slices, and each slice is filled with a unique character. There may be varying numbers of slices in different layers. Since the principle of Feng Shui has a close connection with the eight diagrams (eight combinations of three whole or broken lines formerly used in divination in ancient China), a layer is divided into at least eight slices. Due to different types of the Feng Shui compass, the number of layers can be as few as five annuli or as many as 52 annuli. Thus, the number of slices increases rapidly, as new layers are added. In the extreme case, the outmost layer of the Feng Shui compass contains 384 slices. When a geomancer confirms the direction of things in the room being analysed with the Feng Shui compass, he/she can interpret the situation more easily (hierarchically and quantitatively) and provide comprehensive explanations to guide his/her client(s) in furniture placement.

As mentioned above, Feng Shui containing multi-dimensional information can be recorded and presented in a series of concentric annuli. In other words, we can use such an annular object to illustrate any type of phenomena that are influenced by different factors. Below, we will explain the development of the compass plot based on the ideas that originated from the ancient Chinese way of thinking.

### Functional design

Inspired by the Chinese Feng Shui compass, we developed a six-step procedure to generate and utilise the proposed compass plot for analysing the combinatorial effects associated with multiple factors. Without loss of generality, here we refer to the independent variables considered under a specific situation as ‘impact factors,’ and the dependent variables influenced by these factors as ‘outcome performance.’ As shown in Figure 2, the procedure has six steps that are categorised into three stages (shown in rounded rectangles). At the first stage, the analytical approach (e.g., optimisation or simulation) is chosen when a well-structured problem is formulated (Step 1). This involves investigating the effects of different combinations of ‘impact factors’ on the ‘outcome performance.’ Then, the key impact factors are identified, and their candidate values are determined (Step 2). At the second stage, the problem is solved by the analytical approach specified in Step 1, and the corresponding results are obtained for each combination of impact factors (Step 3). At the last stage, in which the compass plot is generated and interpreted, results gained at the previous step are evaluated either based on the raw result values or by using a suitable performance indicator (Step 4). Next, the input for producing the compass plot is prepared, mainly including the specification of colours for ‘impact factors’ and ‘outcome performance’ and the layout (i.e., number and order of annuli) for the compass plot (Step 5). Finally, the compass plot is generated via the pseudo-code detailed below, and the important information is extracted based on the patterns observed in both the radial and annular directions of the compass plot (Step 6).

Function: Generation of the Proposed Compass Plot via Concentric Pie Charts | |

Input: colour data for impact factors and their corresponding outcome performance | |

Output: compass plot for a given problem | |

1 retrieve the key information: the number of combinations and the number of annuli; | |

2 for each annulus (from big to small) do | |

3 create an axis and set its position and size; | |

4 draw the pie chart and divide it into a number of equal-sized slots; | |

5 fill each slot with colours specified in the data for the current annulus; | |

6 adjust the position and size of the next concentric axis; | |

7 save the compass plot to the specified directory; | |

8 return the success or failure message; |

Function: Generation of the Proposed Compass Plot via Concentric Pie Charts | |

Input: colour data for impact factors and their corresponding outcome performance | |

Output: compass plot for a given problem | |

1 retrieve the key information: the number of combinations and the number of annuli; | |

2 for each annulus (from big to small) do | |

3 create an axis and set its position and size; | |

4 draw the pie chart and divide it into a number of equal-sized slots; | |

5 fill each slot with colours specified in the data for the current annulus; | |

6 adjust the position and size of the next concentric axis; | |

7 save the compass plot to the specified directory; | |

8 return the success or failure message; |

### Software implementation

For the sake of rapid prototyping, the implementation of the proposed compass plot (i.e., Steps 4–6) was carried out on the MATLAB^{®} platform. Before generating the compass plot, it is necessary to process the raw data and yield the data for the colour specification. Firstly, the results are ranked, i.e., various combinations of impact factors and the corresponding performance, in the descending order of the outcome performance. Secondly, the colour settings are specified for each annulus and the slices denoting different candidate values. As a general rule-of-thumb, we prefer the vivid colours for inner annuli that indicate different impact factors and the shades of grey colours for the outmost annulus that illustrate the outcome performance (darker grey means better performance). For different candidate values, varying shades of the same colour are used. The colour data for each annulus are stored in an individual cell array, and the colour data for each slice are expressed in the RGB colour values, which is a vector of 1-by-3 with elements encoding the shades of the red, green, and blue colours, respectively. Finally, the proposed compass plot is rendered via a series of concentric annuli (pie charts in MATLAB^{®}) with tailored positions and sizes (see the above pseudo-code for detail).

### Interpretation of the compass plot

The merit of the proposed compass plot lies in the fact that the patterns identified in both the radial and annular directions reveal essential messages. In the radial direction, the first few slices (depending on the scope of the top grey level in the outermost annulus) in the counter-clockwise direction (the default order in a pie chart produced by MATLAB^{®}) demonstrate the combinations of factors that have a great impact on outcome performance. Furthermore, within these slices, the most significant factor(s) and its corresponding value(s) can be visualised if one candidate value appears frequently in a certain annulus, especially if it happens consecutively. Conversely, the insensitive factor(s) and/or its corresponding value(s) are more likely to be scattered around the annulus.

## APPLICATIONS

### Case study

Despite a wide range of benchmark networks available in the literature (Wang *et al.* 2015), we simply chose the Hanoi network (HAN), which is a widely used case study in the research community of WDS for demonstration purposes (Fujiwara & Khang 1990). Figure 3 depicts the layout of the HAN. It consists of 34 pipes, which form three basic loops. Water is supplied to a flat region (elevation equal to zero) comprised of 31 demand nodes from one reservoir with a fixed head of 100 m. The required minimum pressure head at each node is 30 m. There are six commercially available pipe sizes, and their corresponding unit costs are listed in Table 1. As a result, the search space of the HAN problem contains 6^{34} ≈ 2.87 × 10^{26} discrete combinations.

Diameter (mm) | 304.8 | 406.4 | 508.0 | 609.6 | 762.0 | 1,016.0 |

Cost ($/m) | 45.73 | 70.40 | 98.39 | 129.33 | 180.75 | 278.28 |

Diameter (mm) | 304.8 | 406.4 | 508.0 | 609.6 | 762.0 | 1,016.0 |

Cost ($/m) | 45.73 | 70.40 | 98.39 | 129.33 | 180.75 | 278.28 |

### Problem formulation

Two kinds of problem formulations for WDS design purposes were considered in this paper. In the first definition, we converted the least-cost design to the bi-objective optimisation by transforming the constraints of the minimum required heads at nodes into the second objective, so that an MOEA can be applied directly to handle the least-cost design problem. Such a relaxed form of definition may be practical in dealing with real-world problems under a tight budget (Muhammed *et al.* 2017), as a marginal violation of the minimum head requirement can be much cheaper than the strictly feasible solutions. In the second definition, we took both the capital cost and the network reliability into consideration, which is a widely used formulation adopted by many studies (Raad *et al.* 2010; Wang *et al.* 2015; Liu *et al.* 2017). The network resilience index (Prasad & Park 2004), denoted as *I _{n}*, was selected as a surrogate indicator for the network reliability because

*I*has been proven to have a strong correlation with the mechanical and hydraulic reliability of WDSs among a wide range of surrogate reliability measurements (Raad

_{n}*et al.*2010; Liu

*et al.*2017; Wang

*et al.*2019a).

#### Definition 1: a relaxed form of the least-cost design of WDSs

*np*is the total number of pipes in a network;

*nn*is the total number of nodes in a network;

*C*(

*D*) is the unit cost of pipe

_{i}*i*with a diameter size of

*D*;

_{i}*L*is the length of pipe

_{i}*i*;

*H*is the total head at node

_{j}*j*; is the minimum head required at node

*j*;

*H*is the total head deficit across nodes; and

_{d}*A*is the set of available pipe sizes in the local market.

#### Definition 2: the multi-objective design of WDSs

*I*, approaching the trade-off between cost and reliability. The mathematical definition of the bi-objective design is given in Equation (2):where

_{n}*C*is the uniformity of node

_{j}*j*which is defined in Equation (3);

*Q*is the demand at node

_{j}*j*;

*nr*is the number of reservoirs;

*Q*and

_{k}*H*are outflow and head at reservoir

_{k}*k*, respectively;

*npu*is the number of pumps;

*P*is the power supplied by pump

_{m}*m*;

*γ*is the specific weight of water;

*np*is the number of pipes connected to node

_{j}*j*; and

*D*is the diameter of pipe

_{i}^{j}*i*connected to node

*j*.

### Scenarios

Two scenarios were demonstrated in this paper to show how the proposed compass plot can assist in exploring the combinatorial effects associated with the optimal design of WDSs by MOEAs. The first scenario considers how the parameterisation of the well-known NSGA-II (Deb *et al.* 2002) impacts on the quality of least-cost solutions under Definition 1. The second scenario investigates how the composition of search operators within Borg, which is an advanced MOEA (Hadka & Reed 2012, 2013), impacts on the convergence and diversity of the Pareto fronts under Definition 2. Note that the applications can also be extended to other fields in which MOEAs are used as the optimisation tools.

#### Scenario 1: parameterisation of NSGA-II

NSGA-II is arguably the most popular MOEA in the water research community, which can effectively handle a wide range of multi-objective optimisation problems. After a random initialisation process, the standard NSGA-II implements a fast non-dominated sorting approach to ranking solutions primarily based on the Pareto dominance relationship. Solutions with higher rankings survive and are selected to reproduce. For those solutions that are non-dominated to each other (i.e., with the same ranking), a secondary sorting criterion, known as the crowding distance, is used to discriminate solutions further (i.e., solutions that reside in a less crowded area are preferred). During the selection process, a binary tournament selection is carried out to establish the mating pool (i.e., parents). NSGA-II applies the Simulated Binary Crossover (SBX) and the Polynomial Mutation (PM) with designated probabilities to create children from parents (Deb *et al.* 2002). The implicit elitism strategy ensures that the best solutions ever found in the search history are always retained in the population. This allows a new population to be derived from the combination of parents and their offspring via the fast non-dominated sorting approach.

The literature contains many papers applying NSGA-II for WDS design tasks (Wang *et al.* 2013; Basupi & Kapelan 2015; Creaco *et al.* 2016; Zheng *et al.* 2016; Beygi *et al.* 2019). However, most previous studies followed the trial-and-error approach to determine the proper settings of NSGA-II's parameters (Wang *et al.* 2019b). This approach may not be sufficient to reveal the complex relationships among various parameters. Therefore, we selected NSGA-II as the representative MOEA to study the relationship between its parameters and the quality of solutions under Definition 1.

Specifically, we chose five key parameters of NSGA-II, including the population size (*PS*), the probability of SBX (*P _{c}*), the probability of PM (

*P*), and the distribution indices of SBX and PM (denoted as

_{m}*DI*and

_{c}*DI*, respectively). For the sake of simplicity, we picked two candidate values for each parameter, so that there were 32 parameter combinations in total (i.e., 2

_{m}^{5}= 32). Those two values for each parameter were contrasting enough to show the impact of the parameter value on the results. For

*PS*, the minimum and the maximum values were 60 and 300, respectively. The recommended values for

*P*and

_{m}*P*were adopted (Deb

_{c}*et al.*2002), which were 1/34 (i.e., the inverse of the numbers of decision variables) and 0.9, respectively. A much higher

*P*and a much lower

_{m}*P*were also selected, and they were 1/17 (i.e., two times the recommended value for

_{c}*P*) and 0.45 (i.e., half the recommended value for

_{m}*P*), respectively. The boundary values of 1 and 20 were set for both

_{c}*DI*and

_{m}*DI*. A distribution index of 1 is small enough to guarantee the diversity of the offspring, and a distribution index of 20 is large enough to make the offspring focus on the area near their parents.

_{c}Parameterisation of NSGA-II was investigated by solving the HAN problem with the 32 combinations of NSGA-II's key parameters. For each combination of parameters, 100 individual runs with different random seeds for initialisation were carried out. For each run, the number of function evaluations (NFEs) was 600,000, as it ensured a sufficient convergence for the HAN problem.

Although the trade-off between *Cost* and *H _{d}* can be obtained, the quality of solutions obtained by each combination of parameters was evaluated via the frequency of locating the best-known least-cost solution for the HAN problem in 100 runs (denoted as

*Freq*). For example, if a specific parameter combination manages to identify the best-known solution in 50 runs out of 100, the corresponding

*Freq*is 0.5. According to Zheng

*et al.*(2017), the currently best-known least-cost solution for the HAN problem is $6.0811M.

#### Scenario 2: synergistic effect of the multi-operator search strategy in Borg

Borg is based on the steady-state MOEA framework in which only one solution is generated per iteration (Laumanns *et al.* 2002). Besides the population, it stores the best solutions found so far in an external archive. In each iteration of the main loop, Borg generates one child using randomly selected parents. For a recombination operator requiring *k* parents, one parent is chosen uniformly at random from the archive. The remaining *k**−* 1 parents are discovered from the population using tournament selection. The child resulting from the recombination process is then evaluated by the objective functions, then it is considered for acceptance in both the population and the archive.

Borg consists of seven search operators, including SBX, Differential Evolution (DE), Parent-Centric Crossover (PCX), Simplex Crossover (SPX), Unimodal Normal Distribution Crossover (UNDX), Uniform Mutation (UM), and PM (Hadka & Reed 2013). During each iteration, Borg implements the so-called auto-adaptive multi-operator recombination process. In particular, it uses only one group of search operators for offspring reproduction, by either UM only or one crossover operator (including DE) followed by PM. The successful operators, which contribute more solutions to the archive, are rewarded by increasing their probabilities of being selected for recombination. On the other hand, no operators are completely discarded during the execution of optimisation.

Borg also implements several advanced strategies concurrently, including *ɛ*-dominance, *ɛ*-progress (a measure of convergence speed), adaptive population sizing, and randomised restart (Hadka & Reed 2013). It proves to be a flexible and robust hybrid MOEA by covering a large area of high-performing parameterisations (Hadka & Reed 2012; Reed *et al.* 2013). However, previous studies did not focus on how the different combinations of search operators can affect its performance. Therefore, in this scenario, we focused on the synergistic effect of the multi-operator search strategy in Borg under Definition 2.

We considered all the possible combinations of the default seven operators. As mentioned above, Borg uses only one group of search operators for recombination per iteration and PM plays a subordinate role. Besides, UM is essential during the restart process and thus cannot be removed. Consequently, Borg has a total of 32 different combinations, that is using only UM, all default operators, and all the possible combinations of five crossover alike operators (i.e., + + + = 30). For each combination of operators, 50 individual runs were implemented with the same NFEs as used for Scenario 1.

The quality of solutions obtained by each combination of search operators was assessed concerning diversity and convergence of the Pareto fronts. The hypervolume (Zitzler & Thiele 1999) indicator (denoted as *HV*) was used to measure the quality of Pareto fronts in a comprehensive sense. The range of *HV* is between zero and one, and a larger value of *HV* indicates better performance. The median value of *HV* over 50 runs was selected to represent the performance of a particular combination of search operators. Also, the usage of each operator on average over the optimisation history was monitored.

## RESULTS AND DISCUSSION

### Scenario 1: parameterisation of NSGA-II

Figure 4 shows the compass plot that demonstrates the impact of the parameterisation of NSGA-II on the least-cost design of the HAN problem. As mentioned above, there are 32 combinations of the five key parameters; thus, the compass plot has six annuli (five parameters and one *Freq* value) and is divided into 32 slices accordingly. Each inner annulus represents a specific parameter. We use red, yellow, green, blue, and purple to denote *PS*, *DI _{c}*,

*DI*,

_{m}*P*, and

_{c}*P*, respectively. A lighter colour means the smaller candidate value for a specific parameter. The

_{m}*Freq*values, which appear in the outermost annulus, are illustrated in shades of grey, and the values are categorised into five levels, as shown in the legend of Figure 5.

Additionally, the whole compass plot is sorted in the descending order of the *Freq* values counter-clockwise. Each slice contains five coloured slots and one grey slot illustrating a particular combination of parameters and its corresponding performance. For instance, the first slice shows that the combination with *PS* = 300, *DI _{c}* = 1,

*DI*= 20,

_{m}*P*= 0.9, and

_{c}*P*= 1/34 gained the best performance with a

_{m}*Freq*of 68%.

As mentioned in the last subsection of Methodology, a significant parameter has particular patterns in the compass plot. Specifically, a colour patch (slice) repeating in an annulus aligned with the scope of the high-performing grey levels (problem-specific), preferably in a consecutive manner, represents an efficient candidate value selected for the corresponding parameter. This pattern indicates that the parameter with the selected value can ensure superior performance no matter how the other parameters are set. On the contrary, a secondary parameter tends to disperse, encircling its annulus.

*PS* appears to have a dominant impact on the performance of NSGA-II. Larger values of *PS* can significantly improve the chance of identifying the best-known solution for the HAN problem in a single run. Within the top half combinations (16 in total) that managed to identify the best-known solution at a *Freq* higher than 50%, there exist 13 combinations with a *PS* of 300. All 16 combinations with a *PS* of 300 ensure a *Freq* higher than 40%. In contrast, using a *PS* of 60 is more likely to result in a *Freq* of less than 40%.

*DI _{m}* and

*DI*turn out to be more critical parameters compared with

_{c}*P*and

_{c}*P*. This is probably surprising and inspiring as most previous studies using NSGA-II did not fine-tune

_{m}*DI*and

_{m}*DI*(Wang

_{c}*et al.*2019b). When both parameters were set to 1, all the related combinations (eight in total) found the best-known solution at a

*Freq*higher than 40%. Seven combinations out of eight were able to achieve a

*Freq*higher than 50%, and three combinations had a

*Freq*higher than 60%. In particular, for the best combination with

*DI*and

_{m}*DI*set to 1 (i.e., the second slice counter-clockwise),

_{c}*P*and

_{c}*P*were equal to 0.45 and 1/17, which are far from the recommended values in the literature (Deb

_{m}*et al.*2002).

### Scenario 2: synergistic effect of the multi-operator search strategy in Borg

Figure 5 shows the compass plot that demonstrates the impact of the auto-adaptive multi-operator strategy in Borg on the multi-objective design of the HAN problem. We considered 32 combinations of the six groups of search operators (UM alone and five crossover alike operators followed by PM); thus, the compass plot has seven annuli and is divided into 32 slices accordingly. Each inner annulus represents a group of operators (see the legend in Figure 5). We use red, orange, yellow, green, blue, and purple to denote UM, UNDX + PM, SPX + PM, PCX + PM, DE + PM, and SBX + PM, respectively. A slot shown in colour means a specific group of operators was used; otherwise, it is shown as blank. As mentioned above, UM plays an essential role in Borg; thus, it was included in all combinations of operators (i.e., the complete red annulus). The *HV* values, which appear in the outermost annulus, are illustrated in shades of grey, and the colour values are scaled to show the differences more clearly. Similarly to Figure 4, the whole compass plot is sorted in the descending order of the *HV* values counter-clockwise. Each slice containing six coloured/blank slots and one grey slot illustrates a particular combination of search operators and its corresponding performance. For instance, the first slice shows the combination of DE + PM, UNDX + PM, and UM gained the best performance with an *HV* of 0.89.

Although the variations of *HV* values are not significant for the HAN problem, the best performance was not gained by Borg with the default combination of all seven operators. Quite the opposite: this combination ranks 21st among the 32 combinations. This observation is somewhat counter-intuitive as users who are non-experts in MOEAs are inclined to follow the default settings with application to problems they are interested in. Note that the differences in terms of *HV* can be increased when larger design problems are considered. This implies that using the default combination of operators in Borg can be suboptimal or even misleading. Figure 5 also reveals that UM, UNDX + PM, and SPX + PM have a more profound impact on Borg's performance compared with the other groups of operators.

On the other hand, it is necessary to investigate the overall usage of each group of operators over 50 independent runs because Borg applies only one group per iteration for the recombination process. It is interesting to know whether a frequently used operator is an essential ingredient in Borg. As given in Supplementary Table S1, SBX + PM and SPX + PM were frequently used, with the overall usage across 50 runs and all possible combinations equal to 54 and 40%, respectively. In contrast, UM and UNDX + PM were less used (both around 16%). This observation generally corresponds to the situation when the default combination of operators in Borg was employed (21st in the *HV* ranking). This reveals that SBX + PM and SPX + PM were able to contribute more solutions to the archive compared with UM and UNDX + PM.

However, the best performance was achieved by the combination of UM, UNDX + PM, and DE + PM. Even in this combination, DE + PM was superior to its counterparts. It seems to be paradoxical that the combination relying only on UM and UNDX + PM ranks 9th among the 32 combinations, but the combination of UM, SBX + PM, and SPX + PM ranks only 27th. Moreover, in the top half of combinations, only four cases rely on the group of SBX + PM, but nine cases rely on the groups of UM and UNDX + PM. The other groups of operators consistently dominated the usage of UM and/or UNDX + PM. In other words, strong groups of operators in Borg are not as essential as expected, while the weak groups of operators are indispensable.

The phenomenon that operators with a minor contribution to the archive have a significant impact on the performance of Borg can be attributed to the mechanism of the auto-adaptive multi-operator strategy. Because no operators are lost during the execution of Borg, the contribution from the weak groups, probably at the early stage of optimisation, can still boost the search by other operators. The weak groups of operators, which are employed occasionally afterwards, can also take advantage of the contributions by other groups. Additionally, the paradox mentioned above emphasises that the tailored selection of search operators for a given type of problem (e.g., design of WDSs) can be beneficial as proved in Wang *et al.* (2017).

## CONCLUSIONS

This paper proposes an innovative visualisation tool for investigation of the challenging combinatorial effects commonly found in many fields. The tool is called the compass plot and inspired by the traditional Chinese Feng Shui compass. We demonstrate how the compass plot can be used to extract the key messages effectively by identifying the patterns in both the radial and annular directions. This new approach can help identify the impact factors and the corresponding values that have a more significant impact on a given problem in an intuitive way. Two scenarios related to the applications of MOEAs to the optimal design of the HAN are considered, including the parameterisation of NSGA-II and the auto-adaptive multi-operator strategy in Borg.

The compass tool applied to the least-cost design of the Hanoi problem reveals that *PS* is the most critical parameter of NSGA-II, followed by *DI _{m}* and

*DI*. In contrast,

_{c}*P*and

_{c}*P*, which are generally regarded as key parameters by most previous studies, play a secondary role. More interestingly, when applied to investigate the synergistic effect of the auto-adaptive multi-operator strategy in Borg, this tool identifies the paradox hidden inside the advanced Borg framework. That is, weak operators with a minor contribution to the archive have a significant impact on the performance of Borg. In contrast, strong operators are not as essential as expected. As a result, for a specific type of problem such as the multi-objective design of WDSs, using the default combination of operators can be suboptimal or even misleading. The key messages identified by the proposed compass plot contribute new knowledge to the research community; thus, it can guide users who rely on these MOEAs for coping with more challenging problems in the real world.

_{m}Note that the compass plot is not restricted to EAs that are closely connected with optimisation. It can also be useful to demonstrate the hidden and complex interactions among various parameters associated with the calibration of simulation models where a set of best calibration parameters is needed. Despite the value of the compass plot, it is worth noting that users can still suffer the ‘dimensional explosion,’ in which many impact factors and their candidate values (e.g., more than two) are involved. It can lead to computational overheads that are too high to bear or far more complex patterns in the compass plot to interpret. In such a case, users have to resort to alternatives for eliminating less important factors and limiting the number of candidate values. The compass plot is envisaged, however, to benefit the hydroinformatics community in deciding the effective values for key parameters for a given type of problem. An online version of such a tool, which provides real-time feedback on the efficiency of parameterisation, may also contribute to the dynamic or interactive parameter settings to further boost the power of MOEAs.

## DATA AVAILABILITY STATEMENT

All relevant data are available from an online repository or repositories (http://dx.doi.org/10.17632/cgh6bpzjfs.1).