Principal component analysis for decision support in integrated water management

A general methodology for holistic sustainability assessment of measures in integrated water management based on principal component analysis (PCA) was developed. Application on data from three cases demonstrated that PCA could be used to rank alternatives, assess differences between groups of alternatives and the main properties responsible for this, and account for the impacts of measures on different dimensions of sustainability. The results demonstrated the general applicability of the method. For all cases a combination of measures/options yielded the most sustainable solution. The absence of a single clearly most optimal solution highlights the need for a transparent and systematic analysis, which can be obtained with the presented methodology.


INTRODUCTION
Aquatic ecosystems worldwide are subjected to pressures from agricultural intensification, pollution from industry and transport, and urban development. Climate change will exacerbate these by changing rainfall patterns and temperature regimes (IPCC ). In addition, these multiple pressures, threatening achievement of the United Nations Sustainable Development Goals (SDGs) (UN ), will be influenced by political and cultural changes. Increasing knowledge of the complex interaction between these multiple pressures has created a call for integrated approaches in water management (UN Water ). In this context, a methodology for holistic evaluation of the sustainability of alternative mitigation and adaptation measures is needed.
Sustainability assessments (SA) should cover the environmental, economic and social dimensions. To include results from different disciplines and manage potential conflicting issues within and between different SDGs or different policy areas, an integrated assessment of measures is required. Different outputs, including priorities of different stakeholders, can be structured in a transparent and objective manner in sustainability assessment frameworks (SAF) that can be used to compare alternatives using selected criteria. In general, a case with n alternatives and However, complex comparison tables with detailed ratings could hinder full understanding of the alternatives. Also, commonly used MCDA methods fail to address correlations between criteria, which may result in a sub-optimal decision. Principal component analysis (PCA) is a potential solution for dealing with high correlation where many correlated variables may be reduced to two or three principal components, allowing for visualisation of the merits and demerits of alternatives in scatter diagrams or bar charts.
In this study one common method for MCDA based on PCA has been applied on data sets from three different studies. The purpose was to develop a general methodology for holistic SA in integrated water management, suitable in a range of cases from assessments at the strategic level to detailed assessment of technological solutions.

Principal component analysis
PCA is a widely used multivariate data analysis method. It is particularly useful for data with collinearity and more variables than samples. In the context discussed here, the criteria in the SAF are the variables and the alternatives to be compared are the samples.
Based on the original variables, PCA calculates a set of new variables that describes as much as possible of the variance in the data. The new 'variables' are named principal components (PCs). The PCs will be ranked according to how much of the original variance they explain: PC1 will explain the most variance, PC2 the second most and so on. Calculation of PCs may be done with several methods. Here, the singular value decomposition method was used and performed with commercially available software, Unscrambler X 10.4 (Camo Analytics). The number of PCs to include in a given case can be based on a criterion for the explained variance. This is cal- To ensure equal contribution from each observation and variable it is normal to standardise the data by subtracting the mean of all observations for each variable, i.e. meancentring, and dividing by the standard deviation of the same variable (Martens & Naes ).
In the SA presented here, the observations are not meancentred but scaled so that the optimum value of each variable is 0. The data in a SAF may also be normalised to a common scale, e.g. 0-10. For the PCA, each variable is in addition standardised by dividing by its standard deviation.
Different weight can be given to each variable by dividing each variable with different user-defined factors, e.g. to include the priorities of decision makers. However, this will not be discussed here.
The contribution of each variable to the score for a given PC and observation can be found by multiplying the loadings for that PC with the variable values for that observation. This gives the contribution of each variable to the score value. The relative contribution of each variable, i.e. the percentage, may also be calculated.
When several PCs are needed, the Euclidian distance may be used, i.e. the square root of the sum of squared scores from all contributing PCs: where d is the Euclidian distance for alternative i, and j is the number of contributing PCs. The relative contribution of each variable to the Euclidian distance can be found using the individual relative contributions for each variable.

SAF for the cases
The data sets used in this study all originate from SAFs that were developed to assess the sustainability of the current situation and alternative water management options for sources. In addition, three combinations of these measures were included giving in total seven strategies (a, b, c, d, The seven strategies were assessed with respect to 11 criteria that described the impacts, included priorities of the decision makers and compared the foreseen situation in

Data set 2/Accra
Thirty-six alternative designs for roof rainwater harvesting (RWH) to meet demands of different size households were analysed. The designs were grouped in three groups: 'Basic', including only collection and storage; 'Intermediate', including also a water distribution system; or 'Advanced', including in addition a water disinfection system. Technical   Figure 1, which is based on the first PC, which accounted for 99.6% of the variation in the data.

Ranking of alternatives
Compared with an 8 × 11 matrix with 88 individual values, Figure 1 gives an improved overview of the alternatives and provides a better basis for making a good decision. The different criteria are integrated in the sustainability score for each strategy. The high (>99%) explained variation indicates that the main differences between the strategies were well accounted for. The fact that this was obtained with only one PC indicates criteria with very high co-variation.
The PCA-based ranking indicated that the combination of measures (a þ b þ c) to reduce water loss and improve energy efficiency in the existing water supply would be more sustainable. This was also reflected in the relative importance of the criteria where energy per m 3 supplied was the criterion with highest importance followed closely by water supplied per capita, leakage percentage, chemical use per m 3 supplied and energy use per capita. All of these will favour solutions with reduced water loss and reduced energy for water transfer from additional sources.

Grouping of alternatives and identification of main properties
The methodology can also be used with complex data sets where more PCs are required, as in Accra, where detailed technical designs for RWH were compared. The results from the PCA of RWH designs with a scatter plot for the scores on the first two PCs are shown in Figure 2. The score plot showed clear differences in sustainability score between the three main groups of designs: Basic, Intermediate and Advanced, and differences within the three groups. These were related to e.g. choice of material in the storage tank, where ferro-cement (FC) was evaluated to be more sustainable than the commonly used plastic tanks (PP). One PC was required to account for 85% of the variation between the designs. An additional 8% was accounted for by the second PC.
With PCA, it is of interest to evaluate the number of PCs required and this can be done by assessing the incremental increase in explained variance for each added PC. If the analysis indicated that one PC is sufficient, the PC1 scores can be used. If two PCs are required, the distance from the origin to a given data point can be calculated as illustrated in the left-hand part of Figure 2. With more significant PCs, this can be generalised by using the Euclidean distance in an n-dimensional space.
In the case from Accra, four PCs explained 99% of the variance. A combined score with four PCs could be computed to rank the designs as described in the previous section. However, the overlapping results with several designs having similar scores indicated that household preferences would be important. The differences in sustainability between the main design groups and the main reasons for this were therefore more relevant questions.
This could be assessed with only two PCs, reducing the complexity and improving the understanding of the data.    be assessed. However, Figure 3 only shows one alternative in comparison with the current situation, and there is a limit to the number of additional alternatives that can be included before the spider diagram becomes unreadable.
In general, one will therefore need to compare many spider diagrams to perform a full analysis even with a limited number of alternatives.
How the different dimensions of sustainability are influenced by alternative measures can also be found from the PCA by assessing the contribution from the criteria in each dimension. Considering that the optimal score is defined as 0, the PCA indicated that alternative F would be most sustainable, but also that the differences between alternatives could be small.
With one PC, the main variation was described (95%).
This shows that the contributions to the sustainability score were largest from the economic, asset and environmental related criteria. With two (data not shown) and three PCs, additional variation (3% and 2%, respectively) was described. The contributions from the social and governance criteria increased, and the contribution to the scores from the social criteria became as important. A detailed evaluation revealed that this was mainly related to two criteria: share of increased water availability to community and acceptability of the strategic alternative, for PC2 and PC3 respectively. Considering that the PCA reflects variance and correlations, the differences between the alternatives were largest as measured by the economic, asset and environmental criteria, and there was considerable correlation between these. However, accounting for differences as measured by social and governance criteria would be required in a more detailed assessment.
To understand the scores on a given PC in terms of the original criteria, the contribution to the score from a criterion can be found using the loadings for the criterion on the  The results demonstrated the general applicability of the method. In all cases, the best solution depended on the local situation and preferences of decision makers. The common absence of a single clearly most optimal solution highlights the need for a transparent and systematic analysis, as obtained with the presented methodology.