The use of Frontier Analysis to assess the technical rigor of water loss performance indicators

The American Water Works Association (AWWA) has developed and disseminated advanced methods and performance indicators for assessing and reducing water losses in North America, based in large part on the methods and indicators developed by the International Water Association (IWA). However, many utilities and regulators still use the old, inaccurate, %NRW indicator. A robust, quantitative assessment of the technical rigor of water loss indicators was needed but could not be found in the literature. So, an innovative approach was developed, using Frontier Analysis which provided such a score of ‘technical rigor’. This paper presents this method, applied to three datasets from North America, assessing 15 candidate indicators for total water losses, apparent losses and real losses. The results provide quantitative ‘scores’ of the technical rigor of the candidate indicators. Indicators with relatively high scores align with indicators used in the IWA best practices. Other indicators, such as the %NRW indicator, were found to have low technical rigor. The conclusion of the paper summarizes the rigorous indicators, and suggests areas for further application of this method, and for further research.


INTRODUCTION
Water utilities in North America face significant challenges from climate change, population growth, aging infrastructure, increasing non-revenue water, and declining consumption and revenues. The reduction and control of water losses can mitigate these challenges. In order to conduct a successful water loss control program, water utilities need 'technically rigorous' performance indicators to assess and benchmark performance, set targets, plan programs, and monitor and refine their work.
The American Water Works Association (AWWA) has developed and disseminated advanced methods and performance indicators for managing water losses, based in large part on the methods and indicators developed by the International Water Association (IWA). These methods encourage the use of indicators based on unit volumes of real loss and apparent loss. However, many utilities and regulators still use the old, inaccurate, %NRW indicator which has been shown by many authors to be misleading (Liemberger et al. 2007;Lambert et al. 1999Lambert et al. , 2014AWWA 2016;. The use of this flawed indicators leads to inaccurate target setting, poor planning, and inferior results of water loss reduction and control programs.
A robust method was needed to demonstrate the technical rigor of each of the AWWA/IWA indicators, but there had not been any studies to justify the adoption of the more accurate indicators. Therefore, the author developed a method based on Frontier Analysis to create a water loss efficiency score and compare indicators to that score, using simple regression. Indicators which had a high regression fit would be technically rigorous and those with a low regression fit would not be. This method was conducted for 15 different indicators, across three datasets from different parts of North America. The results showed that common percentage-based indicators had low technical rigor, but the unit volume and most AWWA/IWA recommended indicators had high technical rigor. This paper is organized as follows. The section Background to the methodology provides background material including the methods used to collect and analyse water loss data in utilities in North America, the minimal amount of previous work on the assessment of water loss indicators and the general approach used in this paper to obtain an accurate 'technical score' of indicators. The section entitled Datasets used and indicators assessed reviews the attributes of the datasets used. The Methodology section provides a detailed explanation of the methodology and the Results and discussion section provides the results. Finally, the Conclusions section outlines the conclusions of the work, and an Appendix provides detailed statistical results.

BACKGROUND TO THE METHODOLOGY Water loss assessment in North America
In 2010, the AWWA developed a spreadsheet-based tool known as the Free Water Audit Software (FWAS), consistent with the AWWA/IWA methods and indicators as detailed in the AWWA Manual M36 (AWWA 2009(AWWA , 2016. The software prepares a (water loss) water balance and a set of indicators values. The software also provides a 'data validity score' for the audit, based on practices used to determine the input parameters. There is also a standard AWWA validation process where third party reviewers work with utilities to conduct QC efforts to improve audit accuracy (Andrews et al. 2016a(Andrews et al. , 2016b. Since the initial launch with a small number of 'volunteer' utilities, the software has been improved and its use greatly expanded. Currently there are many hundreds of annual FWAS audits, many of which have been required by State regulators. An analysis of basic results of the use of FWAS from 2010 to 2015 was published in 2016 (Sayers et al. 2016).

Previous research on the accuracy of water loss indicators
There have been very few studies which have assessed the accuracy or suitability of different water loss indicators. There is a body of literature on the flaws of the %NRW indicator, including those cited in the Introduction. More detailed assessments were carried out in France in response to new regulations instituted in 2007, which stipulated the use of the %NRW and a Linear Leakage Index (LLI) in m 3 /km/day. Renaud (2009) described a linear regression analysis of the LLI and found it to be highly sensitive to connection density, and proposed a Customer Leakage Index, in m 3 /connection/day as a better indicator of unit real losses.

Principal concept of the methodology used in this paper
The key premise of the methodology used in this paper is as follows. If there was an accurate method to analyze a group of water loss audits to obtain a series of quantitative water loss efficiency scores, then comparison and correlation of various candidate indicators to those water loss efficiency scores would reveal the technical rigor of those candidate indicators. A strong correlation between the efficiency score and the indicator would demonstrate a high technical rigor of the candidate indicator. If the relationship is highly scattered and has a weak correlation, the candidate indicator would have low technical rigor.

Overview of Frontier Analysis and data envelopment analysis for utility efficiency estimation
There are two options for methods to accurately 'analyze a group of water loss audits to obtain a series of quantitative water loss efficiency scores' -Frontier Analysis (FA) and Data Envelopment Analysis (DEA). As outlined in Abbot & Cohen (2009), FA is a parametric approach that provides a single output of efficiency based on a series of input parameters. FA has the advantages of successfully analyzing data with some uncertainty, and handling economies of scale in utilities; but requires a mathematical form to be selected. FA lends itself to the data available from water loss audits, and can be conducted in a spreadsheet which aligns with the capabilities of many utilities and regulators in North America.
The alternative, DEA is a non-parametric linear programming technique, which has the advantages of not requiring any specific formulaic structure and being able to use multiple inputs and provide multiple outputs. It is generally best used in large datasets as it is sensitive to outliers. It also requires careful selection of input and output variables. In practical terms, it requires linear programming skills and software and therefore is more complex to implement by many North American water utilities and regulators, especially for small organizations.
Frontier Analysis has been used in the water utility sector for many years to assess overall utility performance/efficiency. Abbot & Cohen (2009) Murwirapachena et al. (2019). Some of these and other studies have compared the results of FA to those of DEA. Generally, these analyses examine the overall performance/efficiency of water utilities, often measured in unit cost of service, or output (customers served, water produced) based on multiple explanatory variables. It is interesting to note that nearly all these studies use NRW as an input but use the flawed %NRW volume indicator.
Frontier Analysis for water loss performance assessment Use of Frontier Analysis for assessment of NRW began with the WRc report for the UK regulatory agencies (EA & OFWAT 2008). The approach developed a model that best predicted annual real losses using a number of explanatory factors. By looking at actual, observed real losses compared to the predicted real losses, a ratio can be calculated to rank the performance of the individual areas.
Pearson & Trow (2012) conducted a Frontier Analysis to assess comparative real loss performance in 33 district metered zones (DMZs) in a large UK water utility. The effort concluded that Frontier Analysis was very useful in comparing performance in different DMZsto prioritize zones for interventions. Sandraz et al. (2014) describes a study of the determinants of annual real losses, using multivariate regressions with the number of connections, mains length, pressure, pipe diameter, pipe breaks. Comparisons of modeled and observed real loss showed unsatisfactory correlations. The number of connections was the most significant explanatory variable, and, curiously, pressure was essentially not significant at all. Wyatt et al. (2015) conducted a Frontier Analysis on real loss efficiency in 31 utilities in North America, based on standard water loss audit results and additional detailed distribution pipe network data, including mains burst rates, length and average age of existing pipe materials. Initial regression model inputs included those data and length of mains, connection density, and average operating pressure, with a logarithmic model form. Initial analyses showed that parameters such as the percentage of cast iron pipe, and the product of age and % cast iron pipe were found to be statistically insignificant and were dropped from the analysis. The final regression model had a good fit (r 2 ¼ 0.732) and all independent variables had statistical significance above the 95% level.

DATASETS USED AND INDICATORS ASSESSED
This study used three datasets of validated water loss audits from different areas of North America. These datasets were selected because they are all composed of only validated water loss audits, were readily available, and portray different cohorts in terms of climate, utility size, connection density, water consumption, pressure, and financial parameters. None of these datasets are fully representative of conditions across North America but represent the best available data at the time of this study. These datasets were assembled and subjected to further validation during the projects described in  and . Those reports provide extensive discussion of technical parameters and performance indicators in each dataset. The principal attributes of each dataset are summarized in Table 1.  Additional notes on the three datasets are provided below.
1. The WADI Plus dataset, which is an expansion of an earlier AWWA dataset known as WADI, contains weighted averages of multiple annual water loss audits in 66 utilities across USA and Canada. The dataset includes a broad variety of locationsfrom cool wet locations in the North East region to hot dry locations in the Southwestern USA. While the range of utility sizes is wide, WADI Plus contains a relatively higher number of larger Eastern cities, with older infrastructuresuch as Halifax, NS, Philadelphia, PA, Pittsburgh PA, Region of Peel, ON, Washington, DC, and Wilmington, DE. Many sites in WADI Plus have 5 consecutive years of data, which is considered to result in more accurate water loss audits. 2. The 2016 Georgia dataset is the most recent of five annual datasets of utilities across that State, so all locations had several years of water loss audit experience. With the exception of the City of Atlanta and utilities in suburbs around Atlanta, the dataset mostly includes small water supply systems with low connection density, low water consumption, and wider use of non-metallic piping than the other datasets. 3. The 2016 California dataset is a larger dataset (272 audits, filtered down from 365 validated water loss audits). There is considerable variety in the dataset from the cool, wet northern part of the State to the hot, dry, southern part. In addition, the dataset includes a mix of large utilities such as Sacramento, San Diego, San Francisco and Los Angeles, as well as many smaller suburban or rural Water Districts. Importantly, the dataset includes just one year of data, which was the first year that utilities prepared validated water loss audits.

METHODOLOGY Components of the methodology
The main components of the methodology are outlined in Figure 1, showing the sequence of analyses that result in a 'technical score' for indicators for total water losses, apparent losses, and real losses. The components are explained below.
Component 1. The first component is to prepare a multivariate regression model that predicts annual volume of the total water losses, of apparent losses or real losses, based solely on relevant utility attributes or other factors that are mostly out of the direct control of the utility, such as the number of connections, average operating pressure, unit variable production cost of water, etc. Variables related to water loss control practices are not included. Table 2 lists the input and output variables and regression model form usedthe Cobb-Douglas model. That model was used for its simplicity, which aligns well with the numerical results of water loss audits. Future studies could use a translog model form if additional parameters are introduced. Also an error term could be added to the model allowing error to be distinguished from inefficiency from other factors.
Component 2. The second component is to compare the annual observed total, apparent or real losses of each utility to the predicted total, apparent or real losses for that utility. The observed water loss in each utility is plotted versus the predicted water loss, as shown in Figure 2. If a given utility has an observed water loss that is lower than the predicted value, then it is below the 'average' line and is a relatively 'better' performer within the dataset. On the other hand, a utility with a very  high observed volume of losses relative to the predicted volume will be far above the average line and be a relatively poor performer. It is possible to construct a (low) frontier line, which is parallel to the average line, from the observed volumes of water losses of the best performer(s). A relative water loss efficiency index for each utility can be found from the relative distance to the low frontier, which accurately reflects the relative water loss performance. In this paper, relative water loss efficiency index is also called a 'Frontier Score'.
Component 3. The third component is to compare the value of the efficiency index (Frontier Score) to the value of a candidate indicator for each utility, using a log-log plot. Figure 3 provides an example assessment of one real loss indicator. The plot is examined for any bias, large scatter or other graphical signal of a weak relationship. From there, the analysis continues with a simple regression of the Frontier Score against the indicator value, usually using a power function and the computation of the regression fit, r 2 . If the regression fit is high and no large scatter or skew is present, then that indicator can be considered technically rigorous, in that dataset. Figure 3 shows a strong relationship between indicator value and the Frontier Score and high regression fit, r 2 ¼ 0.845.  Component 4. The last component of the methodology is to determine an overall score for technical rigor of the indicator. The three components above will have been carried out separately for each indicator and for each dataset. Two additional criteria can be reviewed, especially when the graphical view and the regression fit (r 2 ) lead to uncertainty on the technical rigor. Those additional quantitative criteria include the following: 1. The correlation coefficient between the Frontier Score and the indicator. A high correlation coefficient indicates a close relationship between the variables. 2. The standard error of the Frontier Score at the mean indicator value. A low standard error, in simple terms, indicates a low 'spread' of data points on the graph.
The use of these additional factors provides a more thorough and quantitative examination of the relationship between the Frontier Score and the indicator. Note that the standard error is influenced by the dataset sample size, so comparison of standard errors across datasets is inaccurate, but comparison between indicators for a given dataset is acceptable. An example of this process with two indicators in the California dataset is shown in Figure 4. The unit volume indicator has higher correlation coefficient and a lower standard error than the unit value indicator. In addition, the regression fit is considerably higher for the case of the unit volume indicator.
In many cases, reviewing the correlation coefficient and standard error will not be needed, especially if the regression fit is very strong. Also, the correlation coefficient and standard error reflect characteristics which are mostly 'captured' in the regression fit. Therefore, Table 3 was prepared as simplified scoring of Technical Rigor, based solely on the regression fit. This simplified scoring facilitates a quick 'grasp' of the rigor of an indicator and easy comparison between indicators within a dataset, and also across datasets for a given indicator.

RESULTS AND DISCUSSION
The analysis resulted in technical rigor scores for each of the 15 water loss indicators in each of the three datasets. This section of the paper provides an example graph of the observed and predicted volumes of total annual water losses, apparent losses and real losses volumes, one for each dataset. Tables are provided here on the regression fits (r 2 ) and technical rigor scores for all indicators in all datasets. The Appendix provides analysis of variance (ANOVA) tables for each FA regression, including coefficients, standard errors and statistical significance of input parameters.  Frontier Analysis of total water losses in California Figure 5 provides a graph of observed and predicted total annual water losses in California. The FA has a very good regression fit (r 2 ) of 0.843, a standard error of 0.431 and an F Statistic close to 240. The spread of observations is quite even, but there is some skew for very large utilities in the State. The span from low frontier to high frontier is moderatefrom about 0.38 to 3.9 times the average. Table 4 provides a summary of results for total unit water losses across the three datasets. The volume-based indicator has a fairly high technical rigor, while the technical rigor of the valuebased indicator is lower and varies considerably across datasets. Frontier Analysis of apparent losses in Georgia Figure 6 presents the plot of observed and predicted apparent losses in Georgiawhich has a good regression fit (r 2 ¼ 0.733), a standard error of 0.72 and an F Statistic of 138. The spread of observations is fairly even, but the span from low frontier to high frontier is quite wide from about 0.2 to 8.7 times the average. The relatively wide spread of Frontier Scores suggests more variability in performance, possible influence of factors not in the regression model, or more error in apparent loss estimation in Georgia. Atlanta could be thought of as an outlier, but Atlanta is well known to be facing many water loss challenges. Table 5 presents a summary of results for apparent loss indicators for the three datasets. Both the volume-based and value-based indicators had noticeable variation across the datasets, presumably due to additional factors not included in the regression model or variability in error. However, the regression fit and technical rigor of apparent loss/(billed authorized consumption þ apparent loss)  is consistently very high. In fact, this indicator has the highest rigor of all those assessed in this study.
Recently, IWA Guidance Documents on Apparent Loss favored this indicator (Lambert et al. 2016).
Frontier Analysis of real losses using the WADI Plus dataset Figure 7 presents the plot of observed and predicted real losses using the WADI Plus datasetwhich had a very good regression fit r 2 of 0.897, a standard error of 0.628 and an F Statistic of 133. The spread of observations is fairly even, with some skew for the larger utilities in older cities. The span from the low frontier to the high frontier is relatively narrow from about 0.23 to 3.4 Table 6 presents the results for real loss indicators in each dataset. The technical rigor of many real loss volume indicators is quite high and also reasonably consistent across dataset. But there is one exceptionthe real loss volume per kilometer per hour which varies greatly between the datasets. The reasons for this variation are not fully clear but anecdotal information suggests that the materials used for mains and for connections are different in the different datasets, changing the principal locations of leakage, which in turn affects the variability of the real loss indicator values. The technical rigor of the real loss value indicator is low because of its mathematical form. It is derived from the product of the real loss unit volume indicator multiplied by the unit variable production cost (VPC) of water. However, the VPC is often out of the control of the utility, given regulations/restrictions on water sources, raw water quality for surface water systems, the pumping  head for groundwater systems, and other factors. Therefore, as a water loss volumetric performance indicator, a high value of the real loss unit value does not necessarily mean a high real loss volume. The real loss unit value can be very useful in assessing the cost benefit of different real loss interventions, but it is not an accurate indicator of utility real loss performance. Table 7 presents the regression fits and technical rigor scores for four percentage-based indicators, including the commonly used %NRW volume indicator. The regression fits are weak and the technical rigor scores are considerably lower than other indicators. This finding reinforces the concerns regarding the well-known flaws of percentage-based NRW indicators.

CONCLUSIONS
Based on the data available and analysed in this study, the following conclusions can be reached: • The combination of Frontier Analysis and regression analysis of Frontier Scores with candidate indicators is considered to be a very suitable method of assessing the technical rigor of indicators.
• This method is, to the author's knowledge the first detailed quantitative assessment of water loss and water loss component indicators.
• The commonly-used %NRW indicator has very low technical rigor, confirming the long-held critique of this indicatordue to the influence that consumption has on this indicator. Other percentage-based indicators also have low technical rigor. Given that there are alternatives with high technical rigor, ongoing use of these percentage-based indicators is considered to be not useful and misleading. The one exception is real loss volume/system input volume which has moderate technical rigor in two of the datasets studied. This indicator could be useful in water resource planning, but more investigation is suggested.
• It is also noteworthy that of the scores of journal articles describing applications of Frontier Analysis and Data Envelopment Analysis only two were found to use the more rigorous water loss indicators such as unit volumes of losses.
• The unit real loss volume and unit apparent loss volume, per connection per day, appear to be the key performance indicators, given their high technical rigor, simplicity, broad understandability, ease of calculation, and usefulness in planning interventions for water loss reduction and control. This conclusion is consistent with indicator recommendations by the IWA.
• The unit real loss value and unit apparent loss value indicators have low technical rigor for representing utility water loss performance but are very useful for financial evaluation of water loss Real Loss Volume/System Input Volume r 2 ¼ 0.616 TR ¼ 3 Real Loss Value/Water System Operating Cost r 2 ¼ 0.314 TR ¼ 1 control activities and investments. These indicators could be more rigorous if data were separated into cohorts based on the type of water resource and scale.
• Several ratio-based indicators were found to have high technical rigor and considered useful, especially for utilities. Those include the Infrastructure Leakage Index (ILI), the Pressure Management Index (PMI), the product of those ratios (ILI*PMI), and the ratio of Apparent Loss/(Billed Authorized Consumption þ Apparent Loss).
• For some of the indicators, technical rigor varies considerably from one dataset to another. On the other hand, some indicators have high (or low) technical rigor in all three datasets. Such variation is to be expected, probably due to variations in data quality and technical parameters. More assessment, and the use of stochastic frontier analysis should help clarify these variations.
• Follow-up research to apply this method with other water loss audit datasets in different locations or regions of North America will help to refine results, but also determine if locational factors favor one indicator or another.
• The use of stochastic frontier analysis (with the introduction of an error term in the model formulation) would likely provide useful additional information on the Frontier Score values, including the magnitude of error, influence of other variables not included in the regression formulation, and perhaps provide information of the effectiveness of different types and extents of water loss control practices. However, a stochastic approach will require more sophisticated analysis software which may inhibit its use by regulators in North America.
• The use of these indicators and follow-up analyses will be enhanced by an emphasis on regular ongoing water loss audits, and on continual improvement of the accuracy of the data inputs, in support of improving water loss audit accuracy.
• The collection of additional utility parameters, along with the water loss audit, will be helpful for ongoing research, especially for more complex regression model forms, and for controlling for various environmental or institutional factors.
The results of the water loss performance indicator assessments are summarized in Table 8, which provides information on the range of technical rigor scores, suitable purposes/uses and principal users of indicators. For guidance on the reliable use of indictors, a preliminary value of a threshold of technical rigor of 3 or larger was assumed. (This threshold corresponds to a regression fit between the indicator and the Frontier Score greater than 0.55). Those cells with two check marks are deemed technically rigorous, because the Technical Rigor Score is 3 of higher in all three datasets. The absence of check marks indicates low, technical rigor. But some of the 'cells' in Table 8 have a single check mark in parentheses, which indicates that less than three of the datasets exceeded the threshold of a technical rigor of 3 or more. More analysis and interpretation of the results from these datasets is warranted, as well as application of this method to other datasets to gain a refined sense of the appropriate threshold.
These indicators form a 'suite' of tools which, when considered together, can provide an improved benchmarking and understanding of the magnitude and characteristics of the water losses at a utility or group of utilities. They also provide a strong, quantitative basis for identifying priority water loss challenges, setting targets, planning and implementing programs, monitoring results, and refining strategies and tactics to achieve appropriate reduction and control of water losses.