Non-Revenue water ratio prediction with serial triple diagram model

ES ̧ , 0000-0003-3696-9967 ABSTRACT Water administrations attempt to control the Non-Revenue Water Ratio (NRWR) values in sustainable and well-performing water distribution infrastructures. In this respect, the NRWR value prediction through appropriate models over a small number of con-trollable variables is signi ﬁ cant. The collection, monitoring, and predictions of data on variables that are used in the NRWR calculations are not practical and required signi ﬁ cant time besides ﬁ nancial resources. In this study, the NRWR predictions have been made through the suggested method over three parameters. The model prediction accuracies, in the literature obtained by using the Triple Diagram Model (TDM) over two parameters, have been increased through the Serial Triple Diagram Model (STDM) suggested in this study. This method shows that better predictions are possible in the NRWR modeling. Thanks to the model applications developed in this study, water administrations can make predictions with the least error (less than 10% relative error) and certain variables, according to the characteristics of network, water management HIGHLIGHTS (cid:129) The variables and combinations which have an impact on the prediction of NRW ratios have been researched by using the Serial Triple Diagram Models. (cid:129) Kriging method has been used in this study for NRW ratio prediction. (cid:129) This model product is the best in the literature


INTRODUCTION
The International Water Association (IWA) makes important contributions to the non-revenue water (NRW) issue through numerous conferences, workshops, and publications every year. Despite all these efforts, the water loss rate is high and especially the awareness level of the water utilities in most developing countries has not reached the expected level yet. It is, therefore, required to provide much support on NRW studies. In order to meet increasing water demand around the world, the water utilities should focus on after the present and future climate change impacts, NRW calculation, prediction, monitoring, and reducing issues. Therefore, the water infrastructure is very important for continuous water supply and quality to the consumers (Shuang et al. 2017). Analysis of water distribution systems infrastructure components shows that they are composed of treatment units, transmission and distributions lines, tanks, pumping stations, valves, drains and air release valve. Any failure in the distribution components can cause significant water losses (WL). The production and operating costs increase due to these losses. Such problems have different consequences from the service disruption to the water quality deterioration. For this reason, the studies on WL in water distribution systems (WDSs) are quite important (Kanakoudis & Tsitsifli 2010;Kanakoudis et al. 2015;Kanakoudis et al. 2016).
The distribution systems losses can be separated into components as the real and apparent losses (Alegre et al. 2016). The water loss quantity and the unbilled authorized consumption contribute to the NRW loss (Kingdom et al. 2006;Kanakoudis & Muhammetoglu 2014;Alegre et al. 2016;Boztaşet al. 2019;Sisman & Kizilöz 2020a, 2020b. The NRW amount is predicted as about 126 billion m 3 around the world and this amount corresponds to 39 billion dollars (Liemberger & Wyatt 2019). The water authorities must keep the NRW at low levels (under 10%) as in developed countries to continue their investments and to operate properly water distribution infrastructures. In case of insufficient feasibility, design and project application studies, the NRW amount reaches to high levels in a short time and causes significant budget waste. Recently, Turkey country has paid more attention to the legal regulations and practices including water losses reduction in WDSs in accordance with the above-mentioned issues. In this context, the regulation on control of water losses in water supply and distribution systems (Sisman & Kizilöz 2020a, 2020b, most recently revised and published in the Official Gazette dated on 8 May 2014 has entered into force. According to the amendment in the existing regulation, the metropolitans and provincial municipalities are responsible for reducing water losses at the level of 30% until 2023, 25% in 2028; as for the other municipalities; this level should be at most 35% until 202335% until , 30% until 202835% until and 25% until 203335% until (RTMAF 2019. In parallel with these developments, the NRW applications have gathered speed in Turkey. In addition, in water resources based on the climate change impacts, the water quantity reduction has negative effects especially in the last 20 years in the country geography, which has brought the NRW studies into the forefront. It has been tried to improve both the current methodologies with the support of the new approaches. Güngör-Demirci et al. (2018) have used the panel regression analysis method to identify the effective components on NRW. Tabesh et al. (2018) have considered the fuzzy logic risk evaluation and Bayesian network methods on three principal components of WDS including apparent losses, real losses and non-revenue water authorized consumptions. The NRW amount prediction has been made through alternative Artificial Neural Networks (ANNs) models by using physical and operational variables such as water supply quantity per demand junction, pipe ratio deterioration, demand energy ratio, average pipe diameter, pipe length per demand junction and number of leaks (Jang & Choi 2017. The network performances have been analyzed by a new risk approach in Kocaeli (Kızılöz & Sisman 2021). However, the first model prediction results are also obtained through the TDM and Kriging methods for NRW rate (Sisman & Kizilöz 2020a, 2020b. In the literature, there are various studies in which TDM and Kriging methods are successfully applied (Sen 2009). The contour diagrams are obtained through the Kriging method, where the relationship between the two independent variables and the single dependent variable has been predicted through the most appropriate interpolation approach (Özger & Sen 2007). The method flexibility is high in this relationship to generate the accurate model for the data scatter point. Despite of the intensively scattered points, the TDM charts help to understand variable behaviors and to interpret them easily.
In this paper, the NRW ratio (NRWR) prediction is achieved through the Serial Triple Diagram Modeling (STDM) method for the first time. In this approach, firstly, the two independent variables, which are effective on NRWR are taken into consideration and then the initial TDM with two inputs is used for NRWR prediction. Thereafter, the second and last TDM is considered by the TDM errors with an additional variable. This sequential model structure for NRWR prediction is called as STDM.

STUDY AREA AND DATA DESCRIPTION
The suggested model application is carried out in Kocaeli, one of the biggest industrial cities of Turkey, in accordance with the data obtained from the water distribution infrastructure (Figure 1).
A total of 163,627,918 m 3 water was supplied for the consumers in Kocaeli City in 2018 (ISU 2018), the total water loss quantity was 50,422,227 m 3 and the NRW was 52,865,847 m 3 . The water loss rate has been reduced respectively between 2014 and 2019 is sequence in 39.3%, 37%, 35.6%, 33%, 31%, and 30.2%. The investments after the year of 2015 have a significant role in these reductions. Dividing a WDS into district metered areas (DMAs) is important step and task that usually troubles water utility managers towards pressure management for reducing leakages Creaco and Haidar 2019). Certain DMAs and pressure metered areas (PMAs) were installed in WDSs by taking hydraulic model results as a reference, especially starting from the districts with high water supply and water losses, in order to reduce water losses within the framework of the investments in question. The studies on reducing leakages at ideally operating pressure were carried out by controlling the network pressures (Creaco et al. 2019; Güngör et al. 2019). At the end of 2020, the total network length was as 10,656 km in Kocaeli and the isolated network length was 1,488 km. In order to reduce the water losses to 25% in 2023, DMA and PMA isolated region studies are continuing. In addition, the water meters expired their economic life or about to expire and therefore they are replaced with the smart ones to reduce apparent losses in NRW components. As a result of ongoing studies, total water loss of Kocaeli was measured as 27.95% at the end of 2020, and the NRW ratio was 29.86%. In the last five years, the studies have focused mainly on reducing leakages in fittings, building service connections and pressure management system applications in water distribution networks.
Each district in Kocaeli (12 districts) has been planned as a separate principal DMA region. In these regions, the system input volume (SIV), network length (NL), number of failure (NF), pipe diameters, pipe ages, number of water meters and NRWR amounts are followed in the most detailed way. The significant numerical characteristics of Kocaeli water distribution system can be seen in the Table 1 for the year of 2018.
The NRWR model inputs are determined through certain variables that are effective on NRWR and these model inputs are studied by Sisman & Kızılöz (2020a, 2020b, considering network pressure, network age, network length, water meter, pipe diameter, failure ratio, system input volume, number of junctions and service connection length. In addition, the statistical information on all parameters for the model is presented in Table 2.

METHODOLOGY
The practical benefit of this method is that it treats multiparameter variables to reach the better solution, which cannot easily obtainable by a set mathematical formulation such as multiple regression, which provides a smoothened surface with overall error minimization. In the sole mathematical models, there are a set of restrictive assumptions. However, in our method, there is not such restrictive assumption.

Uncorrected Proof
On the other hand, the monitoring of 170 Performance indicators (Pls) that are suggested by the IWA is carried out using the IWA Water Balance tables over 232 variables (Kanakoudis et al. 2013). Kanakoudis et al. (2013) have explained in their studies the formula of 75 IWA Pls and the 98 IWA variables required for calculations such as system input volume (SIV) (m 3 ), water losses (m 3 ), real losses (m 3 ), apparent losses (m 3 ), non-revenue water (NRW) (m 3 ), main (network) lengths (NL-km), service connections, average service connection length (SCL-m), meter replacement (WM), and average operating pressure (kPa). They have also indicated that the 170 Pls suggested by the IWA do not include all issues that water administrations encounter and there are various parameters in water losses that cannot be explained by the existing IWA Pls. In the international literature, it is clearly stated that various factors and parameters have an effect upon pipe failures and leakages; for example, pipe characteristics (diameter, age, and etc.), environmental/climatic conditions (soil temperature, traffic load, etc.), and operational and maintenance factors (water quality, maintenance characteristics, etc.). Jang & Choi (2018) has developed the NRW ratio prediction models through the multiple regression analysis (MRA) and artificial neural network (ANN) methods using operational parameters (demand energy ratio and no of leaks) and physical parameters (average pipe diameter, pipe length/demand junction, water supply quantity/demand junction, and deteriorated pipe ratio). Sisman & Kızılöz (2020a, 2020b have identified the parameters affecting the NRW ratio and have predicted this ratio by using the ANN and Kriging methods in their first study. The model inputs selected in this study are the same as the independent variables suggested by the IWA, EPA (2006), Kanakoudis et al. (2013), Alegre et al. (2016), Jang & Choi (2018), and Sisman & Kızılöz (2020a, 2020b. As for modeling, while the change of two variables with each other can be seen easily in the cartesian coordinate system, it can be seen only through the contour plots if three variables are available (Sırdaş& Sȩn 2003). The X and Y axes indicate the coordinate in classical geostatistics applications. Each axis is defined in terms of the variables used in the study. The NRWR values are predicted through the selected variables. The Surfer program is employed for the preparation of TDM prediction charts. In this study, the TDM charts are established by means of the point Kriging interpolation method. The methods used for model charts establishment through the Kriging method are explained in a detailed manner in the studies carried out by Matheron (1963) and Journal & Huijbrets (1978).
It is important for problem solution to prepare the variograms revealing the structural forms and to generate the Kriging model for prediction problem. One of the most important geostatistical methods is the Kriging for the prediction based on stochastic regional change (Matheron 1963;Krige 1966). The geostatistical method provides information concerning the quantity of the examined or investigated value against the possible variable values defined in XY coordinate system for prediction. The geostatistical method benefits from variograms explaining the change based on independent variables to use the current information in the best way. According to this method, the current data distances in the XY coordinate system and the changeable weighted means are taken as an average to predict the dependent variable value against the certain values of independent variables. The prediction model is given below in a simple way:  (1), n means the number of measuring points and 1 is the error term in accordance with the model chart.
According to the cartesian coordinate system, if the random variables, which are close to each other and positioned x and x þ h have Z(x) and Z(x þ h) values to indicate h distance, this h distance vector means actually the geometric distance between the two input variables. NRW counter maps can be drawn by using Kriging method with two input variables. The variance of Z(x) can be defined as the quadratic mean of m(x) (mean value) deviations.
where, E is the expected value (mean) operator. The variance of Z(x þ h) can be explained similarly.
On the other hand, the covariance indicates the change between Z(x) and Z(x þ h), which is explained as follows: The correlation structure of the semi-variogram is expressed as follows (Matheron 1963).
The relationship between g(h) and C(h) is defined as, where s 2 is the constant variance in this equation. Before prediction, the semi-variogram (SV) is obtained from the current data. For this purpose, the data values matched according to their positions should be divided into groups by taking the distance between them into consideration. The mathematical statement of experimental semi-variogram is given for each group distance as, In this equation, g(h), is the semi-variogram for h distance; h is the mean distance (lag-size) between the remaining data at intervals grouped by distance; N(h) is the total data matching number remaining in these intervals; Z(x i ) is the value in x i point; and finally, Z(x i þ h) gives the measurement value in x i þ h point.
This semi-variogram function should be calculated in each direction and for each pair of measurements. Then, it is defined through the functions such as linear, gaussian, exponential, and spherical. The mathematical model selected for the experimental semi-variogram provides the required input parameters (weight coefficients) for Kriging interpolation. Furthermore, the Kriging interpolation technique can be applied to calculate the unknown point values. The expected value of Z variable (at x 0 position) can be predicted as the weighted average of measured values [Z(x 1 ), Z(x 2 ), Z(x 3 ), . . . . . . ::, Z(x n )]. The basic relation of Kriging method is available below in the most general sense: where Z v (x 0 ) is the appraisal value of Z in x 0 ; u i is the weight values corresponding to each Z(x i ), and n is the point number remaining inside of the influence area for prediction. In this modeling, the counter maps have been formed by using the Kriging interpolation technique and the intermediate values have been estimated.

Serial triple diagram model (STDM)
In this study, STDM, a structure developed for the first time in this research is used in order to predict the NRWR. The Serial Triple Diagram Model (STDM) has a scientific philosophical basis, although there may not be many mathematical equations. There is no flaw in the methodology because any model will need numerical data for application, whether conceptual, logical, statistical, or mathematical. In the literature, there are plenty of models that cannot be applied in the absence of measurements. The STDM procedure provides every detail in spatial mapping better than regression models, which provide smoothening effect through a crisp mathematical model. The first NRWR prediction is made by the TDM charts for the two selected independent variables. In this initial TDM chart, a model chart with two inputs and one output is established by considering the correlation between the selected variables. Thus, the initially established model can be evaluated according to the certain performance criteria. If this model structure cannot predict the dependent variable sufficiently, a new TDM model chart is established to improve the performance by reducing the errors. On the other hand, the error terms (difference between the prediction and the calculated value) after the initial model and another independent variable effectively in NRWR change is taken into consideration for the next prediction charts, the NRWR predictions is repeated in this way by the new TDM charts. A model structure with four inputs in total including three different independent input variables and error terms in accordance with the model result in the initial stage is established for the NRWR variable model prediction. In this way, it is possible to continue with sequential models until the desired prediction accuracy is achieved. This structure is called as Serial Triple Diagram Model for the prediction of Non-Revenue Water Ratio (NRWR) in Figure 2.
The equations used for the STDM and the sample model charts are given below:

Evaluation of STDM model performance
The performances of models in this research are evaluated according to the mean square error (MSE) and coefficient of determination, R 2 parameters.
In these equations, M i is the measured data, P i is the estimation data and n is the data number. M and P are the averages of measured and estimation data.

RESULTS AND DISCUSSION
When the suggested model prediction charts are established, the independent variables is identified on x and y axes and the dependent variable is appears on the z axis. The NRWR prediction charts are constructed through the Surfer software by using linear semi variograms.
The NRWR variable value prediction of Kocaeli City is made by Sisman & Kizilöz (2020a, 2020b through the TDM prediction charts for the first time with performance results given in Table 3. When the performance indicators are analyzed, it is observed that the R 2 values vary between 0.74 and 0.95 and the MSE values between 0.0004 and 0.0021. In this study, for better model performances, the next TDM chart is obtained by considering the model errors of the initial model setup and of a third variable that is effective in NRWR prediction with higher performance. Thus, the TDM structure with two inputs and one output is converted by the proposed approach into a new model with four inputs in total coupled with the error terms in the initial model. The model performance predictions from the TDM and STDM models are given in Table 3. The analysis of the model performances shows that the STDM predictions are better than the TDM among the ten serial models.
The R 2 performance criteria which is the minimum as 0.74 for SCL/MPN-MAP input parameters in TDM structure is 0.94 in terms of SCL/MPN-MAP and initial TDM errors-MDP inputs in STDM structure. The STDM has significant improvements rather than the TDM model predictions in each of the ten combinations taken into consideration.

Uncorrected Proof
The TDM and STDM model prediction charts are given in Figure 3. SCL/MPN-MAP input combination model prediction results for the TDM against the calculated NRWR variable values are exposed in Figure 3(b). The 1:1 straight line (45°) in red color represents the comparison of calculations and model predictions. Comparison of Figure 3(b) and 3(d) according to the reference line shows clearly that the STDM predictions in Figure 3(d) are better than the TDM model predictions in Figure 3(b). According to the chart in Figure 3(b), the TDM model prediction results are generally below or above the calculations. On the other hand, the predictions overlap mainly with the STDM chart in Figure 3(d). In addition, it is possible to evaluate the prediction charts given in Figure 3(a) and 3(c) for the models numbered as ten. It is understood from Figure 4(a) model chart that the NRWR increases when the service connection length per unit average pressure and the network age increase. The model prediction performances are improved by obtaining the NRWR behavior and change for the current sample application according to the model chart depending on the errors and the average pipe diameter as in Figure 4(c).
The second worst model performance is the NL/MPN-MPD combination for the TDM and its performance scatter plot is in Figure 4(b). According to the model chart, it is observed that the NWRW increases in general when the network length increases per unit average operating pressure. On the other hand, there seems to be a much more complex relationship on the average pipe diameter and the NRWR change (Figure 4(a)). Figure 4(b) indicates that it is not possible to explain the NRWR behavior sufficiently by referring these parameters only. The points above the red line in Figure 4(b) shows that the model predictions are high and the points below this line shows that the predictions are low. In this study, the next prediction model chart in Figure 4(c) is established by Uncorrected Proof using the error value of the initial prediction model and the MAP variable. This chart reflects that the NRWR increases depending on the age increase. After the establishment of the next prediction model chart, the performance has increased and the R 2 value has risen up to 0.99 (Figure 4(d)). According to the measurement results the NRWR and STDM model predictions fall mostly on the 45°slope line. The STDM structure helps to the NRWR prediction with three variables with a high accuracy.
The best TDM chart, according to the initial model results for NRWR prediction are available on Figure 5(a) (Sisman & Kizilöz 2020a, 2020b. The relevant chart yields that the NRWR ratios fluctuate by the average pressure change in network. The NRWR change from the system input volume per network junction at average 44-64 bar network pressures can be interpreted through the related figure. If the average pressure value is at 50-54 mSS level, in general, for network operation, it is possible to control the NRWR below 30%. The model performance in Figure 5(b) shows clearly that the prediction performance of the TDM is quite good. This performance improves the predictions slightly and the R 2 value of model performance criterion increases from 0.95 to 0.97 when the application is performed for the same NRWR output data set with the STDM structure as suggested in this study. Comparison of the STDM chart given in Figure 5(b) with the TDM chart in Figure 5(b) indicates the possibility to evaluate the differences between the model results. The STDM predictions are much more successful in low NRWR values. On the other hand, higher prediction performance in high NRWR values is achieved by the initial TDM chart in Figure 6(a) for the model application and data combination.
The increase tendency in NRWR value depending on age increase is clearly seen in Figure 6(a), by the initial prediction chart regarding the other combination. There is an age increase in the model, and therefore a little Uncorrected Proof decrease is observed in NRWR in some small areas. This situation indicates that the network is in good condition despite of network aging or the network operating efficiency (pressure management system applications) is sufficient. The performance chart regarding the model efficiency is given in Figure 6(b). According to the model predictions and the 1:1 scattering line of the calculated NRWR values, the calculations and the model predictions are generally compatible, and the R 2 value performance criterion is as 0.94. The next model prediction chart in Figure 6(c) is established through the STDM approach by using the errors of the initial model predictions and the average pressure variable in the network. The model performance is improved visibly in the final model chart, and the performance value for R 2 has increased from 0.94 to 0.99. In addition, the compatibility between the calculations and the model predictions can be seen from the scattering plot in Figure 6(d).
When the STDM structure suggested among the ten model applications, the predictions are better than the TDM results. Additionally, the previous model studies and the highest performances in this model study are presented in summary in Table 4.
Finally, the comparison of the best prediction results regarding the TDM and STDM by Sisman & Kizilöz (2020a, 2020b is given in Figure 7. The significant improvement in model performance can be seen clearly through the scattering by red points when model predictions are interpreted in accordance with 1:1 straight line.

CONCLUSIONS
In this study, the geostatistical approach is used in order to model the prediction uncertainties and to reveal the correlation structure between the selected dependent variable and the independent variables. The non-linear Water Supply Vol 00 No 0, 10 Uncorrected Proof regression between the input and output data is expressed geometrically through the charts. In this study, the NRWR variable is predicted by using the parameters suggested by IWA identifying the WDS. The models with two inputs and one output are developed through the TDM for NRWR predictions. It is not always possible with the variables depending on only two inputs to find a high accuracy solution to the complicated problems such as NRWR prediction. It is observed that much more successful results can be achieved through the STDM structure as suggested in this research with the addition of a third variable to the TDM prediction models.