A new innovative method for model ef ﬁ ciency performance

In every aspect of scienti ﬁ c research, model predictions need calibration and validation as their representativity of the record measurement. In the literature, there are a myriad of formulations, empirical expressions, algorithms and software for model ef ﬁ ciency assessment. In gen-eral, model predictions are curve ﬁ tting procedures with a set of assumptions that are not cared for sensitively in many studies, but only a single value comparison between the measurements and predictions is taken into consideration, and then the researcher makes the decision as for the model ef ﬁ ciency. Among the classical statistical ef ﬁ ciency formulations, the most widely used ones are bias (BI), mean square error (MSE), correlation coef ﬁ cient (CC) and Nash-Sutcliffe ef ﬁ ciency (NSE) procedures all of which are embedded within the visual inspection and numerical analysis (VINAM) square graph as measurements versus predictions scatter diagram. The VINAM provides a set of verbal interpretations and then numerical improvements embracing all the previous statistical ef ﬁ ciency formulations. The fundamental criterion in the VINAM is 1:1 (45°) main diagonal along which all visual, science philosophical, logical, rational and mathematical procedures boil down for model validation. The application of the VINAM approach is presented for arti ﬁ cial neural network (ANN) and adaptive network-based fuzzy inference system (ANFIS) model predictions.


INTRODUCTION
Models are the reflection tools of the reality for simulation, prediction, automation and optimum management studies at the service to men, and they are required to produce outputs as close as possible to the measurements in an efficient manner. Whatever are the model types (analytical, probabilistic, statistical, stochastic or numerical) in practical studies, there are two sequences for comparison as the measurement series and corresponding model prediction series.
In general, the model predictions are related to measurements through a curve fitting methodology based on the least square analysis of some type. There are also other versions including empirical relationships, stochastic and more complex numerical solution algorithms. All these techniques have a visual basis, which can be appreciated by means of shapes in the forms of mathematical functions, flow charts, geometry, algorithms, and block diagrams. Any idea based on a geometrical shape provides visual inspections, examinations and inference deductions, perhaps at early stages verbally, but such statements can be converted to mathematical expressions after understanding the science philosophical, logical and rational fundamentals. Human philosophical thinking and logical rational trimming of blurted ideas lead to a set of logical rule bases, which are precedencies of mathematical equations and expressions by a set of convenient symbols.
In scientific researches, one is interested in relating measurements to model prediction outputs, which may be in the forms of single-input single-output (SISO) or various versions as multi-input multi-output (MIMO) models. In the literature, there is a set of standard coefficients that provides agreement between measurements and predictions through a set of single parameter values. In most of the cases, authors report that their model is suitably based on the comparison of statistical parameters composed of measurement and model output data series by consideration one or few of the well-established agreement or association metrics among which are the most commonly used, accepted, and recommended ones are bias (BI), Percent Bias (PBI), coefficient of determination (R 2 ), mean square error (MSE) or root mean square error (RMSE), correlation coefficient (CC), Nash-Sutcliffe efficiency (NSE) and index of agreement (d) (Pearson 1895;Nash & Sutcliffe 1970;Willmott 1981;Santhi et al. 2001;Gupta et al. 2002;Moriasi et al. 2007;Van Liew et al. 2007;Özger & Kabataş2015;Tian et al. 2015;Zhang et al. 2016;Dariane & Azimi 2018).
As stated by McCuen & Snyder (1975) and Willmott (1981) almost all the models have elusive predictions, which cannot be covered by the model efficiency measures, and even, in general, by significance tests. Freedman et al. (1978) have mentioned that the statistical significance tests are concepts that must be viewed with skepticism. Along the same line, Willmott (1981) stated that it may be appropriate to test an agreement measure and report the value of a test statistics at a significance level, but the distinction between the significant and insignificant levels is completely unjustified. For instance, if the significance level is adapted as 0.05 then what are the differences, say, among 0.049, 0.048, 0.047 and 0.051, 0.052 and 0.053? Additionally, such significance levels depend on the number of data for depiction of the most suitable theoretical probability distribution function (PDF).
Even though ASCE (1993) accentuate the need to explicitly define model evaluation criteria, no widely accepted guidance has been established, but a few performance ratings and specific statistics have been used (Saleh et al. 2000;Santhi et al. 2001;Bracmort et al. 2006;Van Liew et al. 2007).
For a more objective assessment of model efficiency, calibration and validation, measurement association and comparison visual inspections must be preliminary conditions for better insights, interpretations and model modification possibilities. The basic statistical parameters as arithmetic averages, standard deviations, regression relationship between the measurement (independent variable) and model prediction (dependent variable) data through the scatter diagram are very important ingredients even for visual inspection to identify systematic and random components. Unfortunately, most often the model efficiency measure is obtained by available software, which does not provide any informative detailed visual inspection and assessment.
The main purpose of this paper is to present visual inspection and numerical analysis (VINAM) methodology for effective model efficiency and ideal validation, and if necessary, modification or calibration of the model predictions to comply with the measurements. The visual inspection and validation are possible by means of square template. It includes all basic information clearly in an objective manner first for verbal, science philosophical, logical and rational inferences, which are then translatable to mathematical symbolic equations. The proposed VINAM methodology improves all suggested model efficiency metrics that are available in the literature.

METHODOLOGY Statistical efficiency formulations
In the literature, all model efficiency standard indicators are dependent on three basic statistical parameters among which are the arithmetic averages of the measurements, M and model predictions, P; standard deviations, S M and S P and the cross-correlation, C M,P between measurement and model prediction sequences. Additionally, the regression line between the measurements and predictions has two parameters as intercept, I, or regression line central point ( M and P) coordinates and the slope, S.
The simple and necessary, but not enough mathematical efficiency measure is the bias, BI, which measures the distance between the measurements and model predictions as, The ideal value for model efficiency is BI ¼ 0; although this condition is necessary, but not enough. The second measure is the mean square error (MSE), The ideal value is MSE ¼ 0, but this condition is not valid in any hydro-meteorological model efficiency, because there are always natural random errors. This is the main reason why the best model MSE should have the minimum level among all other alternatives. Equation (2) includes implicitly the standard deviations and the cross-correlation between the measurements and predictions. The Nash-Sutcliffe efficiency (NSE) measure includes the MSE with the standard deviation of the measurement data ratio as follows.
The second term on the r.h.s is greater than 1, hence NSE has negative values. The ideal value of NSE is 1, but this is never verified in practical applications, and therefore, the closes is the value to 1, the better is the model efficiency.
As for the cross-correlation, CC, between the measurement and prediction can be calculated as, On the other hand, the straight-line regression intercept, I, and slope, S, values can be calculated according to the following expressions, respectively. Apart from the above model efficiency measurements, there are others, which have been suggested for their rectification. One of the first versions is due to Willmott (1981), who gave agreement index d as, M are the deviations from the respective arithmetic averages. The expression in the dominator is referred to as the potential error (PE). The significance of d is that it measures the degree to which model predictions are error free, and its values varies between 0 and 1; where 1 represents the perfect agreement between the measurements and predictions, which is never possible in practical applications, and therefore, the researchers take the closest value to d ¼ 1 as the model efficiency acceptance, but there is no criterion, which indicates objectively the limit value between acceptance and rejection, and hence, there appears subjectivity as in other efficiency measures. Equation (7) can be rewritten in terms of the MSE as follows.
Equations (1)-(8) include all the necessary numerical quantities that are useful in the construction of the VINAM template as will be explained and applied in the following sections.

Square graph for visual inspection and numerical analysis method (VINAM)
For visual inspection of the model predictions associated with the measurements, one can regard the measurements as independent variables, predictions as dependent variables and plot them on a coordinate system. In the case of an ideal match between the two series, one expects that they fall on the 1:1 (45°) straight line, which appears as the diagonal straight-line on the square template graph as in Figure 1. This straight-line divide the square area into two half triangles with the upper (lower) one representing complete model over-estimation (under-estimation) domain provided that all scatter points fall completely in either one of these triangles. It is also possible that the scatter points may have positions partially in each triangle in which case the model has some points as over-estimation and others as under-estimation. The mathematical expression of the ideal model efficiency case as 1:1 line is, In Figure 1, a and b coefficients correspond to the minimum and maximum values among the measurements and predictions.
The most significant feature of a square template is in its ability to reflect almost all the previously defined efficiency criteria properties in a single graph. For instance, in Figure 2(a), the scatter of points is shown in the upper triangular area (over-estimation), but there is no linear trend between the measurements and prediction scatter points, which are randomly distributed. This provides the message that the model is not capable of representing the measurements at all. It is necessary to try and model the measurements with another suitable model, which must yield at least some consistency among the scatter points.
For instance, in Figure 2(b), the scatter points have a linear tendency, which is the first indication that the model for predictions is suitable, because the scatters are around a regression line. The following features are the most important information pieces in this figure.
(1) The centroid point ( M and P) on the regression line is at a distance, D, from the ideal prediction line, (2) The same straight-line has a slope, S, with the horizontal axis the value of which can be calculated from Equation (6) which is exactly the reflection of the regression line in Figure 2(b). According to them S is the scale error, and D is the constant or displacement error, but in this paper, they are referred to as the rotational error and shift error, respectively. It is Uncorrected Proof obvious from Figure 2(b) that each one of these are systematic deviations from the ideal prediction line and therefore, each one is a systematic deviation, but their summation is total systematic error. They have significant duties as will be explained in Section 4. No need to say, Figure 2(c) is the under-estimation alternative of Figure 2(b), and the same quantities are also available in this figure. Equation (10) is also valid for this case, but with opposite shift error. Figure 2(d) and 2(e) is for partial model over-estimation (under-estimation) case, where the central point of the regression line centroid coincides with the ideal prediction line (D ¼ 0), and away from the ideal line, respectively. In the former case, there is no shift error, and for the other case everything is self-explanatory under the light of the above explanations.
According to the suggested template and algorithm, the measurement data is accepted as constant, and it is tried to approach systematically the predictions to these measurement data or to define the recalibration operation between the obtained model results and the measurement to obtain the best and optimum efficiency model. When the measurement data accepted as an independent variable is shown on the horizontal axis, and the prediction data that we can accept as a dependent variable on the vertical axis, the rotation and translation can be achieved by performing mathematical operations sequentially. In this way, a model prediction or calibration that is closer to actual measurement values is made by reducing systematic errors. In this case, a distortion occurs due to the change in the vertical distances with the ideal line as a result of these operations. This result is unavoidable to make better predictions. Thanks to the new approach in this study, the optimization of total vertical changes has been achieved by taking all available data into account. The positive contribution of the suggested method is seen by controlling the obtained results through the six different performance indicators.

Model modification
After all explanations in the previous section, an important question is 'is it possible to improve the model performance, and how to increase its efficiency?'. The best and optimum efficiency is possible after shift and rotation operations on the VINAM template regression line. The following steps are necessary for arriving at the best solution.
(1) Shifting operation of the central regression point vertically such that it sits on the ideal prediction line (1:1). Only vertical shifts are possible for keeping the measurements as they are.
(2) After the shifting, the regression line is rotated according to the rotation angle as (1-S), so that the regression line coincides with the ideal prediction line (1:1), (3) These two operations are preferable if there is no other choice to get the VINAM regression line to coincide with the ideal prediction line.
In shifting operation, there is no problem, because the whole scatter points are moved by the amount of D downwards or upwards. The shifting operation mathematical expression is, As for the rotation operation, the horizontal locations of each scatter point must remain the same, so as not to disturb measurement values. Such a rotation can be achieved by means of the following expression where P 00 i is for final data.
The method suggested in this study aims to improve the model performances by reducing the differences between the systematic errors in the model prediction results and the predictive measurements. When the accuracy and reliability levels are analyzed, it is seen that there are systematic and random errors between the predictions and measurements. These errors vary depending on certain factors such as the experience of the modeler, the data quality, and the selected methodology. The error evaluation can be made according to the ideal line given in the Square template described in Figure 1 which is frequently preferred in studies, in addition to various performance indicators. When the first results of model studies are evaluated, it is seen that different alternatives may arise (Figure 2). The vertical differences of the prediction results compose of the consistent difference between the mean values of the measurement and prediction, the angle between the 1:1 ideal line expected to be between the model and the prediction, and the regression line obtained according to the least-squares between the model and the prediction, and finally random differences. These three components showing the quality, accuracy, and reliability of the model vary depending on the established model, the used data, and the selected method, etc. The model performances depending on the first two components can be improved significantly through the suggested method. Therefore, the two important steps including the shift and rotation operations were described in this study. The model design needs to be revised to improve the model performance by developing the third component prediction.

RESULTS AND DISCUSSION
In the Appendix-A, the necessary software is given for the application of all VINAM steps. The applications of the VINAM procedure are presented for two well-known models, which are artificial neural network (ANN) and adaptive network based fuzzy inference system (ANFIS). These applications are based on the water losses measurement in potable water distribution systems, for which water loss predictions are among the most important issues of water stress control (Sisman & Kizilöz 2020a, 2020b. The most important component in the evaluation of a water distribution system with regards to water losses is the non-revenue water (Kanakoudis & Muhammetoglu 2014;Boztaşet al. 2019;Sisman & Kizilöz 2020a, 2020bKizilöz & Sisman 2021). Jang & Choi (2017) built a model to calculate the NRW ratio of Incheon, Republic of Korea by means of ANN methodology. When examined the best model, R 2 was obtained as 0.397 by them. It is seen that the models can be improved when the measurement values and model projections scatter plots appear along a regression line as already explained in Figure 2.
The NRW ratio estimates are modeled through ANN and ANFIS for Kocaeli district, Turkey and the implementation of suggested VINAM method is carried out on similar model outputs for further improvements.
A total of eight models (4 ANN and 4 ANFIS) with nine input measurements are developed through the modeling procedures. Water demand quantity, domestic water storage tank, number of network failure, number of service connection failure and failure ratio, network length, water meter, number of junctions, and mean pipe diameter are the model input parameters. All models are validated by VINAM approach, and the model efficiency evaluations are carried out through statistical values according to BI, MSE, CC, R 2 , d and NSE.
For ANN model performance, 55% of the available data is arranged as training, 35% as validation and 10% as testing. These models are developed with one hidden layer including four neurons and feed forward back propagation training procedure with support of Levenberg-Marquardt back propagation algorithm (Coulibaly et al. 2000;Kermani et al. 2005;Kızılöz et al. 2015;Rahman et al. 2019;Sisman & Kizilöz 2020a, 2020b. As for the ANFIS model implementation 66% of the obtained data are taken as training and the remaining 34% as validation (testing) purpose. For this model, various membership functions (MFs) are considered as triangular (Trimf), Gaussian bell-shaped (gbellmf) and trapezium (trammf) with 'low', 'medium', and 'high' linguistic terms. The statistical properties of input components and model outputs are given in Table 1 for ANN and ANFIS models.
The resultant VINAM graphs are presented in Figure 3 for ANN model versions with the models efficiency classical and VINAM improvements in Table 2.
In this study, the NRW prediction rate of the selected model was calculated over nine different parameters through the ANN and ANFIS methodologies. The performance indicator results of the NRW rate predictions which are made through three different combinations of input parameters given in Table 2 are available, and it seems that the model results are not at the desired level. On the other hand, when the NRW rate predictions are analyzed through the Square template described in Figure 1, it is seen that the combinations determined by certain systematic errors can make good predictions.  Uncorrected Proof So, a considerable improvement has been achieved by calibrating the models (according to the ideal line) over the classical approaches through the suggested methodology. It is possible to predict the NRW rates with specific levels that can be accepted with only three parameters, and evaluate the network losses over three parameters such as WDQ, WM, FR. The second application graph links to the measurements and ANFIS models VVA diagram with Figure 4. The classical efficiency and VINAM improvements are available in Table 3.
The NRW rate, which is predicted through certain parameters in this study, actually varies depending on many variables, and it is also affected by many uncertainties in the water distribution infrastructure. Leaks that cause physical losses in the system, number of failures, network age, network pressure, meter ages that cause apparent losses, meter measurement errors, illegal water use, etc. show the importance of these uncertainties. The effects of these uncertainties and their management are more important, especially for the administrations with high NRW rates. Developing a model, in which all uncertainties that cause increases in NRW rates are evaluated together, is possible by significant time and cost resources. Since the parameter changes in the water distribution system are partially related to the aforementioned issues, the water distribution system management can be carried out by these parameters by predicting the models through the suggested methodology with much fewer parameters. Evaluating the NRW rate performances through the parameter combinations provides much faster and economical solutions for systems. In a conclusion, an improvement in predictive power could help designing investment incentives more effectively and in a more targeted way, and therefore, determining the effects of the parameters for which the best predictions are made on the NRW rate makes the revision of field practices on investment programs possible with the advantages of the effective models suggested in this study.

CONCLUSIONS
The existing model efficiency criteria have statistical mathematical expressions, which yield a single value about the association between the measurement and prediction sequences. Accordingly, the researcher may adopt with subjective acceptance one of the classical efficiency metrics, because the closer the efficiency measure value to the ideal value, the better is the model representativeness of the measurement data. In these criteria assessments single significant tests are inappropriate in many cases, because they do not provide preliminary visual information. Rather than depending on such expressions without visual impressions, this paper presents an effective model efficiency evaluation methodology by means of visual inspection and numerical analysis (VINAM) square template concept. It provides to visualize all the methodological details first by eye for science philosophical, logical and rational inferences, which lead to the fundamentals of the mathematical model efficiency expressions explicitly. The main ideal is to assess the scatter plot diagram between measurement and model predictions. In case of random scatter, the model is not suitable at all. On the other hand, if the scatter points appear along an acceptable regression line on the VINAM square template, then by means of the shift and rotation procedures the scatter points can be formed around the 1:1 (45°) straight-line with improvements in the classically available statistical model efficiency results. The application of the VINAM procedure is checked with the artificial neural network (ANN) and adaptive network based fuzzy inference system (ANFIS) models. It is observed that the VINAM method improves all the cases with very significant percentages.