Abstract
In this study, 41 models used for the prediction of daily failure rates in water distribution networks have been designed via the Serial Triple Diagram Model (STDM) and artificial neural network (ANN) methods. For this purpose, daily failure data measured coordinately in the water distribution system network in Gebze and transferred to the geographic information system (GIS) has been used. The data has been normalized through the min-max technique to scale it at regular intervals and develop the model prediction performance. In this study, certain meteorological variables such as temperature and precipitation have been taken into account as model input for the first time. According to the increasing values of these two variables, it is observed in the model results that daily failure rates tend to increase. The expected model accuracies in failure rate prediction could not be obtained through the suggested ANN models. The higher prediction performances have been obtained through the STDM, a structure that enables visualization of the model results by making inferences. The STDM method is a significant alternative approach to determine the relationship of the variables on the failure and to predict the failure rate. It is predicted that the suggested STDM charts will contribute to the decision-makers, experts, and planners to determine effective infrastructure management. Also, investment planning prioritization will be able to reduce failure rates by interpreting prediction charts.
HIGHLIGHTS
Counter maps have been used for predicting and interpreting failure rates.
Temperature and precipitation effect has been directly taken into consideration for the first time.
The highest accuracy has been achieved with the least input in models.
Graphical Abstract
LIST OF ABBREVIATIONS
The following symbols are used in this paper:
- Acronym
Definition
- AC
Asbestos-cement
- ANN
Artificial neural network
- APP
Average pressure of pipe
- AWWA
American Water Works Association
- BI
Bias
- CI
Cast iron
- DFR
Daily failure rate
- DI
Ductile iron
- DMA
District metered area
- DMT
Daily mean temperature
- DTR
Daily total rainfall
- FFBP
Feed forward back propagation
- FR
Failure rate
- GIS
Geographic information system
- GRP
Glass reinforced plastic
- HDPE
High-density polyethylene
- IWA
International Water Association
- LM
Levenberg-Marquardt algorithm
- LS-SVM
Least-squares supportive machine
- MAE
Mean absolute error
- MAP
Mean age of pipe
- MARS
Multivariate adaptive regression splines
- MDP
Mean diameter of the pipe
- MGM
Turkish Meteorological Service
- MLP
Mean length of pipe
- MSE
Mean square error
- NRWR
Non-revenue water ratio
- PMS
Pressure management system
- PRV
Pressure reducing valve
- PVC
Polyvinyl chloride
- R2
Coefficient of determination
- RF
Random forest
- SCADA
Supervisory control and data acquisition system
- SI
Scatter index
- ST
Steel
- STDM
Serial triple diagram model
- SV
Semi-variogram
- SVM
Support vector machines
- TDM
Triple diagram model
- WDN
Water distribution network
INTRODUCTION
Water utilities are responsible for delivering quality purified drinking water to consumers without any interruption through water distribution networks (WDNs). Failures that may occur in WDNs and their components (such as valves, fire hydrants, pumps, service connection, etc.) bring significant problems for the water utility as they cause a decrease in network pressures, deterioration of water quality, increase in consumer dissatisfaction, and excessive water loss. For this reason, rapid detection and appropriate repair are significant in terms of reducing these problems. Water utilities, domain experts, and decision-makers need to review their current management and operating approaches on WDNs due to environmental, economic, and social impacts of failures. So it is important to determine the rates of failures, their frequencies and densities, and the factors causing the failures (Aydogdu & Fırat 2015). In the literature, failure rate is considered as the ratio of the failure numbers to the network length.
Failures in WDNs can arise out of physical (pipe age, pipe diameter, pipe material, pipe installation, etc.), environmental (climate, pipe location, seismic activity, pipe bedding, groundwater, etc.), and operational (water quality, pressure, flow velocity, leakage, etc.) factors (Trifunović 2012). Researchers have made failure predictions by using different methodologies (Rajani & Kleiner 2001; Pelletier et al. 2003; Wilson et al. 2017; Robles-Velasco et al. 2020). Cooper et al. (2000) have made failure predictions in the WDN of London through the model based on probabilistic approach and geographic information system (GIS) by using certain variables such as pipe density function, soil corrosivity class, number of buses per hour, and pipe diameter. The authors believed that model results contribute to prioritization of the operational and investment plans of water utilities to reduce in WDN failures. Iličić (2009) has studied the factors affecting the failure frequency in the WDN of Zagreb, the capital city of Croatia, through the statistical log-linear analysis method. He stated that there are significant parameters affecting failures such as pressure regulation, pipe age, pipe diameter, and pipe material. Carrión et al. (2010) have analyzed the recorded failure data in WDNs through the non-parametric and semi-parametric methods to evaluate failure probabilities. The analysis results showed that certain pipe characteristics such as traffic conditions, diameter, material, and length have an impact on failures. Kakoudakis et al. (2017) have preferred the evolutionary polynomial regression model to predict the failures in UK WDNs by using pipe age, length, and diameter characteristics. The developed approach contributed to obtaining higher model output accuracies, reducing prediction errors in networks with excessive failures. Winkler et al. (2018) have set up the failure model for replacing plans of urban WDNs by using the decision tree learning technique. They indicated that the models can be a good alternative to traditional statistical failure approaches. Wols et al. (2019) have researched the effect of meteorological parameters such as air temperature, wind, and drought on failures in drinking WDNs of the Netherlands. At the end of the study, it is stated that temperature is a significant parameter for cast iron (CI) and asbestos-cement (AC) pipes. Failures have increased in high temperature for AC pipes and increased in low temperature for CI pipes. Also, it is stated that winds uprooting trees and drought periods cause failures.
Researchers have developed prediction models for failure rate (FR) using different methods. Wang et al. (2009) have recommended deterioration models to predict annual FR through the multiple regression method considering certain variables such as diameter, pipe age, pipe type, length, and depth of installation. According to the model results, the network length is quietly effective on FR prediction. Tabesh et al. (2009) have predicted the FRs in WDNs in Iran by using the neuro-fuzzy systems, and multivariate regression methods. In this analysis, pipe depth, diameter, age, length, and average pressure variables have been used as input. Researchers indicated that artificial neural network (ANN) models have more accuracy and realistic results compared to the neuro-fuzzy systems and multivariate regression models. Jafar et al. (2010) have developed some models through the ANNs by using materials of pipes, the diameter of pipes, length of pipes, thickness, age, soil type, location, and pressure variables to predict the failure rate in the WDN of north France. The ANN models can help decision-makers to determine the best strategy for network replacing and maintenance investments. Asnaashari et al. (2013) have modeled FRs in their studies through the multiple linear regression (MLR) and ANN approaches. The model inputs in both studies are the following variables; pipe age, length, diameter, soil type, break category, pipe material, the year of Cathodic Protection (if applicable), and the year of Cement Mortar Lining (if applicable). The ANN model that was applied by the researchers at Etobicoke WDN in Canada gave better prediction results than MLR with R2 = 0.94. On the other hand, Shirzad et al. (2014) have developed models using the ANN and support vector regression (SVR) methods to predict FRs at Mashhad WDN in Iran. Practitioners have used length, age, the hydraulic pressure of pipes, and installation depth diameter model variables as input, and model results showed that the ANN was a better predictor than the SVR. Aydogdu & Firat (2015) have modeled the FR through the least-squares supportive machine (LS-SVM) and fuzzy clustering approaches by using certain variables such as length, age, and diameter in the WDN of Malatya city. In their study, the FR and the effective factors on this rate have been analyzed. In this context, high FRs have been observed in the pipes with a length in the range of 0–200 m, pipes with a diameter of 110 m, and pipes between the ages of 15 and 20 years. The authors emphasized that using clustering analysis and LS-SVM methods together can be useful to obtain more effective performance in the prediction of FRs. In another study, annual failure rates at Scarborough WDN in Canada have been predicted by Sattar et al. (2019) through the extreme learning machine (ELM) model. For the model, 50 neurons maximum have been employed in the hidden layer. The prediction models showed that the most effective input variable was pipe diameter. Kutyłowska (2019) has predicted the FR through the support vector machines (SVMs) and non-parametric regression methods using operational parameters (distribution pipes, house connections, total network length) in WDNs of Polish cities. The linear kernel function has been recommended to obtain the best prediction model. Motiee & Ghasemnejad (2019) have applied four different regression models (Poisson regression, logistic regression, exponential regression, and linear regression) to predict the FR in the WDN of Tehran by using length, material, pressure, age, and diameter variables. It is clear according to the all-model results that the ductile iron pipes among the others types such as ductile iron, asbestos-cement, cast iron, and high-density polyethylene have a significant role reducing failure number. Shirzad & Safari (2020) have been designed FR prediction models through the random forest (RF) and multivariate adaptive regression splines (MARS) techniques by using certain variables such as hydraulic pressure, pipe diameter, pipe age, pipe installation depth, and average, pipe length in the city network of Mahabad city, located in the north-east of Iran. In the first case study, MARS model outputs had higher prediction performance than RF.
Another approach used for the modern prediction models is the triple diagram method (TDM), a mapping technique based on the Kriging method. Counter charts are used to visualize the collective behavior of two independent variables and a dependent variable, and to provide the most ideal interpolation for prediction (Özger & Sen 2007). The TDM method enables to design of the most suitable surface model and to interpret the variable behaviors despite the extreme data scattering of the variables (Kızılöz & Şişman 2021). Şişman & Kizilöz (2020) have used the TDM structure to predict the non-revenue water ratio (NRWR). In another study conducted by the researchers, the NRWR has been predicted with high accuracy through the serial triple diagram model (STDM) (Kızılöz & Şişman 2021).
The aim of this study is to predict the FR in WDNs through the mapping technique based on ANNs, an artificially intelligent application, and Kriging methods. For this purpose, (1) the precipitation, temperature, diameter, age, pressure, and length variables have been selected as model inputs to predict the FR; (2) the data set has been normalized to scale model inputs and outputs to a certain scale, and to increase the model performance; (3) ideal ANN and STDMs have been designed by analyzing separately the effect of each variable on FR; (4) the best model accuracy with the least input has been analyzed.
STUDY AREA AND DATA
The total network length of the modeling area is 1,401 km and the service connection length is 420,78 km. The existing water distribution network consists of pipes with different material properties such as 4.75% steel (ST), 1.28% glass reinforced plastic (GRP), 18.66% ductile iron (DI), 39.78% high-density polyethylene (HDPE), and 35.53% polyvinyl chloride (PVC) (Figure 1). Daily measured failures on WDNs have been recorded by the ISU, Department of Water and Wastewater Technologies, and the Branch Office of Geographical Information Systems. In this modeling study, 5117 data in total, including 731 data for each model variable (measured and recorded daily values between 2019 and 2020) have been used to analyze, map, and make inferences on FRs. Each independent variable of the WDN measured daily at failure points has been given in Table 1. Different variables are available in this modeling data such as the mean diameter of the pipe (MDP), the mean age of pipe (MAP), the average pressure of pipe (APP), mean length of pipe (MLP), and daily FR (DFR). Also, certain meteorological variables such as daily mean air temperature (DMT) and daily total rainfall (DTR) have been used for the first time as model input and their relationships with FR have been analyzed. Daily mean temperature and precipitation data have been obtained from the Turkish Meteorological Service (MGM). Statistical properties of all variables used as model inputs and output are available in Table 1.
Variable summary in the study model
Variables . | Unit . | Min . | Mean . | Max . | Median . | Std.D. . |
---|---|---|---|---|---|---|
MDP | mm | 32 | 50.79 | 441 | 39.60 | 38.15 |
MAP | year | 2 | 19.02 | 33 | 19.50 | 4.13 |
APP | m | 39.60 | 50.78 | 61.88 | 50.77 | 3.39 |
MLP | km | 10.13 | 30.17 | 55.37 | 29.76 | 4.76 |
DMT | °C | −0.90 | 15.42 | 26.40 | 15.60 | 6.99 |
DTR | mm | 0 | 1.44 | 45 | 0 | 4.25 |
DFR | – | 0.02 | 0.27 | 0.93 | 0.25 | 0.17 |
Variables . | Unit . | Min . | Mean . | Max . | Median . | Std.D. . |
---|---|---|---|---|---|---|
MDP | mm | 32 | 50.79 | 441 | 39.60 | 38.15 |
MAP | year | 2 | 19.02 | 33 | 19.50 | 4.13 |
APP | m | 39.60 | 50.78 | 61.88 | 50.77 | 3.39 |
MLP | km | 10.13 | 30.17 | 55.37 | 29.76 | 4.76 |
DMT | °C | −0.90 | 15.42 | 26.40 | 15.60 | 6.99 |
DTR | mm | 0 | 1.44 | 45 | 0 | 4.25 |
DFR | – | 0.02 | 0.27 | 0.93 | 0.25 | 0.17 |
The standard water balance budget suggested by American Water Works Association (AWWA) and International Water Association (IWA) is available in Table 2 for the region of Gebze. According to this table, the total amount of water entering the system is 30,777,924 m3/year, the amount of water loss is 6,670,870 m3/year, and the amount of unbilled unmetered consumption is 409,884 m3/year (Table 2). It is seen in the table that the water loss amount arising out of failures is at the level of 1.33%. The related water utility tries to reduce this level up to 1% and water loss rate up to 20% through DMA, PM, minimum night flow monitoring, and active leakage control methods. If failures repeat intensively at the same point despite the above-mentioned activities, old networks and service connections will be replaced.
METHODOLOGY
In this study, the DFR has been predicted via ANNs and STDMs. The stages of the model studies are given below respectively:
- 1.
The daily failure number of the WDN of Gebze that was measured and recorded coordinately for model study between the years of 2019 and 2020 has been calculated.
- 2.
The min-max normalization technique has been used to increase model prediction accuracy and measure the data on a certain scale.
- 3.
Different combinations of model have been analyzed to predict the FR with the least input with the help of model input number increase, from one to three. The best prediction model has been obtained by the combination of two inputs.
- 4.
Some statistical indicators such as R, MSE, BIAS, MAE, and SI have been used to evaluate model performances.
- 5.
Model predictions have been made on the basis of the Kriging interpolation technique.
- 6.
An STDM has been used to increase the input number and accuracy in prediction charts.
- 7.
The prediction accuracy of the STDM has been evaluated through the same statistical indicators.
- 8.
The FR prediction model with the highest accuracy has been determined by comparing the ANN and STDM model results.
Min-max normalization
Certain values on the input variable data set used for model applications can be bigger or lower than zero. Especially, extreme values affect negatively model results due to the differences between the input values. To remove any undesirable condition, the data can be normalized by scaling in the same interval. In the literature, there are different normalization methods preferred frequently such as Statistical or Z-Score, Min-Max, Median, Sigmoid, and Statistical Column.




ANNs













The Levenberg-Marquardt (LM) algorithm, a back-propagation algorithm, has been used for the training of ANN models in this study. This algorithm is a repetitive method used for non-linear least-square problems. The LM algorithm is a combination of gradient descent and Gauss-Newton methods. This algorithm is preferred frequently by researchers due to its speed and stability in the training of prediction models (Kizilöz et al. 2015; Şişman & Kizilöz 2020; Kizilöz 2021).
Determination of ideal hidden layer and neuron numbers in the ANN structure is also significant in achieving model predictions. In the literature, ANN applications with single and multi-hidden layers have been studied commonly by model designers to solve complex engineering problems (Zealand et al. 1999; Rajurkar et al. 2004; Şen et al. 2021; Kizilöz et al. 2022). As the hidden layer with a higher number preferred in most studies cannot develop the model prediction, the ANN model structure with a single hidden layer has been preferred in this study due to its easy application and prevalence (Kizilöz et al. 2015; Kizilöz 2021). On the other hand, there is no mathematical approach to determining the most effective neuron number in the hidden layer (Kizilöz 2021). This number is determined generally through the trial-and-error method (Kizilöz 2021). Also, in this modeling study, MDP, MAP, APP, MLP, daily mean temperature DMT, and DTR variables have been used as input, and DFR has been used as output.
STDM










The Kriging approach is based on the least error squares method and is known as the best linear unbiased estimator. The weights determined through this approach, which depend on the SV and the spatial position of the data, are calculated as the mean of the difference between the prediction and actual values is zero and the prediction error variance is the smallest. Researchers take advantage of various Kriging methods according to the study area and data structure such as Indicator Kriging (Armstrong 1998), Universal Kriging (Lark et al. 2014), Simple Kriging (Elbasiouny et al. 2014), Co-Kriging (Chica-Olmo et al. 2014), Ordinary Kriging (Sanusi et al. 2014), Block Kriging, and Punctual Kriging methods (Şişman & Kizilöz 2020). In this study, the Point Kriging method has been preferred to predict data at unmeasured points.
measured input data of
series as MAP, APP, and etc.
;
where n is the record length,
= measured DFR as output data,
where n is the total number of data,
model output as prediction DFR, (
)
1. model error terms
2. model result errors
= model constant
Model assessment criteria




RESULTS AND DISCUSSION
Before starting model applications, 731 raw data belonging to the MDP, MAP, APP, MLP, DMT, DTR variables that were measured and recorded daily between the 2019 and 2020 time period have been reduced in the range of [0 1] standardizing them through the min-max normalization method to scale the data in a certain range and increase model performances.
After the normalization process, various combinations have been formed through the ANNs to predict daily failure rates (DFRs). The model application has been made by using the Neural Network Fitting Toolbox in the MATLAB software. In this study, the three-layer feed-forward back propagation network structure, FFBP, with a single input, hidden, and output layer has been preferred (Figure 3). Also, the linear activation functions in the output layer and sigmoid function in the hidden layer have been used. The model data has been classified randomly; 60% training (439 data), 30% validation (239 data), and 10% testing (73 data). Before the training process, each model which has irregular weights and biases at the beginning has been initialized again (Kizilöz et al. 2015; Şişman & Kizilöz 2020; Kizilöz 2021). The Levenberg-Marquardt back propagation (trainlm) that is preferred frequently due to its speed and sensitivity has been used for the training algorithm of the ANN models in the study (Kizilöz et al. 2015; Şişman & Kizilöz 2020; Kizilöz 2021). The neuron number in the hidden layer that is effective on the model accuracy has been defined via the trial-and-error methods. Firstly, a single neuron was used in the hidden layer, and then the model performances were evaluated by increasing the neuron numbers one by one. According to the results, the best model performance was obtained when five neurons were used in the hidden layer. Therefore, it was decided to have five neurons in the hidden layer.
Accuracy of ANN models with one input variable
Model no . | Model inputs . | R2 . | MSE . | BIAS . | MAE . | SI% . |
---|---|---|---|---|---|---|
ANN-1.1 | DMT | 0.30 | 0.021 | −0.008 | 0.113 | 52.64 |
ANN-1.2 | MAP | 0.23 | 0.023 | −0.004 | 0.119 | 54.98 |
ANN-1.3 | MDP | 0.18 | 0.024 | −0.002 | 0.123 | 56.49 |
ANN-1.4 | MLP | 0.15 | 0.025 | −0.005 | 0.125 | 57.73 |
ANN-1.5 | APP | 0.12 | 0.026 | −0.002 | 0.128 | 58.70 |
ANN-1.6 | DTR | 0.10 | 0.028 | 0.004 | 0.135 | 61.13 |
Model no . | Model inputs . | R2 . | MSE . | BIAS . | MAE . | SI% . |
---|---|---|---|---|---|---|
ANN-1.1 | DMT | 0.30 | 0.021 | −0.008 | 0.113 | 52.64 |
ANN-1.2 | MAP | 0.23 | 0.023 | −0.004 | 0.119 | 54.98 |
ANN-1.3 | MDP | 0.18 | 0.024 | −0.002 | 0.123 | 56.49 |
ANN-1.4 | MLP | 0.15 | 0.025 | −0.005 | 0.125 | 57.73 |
ANN-1.5 | APP | 0.12 | 0.026 | −0.002 | 0.128 | 58.70 |
ANN-1.6 | DTR | 0.10 | 0.028 | 0.004 | 0.135 | 61.13 |
The predictions belonging to the twelve different combinations that formed model structures with single input and single output have been sorted in Table 4 according to some statistical performance indicators such as R2, MSE, BI, MAE, and SI to increase model accuracies of DFRs. According to this sorting, the best prediction model has been obtained through the MAP-DMT combination (Figure 5(b)). When the prediction model results with two inputs are compared with the model results with single input given in Table 3, a performance development is observed over 46%.
Accuracy of ANN models with two input variables
Model no . | Model inputs . | R2 . | MSE . | BIAS . | MAE . | SI% . |
---|---|---|---|---|---|---|
ANN-2.1 | MAP-DMT | 0.44 | 0.016 | 0.005 | 0.101 | 46.68 |
ANN-2.2 | DMT-MLP | 0.41 | 0.017 | 0.002 | 0.102 | 48.15 |
ANN-2.3 | MDP-DMT | 0.38 | 0.018 | 0.005 | 0.105 | 49.11 |
ANN-2.4 | MAP-NL | 0.35 | 0.019 | −0.004 | 0.108 | 50.58 |
ANN-2.5 | MDP-MAP | 0.31 | 0.020 | 0.001 | 0.111 | 51.77 |
ANN-2.6 | AAP-DMT | 0.30 | 0.021 | 0.003 | 0.111 | 52.21 |
ANN-2.7 | MDP-NL | 0.28 | 0.021 | −0.003 | 0.113 | 52.95 |
ANN-2.8 | MAP-APP | 0.28 | 0.021 | 0.009 | 0.113 | 53.33 |
ANN-2.9 | MDP-APP | 0.24 | 0.022 | −0.002 | 0.119 | 54.37 |
ANN-2.10 | MAP-DTR | 0.24 | 0.022 | 0.005 | 0.116 | 54.60 |
ANN-2.11 | APP-NL | 0.23 | 0.023 | 0.007 | 0.116 | 54.87 |
ANN-2.12 | MDP-DTR | 0.22 | 0.023 | 0.007 | 0.119 | 55.28 |
Model no . | Model inputs . | R2 . | MSE . | BIAS . | MAE . | SI% . |
---|---|---|---|---|---|---|
ANN-2.1 | MAP-DMT | 0.44 | 0.016 | 0.005 | 0.101 | 46.68 |
ANN-2.2 | DMT-MLP | 0.41 | 0.017 | 0.002 | 0.102 | 48.15 |
ANN-2.3 | MDP-DMT | 0.38 | 0.018 | 0.005 | 0.105 | 49.11 |
ANN-2.4 | MAP-NL | 0.35 | 0.019 | −0.004 | 0.108 | 50.58 |
ANN-2.5 | MDP-MAP | 0.31 | 0.020 | 0.001 | 0.111 | 51.77 |
ANN-2.6 | AAP-DMT | 0.30 | 0.021 | 0.003 | 0.111 | 52.21 |
ANN-2.7 | MDP-NL | 0.28 | 0.021 | −0.003 | 0.113 | 52.95 |
ANN-2.8 | MAP-APP | 0.28 | 0.021 | 0.009 | 0.113 | 53.33 |
ANN-2.9 | MDP-APP | 0.24 | 0.022 | −0.002 | 0.119 | 54.37 |
ANN-2.10 | MAP-DTR | 0.24 | 0.022 | 0.005 | 0.116 | 54.60 |
ANN-2.11 | APP-NL | 0.23 | 0.023 | 0.007 | 0.116 | 54.87 |
ANN-2.12 | MDP-DTR | 0.22 | 0.023 | 0.007 | 0.119 | 55.28 |
Various model combinations with three inputs have been designed increasing input number to develop prediction accuracy of ANN models. Twelve prediction models evaluated according to the five different performance criteria are available below in Table 5. When these model results are compared with model results with two inputs, there is no remarkable improvement. The determination coefficients of the models given in Table 4 change between 0.22 and 0.44. These coefficients change between 0.21 and 0.40 in Table 5. In brief, the desired predictive accuracies have not been obtained through the model results with three inputs.
Accuracy of ANN models with three input variables
Model no . | Model inputs . | R2 . | MSE . | BIAS . | MAE . | SI% . |
---|---|---|---|---|---|---|
ANN-3.1 | MDP-MAP-DMT | 0.40 | 0.018 | 0.006 | 0.104 | 48.56 |
ANN-3.2 | APP- DMT -MLP | 0.39 | 0.018 | −0.001 | 0.104 | 48.85 |
ANN-3.3 | DMT -DTR- MLP | 0.38 | 0.018 | 0.007 | 0.103 | 49.43 |
ANN-3.4 | MAP – APP – DMT | 0.38 | 0.018 | −0.012 | 0.106 | 49.36 |
ANN-3.5 | APP – DMT – DTR | 0.33 | 0.020 | −0.001 | 0.110 | 51.21 |
ANN-3.6 | MDP – APP – DMT | 0.33 | 0.020 | −0.008 | 0.111 | 51.33 |
ANN-3.7 | MAP – APP – MLP | 0.31 | 0.020 | −0.003 | 0.111 | 51.78 |
ANN-3.8 | MDP – MAP – APP | 0.27 | 0.021 | 0.003 | 0.115 | 53.26 |
ANN-3.9 | MAP- APP – DTR | 0.25 | 0.022 | 0.008 | 0.117 | 54.02 |
ANN-3.10 | MDP – MAP – DTR | 0.25 | 0.022 | 0.002 | 0.116 | 54.30 |
ANN-3.11 | MDP – APP – MLP | 0.24 | 0.022 | 0.007 | 0.114 | 54.67 |
ANN-3.12 | MDP – MAP – MLP | 0.21 | 0.023 | −0.001 | 0.119 | 55.55 |
Model no . | Model inputs . | R2 . | MSE . | BIAS . | MAE . | SI% . |
---|---|---|---|---|---|---|
ANN-3.1 | MDP-MAP-DMT | 0.40 | 0.018 | 0.006 | 0.104 | 48.56 |
ANN-3.2 | APP- DMT -MLP | 0.39 | 0.018 | −0.001 | 0.104 | 48.85 |
ANN-3.3 | DMT -DTR- MLP | 0.38 | 0.018 | 0.007 | 0.103 | 49.43 |
ANN-3.4 | MAP – APP – DMT | 0.38 | 0.018 | −0.012 | 0.106 | 49.36 |
ANN-3.5 | APP – DMT – DTR | 0.33 | 0.020 | −0.001 | 0.110 | 51.21 |
ANN-3.6 | MDP – APP – DMT | 0.33 | 0.020 | −0.008 | 0.111 | 51.33 |
ANN-3.7 | MAP – APP – MLP | 0.31 | 0.020 | −0.003 | 0.111 | 51.78 |
ANN-3.8 | MDP – MAP – APP | 0.27 | 0.021 | 0.003 | 0.115 | 53.26 |
ANN-3.9 | MAP- APP – DTR | 0.25 | 0.022 | 0.008 | 0.117 | 54.02 |
ANN-3.10 | MDP – MAP – DTR | 0.25 | 0.022 | 0.002 | 0.116 | 54.30 |
ANN-3.11 | MDP – APP – MLP | 0.24 | 0.022 | 0.007 | 0.114 | 54.67 |
ANN-3.12 | MDP – MAP – MLP | 0.21 | 0.023 | −0.001 | 0.119 | 55.55 |
When thirty different models with one, two, and three inputs that applied to predict the DFR through the ANN approach are analyzed together with R2, MSE, BI, MAE, and SI statistical performance indicators, it is seen that the best model prediction has been achieved by the MAP-DMT model combination. The best actual-prediction relationship with one and two inputs is given together in Figure 5. Model structure with single input is insufficient for predictions. On the other hand, the expected development of prediction is not also obtained through the model structure with three inputs. When ANN model results are examined all together, it is clear that some variables such as MDP, MAP, APP, MLP, DMT, and DTR are effective factors in the DFR prediction. In a conclusion, after being observed that the desired model performances cannot be obtained through the ANN model approach for the DFR predictions, it has been tried to obtain predictions models with high accuracy through the STDM.
In this study, two-dimensional contour charts with three variables have been used through the Surfer program to make predictions with the STDM structure. It is necessary to have one dependent output and two independent variables (input) to obtain a contour chart through the Surfer program for problem-solving. While the independent variables are shown on the X and Y-axis in the prediction charts, the dependent variable values are shown on the Z-axis with curves depending on independent variables. The STDM structure has been applied for the first time for model charts to predict the DFR. This structure is composed of different model prediction charts that are linked to each other and have two inputs. In the beginning, the first model prediction has been made with two independent variables that are effective on the DFR. In case the model outputs of these charts are not in the desired accuracy, a new model prediction is made through the prediction errors of the first model and with the help of a third independent variable. Thus, an STDM structure with one output and four inputs is formed through the prediction error of the first model and three independent variables. STDM structures can be developed providing forward model flows by each model prediction error being the input of the next model.
In this study, the statistical performance indicators of eleven models developed by the STDM structure are available below in Table 6. According to this table, the R2 values are between 0.93 and 0.83, MSE values are 0.002–0.005, BI values are −0.004 to 0.001, MAE values are 0.033–0.052, and SI values are in the range of 16.62–26.24%. When the forty-one model accuracies in this study are analyzed together, it is observed that the STDM predictions are higher in comparison with the ANN predictions.
Accuracy of STDMs with four input variables
Model no . | First model inputs . | Second model inputs . | R2 . | MSE . | BIAS . | MAE . | SI% . |
---|---|---|---|---|---|---|---|
STDM.1 | MAP-DTR | Error-DMT | 0.93 | 0.002 | 0.000 | 0.033 | 16.62 |
STDM.2 | MDP-APP | Error- DMT | 0.92 | 0.002 | 0.000 | 0.034 | 18.01 |
STDM.3 | APP – DTR | Error- DMT | 0.92 | 0.002 | −0.001 | 0.035 | 18.39 |
STDM.4 | MAP – APP | Error- DMT | 0.90 | 0.003 | 0.000 | 0.039 | 20.18 |
STDM.5 | MDP – MAP | Error- APP | 0.89 | 0.003 | −0.003 | 0.041 | 21.50 |
STDM.6 | MAP – MLP | Error- APP | 0.88 | 0.003 | −0.003 | 0.042 | 21.77 |
STDM.7 | MDP – MLP | Error- APP | 0.88 | 0.003 | −0.001 | 0.044 | 22.23 |
STDM.8 | DTR – MLP | Error- APP | 0.87 | 0.004 | −0.004 | 0.043 | 22.71 |
STDM.9 | MDP – DMT | Error- MAP | 0.84 | 0.004 | 0.001 | 0.050 | 25.19 |
STDM.10 | MDP – DTR | Error- APP | 0.84 | 0.005 | −0.002 | 0.049 | 26.04 |
STDM.11 | DMT -MLP | Error- APP | 0.83 | 0.005 | −0.004 | 0.052 | 26.24 |
Model no . | First model inputs . | Second model inputs . | R2 . | MSE . | BIAS . | MAE . | SI% . |
---|---|---|---|---|---|---|---|
STDM.1 | MAP-DTR | Error-DMT | 0.93 | 0.002 | 0.000 | 0.033 | 16.62 |
STDM.2 | MDP-APP | Error- DMT | 0.92 | 0.002 | 0.000 | 0.034 | 18.01 |
STDM.3 | APP – DTR | Error- DMT | 0.92 | 0.002 | −0.001 | 0.035 | 18.39 |
STDM.4 | MAP – APP | Error- DMT | 0.90 | 0.003 | 0.000 | 0.039 | 20.18 |
STDM.5 | MDP – MAP | Error- APP | 0.89 | 0.003 | −0.003 | 0.041 | 21.50 |
STDM.6 | MAP – MLP | Error- APP | 0.88 | 0.003 | −0.003 | 0.042 | 21.77 |
STDM.7 | MDP – MLP | Error- APP | 0.88 | 0.003 | −0.001 | 0.044 | 22.23 |
STDM.8 | DTR – MLP | Error- APP | 0.87 | 0.004 | −0.004 | 0.043 | 22.71 |
STDM.9 | MDP – DMT | Error- MAP | 0.84 | 0.004 | 0.001 | 0.050 | 25.19 |
STDM.10 | MDP – DTR | Error- APP | 0.84 | 0.005 | −0.002 | 0.049 | 26.04 |
STDM.11 | DMT -MLP | Error- APP | 0.83 | 0.005 | −0.004 | 0.052 | 26.24 |
CONCLUSION
In this study, the ANN and STDM approaches have been used to predict the DFR in the water distribution system in Gebze. The higher prediction performances have been obtained through the STDM, a structure that enables to visualization of the model results by making inferences. The best STDM model has been obtained through the MAP-DTR-Error-DMT combination with four inputs. It is seen that the MDP, MAP, APP, and MLP variables used as model inputs are affected on the DFR in keeping with the previous studies.
In this study, prediction models have been developed by using some meteorological variables for the first time, for instance, DMT and DTR. It is understood that each of these variables is effective on the DFR. A significant DFR increase has been observed through the STDM method in increasing temperatures. Similar increases are also available in MAP and APP variables as such in DFRs. Increases in temperature, age, and pressure values cause serious failures, especially in service connections, so undesired water losses occur. Water utilities can reduce water losses by applying various activities such as district metered area design, pressure management, and active leakage control method.
In the FR predictions, preferring models allowing the interpretation of outputs such as STDM, contributes to the network management strategies to be developed by water utilities, experts, and researchers. Thanks to the prediction model results, water utilities can prioritize their future investments and plans, and thus water losses can be reduced systematically. FR predictions can be made in further studies by taking some variables into account, for instance, depth of excavation, hydraulic radius, speed, pipe thickness, and area of failure point.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.