Leakage detection in water distribution networks using hybrid feedforward artificial neural networks

Water leakage control in water distribution networks (WDNs) is one of the main challenges of water utilities. The present study proposes a new method to locate a leakage in WDNs using feedforward artificial neural networks (ANNs). For this purpose, two ANNs training cases are considered. For case 1, the ANNs are trained by average daily water demand, including small to large hypothetical leakages. In case 2, the ANNs are trained by hourly water demand and variable hourly nodal leakages over 24 hours. The training parameters are determined by EPANET2.0 hydraulic simulation software using MATLAB programming language. In both cases, first, ANNs are trained using flow rates of total pipes number. Then, sensitivity analysis is performed by hybrid ANNs for the flow rates of pipes number less than the number of the total pipes. The results of proposed hybrid ANNs indicate that if at least the flow rates of 10% of the total pipes are known (using flowmeters), then the leakage locations in both cases can be determined. Despite the complexity of case 2, because of the variations of demand and leakage over the 24-hour period, the proposed method could detect the leakage location with high accuracy.


GRAPHICAL ABSTRACT INTRODUCTION
Water resource is one of the most important necessities for human survival and development. Water distribution networks (WDNs), as critical infrastructure, play a vital role in safely delivering drinking water to customers.
However, many networks have large water losses, burst pipes, and leakage due to aging, inadequate construction quality, or lack of maintenance, which further intensifies the water scarcity (Fazelabdolabadi & Golestan ; Zhao et al. ). The network's overall efficiency is severely affected by the lack of proper maintenance. One of the prime reasons is leakage and illegal connections (Shahzad et al. ). WDNs suffer from leakages causing economic as well as social costs. There is a need for a platform to manage WDNs more efficiently by detecting, localizing, and controlling the leakages even before or as soon as they occur (Gupta & Kulat ). The ANN-based mentioned methods cannot detect small leakages because they require extensive ANN training data. Therefore, this matter leads to increases in training time and the complexity of the training process. Also, insufficient and inaccurate data in the ANNs training stage led to incorrect results (Moasheri & Jalili Ghazizadeh ).
This study proposes a new methodology to locate an hourly leakage in WDNs over 24 hours using feedforward ANNs. The main novelty is that the feedforward ANNs are trained based on the hourly variations of leakage and water demand over 24 hours. These hourly variations intensify the training process's complexity and increase the training time of the feedforward ANNs. In this study, the proposed ANNs have high efficiency and can detect small and large leakages. The proposed ANNs are trained by using the pipes' flow. The detection of the leakage location with a minimum number of pipes is essential to the proposed method's practicality in WDNs. Therefore, in this study, new hybrid ANNs are introduced using sensitivity analysis to find the pipe's optimal number.
The proposed ANNs are trained in two cases. For case 1, the ANNs are trained by average daily water demand, including small to large hypothetical leakages. In case 2, the leakage variations are considered over the 24-hour period. The water demand variations cause pressure fluctuations that lead to leakage variations. Thus, it is required to consider the relationship between pressure and leakage. Therefore, in case 2, the ANNs are trained by variable hourly water demand and nodal leakage over 24 hours. Then, in both cases, a sensitivity analysis is performed by hybrid ANNs to find the optimal pipe number. These hybrid ANNs are trained by flow rates of different percentages of pipes number. Also, the training parameters are determined by EPANET2.0 hydraulic simulation software and MATLAB programming language. The rest of this study is organized as follows: in the following section, the methodology, including the ANN definition and development, is presented. In the next section, the structure of the selected network is introduced. Next, the results and discussion presents two cases of the ANNs training process to detect the leakage location. In this section, the relationship between pressure and leakage in WDNs is investigated. Finally, the conclusion section is presented. In the present study, a feedforward ANN is employed to construct a relationship between the input and output variables efficiently. Figure 1 illustrates the general structure of a feedforward ANN including an input layer, a hidden layer, and an output layer. The number of input variables determines the number of neurons in the input layer. The hidden layer determines the neural network's main structure and will have the most significant effect on the network's training method and its overall performance. Therefore, the number of neurons in the hidden layer is determined by minimizing a predefined error function over the training data by trial and error.
As shown in Figure 1, each neuron in the input layer accepts input, and each neuron s in the hidden layer or output layer accepts a set of inputs x rs from the neurons of the previous layer and produces an output y s as follows (Zhou et al. ): where m represents the number of inputs, w rs shows the connection weight of the rth input, θ s denotes the neuron's threshold, and φ represents the activation function typically selected as the hyperbolic tangent function or sigmoid function.
After the structure of an ANN is fixed, in order to minimize the deviation of model outputs from actual outputs over training samples, the parameters, including all connection weights w rs and all neuron biases θ s , are tuned. This deviation can be determined based on the root mean squared error (RMSE) criterion: where M is the number of samples, (x r , y r ) is the input-output pair of the rth sample, and o r is the actual output of the ANN on the input x r . ANN parameter optimization is usually con- In ANNs, RMSE is used as the performance function to determine the weights and biases. The RMSE criterion sometimes exhibits an acceptable value with proper convergence but does not satisfy the problem's requirements.
Indeed, the RMSE demonstrates the average behavior of the network, while the trained network may contain high errors for special points. If a good performance was also expected from the trained ANN for the worst data (with larger error), the RMSE may not be an appropriate residual metric. Therefore, to solve this problem, the maximum relative leakage error (MRLE) metric is defined in this study besides RMSE, which will be discussed later.

ANN development (training and validation)
As mentioned, the present study employs an ANN to detect the leakage location. At first, the ANN should be trained using a set of input and pre-specified output data. Then, the training data must be provided.

STRUCTURE OF THE SELECTED NETWORK
In this study, the hypothetical rectangular network proposed by Poulakis et al. () is utilized (Figure 2). This network consists of 50 pipes, 30 junction nodes, 20 loops, and one reservoir. An elevated reservoir supplies the water by gravity. The length of the vertical and horizontal pipes is 2,000 and 1,000 m, respectively, and the reservoir elevation is 52 m. In Figure 2, the range of pipe diameters is within 300-600 mm. The water demands are assumed at each node of the network through the water delivery system.

RESULTS AND DISCUSSION
Case 1: ANN for leakage detection by average daily water demand and nodal leakage According to the previous sections, the location of the leakages in WDN is determined using the ANN. The proposed ANN can predict the location of a single leakage in a hypothetical WDN taken from Poulakis et al. ().
In this case, the hypothetical leakage with average daily water demand at each node is defined as the output. The pipes' flow is defined as the input to train the ANN. Figure 3 illustrates the training output matrix named leakage matrix.  Figure 4. After training the ANN, the leakage location can be determined using pipes flow.
As mentioned above under 'Artificial neural network (ANN) definition', besides the RMSE index, the relative leakage error (RLE) is also defined as below to investigate the ANN's behavior: where RLE is relative leakage error, CLD ANN is calculated leakage and demand by ANN, HLD is hypothetical leakage and demand, and i is number of junctions, i ¼ 1, 2, 3, . . . , 30.
The number of neurons in the hidden layer is an essential parameter for the ANN training. The optimal number of neurons in the hidden layer can be found based on the maximum of the MRLE (MaxMRLE). For this purpose, the steps explained in Figure 4 are employed for different neurons.
This result is illustrated in Figure 5. As shown in Figure 5, the MaxMRLE value declines considerably for the neurons larger than 30 in training data (1650 cases). To verify this claim, the ANN must be implemented on validation data.
The nodal leakages apply in the range of 0.005% up to 1.2% of total water demand. The increased value of 0.21 L/s is used to construct the next value (0.03 L/s up to 7.05 L/s). By this procedure, the number of 1,050 cases is produced for validation data. Therefore, out of all data (2,700 samples) considered in this study, 60% is assigned to training data, and 40% is allocated to validation data.
As mentioned, Figure 5 demonstrates the results of determining the optimal number of neurons. According to the above description and the results illustrated in Figure 5, the optimal number of neurons in the hidden layer is obtained as 30. Since the values of MaxMRLE for neuron numbers above 30 are very small, a magnification is performed in Figure 5 for the sake of clarification.
The MRLE values for different leakages from 0.06 to 6.0 L/s by considering 30 neurons at the hidden layer are  Table 1. For better clarity, the x-axis in Figure 6 is divided into five categories based on the range of leakages illustrated in Table 1.

Relationship between the pressure and leakage in WDNs
The dependency between the leakage and pressure is a well- where L 1 and L 0 are leakages at the pressures P 1 and P 0 , respectively. P is the pressure head of the zone, the indexes 0 and 1 are different pressure conditions, while N represents the pressure exponent changes from 0.5 to 2.5. Thornton & Lambert () studied the coefficient of N for different projects and concluded that N could vary from 0.5 to 1 for small leakages in PVC pipes.
The present study employs water demand coefficients based on the water demand variations over 24 hours (the maximum hourly coefficient) using Standard No. 117-3 (IRIVPSPS ). Based on these coefficients and the average daily water demand of 20 L/s, the hourly water demand can be calculated. The results are illustrated in Table 3.
The nodal pressures can be calculated using hourly water demand from Table 3 and EPANET2.0 software. Then, using the nodal pressures, the variable hourly nodal leakage can be obtained by Equation (5). The variable hourly nodal leakage is calculated by the following assumptions: 1. The first pressure investigation performs at night (3-4 am) when the nodal leakage is maximum.  2. The maximum nodal leakage is assumed to be 1% of total water demand (L max ¼ 6 L/s). This leakage is added to hourly water demand. The maximum nodal pressure (P max ) is calculated using EPANET2.0 software. Using L max and P max as the maximum nodal leakage and pressure and Equation (5) with N ¼ 1, Equation (6) becomes: where t shows the time measured in hours over the day (t ¼ 1, 2, 3, . . . , 24), L t and P t are the hourly nodal leakage and nodal pressure at time t, respectively.
For estimating the variable hourly nodal leakage at time t (L t ) over 24 hours, it is required to calculate the hourly nodal pressure at this time (P t ). For this purpose, first, a hypothetical leakage (L 0 t ) is assumed and is added to hourly water demand as provided in Table 3. Then, using EPANET2.0 software, the hourly nodal pressure is obtained (P t ). Using this pressure (P t ) and Equation (6), variable hourly nodal leakage at the time t (L t ) is then calculated.
This process repeats until the L t and L 0 t become equal. This process repeats to obtain the variable hourly nodal leakage over 24 hours at all nodes at the specific time t.
All this calculation procedure was coded in the MATLAB programming language. Figure 9 depicts the 24-hour variations of the variable hourly nodal leakage. Each of the 30 diagrams exhibits the variable hourly nodal leakage variations at node 1 to node 30 over 24 hours.
Case 2: ANN for leakage detection by hourly water demand and variable nodal leakage over 24 hours The proposed ANN in Case 1 above is trained based on the average daily nodal leakage and water demand. According to Table 3 and Figure 9, the water demands and leakages at all nodes change hourly over 24 hours. Thus, in this case, the ANN should be trained by considering these variations over 24 hours. The first step of ANN training is to determine the training inputs and outputs. Table 3 (hourly water demand) and Figure 9 (variable hourly nodal leakage) are used to generate the leakage matrix as the training output. For this purpose, at the first hour, hourly water demand from Table 3 and variable hourly nodal leakage from Figure 9 are included in each node (1-30 nodes), and finally, a (30 × 30) matrix is generated. This process repeats for 24 hours. Then, 24 of the generated (30 × 30) matrices are put next to each other to create a (30 × 720) leakage matrix (like Figure 3). Leakage matrix is utilized as the output data for ANN training. Using leakage matrix and EPANET2.0 software, pipes flow matrix (50 × 720) is generated as input data for ANN training. After creating the input and output data of the ANN training, the optimal number of neurons in the hidden layer should be determined. This task is illustrated in Figure 10.
It is required to generate validation data to find the optimal number of neurons in the hidden layer. For this purpose, the following assumptions are made: 1. Hourly water demand follows a normal distribution over 24 hours.
2. Hourly water demand is considered as the mean normal distribution.
3. Variation coefficient is equal to be 10%.
The hourly water demand for validation data is generated using mentioned assumptions and Table 3 data. At each time, 10% of hourly water demand must be added to the hourly water demand value. Indeed, all the hourly water demand in Table 3 should be multiplied by coefficient 1.1. This process creates the hourly water demand of validation data. Using Equation (6) and the mentioned procedure in the previous section, the variable hourly nodal leakages in all nodes are produced for validation data (the maximum nodal leakage at each node is assumed 1.2% of total water demand for validation data). Then, the pipes flow for validation data are calculated using the variable hourly nodal leakage and EPANET2.0 software.
Finally, inputs and outputs of validation data serve to trained ANN to determine the hidden layer's optimal neurons. Figure 10  According to Figure 10 and the above-mentioned explanation, an ANN with 30 neurons can be proposed to predict the leakage over 24 hours. MRLE value can also be calculated, which is demonstrated in Figure 11. Figure 11 shows that the MRLE value is below 0.1% over 24 hours. In this case, this value is slightly larger than Case 1 with average daily water demand and nodal leakage value (below 0.04%). As can be seen, the proposed ANN with as many inputs as the flow rates of total pipes number predicts the leakage location with acceptable accuracy (Asgari & Maghrebi ). Thus, to employ the trained ANN, pipes' flow rates should be measured by some flow meters. Then, using the proposed ANN, RLE values are calculated at each node. Finally, the node with the maximum RLE represents the leakage location over 24 hours.
In Figure 11, the x and y axes represent the hours and

CONCLUSIONS
The present study proposes a new methodology that considers variable leakage over 24 hours to find the leakage location in WDNs using feedforward artificial neural networks (ANNs). The proposed ANNs were studied for two cases with an optimal neuron number of 30 in the hidden layers. The relation between leakage and pressure was implemented over 24 hours. Based on these results, the variable hourly nodal leakage over 24 hours at all nodes was obtained. As a result, the accuracy of the trained ANNs by flow rates of total pipes in Case 1 and Case 2 was obtained below 0.04 and 0.1%, respectively. However, for the applicability of the proposed method in real WDNs, the sensitivity analysis was performed by hybrid ANNs for the fewer pipe numbers with known flow rates. The results showed that if at least flow rates of 10% of the total pipe numbers are known by flowmeters, the leakage location can be found accurately. Although the leakage and water