Abstract
Planners are often faced with the challenge to provide crude estimates of water distribution system (WDS) infrastructure capacity and associated costs in the early phases of greenfield developments. This study investigated the relationship between the physical and hydraulic characteristics of a WDS and the corresponding serviced area. Five physical parameters (a) and two hydraulic parameters (b) describing the serviced area were identified for analysis, namely (a) total pipeline length, land area, area shape factor, terrain index, reservoir distance from area centroid and (b) peak flow rate and average static system pressure. Multiple linear regression was performed on the data. A model was compiled linking the total pipeline length of a WDS to the peak flow rate. The model is applicable to predominantly residential service zones larger than 80 hectares with a peak hour flow rate of <450 L/s. The model enables the prediction of the potable water distribution pipe infrastructure required for future development areas in the absence of basic planning information, such as cadastral layouts. Alternatively, the model can estimate the potential maximum peak flow rate that can be supplied, if the total pipeline length is known.
HIGHLIGHTS
A large dataset of water distribution systems from South Africa was analysed.
Seven parameters were evaluated using multiple linear regression.
A model was compiled linking the total pipeline length of a water distribution system to the peak flow rate.
Total pipeline length was segregated into diameter distributions.
Prediction of the water distribution pipe infrastructure required for future development areas.
Graphical Abstract
INTRODUCTION
Background
A water distribution system (WDS) consists of a network of pressurised pipes of varying diameters. These networks encompass multiple parameters such as pipeline diameter and associated lengths, internal roughness coefficients, available supply pressures, consumers' demand and the related peak hour flow rate. In order to manage all these variables, engineers require sophisticated computer programmes for hydraulic modelling. Master plans for WDSs are typically compiled by experienced modellers relying on accurate information. Engineers and city planners often have the need to crudely estimate the pipeline lengths and associated pipeline diameters in the early planning phases of future development areas – even before a cadastral layout is available.
Several tools have been proposed to predict the required water and sewer pipeline infrastructure in developed areas. Dames & Moore (1978) conducted a national survey of 455 sewer construction projects in the United States of America. One of the fundamental outcomes of the work by Dames & Moore (1978) was a table linking sewer pipe lengths and diameters to population size groups.
Sitzenfrei et al. (2010a) developed a virtual infrastructure benchmarking tool (VIBe), which algorithmically generates complex virtual case studies at a city scale for urban water systems, including sewer systems and water distribution systems. The parameters of the virtual case studies are stochastically varied in ranges extracted from realworld case studies and literature to cover a broad range of possible system properties (Sitzenfrei et al. 2010a). A module was added to this algorithm allowing sewer infrastructure to be developed for each virtual urban case study (Urich et al. 2010). VIBe is therefore used to generate the input files for the sewer network building software.
The VIBe algorithm was further enhanced by adding functionality to dynamically model timerelated impacts on the urban structure for example changing land use and population size. This enhanced version of VIBe was named ‘Dynamic Virtual Infrastructure Benchmarking’ or DynaVIBe (Sitzenfrei et al. 2010b). DynaVIBe could be a useful model to generate dynamic virtual sewer case studies for a selected modelling scenario. The model input parameters can be varied to perform a sensitivity analysis and generate a realistic envelope of the infrastructure requirements for an area and be used to model various future scenarios.
Venkatesh & Brattebø (2011) performed analyses on a dataset of pipeline lengths of 30 Norwegian municipalities, in order to illustrate that the relationship between pipeline length and the number of pipes by length class can be defined by the power law.
Kobayashi et al. (2011) developed a model to predict the distribution of water pipeline lengths based on road layouts in Japan. Kobayashi's model was intended to improve the accuracy of earthquake damage assessment.
Maurer et al. (2012) developed a generic model of length, diameter distribution and replacement costs for the sewer network in a settlement with a fixed area. The model catered for three classes of pipes, namely private connection pipes, secondary sewer pipelines and the sewer trunk main that connects the settlement to the rest of the network.
Pauliuk et al. (2014) provided a calibrated estimate of the total length, total mass of pipelines and the diameter distribution for sewer networks in cases where only area and population density of the settlement are known. A link was established between two planning parameters (urban density and settlement size) and the demand for pipes and materials in water and sewer network infrastructure.
Balaji et al. (2015) used data from completed sewerage schemes in 31 towns in India to perform regression analyses. Balaji et al. (2015) determined empirical equations relating the total installation cost of sewer networks (defined as material, equipment and labour costs for excavation, laying and jointing) to the serviced population size.
The South African Department of Water and Sanitation (DWS) developed a cost benchmark for water services (DWS 2016). The outcome was a document which provides typical unit costs of water services projects and individual infrastructure components. The relevant costs were derived from the DWS rural water supply projects completed after 1994 and from asbuilt project costs from various consulting engineering firms in South Africa.
Several researchers attempted to model sewer network layouts and optimise sewer network design. Turan et al. (2019) developed a graph theorybased methodology for sewer system optimisation. The proposed method generates a viable sewer network layout that contains all sewer links and satisfies the requirements of a sewer system by using graph theory, without any additional strategies required. Hesarkazzazi et al. (2022) proposed a graph theorybased framework for sewer system layout. In addition, a generic scheme for decentralised layouts in both steep and flat terrains was suggested. Duque et al. (2022) proposed a spatial algorithm for generating simplified sewer networks which represent key characteristics of real systems, using basic topographic, demographic and urban characteristics. Three different pipe dimensioning approaches were compared and a balance between detail and computational efficiency was found. Moeini & Afshar (2018) used the ant colony optimisation algorithm in combination with nonlinear programming techniques for the optimal design of sewer networks. The ant colony optimisation algorithm was used to determine pipeline diameters, while nonlinear programming was used to determine the pipeline slopes.
Winter et al. (2022) used multiple linear regression to estimate the total sewer pipeline length for a service zone using basic service zone characteristics. Pipeline diameter distributions were developed for disaggregating the total pipeline length into lengths per diameter. In addition, the number of manholes required along a length of the pipeline for different types of service zones was quantified.
Estimating the required pipeline infrastructure of water and sewer networks based on limited information has been attempted before. Tools for the automatic generation of water and sewer network infrastructure have the potential for estimating infrastructure and for the highlevel costing thereof. The benefit of direct costing methods is that minimal information is required to develop cost estimates. Tools that enable customisation for specific conditions such as the DWS cost benchmark can provide simple yet relatively robust earlystage cost estimates. However, being able to predict the required water and sewer infrastructure components before obtaining an answer that is only related to cost holds obvious benefits. This research proposes a new tool requiring limited available information, which could be applied to future development areas for estimating the likely required WDS pipeline infrastructure. For estimating the number of valves and associated infrastructure required, Liu & Kang (2021) researched typical approaches towards valve spacing and proposed an optimised approach allowing a reduction in the number of valves required without decreasing network resilience. However, applying Liu & Kang's approach to an outcome from this study, which will not necessarily include a network layout, may prove to be challenging.
Various other optimisation algorithms have been developed that are beyond the scope of this research, for example, the impact of problem formulations, pipe selection methods and optimisation algorithms on the rehabilitation of existing water distribution systems (Wang et al. 2020). Transient flow modelling, which has been reviewed in significant detail by Duan et al. (2020), also falls beyond the scope of this research. A summary of the tools developed through earlier research that are related to this research is provided in Table 1.
Description .  Reference . 

Table linking sewer pipe lengths and diameters to population size  Dames & Moore (1978) 
Virtual infrastructure benchmarking tool to generate complex case studies for urban water systems  Sitzenfrei et al. (2010a, 2010b) 
Case study on 30 cities to show that the relationship between pipeline length and the number of pipes by length class can be defined by the power law  Venkatesh & Brattebø (2011) 
A model to predict the distribution of water pipeline lengths based on road layouts  Kobayashi et al. (2011) 
A generic model of length, diameter distribution and replacement costs for the sewer network in a settlement with fixed area  Maurer et al. (2012) 
A model estimating total length, total pipelines mass and diameter distribution for sewer networks where only area and population density are known  Pauliuk et al. (2014) 
Empirical equations developed through regression analyses linking the total installation cost of sewer networks to the population size  Balaji et al. (2015) 
A cost benchmark for water services which provides typical unit costs of water services projects and individual infrastructure components  DWS (2016) 
A graph theorybased methodology for sewer system optimisation, that generates a viable sewer network layout  Turan et al. (2019) 
A graph theorybased framework for sewer system layout and a generic scheme for decentralised layouts in both steep and flat terrains  Hesarkazzazi et al. (2022) 
A spatial algorithm for generating simplified sewer networks which represent key characteristics of real systems, using basic topographic, demographic and urban characteristics  Duque et al. (2022) 
A multiple linear regression tool to estimate the total sewer pipeline length for a service zone using basic service zone characteristics  Winter et al. (2022) 
Description .  Reference . 

Table linking sewer pipe lengths and diameters to population size  Dames & Moore (1978) 
Virtual infrastructure benchmarking tool to generate complex case studies for urban water systems  Sitzenfrei et al. (2010a, 2010b) 
Case study on 30 cities to show that the relationship between pipeline length and the number of pipes by length class can be defined by the power law  Venkatesh & Brattebø (2011) 
A model to predict the distribution of water pipeline lengths based on road layouts  Kobayashi et al. (2011) 
A generic model of length, diameter distribution and replacement costs for the sewer network in a settlement with fixed area  Maurer et al. (2012) 
A model estimating total length, total pipelines mass and diameter distribution for sewer networks where only area and population density are known  Pauliuk et al. (2014) 
Empirical equations developed through regression analyses linking the total installation cost of sewer networks to the population size  Balaji et al. (2015) 
A cost benchmark for water services which provides typical unit costs of water services projects and individual infrastructure components  DWS (2016) 
A graph theorybased methodology for sewer system optimisation, that generates a viable sewer network layout  Turan et al. (2019) 
A graph theorybased framework for sewer system layout and a generic scheme for decentralised layouts in both steep and flat terrains  Hesarkazzazi et al. (2022) 
A spatial algorithm for generating simplified sewer networks which represent key characteristics of real systems, using basic topographic, demographic and urban characteristics  Duque et al. (2022) 
A multiple linear regression tool to estimate the total sewer pipeline length for a service zone using basic service zone characteristics  Winter et al. (2022) 
Scope and limitations
The focus of this study was on the potable water supply infrastructure required for future development areas, based on statistical analyses of certain physical and hydraulic parameters of existing WDSs. The study was limited to pipeline infrastructure and the occurrence of other structures (e.g. reservoirs, water towers, pumps and control valves) were not included.
Objectives
The objectives for developing the tool were to:
Identify physical and hydraulic pipe network parameters that may influence the total pipeline length and diameter distribution of a water supply system, in terms of the known characteristics of the development area itself, which can be quantified at the early stages of a future development area project.
Obtain a suitable sample space of existing WDSs for which all said parameters are known.
Generate a regression model expressing the total pipeline length as a function of the other parameters.
Generate the pipeline diameter distribution for different types of networks, for disaggregating the total pipeline length into lengths per diameter category.
Verify and validate the model.
METHODS
This study involved applied research, where empirical methods were employed in order to solve the practical research problem. The study relied on quantitative data collection. Data were extracted from existing, calibrated hydraulics models of various WDSs in South Africa. The empirical evidence was subjected to statistical analysis in order to develop the model for estimating WDS pipe length as a function of basic greenfield development parameters.
The methodology encompassed several steps: (i) data collection, (ii) parameter extraction, (iii) developing and testing the model through multilinear regression techniques and finally (iv) segregation of the total pipeline length into pipe diameter categories.
Data collection
Data collection involved data source and sample network identification and extraction, followed by selecting and abstracting the parameters of interest from each network.
Data source and sample network extraction
All the WDS models used as part of this research were at the time used in parallel by professionally registered civil engineers at GLS Consulting (www.gls.co.za) to conduct water master planning for various clients across South Africa. The hydraulic models used in this study were obtained directly from collaborators at GLS Consulting (GLS). The model nodes were already populated with water demand (node outputs and codes for land use and zoning). All typical pipeline information was available for every pipe section such a length, nominal diameter and roughness coefficient. The associated landuse information allowed further subsectoring of models by predominantly industrial, commercial or residential land use, for example.
From the available model data, 170 WDS models were investigated for possible inclusion in the analyses. Of these, 141 were found to be predominantly homogenous in terms of residential land use, and these were subsequently divided into two dominant land use categories, thus obtaining 90 ‘General Residential’ and 51 ‘LowIncome Residential’ models. Separate models were developed for each dominant land use category.
Parameters of interest
Table 2 lists the relevant parameters and a comment to describe how each parameter was defined. Land use was considered by classifying each model according to the predominant land use, as type A (general residential) or type B (lowincome residential).
Parameter .  Unit .  Definition . 

Total pipeline length  km  Sum of the lengths of all individual pipes in the WDS. Total pipeline length per diameter was also recorded. 
Peak flow rate  L/s  Hydraulic models were populated with the hourly peak flow rate, which is derived from the average annual daily demand (AADD). The AADD is widely used for problems relating to research and design in South Africa and is also used in other Southern African countries, for example, in Malawi (Makwiza & Jacobs 2016). The minimum pressure during peak hourly demand is widely used when considering minimum system pressure (Ghorbanian et al. 2016). 
Land area  ha  The area of each WDS was approximated by an ellipse, the major axis d_{1} is the line joining the two furthest points, and the minor axis d_{2} is the longest possible perpendicular bisector of d_{1}. The area of the ellipse is then: . 
Area shape factor  –  Defined as the ratio of d_{1} to d_{2}, describing the elongation of the ellipsoidal area. 
Terrain index  –  The average value between the range and standard deviation indices for the WDS, as discussed below and determined using Tables 3 and 4. 
Reservoir distance from area centroid  m  Distance between coordinates of reservoir and ellipse centroid. 
Average static system pressure  m  Difference in height between reservoir full supply level and mean elevation of all WDS model nodes. 
Parameter .  Unit .  Definition . 

Total pipeline length  km  Sum of the lengths of all individual pipes in the WDS. Total pipeline length per diameter was also recorded. 
Peak flow rate  L/s  Hydraulic models were populated with the hourly peak flow rate, which is derived from the average annual daily demand (AADD). The AADD is widely used for problems relating to research and design in South Africa and is also used in other Southern African countries, for example, in Malawi (Makwiza & Jacobs 2016). The minimum pressure during peak hourly demand is widely used when considering minimum system pressure (Ghorbanian et al. 2016). 
Land area  ha  The area of each WDS was approximated by an ellipse, the major axis d_{1} is the line joining the two furthest points, and the minor axis d_{2} is the longest possible perpendicular bisector of d_{1}. The area of the ellipse is then: . 
Area shape factor  –  Defined as the ratio of d_{1} to d_{2}, describing the elongation of the ellipsoidal area. 
Terrain index  –  The average value between the range and standard deviation indices for the WDS, as discussed below and determined using Tables 3 and 4. 
Reservoir distance from area centroid  m  Distance between coordinates of reservoir and ellipse centroid. 
Average static system pressure  m  Difference in height between reservoir full supply level and mean elevation of all WDS model nodes. 
Terrain models were constructed for all 141 WDSs. The range between the highest and lowest nodal elevations would be insufficient to describe the terrain. The reason for this is that the range would not account for the number of smaller hills and terrain fluctuations inside each WDS. The range and standard deviation of the node elevations were used instead, in order to describe the terrain. By including the standard deviation, the fluctuations of the nodal elevations were accounted for. Elevation index tables, represented in Tables 3 and 4, were used to categorise each WDS. The range and standard deviation index value were the same for the majority of the WDS models. For zones where this was not the case, an average value was used for terrain classification.
Range index .  Elevation range between highest and lowest network nodes (m) . 

1  10–40 
2  41–70 
3  71–100 
4  101–130 
5  131–160 
Range index .  Elevation range between highest and lowest network nodes (m) . 

1  10–40 
2  41–70 
3  71–100 
4  101–130 
5  131–160 
Standard deviation Index .  Standard deviation of all nodal elevations (m) . 

1  0–6.0 
2  6.1–12.0 
3  12.1–18.0 
4  18.1–24.0 
5  24.1–30.0 
Standard deviation Index .  Standard deviation of all nodal elevations (m) . 

1  0–6.0 
2  6.1–12.0 
3  12.1–18.0 
4  18.1–24.0 
5  24.1–30.0 
Final dataset
The final dataset comprised 90 ‘General Residential’ and 51 ‘LowIncome Residential’ data sets, for which total pipeline length, peak flow, land area, area shape, terrain index, reservoir distance from centroid and reservoir elevation above mean terrain elevation, were known.
Regression analysis
Selection of regression model
The most common method for determining the intercept and regression coefficients is ordinary least squares (OLS) regression. An OLS model must satisfy five assumptions, namely, (a) lack of multicollinearity, meaning the independent variables should be uncorrelated to each other, (b) normality, meaning the errors or residuals should be normally distributed, (c) linearity, meaning the true relationship between the dependent and independent variables should be linear in nature, (d) homoscedasticity, meaning the residuals should be independent of the values of the dependent or independent variables and (e) independence, meaning the residuals should be unrelated to their order of observation (De Veaux et al. 2011). When building an OLS model, a preliminary model is built using all of the candidate variables, or independent variables identified as being potentially significant. Then, the pvalue for each candidate variable in the model is considered, where the pvalue represents the probability that the variable is statistically insignificant (Montgomery & Runger 2014). Any variable with a pvalue exceeding the selected value (0.05 in this study for a significance level of 95%) is removed, and the model is regenerated using the remaining significant variables to obtain the final model.
Selection of variables
A multiple linear regression model was developed with the total pipeline length as y, and the remaining parameters of interest from Table 2 as the candidate xvariables. The procedure was repeated for each land use. Before the preliminary model could be built, it had to be verified that multicollinearity did not exist between the independent variables. Table 5 presents a correlation matrix for the candidate variables and indicates that the independent variables peak flow and area size are highly correlated with a correlation coefficient of 0.79. Multicollinearity was addressed by retaining only the variable with the highest individual correlation to the total pipeline length, namely peak flow, which reduced the number of candidate independent variables to five. A preliminary regression model was then built, which would be refined to arrive at the final model. Before interpreting the performance results of any model, it was verified that the OLS assumptions were met. Linearity was indicated by the absence of curvature in partial regression plots (De Veaux et al. 2011) and scatter plots between the dependent and independent variables; plots of the residuals versus each model variable also needed to display a random distribution. Independence was indicated by a random distribution when plotting the residuals versus the order of observation. Normality was indicated by the presence of a normal distribution in a histogram of the residuals, as well as the presence of a reasonably straight line on a normal probability plot. Homoscedasticity is generally indicated by the absence of any widening or narrowing in plots of the residuals versus each model variable. This final verification revealed that heteroscedasticity was present in the model, since the size of the residuals increased for datapoints with higher total pipeline length. The heteroscedasticity was addressed by introducing weighted least squares (WLS) regression, a variation of OLS, in which the larger residuals are downweighted to reduce their disproportional impact on the regression coefficients.
.  Total pipeline length .  Peak flow .  Area .  Reservoir distance from centroid .  Shape ratio .  Reservoir height above mean .  Terrain . 

Total pipeline length  1.00  
Peak flow  0.91  1.00  
Area  0.90  0.79  1.00  
Reservoir distance from centroid  0.37  0.33  0.36  1.00  
Shape ratio  −0.07  −0.07  −0.12  0.00  1.00  
Average static system pressure  0.14  0.11  0.19  0.58  −0.12  1.00  
Terrain  0.27  0.22  0.39  0.16  −0.06  0.43  1.00 
.  Total pipeline length .  Peak flow .  Area .  Reservoir distance from centroid .  Shape ratio .  Reservoir height above mean .  Terrain . 

Total pipeline length  1.00  
Peak flow  0.91  1.00  
Area  0.90  0.79  1.00  
Reservoir distance from centroid  0.37  0.33  0.36  1.00  
Shape ratio  −0.07  −0.07  −0.12  0.00  1.00  
Average static system pressure  0.14  0.11  0.19  0.58  −0.12  1.00  
Terrain  0.27  0.22  0.39  0.16  −0.06  0.43  1.00 
Model development
For each land use category, the following process was then used to develop the final models. Scatter plots of the dependent versus each independent variable were inspected to identify and remove extremevalue points as outliers. Partial regression plots, which illustrate the relationship between the dependent and each independent variable after the effects of the other independent variables have been accounted for (De Veaux et al. 2011), were inspected. This inspection served to identify and remove any overlyinfluential points as outliers, as well as to visually assess the significance of each candidate independent variable on the independent variable. From the ‘General Residential’ and ‘LowIncome Residential’ land uses, one and six outliers were removed, respectively. From the remaining points, 20% were then randomly removed to be reserved for validity testing, leaving 80% to form the training set for model development. For each land use, a preliminary OLS model was then built, and the significant variables with p < 0.05 were identified to be used in the final models. The final models were built using both OLS and WLS with three different weighting systems, resulting in four final models for each land use. Subsequently, provided the five assumptions of OLS were satisfied, these models were compared in terms of the loglikelihood, AIC (Akaike's information criteria) and BIC (Bayesian information criteria) to determine the bestperforming model for each land use. These likelihood indicators as well as how the results are to be interpreted are presented in Table 6.
Indicator .  Interpretation . 

LogLikelihood 

Akaike's Information Criterion (AIC) 

Bayesian Information Criterion (BIC) 

Indicator .  Interpretation . 

LogLikelihood 

Akaike's Information Criterion (AIC) 

Bayesian Information Criterion (BIC) 

It is noted that these likelihoodbased indicators have no specific meaning and imply nothing about how good a single model is. Instead, these indicators can only be interpreted as relative values between models. Furthermore, the indicators are only applicable between models developed using the same sample points, and the same dependent variable. The selected bestperforming model and performance evaluation for each land use are presented in the Results section.
Diameter distribution analysis
Analyses of the pipeline diameters and associated lengths from the 141 WDSs were used to determine typical pipeline diameter distributions. Firstly, the WDSs were classified into different area size and topography categories, as defined in Table 7. Apart from the categorisation as per Table 7, the WDSs were also classified as General Residential or LowIncome Residential. It is noted that some overlap existed between the area size and the area topography. For example, an area could be classified as both small and hilly. Moreover, a flat area could be classified as either General Residential or LowIncome Residential. The diameter distributions were obtained by determining the average diameter distribution by overall pipeline length within categories of WDSs with similar characteristics. The final distributions are presented in the Results section.
Development area category .  Definition . 

Small areas  Area <2,000 ha 
Medium areas  2,000 ha < Area < 4,000 ha 
Large areas  Area > 4,000 ha 
Flat areas  Terrain index ≤ 2 
Partially hilly areas  2 < Terrain index < 4 
Hilly areas  Terrain index ≥ 4 
Development area category .  Definition . 

Small areas  Area <2,000 ha 
Medium areas  2,000 ha < Area < 4,000 ha 
Large areas  Area > 4,000 ha 
Flat areas  Terrain index ≤ 2 
Partially hilly areas  2 < Terrain index < 4 
Hilly areas  Terrain index ≥ 4 
RESULTS AND DISCUSSION
The results were generated considering 141 WDSs of at least 80 hectares in area, with a peak flow rate of less than 450 L/s, therefore can only be considered applicable to WDSs within this range.
Total pipeline length models
Symbol .  Variable .  Unit .  Definition . 

Total pipeline length  km  Table 1  
Peak flow rate  L/s  Table 1 
Symbol .  Variable .  Unit .  Definition . 

Total pipeline length  km  Table 1  
Peak flow rate  L/s  Table 1 
In terms of a physical interpretation, the models indicate the expected outcome that total pipeline length increases with increasing peak flow rate. The rate of increase is similar for both land use categories, in the order of 0.3 km increase in pipeline length per 1 L/s increase in peak flow rate. The ‘LowIncome Residential’ model has a higher intercept, indicating a denser pipeline network.
Land use category .  R^{2} .  MAPE (%) .  

Training data .  Test data .  Training data .  Test data .  
General residential  0.85  0.82  31.9  32.6 
Lowincome residential  0.74  0.98  40.7  35.3 
Land use category .  R^{2} .  MAPE (%) .  

Training data .  Test data .  Training data .  Test data .  
General residential  0.85  0.82  31.9  32.6 
Lowincome residential  0.74  0.98  40.7  35.3 
Pipeline diameter distribution
Application potential
The model is particularly useful during the early stages of a greenfield development when a cadastral layout may not be available yet. The average water demand of the proposed greenfield development could be estimated as soon as the population served, the number of housing units to be developed or the development footprint area is approximately known. The average water demand could be determined using typical daily demand per person, per housing unit or per land area unit. The average water demand thus determined could be multiplied by a peak hour factor associated with this type of development in order to estimate the peak hour flow rate.
The model enables the prediction of the total length of WDS pipes to service a future development area as a function of the peak flow rate. Once the total pipeline length is determined, the diameter distribution charts can be used to disaggregate the total pipeline length into estimated length per diameter. With the approximate total length of pipework and associated length per diameter known, a budget estimate is enabled by the utility for the future construction of planned assets.
Loubser et al. (2021) highlighted the widespread occurrence of intermittent water supply (IWS) in South Africa. One of the causes of IWS is insufficient WDS pipes, in view of the population served by the particular WDS (Maake & Holtzhausen 2015). The model developed during this research can potentially be used to analyse systems subjected to IWS. The model could be employed, for example, to compare the expected WDS pipe length (as function of the estimated total peak hour flow rate) to the actual WDS pipe length of the system subjected to IWS. Systems subjected to IWS could be expected to have insufficient pipe length when compared to the model results or display diameter distributions that vary drastically from the model outcomes (for example, when relatively higher percentages of smaller diameter pipelines are present). Therefore, the model would allow a user to ascertain whether an existing water supply network has potentially been stretched beyond its design capacity. A similar result can be achieved via detailed hydraulic modelling of the WDS, yet this tool provides an indication of capacity problems with only a few inputs – a task that is made possible with limited resources and subject to stringent budget and time constraints.
CONCLUSIONS
A model enabling the prediction of the water supply pipeline infrastructure required for future development areas was successfully developed. It was found that peak hour flow rate dominated the result in terms of total WDS pipe length. With the peak hour demand of an existing or future residential development area estimated or known, the model can be applied to crudely estimate the total WDS pipe length required. Subsequently, the diameter distribution charts can be used to disaggregate the total pipeline length into estimated length per diameter.
The model could find application in estimating the water supply pipeline infrastructure required for greenfield developments at an early stage. Moreover, the model can be used to estimate the maximum capacity of existing water networks, in order to ascertain whether the WDS can sustain certain maximum peak hour demands – failing which would require infrastructure upgrades, or else could result in low pressures and ultimately IWS.
Multiple linear regression was employed to develop a model based on South African residential water supply network data. Thus, care should be exercised when using the model outside the South African context and on nonresidential networks. The model is applicable to predominantly residential service zones with a footprint ≥80 hectares and a peak hour flow rate of ≤450 L/s.
Future research should focus on modelling smaller areas and water distribution zones, which may require unique models associated with area size categories <80 hectares. This would enable wider practical application of the model in smaller developments. In addition, given the good correlation between development area size and total pipeline length, a second model selecting area size as the dominant independent parameter could also be developed. In order to generalise the model for a wider geographic application, the model could be extended to include large datasets of water distribution systems from other parts of the world.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.