Skip to Main Content

A multiple linear regression model was developed with the total pipeline length as y, and the remaining parameters of interest from Table 2 as the candidate x-variables. The procedure was repeated for each land use. Before the preliminary model could be built, it had to be verified that multi-collinearity did not exist between the independent variables. Table 5 presents a correlation matrix for the candidate variables and indicates that the independent variables peak flow and area size are highly correlated with a correlation coefficient of 0.79. Multi-collinearity was addressed by retaining only the variable with the highest individual correlation to the total pipeline length, namely peak flow, which reduced the number of candidate independent variables to five. A preliminary regression model was then built, which would be refined to arrive at the final model. Before interpreting the performance results of any model, it was verified that the OLS assumptions were met. Linearity was indicated by the absence of curvature in partial regression plots (De Veaux et al. 2011) and scatter plots between the dependent and independent variables; plots of the residuals versus each model variable also needed to display a random distribution. Independence was indicated by a random distribution when plotting the residuals versus the order of observation. Normality was indicated by the presence of a normal distribution in a histogram of the residuals, as well as the presence of a reasonably straight line on a normal probability plot. Homoscedasticity is generally indicated by the absence of any widening or narrowing in plots of the residuals versus each model variable. This final verification revealed that heteroscedasticity was present in the model, since the size of the residuals increased for datapoints with higher total pipeline length. The heteroscedasticity was addressed by introducing weighted least squares (WLS) regression, a variation of OLS, in which the larger residuals are down-weighted to reduce their disproportional impact on the regression coefficients.

Table 5

Correlation matrix for candidate variables

Total pipeline lengthPeak flowAreaReservoir distance from centroidShape ratioReservoir height above meanTerrain
Total pipeline length 1.00       
Peak flow 0.91 1.00      
Area 0.90 0.79 1.00     
Reservoir distance from centroid 0.37 0.33 0.36 1.00    
Shape ratio −0.07 −0.07 −0.12 0.00 1.00   
Average static system pressure 0.14 0.11 0.19 0.58 −0.12 1.00  
Terrain 0.27 0.22 0.39 0.16 −0.06 0.43 1.00 
Total pipeline lengthPeak flowAreaReservoir distance from centroidShape ratioReservoir height above meanTerrain
Total pipeline length 1.00       
Peak flow 0.91 1.00      
Area 0.90 0.79 1.00     
Reservoir distance from centroid 0.37 0.33 0.36 1.00    
Shape ratio −0.07 −0.07 −0.12 0.00 1.00   
Average static system pressure 0.14 0.11 0.19 0.58 −0.12 1.00  
Terrain 0.27 0.22 0.39 0.16 −0.06 0.43 1.00 

Close Modal

or Create an Account

Close Modal
Close Modal