## Abstract

With a water demand of 370 MLD, Kathmandu Valley is currently facing a water shortage of 260 MLD. The Melamchi Water Supply Project is an interbasin project aimed at diverting 510 MLD of water in three phases (170 MLD in each phase). Phase I of the project was expected to complete by 2018. Water demand forecasting is the first and important activity in managing water supply. Using the socio-economic factors of number of connections, water tariff and ratio of population to number of university students and climatic factor of annual rainfall, ANN was used to predict the water demand of Kathmandu Valley until the year 2040. The analysis suggests that, even after the completion of Phase I of MWSP, the water scarcity in the valley will be 160 MLD in 2020. Therefore, Phase II of MWSP should be completed within 2025 and Phase III should be completed by 2040. The result of this study aids KUKL for better management of the water system. In addition, this research can help in decision making to construct the second and third phase for MWSP, the construction date of which still has not been decided.

## HIGHLIGHTS

First paper to develop ANN to forecast water demand of Kathmandu Valley.

Water demand is predicted using socio-economic and climate change factors.

The results suggested a 15% decrease in annual average rainfall for the Kathmandu valley.

Water scarcity in the valley will be 160 MLD in 2020 even after MWSP-I.

Phase II of MWSP should be completed within 2025 and Phase III should be completed by 2040.

## INTRODUCTION

Environmental changes including climate change and present water use behavior have continually stressed both quantity and quality for present and future water. Studies by Gato *et al.* (2007), Tian *et al.* (2016), Brentan *et al.* (2017) and Bata *et al.* (2020), have concluded that water demand is greatly influenced by climatic variables. Zubaidi *et al.* (2018) pointed out that the existing water infrastructures will be under considerable stress due to extreme changes in climate. A strong emphasis is needed to plan and manage water resources to ensure that the water demand of the current and future population is met at a satisfactory level. The initial and vital component for water supply planning comprises accurate prediction of water demand for operation, expansion and management of the water supply system, policy formulation and inter-basin transfer (Babel & Shinde 2011). Therefore, the conventional method of water demand calculation can lead to mismanagement of water resources (Urich & Rauch 2014).

With a population growth rate of 4.63% and total population of 2.51 million (CBS 2011), the water demand of the Kathmandu valley is 350 million liters per day (MLD) (KUKL 2011). Kathmandu Upatyaka Khanepani Limited (KUKL), the main authority supplying water in the valley, is only providing 110 MLD with an estimated loss of 40% during supply. With the increasing water demand, the Melamchi Water Supply Project (MSWP) was initiated in 1998 and was supposed to supply water by 2008. However, due to technical and political reasons, the first phase of the project is still under construction. With the immigration from various parts of the country, especially from the rural areas during the civil war, the population and the water demand of the valley have increased rapidly during the last two decades (Shrestha *et al.* 2015) and hence proper water management of the limited water supply is a must to ensure suitable distribution of water.

KUKL uses a traditional way of demand forecasting using the consumption rate multiplied by the forecasted population using an arithmetic model. Traditional techniques like regression analysis and time series analysis are widely used, which in most cases overestimate the demand, resulting in financial resources being wasted on infrastructure larger than required (Babel & Shinde 2011; Pacchin *et al.* 2019). Accurately forecasting municipal water demand helps to minimize the risks involved in decision-making and can improve the performance of water distribution systems to attend urban water sustainability (Marlow *et al.* 2013; Walker *et al.* 2015; Zhang *et al.* 2019). The Artificial Neural Network (ANN) is a popular tool widely used for predicting various outputs including water demand. There have been numerous studies concluding the superiority of ANN over traditional techniques (Bougadis *et al.* 2005; Adamowski 2008; Marlow *et al.* 2013; Sebri 2013; Behboudian *et al.* 2014; Guo *et al.* 2018). Other studies with a successful application of ANN includes (Liu Savenije & Xu 2003; Zhang *et al.* 2008; Iliadis & Maris 2007; Firat *et al.* 2010; Babel & Shinde 2011; Mouatadid and Adamowski 2016; Zubaidi *et al.* 2018).

To construct a reliable ANN model for water demand forecasting, it is very important to select the explanatory variables (or factors) that would be used as an input for the model (Babel & Shinde 2011). The selection of model is very important, as it can help to reduce the computational period, minimize loss of information and prevent inclusion of redundant input, which may complicate the training process (Zhang *et al.* 2006). Adamowski (2008), Herrera (2010), Babel & Shinde (2011), Bata *et al.* (2020) and Zubaidi *et al.* (2020) used a number of factors to forecast the water demand of a city. These factors can be broadly categorized into (a) socio-economic factors like GDP, population, number of households, number of pipeline connections, tariff rates etc. and (b) climatic factors such as rainfall, temperature, humidity etc.

There is no concrete method for selecting the factors for an ANN model; however, linear cross correlation has been widely used (Babel *et al.* 2007; Babel & Shinde 2011) to eliminate the repeating factors to avoid the impact of multi-collinearity. The multi-collinearity distorts the standard error of estimates and can lead to an incorrect conclusion as to which independent variables are statistically significant (Babel *et al.* 2007). This study is the first to develop a multivariate model using the ANN by incorporating the impact of climate change to forecast the domestic water demand of Kathmandu valley. To achieve the goal, first the explanatory variables were selected based on correlation analysis and ease of data availability to develop an ANN model for the Kathmandu Valley. Second, six climate models (3 RCMs × 2 RCPs) were selected and downscaled using bias correction method to forecast the future rainfall of the valley. And finally, the ANN model was used to predict the water demand of the valley, using the extrapolated variables and the bias corrected climate change data. The result from this study will provide important factors influencing the water demand in the valley and further aid KUKL for better management of the water supply system.

## STUDY AREA AND METHOD

### Study area

Kathmandu Valley includes three districts: Kathmandu, Lalitpur and Bhaktapur (Figure 1). The total area coverage of Kathmandu valley is 899 km^{2} (Kathmandu; 395 km^{2}, Lalitpur; 385 km^{2}, Bhaktapur; 119 km^{2}). In general, the annual maximum temperatures are between 29 °C in May to 2 °C in January (Shrestha *et al.* 2015). The annual precipitation is around 1600 mm. Heavy rainfall occurs in the month of June to August, resulting from southeast monsoon winds. The average humidity of the valley is about 75%.

Currently, KUKL supplies water using 35 surface and 59 deep tube wells. It has 20 treatment plants and 43 reservoirs operated with 1,300 main valves. The total supply to Kathmandu valley during wet and dry seasons is 130 MLD and 90 MLD respectively, whereas the demand of the valley is 350 MLD and is expected to reach 510 MLD in 2018 (KUKL 2011).

The MWSP aims to bring water from the nearby rivers Melamchi, Yangri and Larke, 50 km northeast of the Kathmandu valley, in three phases (170 MLD from each river in three phases) through a tunnel. Phase I of MWSP is expected to be complete by 2020, whereas the construction dates of Phase II and Phases III have not yet been finalized. The water will be supplied through a bulk distribution system from a water treatment plant located at Mahankal village at Sundarijal to 15 reservoirs located within Kathmandu Valley. The project was designed in 1998 with an aim of supplying water for 30 years.

### Selection of factors

The multiple co-efficient method using ANN can use multiple factors to predict the water demand for a specific area. In this method, water demand is a function of two or more variables associated with water use. Babel *et al.* (2007) developed a multivariate regression model to forecast the domestic water demand of Kathmandu Valley. This paper identified nine factors as a dependent variable for water demand in Kathmandu Valley. A correlation analysis was carried out between the factors to determine the degree of relation between the factors by constructing a correlation matrix.

The analysis suggested that there was a high correlation between X1, X3, X4, X6 and X7, with a value of more than ±0.99 (Table 1). To avoid the effects of multi-collinearity, only one of the factors was selected as an input variable for demand forecasting. Factor X1 was chosen due to the reliability of the historical data. Multiple regression analysis was then carried out with the remaining factors X1, X2, X5, X8 and X9. It was noticed that for all three models developed, the linear model, Semi-log model and Log-Log model, factor X8 had the smallest absolute t-value and accepts the null hypothesis of the individual test. Hence, this factor was removed from further analysis (Babel *et al.* 2007 for more details). In this study, the factors selected by Babel *et al.* (2007); that is, X1, X2, X5 and X9, will be used as an input variable to construct the ANN model to predict the water demand of Kathmandu Valley.

. | X1 . | X2 . | X3 . | X4 . | X5 . | X6 . | X7 . | X8 . | X9 . |
---|---|---|---|---|---|---|---|---|---|

X1: no. of connections | 1 | ||||||||

X2: water tariff rate after minimum allowance of water supply in Nrs/m^{3} | 0.94 | 1 | |||||||

X3: population | 0.99 | 0.95 | 1 | ||||||

X4: per capita GDP at the current price in NRs | 0.99 | 0.94 | 0.99 | 1 | |||||

X5: ration of the total population to university students | 0.01 | 0.02 | 0.04 | 0.01 | 1 | ||||

X6: number of households | 0.99 | 0.95 | 0.99 | 0.99 | 0.05 | 1 | |||

X7: average household size | −0.99 | −0.95 | −0.99 | −0.99 | −0.06 | −0.99 | 1 | ||

X8: average annual temperature in °C | 0.79 | 0.75 | 0.80 | 0.81 | −0.17 | 0.81 | −0.77 | 1 | |

X9: annual rainfall in mm | 0.56 | 0.51 | 0.58 | 0.55 | 0.61 | 0.59 | −0.57 | 0.39 | 1 |

. | X1 . | X2 . | X3 . | X4 . | X5 . | X6 . | X7 . | X8 . | X9 . |
---|---|---|---|---|---|---|---|---|---|

X1: no. of connections | 1 | ||||||||

X2: water tariff rate after minimum allowance of water supply in Nrs/m^{3} | 0.94 | 1 | |||||||

X3: population | 0.99 | 0.95 | 1 | ||||||

X4: per capita GDP at the current price in NRs | 0.99 | 0.94 | 0.99 | 1 | |||||

X5: ration of the total population to university students | 0.01 | 0.02 | 0.04 | 0.01 | 1 | ||||

X6: number of households | 0.99 | 0.95 | 0.99 | 0.99 | 0.05 | 1 | |||

X7: average household size | −0.99 | −0.95 | −0.99 | −0.99 | −0.06 | −0.99 | 1 | ||

X8: average annual temperature in °C | 0.79 | 0.75 | 0.80 | 0.81 | −0.17 | 0.81 | −0.77 | 1 | |

X9: annual rainfall in mm | 0.56 | 0.51 | 0.58 | 0.55 | 0.61 | 0.59 | −0.57 | 0.39 | 1 |

Time series data from 1988 to 2015 for the selected factors were collected from literature, government and non-government organizations (Table 2). Interpolation technique was used to fill up the missing data. The annual rainfall of Kathmandu Valley was calculated using the Thiessen polygon using 14 rainfall stations scattered within the valley.

Data . | Source . |
---|---|

Water demand | Literature review and KUKL |

Number of pipe line connections | Literature review and KUKL |

Rainfall data | Department of Hydrology and Meteorology (DHM), Nepal |

Population | Central Bureau of Statistics, Nepal |

Number of university students | Literature review and University Grant Commission, Nepal |

Water tariff | Literature review and KUKL |

RCM data | South Asia Cordex |

Data . | Source . |
---|---|

Water demand | Literature review and KUKL |

Number of pipe line connections | Literature review and KUKL |

Rainfall data | Department of Hydrology and Meteorology (DHM), Nepal |

Population | Central Bureau of Statistics, Nepal |

Number of university students | Literature review and University Grant Commission, Nepal |

Water tariff | Literature review and KUKL |

RCM data | South Asia Cordex |

### Climate change analysis

*et al.*2015, 2016, 2017). In this study, 3 RCMs under 2 Representative Concentration Pathway (RCP) scenarios were used to predict the future rainfall of Kathmandu Valley. The RCMs used in the study are ACCESS, CNRM-CM5-CSIRO-CCAM and MPI-ESM-LR under RCP 4.5 and RCP 8.5 scenarios. These data were downloaded from South Asia Cordex (http://cccr.tropmet.res.in/home/index.jsp). These data were then bias corrected using linear scaling method to correct the errors or biases (Shrestha 2015). The equations used for bias correction are shown below. where, d = daily, μm = long term monthly means, * = bias corrected, his = RCM simulated 1976–2005, sim = RCM simulated 2016–2040, obs = observed 1976–2005

### Artificial neural network

ANN is a massively parallel distribution and information processing system that resembles human biological neural network. As humans apply knowledge gained from experience to solve new problems, ANN takes previously solved examples to build a system of neurons which makes new decisions. ANN looks for patterns in training sets of data, learn those patterns and develop the ability to classify or forecast new patterns.

For this study, Multilayer Perceptron (MLP) ANN method was used to develop a water demand model. This method is widely used in different aspects of water resource management from predicting urban water demand (Bata et al. 2020; Zubaidi et al. 2020), agriculture water demand (Perea et al. 2019) to river flow prediction (Ghorban et al. 2016; Pradhan et al. 2020). A basic structure of ANN is shown in Figure 2. The circle represents the neuron also called a node. A group of nodes arranged in a series is called a layer. The layer that receives data from outside is called the input layer, the layer that presents the final predicted value from the network is called the output layer and the layer between is called the hidden layer. Each of the nodes in one layer is linked to the nodes of another layer and each of the links has a numeric weight.

ANN models were developed using all four factors as an input to simulate the water demand of the valley. Selection of the number of hidden layers and the number of nodes in them greatly affects the model performance. With very few hidden nodes, the model is likely to underfit the data, resulting in high training error and high generalization error. Likewise, using an excessive number of units will result in overfitting; although the training error can be low, the generalization error will be high. However, the optimum number depends on the dataset as well. Models were constructed in MATLAB using 1, 2 and 3 hidden layers with an equal number of nodes in each hidden layer.

A hyperbolic tangent sigmoid transfer function was used in the hidden layers and a linear transfer function was used in the output layer. Backpropagation has been the most popular algorithm used for training the ANN. The Levenberg–Marquardt Algorithm (LMA) achieves a faster training speed and is considered an efficient algorithm for small to medium sized datasets (Karul *et al.* 2000). LMA techniques uses both Gauss-Newton and Gradient Descent approaches to converge to an optimal solution. This algorithm is widely used as it can provide advantages of rapid convergence near the vicinity of the minimum error, as in the Gauss-Newtown algorithm, and provides a stable learning path, as in the Gradient Descent method.

Here, *J* is the Jacobian matrix that includes first order partial derivatives of network error with respect to the weights and biases, *e* is the vector of network error, *I* is the identity matrix, *mu* is the damping factor that is updated in each iteration until the sum of square error is reduced. The value of *mu* determines a transition level between Gradient Descent and Gauss-Newton methods.

In this research, weight and bias values in the network were updated using the Neural Network library in MatLab. For more information on LMA, see Suratgar *et al.* (2007), Reynaldi *et al.* (2012), Duc-Hung *et al.* (2012) and Mathworks (2017).

## RESULTS AND DISCUSSION

### Climate change impact on rainfall

The RCM data was extracted for each of the stations, bias corrected and later used to calculate the annual rainfall of the valley using the Thiessen polygon method. The ensemble data of the three RCMs were used under RCP 4.5 and RCP 8.5 scenarios for forecasting the annual average rainfall of the valley from 2016 to 2040. The results suggested a 15% decrease in the annual average of the Kathmandu valley. However, the RCP scenarios had little to no difference between them in terms of future annual rainfall. The annual average rainfall of Kathmandu valley is forecast to be around 1245 mm (Figure 3). This decrease can have an enormous impact on not only water demand but also on water supply within the valley.

Under both the scenarios, the annual average rainfall of the valley is predicted to decrease. The trend line for both scenarios is shown in Figure 4 and shows comparable results.

### ANN model

Yearly dataset was available for a period of 28 years, among which 16 years of data were used for training the model, 6 years of data were used for validation and 6 years of data were used for testing the model. The testing, validating and testing samples were all selected randomly. The simulated water demand of each of the models was compared with the observed water demand. Statistical parameters like R^{2} and Root Mean Square Error (RMSE) were used to check the accuracy of the models. Among all the developed models, H1-N6, which had 1 Hidden layer with 6 Nodes and H2-N3-N3 with 2 Hidden layers and 3 Nodes in each layer, showed the best results with an overall R^{2} value of 0.999 and 1.000 and RMSE of 2.81 and 2.66 MLD respectively. However, model H1-N6 was selected to forecast the water demand of Kathmandu Valley for its reduced model complexity. For this H1-N6 model, there were 17 epochs for the convergence. The variation of gradient and mu for the model is shown in Figure 5 and the model's performance is shown in Figure 6.

### Forecasting socio-economic factors

The socio-economic factors of number of connections, water tariff and ratio of population to number of university students were forecast based on the historical data. Curve line fitting the time series data was tried with various trend-lines, and the linear equation was found to best fit the data set for all factors. The equation associated with each of the trend lines was used to forecast the factors up to the year 2040.

The number of pipeline connections in Kathmandu valley shows an increasing trend. The trend line fits the data set with an R^{2} value of 0.998, suggesting there has been a linear increase in pipe line connections (Figure 7). It is reasonable that with the increase in population and expansion of settlement, KUKL will increase the number of connections accordingly. The plot of water tariff rate after minimum allowance of water supply also shows an increasing trend. The water tariff increased gradually from 1988 to 2012 with an average increase of NRS. 3.00 in roughly every 5 years; however, the water tariff increased nearly twice from NRS. 17.50 in 2012 to NRS. 32.00 in 2013. The water tariff is a policy and is economically dependent. It can be easily forecast that the water tariff will increase in future to recover the cost of MWSP. The trend line with an R^{2} value of 0.818 best fits the historical data and shows that the water tariff in 2040 will reach NRS. 49.00. In contrast to the two socio-economic factors, the ratio of population to number of university students has a decreasing trend. With the increase in the number of colleges, the population attending university also increased, therefore decreasing the ratio. However, in the past recent years, the ratio seems to be increasing. This may be due to the migration of students abroad for higher studies. Nevertheless, a trend line with R^{2} 0.6621 best fits the data set and the future ratio is forecast based on this trend line.

### Water demand forecasting

Model H1-N6 was used to forecast the water demand of Kathmandu Valley. The forecast time series data of socio-economic factors and climatic factors under RCP 4.5 and RCP 8.5 scenarios were used as an input to predict the water demand. The water demand is predicted to increase under both scenarios at nearly the same rate. The water demand of the valley will increase to 465 MLD in 2025 and starts to saturate. The cause of saturation can be due to the equal number of increasing and decreasing trends of the factors involved in the water demand forecasting. With the present rate of water supply of 110 MLD and additional water supply from MWSP Phase I of 170 MLD, the Kathmandu Valley will still have a water shortage of 160 MLD in 2020. This scarcity will increase continuously as the water demand increases. Therefore, Phase II of MWSP should be completed within 2025 and Phase III should be completed by 2040.

The water demand forecast from a traditional arithmetic model compared to the ANN model under-predicts the demand from 2016 to 2025 and overestimates from 2025 onwards (Figure 8). The traditional method overestimates by more than 10%, suggesting that the ANN model can show a realistic forecast and help water supply authorities to manage the resources efficiently.

## CONCLUSION

With increasing population and changes in climate, the water supply in a city needs to be managed properly. Water demand forecasting is the first activity in managing water supply. KUKL is currently using the traditional method of demand calculation depending only on population. Therefore, this study aims to develop a multivariate model using the ANN to forecast the domestic water demand of the Kathmandu valley. The socio-economic factors of number of connections, water tariff and ratio of population to number of university students and climatic factor of annual rainfall were used as inputs to forecast the water demand. It was found that the model with 1 hidden layer and 6 nodes produced the best results. The socio-economic factors were forecast using trend extrapolation techniques, whereas the annual average rainfall was forecast using the ensemble of 3 RCMs under RCP 4.5 and RCP 8.5 scenarios. It was predicted that the water demand of the valley will increase to 440 MLD in the 2020s and reach 470 MLD by 2040. Even with the MWSP Phase I, the water shortage will be 160 MLD in 2020 and the scarcity will increase with the increase in water demand. Therefore, Phase II of MWSP should be completed within 2025 and Phase III should be completed by 2040.

The result from this study provides key factors influencing the water demand in the valley. Furthermore, this can also aid KUKL in better management of the water system; like how much water to release from each of the storage tanks. In addition, this research can help in decision making to construct the second and third phase for MWSP, the construction date of which still has not been decided. One of the major constraints of the study was the lack of systematic data availability. The result of the model could have been better if a longer dataset would have been available. Therefore, it is recommended that KUKL should also emphasize the collection and management of data, which can aid in analysis and development of water demand in the future.

## ACKNOWLEDGEMENTS

The authors would like to thank Center of Research for Environment, Energy and Water for providing the funds for the research. The authors would also like to thank Dr. Rabin Malla and Deepa Neupane for helping to collect the data necessary for the study.