Trinidad has undergone rapid urbanization over the past few decades. Urbanization is accompanied with an increase in the country's demand for water. The forecasting of water demand can give rise to a better understanding of water consumption behaviour across all sectors of economy and therefore aid in effective water demand management. This study compares the application of the seasonal ARIMA, exponential state space (ETS) models, artificial neural network (ANN) models and hybrid combinations of them in developing forecast models for all categories of water consumption for Trinidad. The best forecasting model was selected using the forecasting assessment criterion of Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). The forecasts were conducted until the end of December 2021. The results of the study show that hybrid model combinations are adequate in forecasting four out of the five categories and the single model, SARIMA, has been found suitable for the domestic category. Forecast plots revealed an increase in water demand until the end of 2021. The study also demonstrates the suitability of hybrid models for forecasting water demand for the island of Trinidad.

  • Development of monthly water demand forecasts.

  • Development of water demand models for each category of water demand for an island in the Caribbean.

  • Development of the first hybrid combination forecasting model for water consumption in Trinidad.

Graphical Abstract

Graphical Abstract
Graphical Abstract

Water is an essential part of life. Globally, water consumption has been increasing at twice the rate of population growth in the last century and this has presented several challenges with satisfying the demand for water (UN-Water 2021). According to a study by the United Nations, 1.42 billion people worldwide reside in areas of high-water vulnerability (UNICEF 2021). Approximately two thirds of the world population are faced with severe water shortages for at least one month per year (Mekonnen & Hoekstra 2016). In the Caribbean, the World Resources Institute have identified the following islands as having ‘extremely high’ levels of water stress: Dominica, St. Vincent and the Grenadines, Antigua and Barbuda, St. Kitts and Nevis, Barbados and Trinidad and Tobago. Of the above, Antigua and Barbuda, Barbados and St. Kitts and Nevis have been categorized as being water scarce islands with less than 1000 cubic meters of freshwater resources per capita (Chow 2019). Although Trinidad has an abundance of water resources, the country faces a water crisis as it has been unable to meet its demands in recent times. The demand for water was estimated at 393 MCM/year in 2015 with supply at 382 MCM/year which represents a deficit of 11 MCM. This deficit is intensified during the dry season when precipitation and reservoir levels are low. To address the increase in water demand and reduction in supply of water, various strategies need to be considered in water demand management. One important area is to develop a model for forecasting consumption.

There are many approaches to the modelling of water consumption. Worldwide, traditional approaches have utilized methods such as exponential smoothing, linear regression and ARIMA models. However, regression analysis assumes constant variance and normal distribution of errors (Sen et al. 2003). Most time series methods assume a linear dependence of future values on historic data. ARIMA models developed by Box & Jenkins (1976) are superior in modelling the linear behaviour of time series. Exponential smoothing models have also been found to be easily applicable and are remarkable in modelling the seasonality patterns in time series data (Gjika et al. 2019). In more recent times, Artificial Neural Networks (ANNs) have been used in forecasting due to its advantages such as its capability to model non-linear data (Hornik et al. 1989). Several studies have compared traditional methods of forecasting to neural network approaches, and it was found that ANNs were better at forecasting water consumption than the traditional techniques. Bougadis et al. (2005) analyzed three models for water consumption which included time series, ANN and regression. The study was carried out for the city of Ottawa in Canada and only included data for the summer months for the years 1993–2002. It was deduced that the ANN model outperformed the other conventional techniques. White & Safi (2016) postulated that the ANN model performed considerably better than the ARIMA model when the linking function was non-linear but the ARIMA model was more effective when the function was a linear one.

Therefore, although ANNs have been found to model non-linear data quite effectively, it has been deduced that ANNs alone, in some situations, are incapable of modelling data that exhibit both linear and non-linear characteristics (Zhang 2003). Hybrid models which are formed by combining different methods allow for accurately modelling data that comprise of linear and non-linear patterns. Ginzburg & Horn (1993) developed a hybrid model which combined several feedforward neural networks for time series forecasting. Their model was found to improve forecasting accuracy. Luxhoj et al. (1996) developed a hybrid model by using an econometric and ANN approach for sales forecasting which was also found to be very efficient.

The main objective of this study is to determine appropriate forecasting models that can adequately forecast water consumption for Trinidad. A study conducted by Ekwue (2009) showed that residential consumption accounted for approximately 40% of total consumption in Trinidad with the industrial and agricultural sectors accounting for approximately 21 and 3% of total demand, respectively. In this study, an ETS, ARIMA, ANN, and hybrid combinations of all three models were considered in developing monthly forecasting water demand models for each category: domestic, industrial, commercial, agricultural and ‘other’ based on data from Trinidad for the period 2003–2018. According to Leon et al. (2020), water analysis can assist in answering questions related to water consumption problems and its scarcity. A model for forecasting the demand for water will therefore provide a technique for maintaining a supply-demand balance.

Data

Historical monthly consumption data from October 2003 to August 2018 across Trinidad were obtained from the Water Resources Agency (WRA), a division of the Water and Sewerage Authority (WASA). The data were sorted into the following categories: domestic, industrial, commercial, agricultural and ‘other’. Figure 1 shows the spread for each category of water consumption for the island. The domestic class is defined as follows: all premises used entirely as living quarters or used either solely or partly for business and have not been registered for Value Added Tax. The industrial category refers to property that is used for production activities. The commercial category refers to premises that have been registered for Value Added Tax and used to conduct commercial and business activities such as malls, shopping centers, etc. The agricultural category refers to premises used for agricultural operations such as crop and livestock farming, forestry and horticulture. The ‘other’ category comprises of cottage and charitable organizations and incorporates non-domestic property that is used partly for business operations or partly as domestic dwellings as well as domestic premises used for charitable purposes. Data from October 2003 to July 2017 were used as the training sample for model estimation and the remaining data from August 2017 to August 2018 were used as the testing sample data set. The training set is utilized solely for model development whereas the testing set is utilized primarily for model evaluation.

Figure 1

Map of Trinidad showing the breakdown of water consumption for each category.

Figure 1

Map of Trinidad showing the breakdown of water consumption for each category.

Close modal

Data cleaning

Data cleaning methods comprise detection and removal of any outliers which aids in obtaining an improved outcome from data analysis (Xiong et al. 2006). In this study, the non-seasonal part of the data was smoothed using the Friedman's Super Smoother which is a non-parametric trend estimator based on a localized least squares regression technique with adaptive bandwidths. The technique is first implemented by calculating m different smooths using different bandwidths and computed using the entire data set. The performance of the smooth is measured and estimated as:
formula
(1)
where is some fixed span smoother, and is a performance measure of th smooth at the point for (Givens & Hoeting 2013).

The data used for each smooth come from the cross-validated residuals from alternate smooths. The optimal bandwidths are smoothed again and the two estimates with closest bandwidths are chosen. Final smoothing is done using linear interpolation technique between both estimates.

Data normalization

Data normalization is used to minimize the effects of noise on a data set and bring all attributes of the data set on the same scale so that the best performing models can be developed. For this study, data were normalized by transforming all data points between the values of 0 and 1 using the following function:
formula
(2)

Seasonal autoregressive integrated moving average (SARIMA) model

The Autoregressive Integrated Moving Average (ARIMA) model, proposed by Box and Jenkins, has been one of the most widely utilized models in time series forecasting. For the ARIMA model, the assumption is that the future values of the variables are a linear function of historical values and random errors. The model takes the following form:
formula
(3)
where and are the values and random errors at time t respectively (Wang & Meng 2012).

A seasonal autoregressive integrated moving average (SARIMA) model is expressed as ARIMA[m] where represents the non-seasonal component, represents the seasonal component of the model and m represents the number of periods in each season. The values of represent the order of the non-seasonal autoregressive term, number of non-seasonal differencing required to obtain stationarity and the order of the moving average component respectively, whereas the values of and Q represent the corresponding values for the seasonal components. The parameters for the model are estimated so that the errors are minimized. Stationarity is required in building a SARIMA model. Stationarity is characterized by the condition that the mean and autocorrelations are constant over time. Once the model has been estimated, diagnostic checks are implemented to check that the error assumptions are satisfied.

Exponential state space models

Forecasts using exponential smoothing methods are based on weighted averages of historical observations where more weight is given to the later observations. Exponential smoothing methods were first developed by Pegels (1969) and then later modified by Gardner (1985), Hyndman et al. (2002) and Taylor (2003). The methods were classified according to the seasonality and trend components. In 2008, Hyndman developed modifications to the method which took into consideration additive and multiplicative errors along with seasonality and trend components (Holt-Winters method). The ETS model refers to the three components: Error, Trend and Seasonality. This model was selected as it took into consideration the three components of the time series data. The general model comprises of a state space vector and the exponential smoothing models are written as follows:
formula
(4)
formula
(5)
formula
(6)
where represents the series level at time t, represents the slope at time t, represents the seasonal component of the time series at time t, and m represents the number of seasons in a year. varies according to the seasonal and trend component. and are constants (Gjika et al. 2019).

The artificial neural network model

Artificial neural networks (ANN) imitate the functionality of the human brain. The ANN model can identify patterns and trends in data and is capable of learning from existing data. ANN's are excellent at modelling non-linear and non-parametric relationships better than traditional linear models which have more restrictive assumptions. Single hidden layer feedforward neural networks are the most widely utilized form of neural networks. This model consists of three layers, the input, output and hidden layers. The signals are transmitted in one forward direction to the hidden and output layers (Firat et al. 2010). The relationship between the output and the inputs ( takes the following form (Zhang 2003):
formula
(7)
where are called the connection weights, p is the number of input nodes and q is the number of hidden nodes. Data are transferred from the input layer to the output layer by a sigmoid function given by (Zhang 2003):
formula
(8)

This function is used as the hidden layer transfer function.

The function which maps past observations to future values is given by:
formula
(9)
where is a function that is specified by the network structure and weight, and is a vector of parameters.

Hybrid model

There is often difficulty in determining whether water consumption data display linearity or non-linearity properties and hence it is quite difficult to select the most appropriate statistical method for specific problems. Real world time series data are seldom either linear or non-linear and hence by using a combination of different statistical methods, complex data structures can be modelled more precisely. Hybrid models therefore allow for the modelling of different underlying patterns (trend and seasonality) in the data. Combinations of all three models were utilized to establish if any of the hybrid combinations would result in an increased forecasting performance.

The proposed model is developed for forecasting water consumption across Trinidad using historical time series consumption data. The steps in model development are shown in Figure 2.

Figure 2

Flow chart of hybrid model development.

Figure 2

Flow chart of hybrid model development.

Close modal

Model performance

The accuracy of models’ forecasts can be assessed by taking into consideration model performance using unseen data that have not been used during the training process. Various measures of accuracy have been developed and many authors have utilized these methods in assessing model performance.

In order to assess the performance of the models, root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) have been utilized:
formula
(10)
formula
(11)
formula
(12)
where and are the ith actual and forecasting values respectively, n is the total number of predictions (Wang & Meng 2012). The RMSE has been popular in modelling but has been found to be more sensitive to outliers than the MAE. The RMSE and MAE can be utilized if all forecasts are measured on the same scale and on the same dataset. For the MAPE, the value is represented in percentage terms and hence the scale does not pose a problem. MAPE can be used on different time series data sets on the same or different scales once the data set does not contain small values of zeros (Hyndman & Koehler 2006). The models having the lowest error values are selected as the most accurate. Another method for model evaluation is calculating the difference in error of the RMSE (RMSE) and change in MAE (MAE) for the trained and test data sets. The closer the values of RMSE and MAE to zero, the more accurate the proposed model (Bamisile et al. 2020):
formula
(13)
formula
(14)
where and are training performance metrics and and are the test performance metrics.

Index of agreement

The index of agreement (d) was developed by Willmott (1981) as a standardized method to compare predicted values with the actual data. The value varies from zero to one, with values closer to one indicating good agreement between actual and predicted values (Legates & Mccabe 1999). The index of agreement, d, is given by:
formula
(15)
where n is the number of observations, is the th predicted value, is the th observed value, is the difference between the th observed value and the average observed value, is the difference between the th predicted value and the average predicted value (Khedkar et al. 2015).

Trend detection

The Mann-Kendall (MK) test was used to assess whether there was a trend in water consumption over time. The MK test has been used extensively in the assessment trends in hydrologic time series data (Hamed 2008). The following are the assumptions of the Mann-Kendall test (Tosunoglu & Kisi 2017):

  • 1.

    The measurements that are observed over time are independent and identically distributed in the absence of a trend.

  • 2.

    The observations are representative of the real states at the time of measurement.

  • 3.

    The sampling methods, handling of data and measurement methods are unbiased.

The MK test statistic () is calculated as follows:
formula
(16)
formula
(17)
where N is the length of the data set and represent the data points at times i and j. Negative values of S indicate a decreasing trend and positive values indicate an increasing trend in the time series.

The MK test is checked as follows:

  • No trend is present

  • There is an upward trend

The statistic S is normally distributed with a mean if zero and variance given by:
formula
(18)
where P is the number of tied groups and represnts the number of observations in the ith group.
The standard Z value of the test statistics S is computed as follows:
formula
(19)

The null hypothesis is rejected if the calculated standard Z value is greater than the standard normal Z value at significance level (Tosunoglu & Kisi 2017).

Trinidad is the most southerly island in the Caribbean with a population of approximately 1.3 million people. The island has two distinct seasons: a wet season which runs from June to November and a dry season which spans December to May. During the dry season, temperatures can reach as high as 36 °C during the day and about 28 °C at night. The wet season usually has slightly lower temperatures. The seasonal variation between the daytime and night time temperatures is approximately 0.9 °C.

Figure 3 shows plots of water consumption time series data after normalization. Figure 3(a), 3(b) and 3(d) show the plots for the domestic, commercial and agricultural categories, respectively. There are very clear seasonal patterns in water consumption for the domestic, commercial and agricultural categories. The highest consumption is during the wet season for the domestic category while the lowest are during the dry months. Agricultural consumption increases during the dry months and decreases during the wet season as farmers would require less water for watering their crops. This fluctuation in seasonal consumption can be amplified under changing climatic conditions. Climate change has the potential to alter meteorological conditions and hence have a significant impact on water resources (Nazari-Sharabian et al. 2018) as demonstrated in the case study conducted by Nazari-Sharabian et al. (2019) on the Mahabad Dam watershed in Iran. Water consumption data for the ‘other’ category are shown in Figure 3(e). An obvious linear upward trend is observed whereas the industrial category shown in Figure 3(c) exhibits an upward trend from 2003 to 2015 with a sharp downward trend for the two years that follow.

Figure 3

Normalized time series plots of water consumption (a) domestic; (b) commercial; (c) industrial; (d) agricultural; (e) other.

Figure 3

Normalized time series plots of water consumption (a) domestic; (b) commercial; (c) industrial; (d) agricultural; (e) other.

Close modal

Tables 15 summarize the main forecast accuracy measures and Table 6 provides the structure of the optimum models selected for each category, respectively. Models were selected based on the minimization of RMSE, MAE and MAPE. For the domestic category, the highest consumption (1,431,712 m3) was observed in June 2018 and the lowest consumption (164,362.4 m3) in September 2017. The seasonal ARIMA model proved to be the best model based on all the accuracy measures (Table 1). This model was found to be an ARIMA (1,0,3)(0,1,1)[12] with drift with parameters ar1=0.7892, ma1=−0.902, ma2=−0.0822, ma3=0.4151, sma1=−0.5001, drift=0.0016. Residual diagnostics show that the residuals satisfied white noise criterion according to the Box-Pierce test ( For the commercial category, the highest consumption (6,722,828 m3) was observed in July 2017 and the lowest consumption (709,51 6.4 m3) in February 2005. For this category, the ARIMA-ETS-NNAR model outperformed all single models and 2-model hybrid combinations (Table 2). For the seasonal ARIMA model, the results indicate that the best fit model was the ARIMA (0,1,2)(2,0,0)[12] with parameters ma1=−0.7543, ma2=0.2730, sar1=0.2729, sar2=0.2901. For the exponential smoothing component, the model was an ETS (A, Ad, A) which is an additive damped trend method with additive errors. The parameters of the ETS component are with initial level , initial trend and initial state vector For the neural network model, the results indicate that the model was NNAR(2,1,10)[12]. This three-layer neural network has an average of 250 networks, each of which is a 3-10-1 network with 51 weights. Residual diagnostics for the hybrid model indicate that the residuals satisfy white noise criterion according to the Box-Pierce test (0.3615). For the industrial consumption category, there was a sharp decrease in consumption in 2015–2017 followed by an increase thereafter. The lowest consumption (201,730 m3) was observed in October 2017 and the highest consumption (4,750,722 m3) was recorded in November 2017. The best fitting model for this category was a hybrid ARIMA-ETS-NNAR (Table 3) model with weights of 0.342, 0.325 and 0.333 respectively (Table 6). The model for the SARIMA component was (0,1,1)(1,0,0)[12] with parameters ma1=−0.6449, sar1=0.2942. For the exponential smoothing component, the parameters are with initial level initial trend and The model for the neural network component was NNAR(2,1,10)[12]. This neural network has an average of 250 networks, each of which is a 3-10-1 network with 51 weights. Residual diagnostics indicate that residuals satisfy white noise criterion according to the Box Pierce test The agricultural category displays increased consumption during most of the dry season from January to May and lower consumption from June to December as expected. The lowest consumption value was in February 2005 (16,424.62 m3) and the highest consumption was reported in June 2017 (388,933.5 m3). The most efficient model for the agricultural category was found to be a hybrid ARIMA-ETS-NNAR model (Table 4) with weights for the seasonal ARIMA. ETS and NNAR models of 0.325,0.349 and 0.326 respectively (Table 6). The model for the SARIMA component was (0,1,1)(2,0,0)[12] with parameters ma1=−0.7107, sar1=0.1402, sar2=0.2357. For the NNAR component, the model was (4,1,10)[12]. This neural network model has an average of 250 networks, each of which is a 5-10-1 network with 71 weights. The ETS component was an ETS (A, Ad, A) model. The parameters of the ETS component are with initial level , initial trend and initial state vector Model diagnostics indicate that residuals satisfy white noise criterion according to the Box Pierce test The ‘other’ category displays a linear increasing trend over the years. The lowest consumption value was in March 2004 (22,480.06 m3) and the maximum consumption occurred in July 2017 (142,441 m3). From Table 5, although the ETS model had a lower value for RMSE, the ETS-NNAR model was found to be the best fit based on values of MAE and MAPE. The fitted model was an ETS-NNAR model with equal weights. The exponential smoothing component was an ETS(A, Ad, A) model which represents an additive damped trend method with additive errors. The parameters of the ETS component are with initial level , initial slope and For the NNAR component, the model was (3,1,10)[12]. This neural network model has an average of 250 networks, each of which is a 4-10-1 network with 61 weights. The results from the Box-Pierce test indicate that the residuals are white noise The Anderson-Darling test for normality was carried out to evaluate if the residuals after modelling for all selected models were normally distributed. For all five models, normality tests yielded the following p-values: <2.2 × 10−16, 0.8038, 0.2353, 0.0039, 0.0708 for the domestic, commercial, industrial, agricultural and ‘other’ categories respectively. These results indicate that normality assumption was satisfied for the commercial, industrial and ‘other’ categories while the normality assumption was not satisfied for the domestic and agricultural categories. However, normality is a useful but not necessary condition for residuals (Hyndman & Athanasopoulos 2021) and models can still yield satisfactory results. Table 7 summarizes the training and test errors for all five categories. Even though all models had negative values for RMSE and MAE which indicate an overfitting of the models during the training phase, the model for the industrial, commercial, agricultural and ‘other’ categories proved to be adequate based on the values for RMSE and MAE being very close to zero. The SARIMA model, although being the most suitable model for the residential category, displayed larger changes in training and test errors than all models and hence the SARIMA model was not as effective as the others generalizing to unseen data for this category. It can also be deduced that the hybrid models proved to be the best models for forecasting water consumption based on accuracy measures for four out of five categories. Forecasts were performed for all categories up to December 2021 (a 40-month horizon). The observed values, fitted values, the point forecasts and the 80–95% confidence intervals are shown in Figure 4. We can observe from the plots that the fitted values are close to the observed values which validates the utilization of all models. This can be confirmed by the values for index of agreement (d) between the observed and fitted values of each model (Table 8). All values were very close to one which indicate generally good agreement between the actual and fitted values. The models for the industrial, commercial and ‘other’ categories demonstrated an even better agreement between the fitted model and observations (d > 0.900) than the models for the agricultural (d = 0.8652) and domestic categories (d = 0.8524).

Table 1

Forecast accuracy measures of the single and hybrid models for the domestic category

ModelRMSEMAEMAPE
SARIMA 0.2543 0.2102 – 
ETS 0.3081 0.2521 – 
NNAR 0.2849 0.2376 – 
ARIMA-ETS 0.2739 0.2228 – 
ARIMA-NNAR 0.2575 0.2110 – 
ETS-NNAR 0.2765 0.2279  
ARIMA-ETS-NNAR 0.2652 0.2203 – 
ModelRMSEMAEMAPE
SARIMA 0.2543 0.2102 – 
ETS 0.3081 0.2521 – 
NNAR 0.2849 0.2376 – 
ARIMA-ETS 0.2739 0.2228 – 
ARIMA-NNAR 0.2575 0.2110 – 
ETS-NNAR 0.2765 0.2279  
ARIMA-ETS-NNAR 0.2652 0.2203 – 
Table 2

Forecast accuracy measures of the single and hybrid models for the commercial category

ModelRMSEMAEMAPE
SARIMA 0.0338 0.0247 12.2500 
ETS 0.0538 0.0499 23.4234 
NNAR 0.1917 0.1817 92.2916 
ARIMA-ETS 0.0422 0.0359 16.3573 
ARIMA-NNAR 0.0538 0.0489 24.5694 
ETS-NNAR 0.0428 0.0377 18.8122 
ARIMA-ETS-NNAR 0.0315 0.0247 12.2500 
ModelRMSEMAEMAPE
SARIMA 0.0338 0.0247 12.2500 
ETS 0.0538 0.0499 23.4234 
NNAR 0.1917 0.1817 92.2916 
ARIMA-ETS 0.0422 0.0359 16.3573 
ARIMA-NNAR 0.0538 0.0489 24.5694 
ETS-NNAR 0.0428 0.0377 18.8122 
ARIMA-ETS-NNAR 0.0315 0.0247 12.2500 
Table 3

Forecast accuracy measures of the single and hybrid models for the industrial category

ModelRMSEMAEMAPE
SARIMA 0.0427 0.0325 6.3835 
ETS 0.0453 0.0374 6.4284 
NNAR 0.0716 0.0618 11.8761 
ARIMA-ETS 0.0400 0.0330 6.5185 
ARIMA-NNAR 0.0489 0.0427 8.3303 
ETS-NNAR 0.0469 0.0418 8.1298 
ARIMA-ETS-NNAR 0.0405 0.0330 6.5044 
ModelRMSEMAEMAPE
SARIMA 0.0427 0.0325 6.3835 
ETS 0.0453 0.0374 6.4284 
NNAR 0.0716 0.0618 11.8761 
ARIMA-ETS 0.0400 0.0330 6.5185 
ARIMA-NNAR 0.0489 0.0427 8.3303 
ETS-NNAR 0.0469 0.0418 8.1298 
ARIMA-ETS-NNAR 0.0405 0.0330 6.5044 
Table 4

Forecast accuracy measures of the single and hybrid models for the agricultural category

ModelRMSEMAEMAPE
SARIMA 0.0398 0.0339 33.2996 
ETS 0.0328 0.0299 32.5427 
NNAR 0.0770 0.0688 93.6701 
ARIMA-ETS 0.0361 0.0317 32.6562 
ARIMA-NNAR 0.0342 0.0262 40.0241 
ETS-NNAR 0.0363 0.0313 32.5900 
ARIMA-ETS-NNAR 0.0275 0.0203 30.3825 
ModelRMSEMAEMAPE
SARIMA 0.0398 0.0339 33.2996 
ETS 0.0328 0.0299 32.5427 
NNAR 0.0770 0.0688 93.6701 
ARIMA-ETS 0.0361 0.0317 32.6562 
ARIMA-NNAR 0.0342 0.0262 40.0241 
ETS-NNAR 0.0363 0.0313 32.5900 
ARIMA-ETS-NNAR 0.0275 0.0203 30.3825 
Table 5

Forecast accuracy measures of the single and hybrid models for the ‘other’ category

ModelRMSEMAEMAPE
SARIMA 0.0711 0.0635 21.1879 
ETS 0.0698 0.0606 20.0727 
NNAR 0.0816 0.0641 19.4683 
ARIMA-ETS 0.0700 0.0619 20.5888 
ARIMA-NNAR 0.0719 0.0598 18.8920 
ETS-NNAR 0.0707 0.0583 18.3016 
ARIMA-ETS-NNAR 0.0700 0.0600 19.2637 
ModelRMSEMAEMAPE
SARIMA 0.0711 0.0635 21.1879 
ETS 0.0698 0.0606 20.0727 
NNAR 0.0816 0.0641 19.4683 
ARIMA-ETS 0.0700 0.0619 20.5888 
ARIMA-NNAR 0.0719 0.0598 18.8920 
ETS-NNAR 0.0707 0.0583 18.3016 
ARIMA-ETS-NNAR 0.0700 0.0600 19.2637 
Table 6

Structure of optimal models for each category

CategoryModelStructureWeights
Domestic ARIMA ARIMA(1,0,3)(0,1,1)[12] with drift – 
Industrial ARIMA-ETS-NNAR ARIMA(0,1,1)(1,0,0)[12] 0.342 
ETS(A, Ad, A) 0.325 
NNAR(2,1,10)[12] 0.333 
Commercial ARIMA-ETS-NNAR ARIMA(0,1,2)(2,0,0)[12] 0.333 
ETS(A, Ad, A) 0.333 
NNAR(2,1,10)[12] 0.333 
Agricultural ARIMA-ETS-NNAR ARIMA(0,1,1)(2,0,0)[12] 0.325 
ETS(A, Ad, A) 0.349 
NNAR(4,1,10)[12] 0.326 
Other ETS-NNAR ETS(A, Ad, A) 0.5 
NNAR(3,1,10)[12] 0.5 
CategoryModelStructureWeights
Domestic ARIMA ARIMA(1,0,3)(0,1,1)[12] with drift – 
Industrial ARIMA-ETS-NNAR ARIMA(0,1,1)(1,0,0)[12] 0.342 
ETS(A, Ad, A) 0.325 
NNAR(2,1,10)[12] 0.333 
Commercial ARIMA-ETS-NNAR ARIMA(0,1,2)(2,0,0)[12] 0.333 
ETS(A, Ad, A) 0.333 
NNAR(2,1,10)[12] 0.333 
Agricultural ARIMA-ETS-NNAR ARIMA(0,1,1)(2,0,0)[12] 0.325 
ETS(A, Ad, A) 0.349 
NNAR(4,1,10)[12] 0.326 
Other ETS-NNAR ETS(A, Ad, A) 0.5 
NNAR(3,1,10)[12] 0.5 
Table 7

Training and Testing errors for all models

Training
Testing
Difference in Errors
CategoryRMSEMAERMSEMAERMSEMAE
Domestic 0.0727 0.0408 0.2543 0.2102 −0.1816 −0.1694 
Industrial 0.0404 0.0304 0.0405 0.0330 −0.0001 −0.0026 
Commercial 0.0134 0.0105 0.0315 0.0347 −0.0181 −0.0242 
Agricultural 0.0098 0.0076 0.0275 0.0203 −0.0177 −0.0127 
Other 0.0366 0.0290 0.0698 0.0606 −0.0332 −0.0316 
Training
Testing
Difference in Errors
CategoryRMSEMAERMSEMAERMSEMAE
Domestic 0.0727 0.0408 0.2543 0.2102 −0.1816 −0.1694 
Industrial 0.0404 0.0304 0.0405 0.0330 −0.0001 −0.0026 
Commercial 0.0134 0.0105 0.0315 0.0347 −0.0181 −0.0242 
Agricultural 0.0098 0.0076 0.0275 0.0203 −0.0177 −0.0127 
Other 0.0366 0.0290 0.0698 0.0606 −0.0332 −0.0316 
Table 8

Index of agreement for all models

CategoryModel structureIndex of agreement (d)
Domestic ARIMA(1,0,3)(0,1,1)[12] with drift 0.8524 
Industrial ARIMA(0,1,1)(1,0,0)[12] 0.9335 
ETS(A, Ad, A) 
NNAR(2,1,10)[12] 
Commercial ARIMA(0,1,2)(2,0,0)[12] 0.9677 
ETS(A, Ad, A) 
NNAR(2,1,10)[12] 
Agricultural ARIMA(0,1,1)(2,0,0)[12] 0.8652 
ETS(A, Ad, A) 
NNAR(4,1,10)[12] 
Other ETS(A, Ad, A) 0.9726 
NNAR(3,1,10)[12] 
CategoryModel structureIndex of agreement (d)
Domestic ARIMA(1,0,3)(0,1,1)[12] with drift 0.8524 
Industrial ARIMA(0,1,1)(1,0,0)[12] 0.9335 
ETS(A, Ad, A) 
NNAR(2,1,10)[12] 
Commercial ARIMA(0,1,2)(2,0,0)[12] 0.9677 
ETS(A, Ad, A) 
NNAR(2,1,10)[12] 
Agricultural ARIMA(0,1,1)(2,0,0)[12] 0.8652 
ETS(A, Ad, A) 
NNAR(4,1,10)[12] 
Other ETS(A, Ad, A) 0.9726 
NNAR(3,1,10)[12] 
Figure 4

Forecast graphs for water consumption: (a) domestic category for SARIMA model; (b) commercial category for hybrid ARIMA-ETS-NNAR model; (c) industrial category for hybrid ARIMA-ETS-NNAR model; (d) agricultural category for hybrid ARIMA-ETS-NNAR model; (e) ‘other’ category for ETS-NNAR model.

Figure 4

Forecast graphs for water consumption: (a) domestic category for SARIMA model; (b) commercial category for hybrid ARIMA-ETS-NNAR model; (c) industrial category for hybrid ARIMA-ETS-NNAR model; (d) agricultural category for hybrid ARIMA-ETS-NNAR model; (e) ‘other’ category for ETS-NNAR model.

Close modal

All five forecast graphs display an increasing trend in consumption until the end of December 2021. This was confirmed by using the M-K test statistic to assess the trend for all categories. The M-K test yielded the following values for S of 1.969×10−3, 2.335 × 10−3, 2.087 × 10−3, 1.791 × 10−3, 1.743 × 10−0 which indicate an increasing trend in forecast time series (S > 0) for the domestic, commercial, industrial, agricultural and ‘other’ categories respectively until the end of 2021. The z-values were calculated as 4.3022, 5.1023, 4.5602, 3.9131 and 3.8082 for the domestic, commercial, industrial, agricultural and ‘other’ categories respectively which indicate a rejection of the null hypothesis that no trend is present at the 5% significance level (z-value > 1.645). These findings therefore indicate an increase in projected water consumption patterns for both the wet and dry season.

The results obtained from this study demonstrate that the proposed method can be applied to forecast monthly water demand in any location, once there is sufficient data available for the analysis to be conducted. Past studies have utilized single or two-model hybrid combinations of the ARIMA, ETS and NNAR models for forecasting water consumption. A study conducted by Ristow et al. (2021) for the city of Joinville in Brazil used only the ETS and ARIMA models to forecast water consumption. Another study conducted by Mukhairez & El-Halees (2018) for the KhanYounis municipality in Palestine utilized a hybrid ARIMA and neural network model. Our study used the ARIMA, ETS and NNAR models and two-model hybrid combinations. It then used the three-model hybrid combination, demonstrating the superiority of the hybrid models for water consumption forecasting. From this perspective, there may be potential that this model can be generalized to forecast water demand for different scenarios.

In this study, water consumption data were analysed for the period October 2003–August 2018 across Trinidad, Trinidad and Tobago. The forecasting of water consumption can aid in developing a balance between demand and supply for water. This research utilized single models, namely the ARIMA, ETS and NNAR models, as well as hybrid combinations of all three models in forecasting of water consumption until the end of December 2021 for all categories of demand across Trinidad. The three-model hybrid combination of ARIMA-ETS and NNAR were found to be adequate in modelling water consumption for the commercial, industrial and agricultural categories, the two-model hybrid combination of ETS-NNAR was found to be suitable for the ‘other’ category whereas the SARIMA model was found to be suitable for the domestic category. All models demonstrated a good agreement between the actual and fitted values. Hybrid models were also found to be the optimum models for four out of five categories. The study also highlights the importance of a forecasting model as it revealed an increase in demand for water for the future.

This short-term forecasting of water demand can effectively be used to fuel processes such as operational planning and management, and aid in establishing benchmarks for water demand. The study can be further improved by utilization of a larger data set as well as exploitation of other potential models for forecasting water consumption. A larger training data set can be utilized to avoid overfitting. Future work can explore long-term forecasting methods to forecast water consumption for a longer time into the future as well as incorporate other variables as part of the model such as population, climatic variables and spatial attributes contingent on the availability of water demand and supply data for the region.

No potential conflict of interest was reported by the authors and contributors.

Data cannot be made publicly available; readers should contact the corresponding author for details.

Bamisile
O.
,
Oluwasanmi
A.
,
Obiora
S.
,
Osei-Mensah
E.
,
Asoronye
G.
&
Huang
Q.
2020
Application of deep learning for solar irradiance and solar photovoltaic multi-parameter forecast
.
Energy Sources, Part A: Recovery, Utilization and Environmental Effects
,
1
21
.
doi:10.1080/15567036.2020.1801903
Bougadis
J.
,
Adamowski
K.
&
Diduch
R.
2005
Short-term municipal water demand forecasting
.
Hydrological Processes
19
(
1
),
137
148
.
Box
G. E. P.
&
Jenkins
G. M.
1976
Time Series Analysis-Forecasting and Control
,
3rd edn
.
Prentice-Hall
,
Englewood Cliffs, NJ
.
Chow
D.
2019
In Search of a Solution for Water Scarcity in the Caribbean
. .
Ekwue
E.
2009
Management of water demand in the Caribbean region: current practices and future needs
.
The West Indian Journal of Engineering
32
(
1–2
),
28
35
.
Gardner
E. S.
Jr.
1985
Exponential smoothing: the state of the art
.
Journal of Forecasting
4
(
1
),
1
28
.
Ginzburg
I.
&
Horn
D.
1993
Combined neural networks for time series analysis
. In:
Proc. of the 6th Int. Conf. on Neural Information Processing Systems (NIPS)
,
29 November–02 December
,
Denver
.
NIPS
,
Denver
,
Colorado
.
Givens
G. H.
&
Hoeting
J. A.
2013
Computational Statistics
.
John Wiley & Sons, Inc.
,
Hoboken, NJ
.
Gjika
E.
,
Ferrja
A.
&
Kamberi
A.
2019
A study on the efficiency of hybrid models in forecasting precipitations and water inflow Albania case study
.
Advances in Science, Technology and Engineering Systems Journal
4
(
1
),
302
310
.
Hornik
K.
,
Maxwell
S.
&
Halbert
W.
1989
Multilayer feedforward networks are universal approximators
.
Neural Networks
2
,
359
366
.
Hyndman
R. J.
&
Athanasopoulos
G.
2021
Forecasting: Principles and Practice
, 3rd edn.
OTexts
,
Melbourne
.
Available from: https://otexts.com/fpp3/ (accessed 20 December 2021)
.
Hyndman
R. J.
&
Koehler
A. B.
2006
Another look at measures of forecast accuracy
.
International Journal of Forecasting
22
,
679
688
.
Hyndman
R. J.
,
Koehler
A. B.
,
Snyder
R. D.
&
Grose
S.
2002
A state space framework for automatic forecasting using exponential smoothing methods
.
International Journal of Forecasting
18
(
3
),
439
454
.
Khedkar
D.
,
Singh
P. K.
,
Bhakar
S. R.
,
Kothari
M.
,
Jain
H. K.
&
Mudgal
V. D.
2015
Selection of proper method for evapotranspiration under limited meteorological data
.
International Journal of Advanced Research in Biological Sciences
2
(
12
),
34
44
.
Leon
P.
,
Chaplot
B.
&
Solomon
A.
2020
Water consumption forecasting using soft computing-a case study of Trinidad and Tobago
.
Water Supply
.
doi:10.2166/ws.2020.273
.
(accessed 20 December 2021)
.
Luxhoj
J. T.
,
Riis
J. O.
&
Stensballe
B.
1996
A hybrid econometric-neural network modelling approach for sales forecasting
.
International Journal of Production Economics
43
(
2–3
),
175
192
.
Mekonnen
M.
&
Hoekstra
A.
2016
Four billion people facing severe water scarcity
.
Science Advances
2
(
2
),
1
6
.
Mukhairez
H.
&
El-Halees
A.
2018
Medium-term forecasting for city water demand and revenue
.
International Journal of Intelligent Computing Research
9
(
1
),
921
927
.
Nazari-Sharabian
M.
,
Ahmad
S.
&
Karakouzian
M.
2018
Climate change and eutrophication: a short review
.
Engineering, Technology and Applied Science Research
8
(
6
),
3668
3672
.
Pegels
C. C.
1969
Exponential forecasting: some new variations
.
Management Science
15
(
5
),
311
315
.
Ristow
D.
,
Henning
E.
,
Kalbusch
A.
&
Peterson
C.
2021
Models for forecasting water demand using time series analysis: a case study in southern Brazil
.
Journal of Water, Sanitation and Hygiene for Development
11
(
2
),
231
240
.
Sen
Z. I.
,
Altunkaynak
A.
&
Ozger
M.
2003
Autorun persistence of hydrologic design
.
Journal of Hydrologic Engineering
8
(
6
),
329
338
.
Taylor
J. W.
2003
Exponential smoothing with a damped multiplicative trend
.
International Journal of Forecasting
19
(
4
),
715
725
.
United Nations Children's Fund (UNICEF)
2021
Water Security for all
.
Available from: https://www.unicef.org/media/95241/file/water-security-for-all (accessed 20 December 2021)
.
United Nations – Water
2021
Water Scarcity
.
Available from: https://www.unwater.org/water-facts/scarcity/ (accessed 20 December 2021)
.
Wang
X.
&
Meng
M.
2012
A hybrid neural network and ARIMA model for energy consumption forecasting
.
Journal of Computers
7
(
5
),
1184
1190
.
White
A.
&
Safi
S.
2016
The efficiency of artificial neural networks for forecasting in the presence of autocorrelated disturbances
.
International Journal of Statistics and Probability
5
(
2
),
51
58
.
Willmott
C. J.
1981
On the validation of models
.
Physical Geography
2
(
2
),
184
194
.
Xiong
H.
,
Pandey
G.
,
Steinbach
M.
&
Kumar
V.
2006
Enhancing data analysis with noise removal
.
IEEE Transactions on Knowledge and Data Engineering
18
(
3
),
304
319
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).