Abstract
Water demand prediction is crucial for effective planning and management of water supply systems to handle the problem of water scarcity. Taking into account the uncertainties and imprecisions within the framework of water demand forecasting, the uncertain time series prediction method is introduced for water demand prediction. Uncertain time series is a sequence of imprecisely observed values that are characterized by uncertain variables and the corresponding uncertain autoregressive model (UAR) is employed to describe it for predicting future values. The main contributions of this paper are shown as follows. Firstly, by defining the auto-similarity of uncertain time series, the identification algorithm of UAR model order is proposed. Secondly, a new parameter estimation method based on the uncertain programming is developed. Thirdly, the imprecisely observed values are assumed as the linear uncertain variables and a ratio-based method is presented for constructing the uncertain time series. Finally, the proposed methodologies are applied to model and forecast Beijing's water demand under different confidence levels and compared with the traditional time series, i.e. ARIMA method. The experimental results are evaluated on the basis of performance criteria, which shows that the proposed method outperforms over the ARIMA method for water demand prediction.
HIGHLIGHTS
Considering the uncertainty of water demand, the uncertain time series method for demand estimation of water resources is presented.
The auto-similarity of uncertain time series is defined, and the identification algorithm of uncertain autoregressive model order is proposed.
An uncertain programming approach to estimate the parameters of model is proposed.
The construction of liner uncertain time series is investigated due to the interval-valued data frequently encountered in real life.
INTRODUCTION
Water is an indispensable natural resource on our planet, which plays an important role in man's life and activity. Apart from drinking and personal hygiene, water is still a necessary resource for agricultural and industrial production, economic and ecological development (Deng et al. 2015; Liu et al. 2015). However, due to climate change, socio-economic development and population growth, water consumption (especially for freshwater) is growing rapidly, and water supply is facing many challenges (Choksi et al. 2015), especially the problem of water scarcity (Frederick 1997; Pahl-Wostl 2007; Arnell & Lloyd-Hughes 2014; Wang et al. 2015). This has led to the need for effective planning, managing and operating of finite water resources (Oduro-Kwarteng et al. 2009; Wang et al. 2018). Therefore, water demand forecasting is a fundamental phase for optimal allocation of water resources and aims to provide the simulated view of future demand, which can assist decision makers in devising appropriate management schemes to relieve the conflict between growing demand and limited supply of water resources. To this end, many researchers have proposed different methods to model and forecast water demand.
Time series analysis is one of the commonly used methods for water demand prediction. It was proposed by Box and Jenkins who considered the dependence among the data. The model, namely, ARIMA, is regarded as a classical forecasting technique, describing a predicted value as a linear function of previous data and random errors and including a cyclical or seasonal component. For example, Maidment et al. (1985) applied the time series model of daily municipal water use as a function of rainfall and air temperature for short-term forecasting of daily water use in Austin, Texas. Smith (1988) developed an autoregressive process with randomly varying mean to forecast the daily municipal water use, which captured the seasonality and day-week effects in the model through the unit demand function. Aly & Wanakule (2004) utilized a deterministic smoothing algorithm that considered level, trend and seasonality components of time series to estimate monthly water use. Zhai et al. (2012) employed the time series forecasting method for predicting the future needs of water in Beijing by analyzing the driving mechanism of changes of water consumption and water consumed structure.
In a time series modelling application, the determination of the model order is the fundamental step towards describing any dynamic process and has been of considerable interest for a long time. Primarily, the determination method of the order of time series model is based on the properties of sample autocorrelation coefficient and partial autocorrelation coefficient. Following that, several order selection approaches based on information theoretic criteria, such as Akaike's information criterion (AIC) (Akaike 1974), Akaike's final prediction error (FPE) (Akaike 1970), minimum description length (MDL) (Rissanen 1978; Liang et al. 1993) and so on, have been developed. Another common method, namely, the linear algebraic method based upon the determinant and rank testing algorithms, was proposed in (Fuchs 1987) and Sadabadi et al. (2007). Apart from two methods above, many other methods like bayesian information criterion (BIC) (Schwarz 1978), edge detection-based approach (Al-Smadi & Al-Zaben 2005), optimal instrumental variable (IV) algorithm (Sadabadi et al. 2009) and so on were investigated to estimate the order of the time series model. Another estimation problem that has also been considerably investigated is the aspect of coefficients determination of time series model. Commonly used methods for estimation of unknown coefficients are least-squares (LS) estimator and maximum likelihood estimator (MLE) methods.
Based on the mentioned methods of model order identification and parameters estimation, the time series models can be formulated to forecast water demand. It is worth noting that the aforementioned models provided a single valued forecast of water demand, disregarding the uncertainty inherent in some situations where the influential factors that affect water demand are uncertain, which leads to the uncertainty of water demand. This would limit the usefulness of these deterministic models. One classical way to handle the uncertainty is to use a probabilistic model (Almutaz et al. 2012; Haque et al. 2014) based on the Monte Carlo Stimulations (MCS) to obtain the distribution of water demand and provide an estimate of the overall uncertainty in the predictions connected to uncertainty of influential factors.
Unfortunately, the distribution function obtained in most practical problems is not close enough to the actual frequency, especially in the case of emergencies and lack of history data. In addition, the water demand data possess uncertain characteristics caused by inaccuracies in measurements that need to be given by experts. This motivates us to apply a new mathematical tool to deal with a range of uncertainties inherent in certain water demand data. Recently, uncertainty theory was proposed by Liu (2009) in 2007, which is an effective way to solve previous problem for imprecisely observed values. Based on the uncertainty theory, many researchers have done a lot of work including the determination of uncertain distribution (Wang et al. 2012a; Wang & Peng 2014), hypothesis test (Guo et al. 2017; Ye & Liu 2021), and uncertain regression analysis (Wang et al., 2012b; Lio & Liu 2018; Yao & Liu 2018; Ye & Liu 2020). Furthermore, the concept of uncertain time series was firstly proposed by Yang & Liu (2019) based on uncertain theory in 2019. Like the traditional time series analysis, there may be more than one approach to model time series. However, in their study, to describe uncertain time series, the UAR model was employed to predict the future values based on previously imprecisely observed values that are characterized in terms of uncertain variables. Based on the imprecisely observed values, Yang & Liu (2019) presented the least-squares method to estimate the coefficients of the UAR model for predicting the carbon emission.
However, there are still many important issues that have not been touched. Firstly, the identification of UAR model order is one of these, because it is the first step in estimating the model parameters. In the work of Yang & Liu (2019), the 2-order UAR model was directly employed to forecast the future values. This method is too subjective and lacks a certain theoretical foundation, which may reduce the prediction accuracy of the model. So, in this paper, by defining the auto-similarity of uncertain time series, an algorithm for determining the optimal order of autoregressive model is designed. Secondly, its novel parameter estimation approach is developed based on uncertain programming. Within the proposed method, the original problem including uncertain measure is transformed to the equivalent crisp mathematical programming. Thirdly, in our daily life, most information is uncertain in nature. For example, water demand naturally takes different values with minimum water demand and maximum water demand, which are inherently imprecisely observed values at times t, , respectively, so the linear uncertain variables are selected for this purpose. Hence, it is a critical issue for us to determine the lower and upper bounds for the actual data belonging to a range. That is, how to construct an uncertain time series based on observed historical point data. Referring to the work of Huarng (2006), we introduced a novel ratio-based approach to determine the effective uncertain time series. Furthermore, the proposed uncertain time series forecasting approach is used to predict the urban water demand. As the second-largest city of China, Beijing's rapid development has attracted many immigrants in recent years and water consumption is growing rapidly, which led to the increasingly sharp conflict between demand and supply of water resources. This situation has become an important constraint on the sustainable development of Beijing. Therefore, the water demand prediction of Beijing is a fundamental stage for water resources planning and utilization, which contributes to harmonious development between the socio-economy and resources environment in Beijing. To further verify the accuracy of the proposed methodologies, traditional time series method is selected as a competitor. The results are judged on the basis of presented criteria, i.e. the prediction reliability and accuracy compared to ARIMA method.
The organization of this paper is as follows. Section 2 briefly presents some fundamental concepts properties and theorems in uncertainty theory. Section 3 introduces the forecasting procedure of uncertain time series analysis. Section 4 provides an experimental analysis to validate the effectiveness of the proposed method and access its performance by comparing with the conventional time series (ARIMA) method. Finally, some conclusions are drawn.
PRELIMINARIES
In this section, we will present some fundamental definitions and theorems on uncertainty theory.
Definition 1. (Liu 2007) Letbe a nonempty set, andbe aσ-algebra over. Each elementis called an event. A numberindicates the belief degree thatwill occur. Thenis called an uncertain measure if it satisfies the following axioms:
Axiom 1:(Normality Axiom)for the nonempty set.
Axiom 2:(Duality Axiom)for any event.
In this case, the tripletis called an uncertainty space.
Then the product uncertain measure on the product σ-algebra was defined by Liu (2009), producing the fourth axiom of uncertain measure.
The concept of uncertain variable was introduced by Liu as a measurable function from an uncertainty space to the set of real numbers.
Definition 7. (Liu 2010a) Letbe an uncertain variable with regular uncertainty distribution. Then the inverse functionis called the inverse uncertainty distribution of.
UNCERTAIN TIME SERIES FORECASTING METHOD
Uncertain time series was proposed by Yang & Liu (2019) in 2019 so as to predict the future values based on previously imprecisely observed values that are described by uncertain variables. The basic definition of uncertain time series is as follows.
The determination of the order of UAR model
Basic definitions
Based on the above distance measure, the average distance between two uncertain variables in the uncertain time series is defined as below.
Definition 9.Average distance
To further provide ease of use, the study applies the following definition to make some adjustments.
Definition 10.Auto-similarity of uncertain time series
It is clear that . Definition 10 shows that the greater is, the higher the similarity between the uncertain variables and is.
Model order selection algorithm
According to the above definitions, an algorithm for determining the appropriate order of UAR model is presented as follows.
Step 1. We set as an alternative order.
Step 2. Determine the confidence level .
Step 3. If , then select the order to add the set of alternative order numbers. That is, enter into Step 4.
If and , then choose the order as the optimal order.
If and , then into Step 4.
Step 4. Set and return to Step 2.
Step 5. Optimal order obtained from Step 1 to Step 4 is regarded as the order of UAR model.
Step 6. If we cannot find the effective order until the predetermined experimental order m is reached, then let .
We want to note again that the proposed algorithm is also presented in Figure 2.
A new parameter estimation method based on uncertain programming
Proof: It follows from Theorem 4 immediately.
The construction of uncertain time series
Uncertain time series was proposed so as to deal with such forecasting problems where the historical data are not crisp numbers but are imprecisely observed values. Besides, in practical cases, most traditional point data possess uncertainty characteristics due to the measurement errors. For instance, the water demand variables that naturally take a finite set of numerical values varying between a lower and upper bound are regarded as interval-valued variables, which stand for the inaccuracies in measurements. For doing so, each interval-valued variable is assumed as the linear uncertain variable. However, for the same uncertain time series model, the difference of constructed intervals by adopting different ways can result in different forecasting performance. So, how to use an efficient way to choose effective length of interval is especially critical to improve uncertain time series forecasting performance. A key point in determining the proper length of interval is that they should not too large or small. When an effective length of interval is too wide, the prediction results will be meaningless in the uncertain time series. If the length is too small, the uncertain time series will become very close to the traditional time series and the result is not intended. On the other hand, many traditional time series have the momentum to vibrate in a certain period of time. Therefore, in the process of constructing uncertain time series, we should consider the trend information of data of time series itself, which makes the determined interval series more reasonable and really reflects the variation tendency of data of time series. By following the two requirements, in this subsection, we propose a new ratio-based approach to determine the length of interval to obtain the high forecasting accuracy. The step of the algorithm of the method presented can be given as follows:
Step 1. Take the first order of differences between any two consecutive observations for any and , .
Step 2. Calculate relative differences for all t, .
Step 3. Determine the lower and upper bounds of the initial value.
Step 4. Determine the lower and upper bounds on the interval series.
be the lower and upper bounds for observation at time t, respectively.
CASE STUDY
This section presents the application of the proposed methods for water demand forecast in Beijing. In Section 4.1, the location and dataset used in model development are given. Section 4.2 provides the implementations of the proposed uncertain time series model. For the purpose of comparison, the ARIMA model is selected to contrast the forecasting performance, and the classical measure methods are adopted to evaluate the forecast accuracy of the models in Section 4.3.
Location and dataset
In this work, the study area is located in Beijing. As the capital of China, Beijing is China's political, cultural, and international communication center, located at the interlaced terrace of North China Plain and Mongolian Plateau. In Beijing, drinking water is mainly supplied by the Yongdinghe and Chaobaihe rivers. As a result of China's rapid development and dense population, Beijing's water demand consumption is increasing rapidly and Beijing is experiencing a shortage of water resources. According to the Beijing Water Authority (BWA), the annual water resources per capita is less than 300 m3, which is only 12.5% of the national average and far below the internationally recognized minimum standard of 1000 m3 per year. This situation of increasing water demands and limited water resource supplies has also become the vital restrictive factor affecting the socio-economic development and environmental health of Beijing for a long time into the future. Therefore, it is particularly important to apply the proposed method to forecast the water demand in Beijing, which contributes to the water resources planning and management in the near future.
All the research dataset of uncertain time series were obtained from Beijing Water Resources Bulletin during the time period between 1988 and 2016. A total of 29 data points were collected and are shown in Table 1. In order to illustrate the effectiveness of the proposed uncertain time series method, the data from 1988 to 2013 are used as an estimation sample to determine the coefficients of the estimation model, while the rest of data are reserved as the hold-out sample, used to test the model and access the performance of prediction.
Observation . | Time . | Total water demand . | First order difference . | Relative difference . | . | . | . |
---|---|---|---|---|---|---|---|
1 | 1988 | 424 | 406 | 442 | |||
2 | 1989 | 446 | 22 | 0.05189 | 427 | 465 | |
3 | 1990 | 411 | −35 | −0.0785 | 394 | 428 | |
4 | 1991 | 423 | 12 | 0.0292 | 405 | 441 | |
5 | 1992 | 464 | 41 | 0.0969 | 444 | 484 | |
6 | 1993 | 452 | −12 | −0.0259 | 433 | 471 | |
7 | 1994 | 459 | 7 | 0.0155 | 440 | 478 | |
8 | 1995 | 449 | −10 | −0.0218 | 430 | 468 | |
9 | 1996 | 400 | −49 | −0.1091 | 383 | 417 | |
10 | 1997 | 403 | 3 | 0.0075 | 386 | 420 | |
11 | 1998 | 404 | 1 | 0.0025 | 387 | 421 | |
12 | 1999 | 417 | 13 | 0.0322 | 399 | 435 | |
13 | 2000 | 400 | −17 | −0.0408 | 383 | 417 | |
14 | 2001 | 389 | −11 | −0.0275 | 372 | 406 | |
15 | 2002 | 346 | −43 | −0.1105 | 331 | 361 | |
16 | 2003 | 358 | 12 | 0.0347 | 343 | 373 | |
17 | 2004 | 346 | −12 | −0.0335 | 331 | 361 | |
18 | 2005 | 345 | −1 | −0.0029 | 330 | 360 | |
19 | 2006 | 343 | −2 | −0.0058 | 328 | 358 | |
20 | 2007 | 348 | 5 | 0.0146 | 333 | 363 | |
21 | 2008 | 351 | 3 | 0.0086 | 336 | 366 | |
22 | 2009 | 355 | 4 | 0.0114 | 340 | 370 | |
23 | 2010 | 352 | −3 | −0.0085 | 337 | 367 | |
24 | 2011 | 360 | 8 | 0.0227 | 345 | 375 | |
25 | 2012 | 359 | −1 | −0.0028 | 344 | 374 | |
26 | 2013 | 364 | 5 | 0.01393 | 349 | 379 |
Observation . | Time . | Total water demand . | First order difference . | Relative difference . | . | . | . |
---|---|---|---|---|---|---|---|
1 | 1988 | 424 | 406 | 442 | |||
2 | 1989 | 446 | 22 | 0.05189 | 427 | 465 | |
3 | 1990 | 411 | −35 | −0.0785 | 394 | 428 | |
4 | 1991 | 423 | 12 | 0.0292 | 405 | 441 | |
5 | 1992 | 464 | 41 | 0.0969 | 444 | 484 | |
6 | 1993 | 452 | −12 | −0.0259 | 433 | 471 | |
7 | 1994 | 459 | 7 | 0.0155 | 440 | 478 | |
8 | 1995 | 449 | −10 | −0.0218 | 430 | 468 | |
9 | 1996 | 400 | −49 | −0.1091 | 383 | 417 | |
10 | 1997 | 403 | 3 | 0.0075 | 386 | 420 | |
11 | 1998 | 404 | 1 | 0.0025 | 387 | 421 | |
12 | 1999 | 417 | 13 | 0.0322 | 399 | 435 | |
13 | 2000 | 400 | −17 | −0.0408 | 383 | 417 | |
14 | 2001 | 389 | −11 | −0.0275 | 372 | 406 | |
15 | 2002 | 346 | −43 | −0.1105 | 331 | 361 | |
16 | 2003 | 358 | 12 | 0.0347 | 343 | 373 | |
17 | 2004 | 346 | −12 | −0.0335 | 331 | 361 | |
18 | 2005 | 345 | −1 | −0.0029 | 330 | 360 | |
19 | 2006 | 343 | −2 | −0.0058 | 328 | 358 | |
20 | 2007 | 348 | 5 | 0.0146 | 333 | 363 | |
21 | 2008 | 351 | 3 | 0.0086 | 336 | 366 | |
22 | 2009 | 355 | 4 | 0.0114 | 340 | 370 | |
23 | 2010 | 352 | −3 | −0.0085 | 337 | 367 | |
24 | 2011 | 360 | 8 | 0.0227 | 345 | 375 | |
25 | 2012 | 359 | −1 | −0.0028 | 344 | 374 | |
26 | 2013 | 364 | 5 | 0.01393 | 349 | 379 |
Methodologies implementations
Construction of uncertain time series
As mentioned previously, water demand data are not crisp numbers but imprecisely observed values, so the data pre-processing is essential. For the purpose of implementation, we utilize the linear uncertain variables to describe the water demand observations and construct the uncertain time series by following the algorithm in the previous subsection:
- (1)
Take the first order of differences between any two consecutive observations, which are listed in the fourth column of Table 1.
- (2)
Calculate the relative differences between any two consecutive observations. All the relative differences are listed in the fifth column of Table 1.
- (3)
Determine the lower and upper bounds of the interval-valued series. Firstly, the initial interval is calculated as follows.
The determination of the model order
In the model order selection phase, different experimental orders are examined based on the definition of auto-similarity of uncertain time series and the best one among them is selected. Generally, we set the maximum lagging order . The details of the calculation can be found in the following paragraphs.
By using the algorithm in Section 3.1, we can find the appropriate order.
Let . Because and . At the same time , thus the order of UAR model is .
The parameter estimation for the UAR model
According to different confidence levels (i.e. 0.75, 0.85, 0.90), different regression models can be established to compare with the existing traditional time series model.
Comparison with the existing method
Time . | Actual values . | ARIMA model . | UAR model . | ||
---|---|---|---|---|---|
. | . | . | |||
2014 | 375 | 361.59 | 375.64 | 383.74 | 385.74 |
2015 | 382 | 359.17 | 385.57 | 400.56 | 404.32 |
2016 | 388 | 356.75 | 398.53 | 416.54 | 422.04 |
Time . | Actual values . | ARIMA model . | UAR model . | ||
---|---|---|---|---|---|
. | . | . | |||
2014 | 375 | 361.59 | 375.64 | 383.74 | 385.74 |
2015 | 382 | 359.17 | 385.57 | 400.56 | 404.32 |
2016 | 388 | 356.75 | 398.53 | 416.54 | 422.04 |
Therefore, the prediction performance are shown in Table 3 and Figure 3. From the experimental results obtained, it can be concluded that the proposed uncertain time series forecasting method has better forecasting performance than the ARIMA method under the considered levels.
Time . | Absolute error . | Relative error . | ||||||
---|---|---|---|---|---|---|---|---|
ARIMA model . | UAR model . | ARIMA model . | UAR model . | |||||
. | . | . | . | . | . | |||
2014 | 13.41 | 0.64 | 8.74 | 10.74 | 0.0358 | 0.0017 | 0.0233 | 0.0286 |
2015 | 22.83 | 3.57 | 18.56 | 22.32 | 0.0598 | 0.0093 | 0.0486 | 0.0584 |
2016 | 31.25 | 10.53 | 28.54 | 34.04 | 0.0805 | 0.0271 | 0.0736 | 0.0877 |
Total predicted error | 67.49 | 14.74 | 55.84 | 67.09 | ||||
Maximum relative error | 0.0805 | 0.0271 | 0.0736 | 0.0877 | ||||
Average relative error | 0.0587 | 0.0127 | 0.0485 | 0.0583 |
Time . | Absolute error . | Relative error . | ||||||
---|---|---|---|---|---|---|---|---|
ARIMA model . | UAR model . | ARIMA model . | UAR model . | |||||
. | . | . | . | . | . | |||
2014 | 13.41 | 0.64 | 8.74 | 10.74 | 0.0358 | 0.0017 | 0.0233 | 0.0286 |
2015 | 22.83 | 3.57 | 18.56 | 22.32 | 0.0598 | 0.0093 | 0.0486 | 0.0584 |
2016 | 31.25 | 10.53 | 28.54 | 34.04 | 0.0805 | 0.0271 | 0.0736 | 0.0877 |
Total predicted error | 67.49 | 14.74 | 55.84 | 67.09 | ||||
Maximum relative error | 0.0805 | 0.0271 | 0.0736 | 0.0877 | ||||
Average relative error | 0.0587 | 0.0127 | 0.0485 | 0.0583 |
As far as the comparison between the proposed method under the 0.75 and 0.85 confidence levels and the ARIMA method is concerned, the former outperforms the latter in all cases. The total prediction error is reduced by 78.15 and 17.26% respectively, and the average relative error by 78.36 and 17.38% respectively. In addition, for the prediction error of each observation, the proposed method under the 0.75 and 0.85 levels is smaller than the ARIMA method. Especially for the associated 0.75 confidence level, the improvements of forecasting performance are more obvious.
When considering the comparison between the proposed method under the 0.90 confidence level and the ARIMA method in all cases, we can see that the maximum prediction error of the proposed method is a little higher than the ARIMA method. Overall, the former almost wins. The total prediction error is reduced by 0.59% and the average relative error by 0.68%. This reduction is crucial in the planning and management of water supply systems.
Apart from the statistical criteria discussed above, we implement the forecasting trend to evaluate the performance of the above methods. According to the current trend, Beijing's total water demand was increasing from 2014 to 2016. However, the prediction trend of the ARIMA method was declining. This is not in accordance with the reality of Beijing's total water demand during 2014 and 2016. So, these predicted results may not provide support to the water resource management in the near future. It is worthy to note that the predicted results of the proposed method could reflect the realistic water demand trend in the short term and help the decision makers to devise reasonable management schemes.
CONCLUSION
In this study we presented a modified time series method for demand estimation of water resources in Beijing. Considering the uncertainty of water demand in real life, we attempted to combine the uncertainty theory with a time series model, called uncertain time series, to handle the above problems. In the presented method, we employed the UAR model to describe uncertain time series for predicting future values. First, the auto-similarity of uncertain time series, as a principle of justifiable recognition, is defined and the identification algorithm of determining the optimal model order is proposed, which enables the estimation of the correct parameters of the model. Second, we propose an uncertain programming approach for estimating the parameters of the model. Then, the imprecisely observed values are assumed as the linear uncertain variables and a ratio-based method is presented for constructing the uncertain time series. Finally, we tested the performance of the proposed model and the traditional time series model (ARIMA) based on the statistical criteria. The results demonstrated that the proposed model provided much better accuracy over the traditional model mentioned above for water demand predictions. The possible reason is that the traditional model cannot effectively handle the imprecisely observed values, this allows the possibility of the loss of effective information, which leads to the reduction of prediction accuracy.
Although the proposed UAR model has greatly improved the traditional water demand time series method, there are still some limitations which need to be improved, such as the determination of the model order and the construction of the uncertain time series. In the future, we will further study the algorithm optimization of model order, and provide better solutions for the UAR model applications which improve the accuracy of forecast. On the other hand, we only investigate the construction of linear uncertain time series due to the interval-valued data that are encountered frequently in multiple situations. Further study may attempt to construct the normal uncertain time series to expand the application field of the model.
ACKNOWLEDGEMENTS
This work was supported by the National Natural Science Foundation of China (No.61873084) and the Foundation of Hebei Education Department (No. ZD2017016).
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.