Abstract

Reliability and validity of model prediction play a decisive role in water resource simulation and prediction. Among many prediction models, the combined model (CM) is widely used because it can combine the prediction results of multiple single models and make full use of the information provided by various methods. CM is an effective method to improve the predictive veracity but the weight of single model estimation is the key to the CM. Previous studies take errors as the objective function to calculate the weight, and the uncertainty of the weight of the individual model cannot be considered comprehensively. In order to consider the uncertainty of the weight and to improve universal applicability of the CM, in this paper, the authors intend the Markov chain Monte Carlo based on adaptive Metropolis algorithm (AM-MCMC) to solve the weight of a single model in the CM, and obtain the probability distribution of the weight and the joint probability density of all the weight. Finally, the optimal weight combination is obtained. In order to test the validity of the established model, the author put it into the prediction of monthly groundwater level. The two single models in the CM are time series analysis model (TSAM) and grey model (GM (1,1)), respectively. The case study showed that the uncertainty characteristic of the weight in the CM can be obtained by AM-MCMC. According to the study results, CM has obtained a least average root mean square error (RMSE) of 0.85, a mean absolute percentage error (MAPE) of 8.61, and a coefficient of determination (R2) value of 0.97 for the studied forecast period.

INTRODUCTION

Model construction is not only a key technology of water resources simulation and prediction, but also an important link with water resources management and sustainable utilization. The traditional approach to forecasting involves choosing the single forecasting method judged most appropriate of the available methods (Chambers et al. 1971) and applying it in some specific situations. The main methods of water resource simulation and prediction are time series analysis (Tingsanchali & Gautam 2015; Dastorani et al. 2016) and system analysis (Nayak et al. 2004; Dejamkhooy et al. 2017). In general, the choice of a method depends upon the characteristics of the series and the type of application (Makridakis et al. 1983). Therefore, using different prediction methods will achieve different prediction results. If the prediction error is simply used as the evaluation standard, some useful information will be lost (Guo et al. 2009). For example, some models like grey model (GM (1,1)) are not suitable for solving the periodic time series (Tseng et al. 2001), and other models like time series analysis model (TSAM) are poor to deal with the grey problem with insufficient data, small sample, complete information and lack of experiences (Chang & Tsai 2008). Hence the full expression of the series of features cannot be achieved by solely relying on a single forecasting method. It is the reason why the combined model (CM) was put forward. The CM that combines the different forecasting methods appropriately was first proposed by Bates & Granger (1969), which eliminates the problem that individuals have no choice but to use a single method and rely exclusively on its forecasts. Whereas there is a key problem in the CM that it is difficult to estimate the weight coefficient of the individual model (Liu et al. 2013). CM based on entropy weight (EW-CM) and set pair analysis (SPA-CM) was a common method to solve the weight of single model in the CM (Yuan et al. 2014). The core of common method is to construct the objective function by means of absolute error or relative error, and the objective function is minimized under certain criteria. However, the simple arithmetic average method cannot accurately express the uncertainty of the weight of a single model. Moreover, the prediction results of each model, whether a single model or synthetic model, can provide only the deterministic prediction value of the hydrological process, and the uncertainty of the prediction results cannot be quantitatively described; thus, the accurate prediction cannot be obtained (Liang et al. 2013).

Solution of the weights used in combining forecasts can be based on posterior model probabilities within a Bayesian framework. This procedure, which is typically referred to as Bayesian model averaging (BMA), is a standard approach to model uncertainty within the Bayesian paradigm, where it is natural to reflect uncertainty through probability (Steel 2011). Therefore, by applying the BMA method to obtain the weight of the CM, the uncertainty of the weight of the individual model can be considered (Sloughter et al. 2007). There are two methods in BMA (Tian et al. 2012): expected to maximize (EM) and the Markov chain Monte Carlo algorithm (MCMC). The MCMC method can overcome the defect of the EM method (Gibbons et al. 2008), especially for two-parameter expressions, and the global optimal solution can be obtained visually. The convergence rate of the MCMC based on the adaptive Metropolis algorithm (AM-MCMC) was the fastest with the same accuracy (Marshall et al. 2004) compared with other sampling algorithms.

Given the uncertainty of the weight of a single model in the CM, a TSAM (Dastorani et al. 2016) and grey model (GM (1, 1)) (Dejamkhooy et al. 2017) are selected as the single models in the study problems in this work, which vary from simple to complex. In addition, the input and output are the same, but the basic theories are different for the TSAM and GM (1, 1) (see the Supplementary file, available with the online version of this paper). Under the theoretical framework of BMA, the AM-MCMC was applied to obtain the weight of the CM to construct the AM-MCMC-CM. Not only the optimal weight combination can be obtained, but the uncertainty of the weight of the individual model is taken into consideration. To verify the validity of the AM-MCMC-CM, it is applied to predict the groundwater level of Daxing Farm on the Sanjiang Plain in Heilongjiang Province, Northeast China. The results are compared with the statistics from the SPA-CM and another two single models.

DATA SET AND STUDY AREA

The Sanjiang Plain, an alluvial flat formed by the Heilongjiang River, Songhua River and Wusulijiang River, is located in northeastern Heilongjiang Province, China. The area of the Sanjiang Plain is 108,900 km2. The mean annual volume of groundwater resources is 85.56 × 108 m3. Recently, paddy planting has developed rapidly, and groundwater utilization for irrigation has increased annually. The groundwater level of the Sanjiang Plain has considerably changed. Therefore, it is important to analyze and forecast groundwater levels, which can provide references for decision making and groundwater utilization in the Sanjiang Plain Irrigation Area. Daxing Farm (132°18′ E–133°18′ E, 46°49′ N–47°13′ N), one of 14 large irrigation areas that encompasses 800 km2, was selected as a typical study area in the irrigation region of the Sanjiang Plain. The geographical location of the study area is shown in Figure 1(a). The average monthly groundwater level at this farm over 17 years (January 1997 to December 2013) was obtained based on data collected at three groundwater monitoring wells (A (133°02′14.68″ E, 46°59′14.09″ N), B (133°10′44.7″ E, 47°07′44.75″ N), and C (132°54′41.39″ E, 47°04′26.48″ N)) by the Water Affairs Bureau of Heilongjiang Province. As shown in Figure 1(b), the time series variation of the average monthly groundwater level was given.

Figure 1

The location of Daxing Farm and time series of the groundwater level. (a) The location of Daxing Farm. (b) Time series of the groundwater level.

Figure 1

The location of Daxing Farm and time series of the groundwater level. (a) The location of Daxing Farm. (b) Time series of the groundwater level.

RESEARCH METHODS

Establishment of a new CM

In this paper, the AM-MCMC-CM steps are as follows (Haario 2001):

  • (1)

    First, initialize t = 0.

  • (2)

    Randomly produce an initial state (w1, w2) within [0,1] and use it as follows:

    • (a)
      Calculate the covariance Ct according to Equation (1); 
      formula
      (1)
      where C0 is a diagonal matrix and is equal to 10% of the length of the search range; sd is a scaling parameter that depends on only dimension d, ensuring an acceptance probability in a reasonable range; is the sample that has been sampled; ɛ is a constant; and Id denotes a d-dimensional identity matrix. Gelman et al. (1996) suggested that sd equals 2.42/d.
    • (b)

      Produce a candidate state w1# and w2#.

    • (c)
      Calculate an acceptance probability α according to Equation (2); 
      formula
      (2)
      where σ is the variance of the residual of the combined prediction value and the observed value.
    • (d)

      Generate a uniform random number u from (0, 1).

    • (e)

      If u < α, then accept w1# and w2#; otherwise, reject w1# and w2#.

  • (3)

    For t = t + 1, repeat (a)–(e) until the number of samples is sufficiently large.

  • (4)
    According to the criterion of adaptive Metropolis convergence (Gelman & Rubin 1992), when AM-MCMC converges, the maximum value of the joint density fmax(w1#, w2#) of w1 and w2 can be determined at the optimal point (w1*, w2*) or the mean values of w1# and w2#. The CM can be constructed as follows: 
    formula
    (3)
    where is the output of the CM based on AM-MCMC, is the output of an individual model (TSAM), and is the output of the other individual model (GM (1, 1)).

The basic block diagram of the proposed prediction approach is shown in Figure 2.

Figure 2

Basic building blocks of the AM-MCMC-CM for groundwater level forecasts.

Figure 2

Basic building blocks of the AM-MCMC-CM for groundwater level forecasts.

Model performance evaluation

Three statistical indicators, the mean relative error (MAPE), root mean square error (RMSE), and coefficient of determination (R2), were used to test the performance of the CM (Ramana et al. 2013; Soleymani et al. 2016).

RESULTS AND ANALYSIS

Stochastic analysis of weights

The weight set (w1, w2) of the two individual models above was sampled using the AM-MCMC algorithm. The iteration was set to 6,000 in the initial period, and parallel sampling was performed five times. Twenty thousand samples were produced at one time; therefore, (20,000–6,000) × 5 = 70,000(w1,w2) sets were generated and used to study the weighting uncertainty of the CM. As shown in Figure 3(a), the value of R1/2 decreases to less than 1.02 and steadily declines to 1 after the 6,000th iteration. This result suggests that the multiple sampling series converged, i.e. the statistical characteristics of the samples have converged the ensemble. As shown in Figure 3(b) and 3(c), the posterior marginal densities of the CM weights are w1N (0.6147, 0.212) and w2N (0.3853, 0.172). Meanwhile, the posterior joint probability densities of the two weight parameters are shown in Figure 3(d), and the weights reached the global maximum joint density at the intersection of the mean values (w1*,w2*) = (0.6147 0.3853); thus, f(w1*,w2*)max = 7.8.

Figure 3

Iteration trace of R1/2, and the posterior marginal densities and posterior joint probability densities of the CM weights. (a) Iteration trace of R1/2. (b) Histogram of the posterior density of w1. (c) Histogram of the posterior density of w2. (d) Posterior joint density of w1 and w2.

Figure 3

Iteration trace of R1/2, and the posterior marginal densities and posterior joint probability densities of the CM weights. (a) Iteration trace of R1/2. (b) Histogram of the posterior density of w1. (c) Histogram of the posterior density of w2. (d) Posterior joint density of w1 and w2.

Calculation results based on AM-MCMC-CM

Through the above calculation, the model expression of the TSAM, GM (1, 1), AM-MCMC-CM, SPA-CM are given in Table 1. In Equations (4)–(7), t represents the time series. Finally, the corresponding prediction value is obtained according to the model expression.

Table 1

The equations used in models of groundwater level prediction

TypeModel expression
TSAM (Dastorani et al. 2016 
formula
 
(4) 
GM (1, 1) (Tseng et al. 2001 
formula
 
(5) 
AM-MCMC-CM  
formula
 
(6) 
SPA-CM (Guo et al. 2009 
formula
 
(7) 
TypeModel expression
TSAM (Dastorani et al. 2016 
formula
 
(4) 
GM (1, 1) (Tseng et al. 2001 
formula
 
(5) 
AM-MCMC-CM  
formula
 
(6) 
SPA-CM (Guo et al. 2009 
formula
 
(7) 

The scatter plots of observed values and the predictions of the four models in the forecast period (2011–2013) is shown in Figure 4. As shown in Figure 4, the calculated values of the four types of prediction models exhibit good correlation with the measured groundwater levels. Among the models, the prediction of TSAM and GM (1, 1) are quite similar, but the effects of the two single models are very different in low and high groundwater level prediction, meaning that the choice of the two single models is successful. Additionally, this also explains the necessity of using a CM. It can be seen that the predictive effect of the CM is clearly better than that of the single model from the scatter graph.

Figure 4

Scatter plots of observed and predicted groundwater levels at Daxing Farm from 2011–2013: (a) TSAM, (b) GM (1, 1), (c) SPA-CM, and (d) AM-MCMC-CM.

Figure 4

Scatter plots of observed and predicted groundwater levels at Daxing Farm from 2011–2013: (a) TSAM, (b) GM (1, 1), (c) SPA-CM, and (d) AM-MCMC-CM.

Model evaluation and discussion

The model evaluation indicators MAPE, RMSE and R2 introduced in section ‘model performance evaluation’ were used to further compare the accuracies of the AM-MCMC-CM, SPA-CM, TSAM and GM (1, 1) models. The evaluation indicator values of the four models in the calibration period (1997–2010) and forecast period (2011–2013) are listed in Table 2, and the predicted (dashed lines) and observed (solid lines) levels in the forecast period (2011–2013) are plotted in Figure 4.

Table 2

The evaluation indicator values of different models

PeriodCalibration period (1997–2010)
Forecast period (2011–2013)
typeAM-MCMC-CMSPA-CMTSAMGM (1, 1)AM-MCMC-CMSPA-CMTSAMGM (1, 1)
MAPE 8.24% 9.00% 9.52% 10.98% 8.61% 9.31% 10.36% 13.34% 
RMSE 0.76 0.79 0.84 0.97 0.85 0.89 0.99 0.99 
R2 0.98 0.96 0.91 0.88 0.97 0.95 0.89 0.87 
PeriodCalibration period (1997–2010)
Forecast period (2011–2013)
typeAM-MCMC-CMSPA-CMTSAMGM (1, 1)AM-MCMC-CMSPA-CMTSAMGM (1, 1)
MAPE 8.24% 9.00% 9.52% 10.98% 8.61% 9.31% 10.36% 13.34% 
RMSE 0.76 0.79 0.84 0.97 0.85 0.89 0.99 0.99 
R2 0.98 0.96 0.91 0.88 0.97 0.95 0.89 0.87 

As shown in Table 2, in the calibration period (1997–2010), the MAPE, RMSE and R2 were 9.52%, 0.84, and 0.91, respectively, for TSAM, and 10.98%, 0.97, and 0.88, respectively, for GM (1, 1). Additionally, as shown in Figure 4, TSAM yielded a better fit with observations than did GM (1, 1), which implies that TSAM should be assigned a larger weight than GM (1, 1). This result is in accordance with the CM results based on Equations (6) and (7). Likewise, the results of the analysis of correlation and the dispersion degree in the previous section ‘Calculation results based on AM-MCMC-CM’ suggested that the three evaluation indicators of the CM are better than those of each individual model from 1997 to 2013, as shown in Table 2. Furthermore, the results indicate that the CM is more accurate than individual models in predicting the groundwater level.

The AM-MCMC-CM results for the calibration period yielded MAPE, RMSE and R2 values of 8.24%, 0.76 and 0.98, respectively. Thus, AM-MCMC-CM produced better results for almost all the evaluation indicators considered in this study compared to those of the other models. Both AM-MCMC-CM and SPA-CM produced linear correlation coefficient (CC) values greater than 0.85 based on the associated scatter plots (Figure 4). In addition, a strong correlation exists between the observed and simulated groundwater levels of both models. However, a higher degree of correlation was obtained by AM-MCMC-CM than by SPA-CM (not considering the uncertainty of the weights of individual models).

Through statistical and graphical analysis and comparison of evaluation indicators, the prediction accuracy of the CM was better than that of individual models. AM-MCMC-CM performed better than SPA-CM for groundwater level prediction; notably, the RMSE was 4.0% lower in the calibration period (1997–2010) and 5% lower in the forecast period (2011–2013). In addition, the MAPE decreased by 0.76% and 0.70%, and R2 increased by 2.0% and 2.1%, respectively. AM-MCMC-CM considers the weighting uncertainty of individual models to reduce the random interference of the weights of individual models caused by observed data limitations. Notably, the weight of TSAM, which was the most accurate individual model, was increased from 0.5629 in SPA-CM to 0.6147 in AM-MCMC-CM. Overall, AM-MCMC-CM exhibited higher reliability and accuracy than SPA-CM in groundwater level forecasting.

CONCLUSIONS

The AM-MCMC-CM is based on the AM-MCMC algorithm to consider the uncertainty of the weight of a single model in the CM and applied to predict the groundwater level to verify the effectiveness of the new model. The conclusions are as follows:

  • (1)

    The AM-MCMC-CM can account for the uncertainty and equifinality of the different weights of individual models. Furthermore, it can obtain the joint probability distribution and statistical characteristics of weights of individual models used in the CM.

  • (2)

    AM-MCMC-CM can effectively reduce the influence of the prediction error of individual models on the prediction results of the CM compared with that of the SPA-CM. Moreover, the proposed model weakens the dependence on a limited number of samples in the determination of weights.

  • (3)

    AM-MCMC-CM is more applicable and provides better prediction accuracy than that of the TSAM, GM (1, 1) and SPA-CM. Thus, AM-MCMC-CM can improve the accuracy of groundwater level prediction.

ACKNOWLEDGEMENTS

This research was supported by the National Key R&D Program of China (No. 2017YFC0406004), the National Natural Science Foundation of China (No. 51109036), the Natural Science Foundation of Heilongjiang Province of China (No. E2015024), the Research Fund for the Doctoral Program of Higher Education of China (No. 20112325120009), the Projects for Science and Technology Development of Water Conservancy Bureau in Heilongjiang Province of China (No. 201402, No. 201404, No. 201501), and the Academic Backbones Foundation of Northeast Agricultural University (No. 16XG11).

REFERENCES

REFERENCES
Bates
J. M.
&
Granger
C. W. J.
1969
The combination of forecasts
.
Journal of the Operational Research Society
20
(
4
),
451
468
.
Chambers
J. C.
,
Mullick
S. K.
&
Smith
D. D.
1971
How to choose the right forecasting technique
.
Harvard Business Review
49
(
4
),
45
74
.
Dastorani
M.
,
Mirzavand
M.
,
Dastorani
M. T.
&
Sadatinejad
S. J.
2016
Comparative study among different time series models applied to monthly rainfall forecasting in semi-arid climate condition
.
Natural Hazards
81
(
3
),
1811
1827
.
Dejamkhooy
A.
,
Dastfan
A.
&
Ahmadyfard
A.
2017
Modeling and forecasting nonstationary voltage fluctuation based on grey system theory
.
IEEE Transactions on Power Delivery
32
(
3
),
1212
1219
.
Gelman
A.
&
Rubin
D. B.
1992
Inference from iterative simulation using multiple sequences
.
Statistical Science
7
(
4
),
457
472
.
Gelman
A.
,
Roberts
G. O.
&
Gilks
W. R.
1996
Efficient Metropolis Jumping Rules. In: Bayesian Statistics
.
Gibbons
J. M.
,
Cox
G. M.
,
Wood
A. T. A.
,
Craigon
J.
,
Ramsden
S. J.
,
Tarsitano
D.
&
Crout
N. M. J.
2008
Applying Bayesian Model Averaging to mechanistic models: an example and comparison of methods
.
Environmental Modelling & Software
23
(
8
),
973
985
.
Guo
Y.
,
Jin
J.
&
Liang
Z.
2009
Combined prediction model for regional water demand based on set pair analysis
.
Advances in Science and Technology of Water Resources
29
(
5
),
42
45
.
Haario
H.
2001
An adaptive Metropolis algorithm
.
Bernoulli
7
(
2
),
223
242
.
Liang
Z.
,
Wang
D.
,
Guo
Y.
&
Zhang
Y.
2013
Application of Bayesian model averaging approach to multimodel ensemble hydrologic forecasting
.
Journal of Hydrologic Engineering
18
(
11
),
1426
1436
.
Liu
D.
,
Wei
Y.
,
Yang
S.
&
Guan
Z.
2013
Electricity price forecast using combined models with adaptive weights selected and errors calibrated by hidden Markov model
.
Mathematical Problems in Engineering
2013
(
3
),
1
8
.
Makridakis
S. G.
,
Wheelwright
S. C.
&
McGee
V. E.
1983
Forecasting: Methods and Applications[M]//Forecasting, Methods and Applications/
.
Wiley
, pp.
147
156
.
Marshall
L.
,
Nott
D.
&
Sharma
A.
2004
A comparative study of Markov chain Monte Carlo methods for conceptual rainfall-runoff modeling
.
Water Resources Research
40
(
2
),
183
188
.
Nayak
P. C.
,
Sudheer
K. P.
,
Rangan
D. M.
&
Ramasastri
K. S.
2004
A neuro-fuzzy computing technique for modeling hydrological time series
.
Journal of Hydrology
291
(
1
),
52
66
.
Ramana
R. V.
,
Krishna
B.
,
Kumar
S. R.
&
Pandey
N. G.
2013
Monthly rainfall prediction using wavelet neural network analysis
.
Water Resources Management
27
(
10
),
3697
3711
.
Sloughter
J. M.
,
Raftery
A. E.
,
Gneiting
T.
&
Fraley
C.
2007
Probabilistic quantitative precipitation forecasting using Bayesian model averaging
.
Monthly Weather Review
135
(
9
),
3209
3220
.
Soleymani
S. A.
,
Goudarzi
S.
,
Anisi
M. H.
,
Hassan
W. H.
,
Idris
M. Y. I.
,
Shamshirband
S.
,
Noor
N. M.
&
Ahmedy
I.
2016
A novel method to water level prediction using RBF and FFA
.
Water Resources Management
30
(
9
),
1
19
.
Steel
M. F.
2011
Bayesian Model Averaging and Forecasting
.
Bulletin of EU and US Inflation and Macroeconomic Analysis
.
Tian
X. J.
,
Xie
Z. H.
,
Wang
A. H.
&
Yang
X. C.
2012
A new approach for Bayesian model averaging
.
Chinese Science: Earth Science
55
(
8
),
1336
1344
.
Tingsanchali
T.
&
Gautam
M. R.
2015
Application of tank, NAM, ARMA and neural network models to flood forecasting
.
Hydrological Processes
14
(
14
),
2473
2487
.
Tseng
F. M.
,
Yu
H. C.
&
Tzeng
G. H.
2001
Applied hybrid grey model to forecast seasonal time series
.
Technological Forecasting & Social Change
67
(
2–3
),
291
302
.
Yuan
Z.
,
Yan
D. H.
,
Yang
Z. Y.
&
Yin
J.
2014
Ensemble model and its application in runoff simulation and forecast
.
Journal of Hydraulic Engineering
80
(
24
),
2675
2685
.

Supplementary data