Considering the complexity of reservoir systems, a model fusion approach is proposed in this paper. According to different inflow information represented, the historical monthly data can be constructed as two time series, namely, yearly-scale series and monthly-scale series. Even grey model (EGM) and adaptive neuro-fuzzy inference system (ANFIS) are adopted for the forecasts at the two scales, respectively. Grey relational analysis (GRA) is subsequently used as a scale-normalized model fusion tool to integrate the two scales' results. The proposed method is evaluated using the data of the Three Gorges reservoir ranging from January 2000 to December 2012. The forecast performances of the individual-scale models are improved substantially by the suggested method. For comparison, two peer models, back-propagation neural network and autoregressive integrated moving average model, are also involved. The results show that, having combined together the small-sample forecast ability of the EGM in the yearly-scale, the nonlinearity of the ANFIS in the monthly-scale, and the grey fusion capability of the GRA, the present approach is more accurate for holistic evaluation than those models in terms of mean absolute percentage error, normalized root-mean-square error, and correlation coefficient criteria, and also for peak inflow forecasting in accordance with peak percent threshold statistics.

## INTRODUCTION

Reservoirs are significant components of water resource systems, providing effective irrigation, water supply, hydropower, flood and drought control, and other functions. Hence the effective operation and scheduling of reservoirs are important tasks in water resource management. An accurate and reliable inflow forecast is a vital reference for making decisions in reservoir management. During the past few decades, a great deal of research has been devoted to the formulations and model developments in reservoir inflow forecasting.

Autoregressive (AR) model, autoregressive moving average and autoregressive integrated moving average (ARIMA) (Box & Jenkins 1976; Pankratz 1983; Brockwell & Davis 1991; Commandeur & Koopman 2007) are all general time series models in hydrological forecasting (Valipour 2012, 2015; Valipour *et al.* 2012b, 2013; Wang *et al.* 2015). All these approaches are very useful for forecasting changes in the assumption process that the data have an internal structure over time (such as autocorrelation, trend or seasonal variation). However, the real time series of the reservoir inflow often features dynamic, nonlinear and non-stationary characteristics, which limit the applicable scope of these general models.

With the developments in computer technologies and new theories, researchers have proposed various advanced methods, e.g., artificial neural network (Asaad 2010; Othman & Naseri 2011; Kale *et al.* 2012; Sattari *et al.* 2012; Valipour *et al.* 2012a; Prakash *et al.* 2014), support vector machine (SVM) (Lin *et al.* 2009a; Wang *et al.* 2010), Bayesian regression (Ticlavilca & Mckee 2011), decision support system (Regonda *et al.* 2014), fuzzy inference systems (FIS) (Luna *et al.* 2011; Lohani *et al.* 2011, 2014; Jaiswal *et al.* 2015), model tree (Jothiprakash & Kote 2011; Maestre *et al.* 2013) and grey model (GM) (Lin *et al.* 2012). In addition to the direct time series forecast using the inflow data themselves, external factors such as weather conditions were also taken into account to improve the inflow forecasting accuracy (Zhang *et al.* 2009; Lin *et al.* 2009b, 2010; Wang *et al.* 2012; Tsai *et al.* 2014).

The application of a sole forecast model is often ineffective in forecasting different reservoir inflow series. Hence, some hybrid methods consisting of two or more models or criteria (Li *et al.* 2015) have been reported recently. Chen (2011) proposed a nonlinear model which incorporated improved real-coded grammatical evolution with a genetic algorithm to predict the 10-day inflow of the De-Chi Reservoir in central Taiwan. Budu (2013) studied the capability of two pre-processing techniques, i.e., wavelet transform and moving average, in combination with feed forward neural networks, radial basis and multiple linear regression models for the prediction of the daily inflow values of Malaprabha reservoir, Belgaum, India. Okkan & Ali Serbes (2013) applied wavelet transform technique to remove nonlinear dynamic noise of raw data, and established three hybrid wavelet-based box models, i.e., multiple linear regression, feed forward neural networks and least square SVM, for monthly inflow forecasts. Toro *et al.* (2013) presented a hybrid hydrologic estimation model incorporating both statistical and artificial intelligence techniques for calculating the mean daily inflow of the Salvajina reservoir, Colombia. Li *et al.* (2014) used a couple method, SVM and ensemble Kalman filter, to forecast flooding in the Luo River Basin, China. Awan & Bae (2014) investigated the ability of adaptive neuro-fuzzy inference system (ANFIS) method to improve the accuracy of reservoir inflow forecasting. Bai *et al.* (2014) proposed an additive model for forecasting monthly reservoir inflow, combining AR for trend term, SVM for periodic term and ANFIS for stochastic term, respectively. Wang *et al.* (2015) studied the ARIMA model coupled with the ensemble empirical mode decomposition for forecasting annual runoff time series, improving the accuracy based on a deep insight into the data characteristics. These hybrid models have shown better forecasting ability than that of the mono-models. These hybrid models are capable of sufficiently reflecting the characteristics of the real time series. Some techniques, such as nonlinear and linear model integration, multiple model combination, data pre-processing, were used for taking advantage of various methods.

Combining data from multiple sources to improve the potential values and interpretation performances of the source data, data fusion is as a general and popular multi-discipline approach to produce a high-quality representation of the data (Zhang 2010). Considering the complexity of hydrologic systems, data fusion approaches have been used in hydrological engineering (Azmi *et al.* 2010; Ababaei *et al.* 2013). Inspired by the idea of data fusion (Liggins *et al.* 2008), we propose in this paper a model fusion approach for monthly reservoir inflow forecasting. Due to the different features in the different forecasting time scales, the time scales of the real time series should be determined first. In each scale, an appropriate model selected by the scale's time characteristics is used to train and forecast the scale's information. The results forecasted at all scales are employed to obtain the final forecasting results through scale-normalized fusion. In this paper, according to different inflow information represented, the historical data of the reservoir are divided into two time series, namely, yearly-scale series and monthly-scale series. To accommodate different characteristics of the two scales, the even grey model (EGM) and the ANFIS are adopted for the forecasts at the two scales, respectively. The grey relational analysis (GRA) is subsequently used as a scale-normalized model fusion tool to integrate the yearly-scale model and the monthly-scale model to obtain final forecasting results. In this way, a model fusion approach is constructed to improve the forecasting accuracy of the monthly reservoir inflow. The proposed approach is applied to the Three Gorges reservoir, China. The forecasting results are also compared to those of some published peer approaches.

## METHODOLOGIES

*t*th forecasted value of the reservoir inflow can be associated with the historical data, i.e., where

*τ*denotes the delay time,

*d*is the embedding dimension,

*F*(.) stands for the correlation pattern between the future and the historical values, and

*x*(

*t*) and

*y*(

*t*) represent the real observation and the forecasted values at time

*t*, respectively. Generally, the reservoir inflow has special characteristics, i.e., periodicity, seasonality and tendency, in the entire hydrologic times. Moreover, for real hydrologic systems, random disturbances and the weather conditions vary over time. Hence, the reservoir inflow has special patterns in different time scales. This means that a mono model cannot exhibit enough tracking ability for all time scale patterns of the reservoir inflow. To better accommodate the complex natures of the reservoir inflow forecast, we draw lessons from the data fusion approach. From Equation (1), we can find that there exist two relational patterns between the records and the forecasted values: (1) monthly-scale forecasting (the recent monthly data are used to forecast the current monthly value with

*τ*

_{1}≠ 12); and (2) yearly-scale forecasting (the current monthly data in the historical years are applied to forecast the current monthly value with

*τ*

_{2}= 12). Different forecasting patterns are capable of representing different time scales' characteristics. The EGM and the ANFIS are used to forecast the monthly inflow series, respectively. The GRA is then used as a fusion tool to calculate the correlation degree between the forecasted series and the target series. Through the distribution of the grey relational degree, we can get the final forecasting result of the monthly reservoir inflow. Using the model fusion, therefore, Equation (1) can be recast as: where subscripts ‘1’ and ‘2’ denote the monthly-scale model and the yearly-scale model, respectively,

*F*

_{0}(.) represents the scale-normalized fusion function for the two constituting models. In this section, the involved methodologies are introduced, respectively.

### ANFIS for the monthly-scale modeling

As mentioned above, the monthly-scale pattern describes the nonlinear changes information of the adjacent months. This calls for a nonlinear model to express the nonlinear relationship *F*_{1}(.) between the input and the output variables. The ANFIS, integrating both the neural networks and the fuzzy logic principles, has capabilities in handling nonlinear, complexity and discontinuity problems (Lohani *et al.* 2012). It also has some other advantages, such as fast convergence speed, good repeatability and high forecasting precision. Moreover, the results in our previous research (Bai *et al.* 2014) have shown that the ANFIS has good performance in single application on the monthly inflow forecasting. Hence, the ANFIS is chosen as the monthly-scale regression model.

*et al.*1997). The ANFIS is a multilayer feed-forward network that uses the neural network learning algorithms and fuzzy logic to map an input space to an output space (Chang & Chang 2006). For simplicity, assume the fuzzy inference system under consideration has two inputs (

*x*

_{1}=

*x*(

*t*−

*τ*

_{1}),

*x*

_{2}=

*x*(

*t*−2

*τ*

_{1})) and one output (

*y*

_{1}=

*y*

_{1}(

*t*)). Its inference system corresponds to a set of fuzzy IF–THEN rules (Takagi & Sugeno 1985; Lohani

*et al.*2006) that have learning capability to approximate nonlinear functions. The two rules are: where

*A*,

_{m}*B*(

_{m}*m*

*=*1, 2) are the linguistic labels,

*p*,

_{m}*q*and

_{m}*r*are the linear parameters in the consequent part of the first order Sugeno fuzzy model, and

_{m}*f*(.) is a first order Sugeno fuzzy model. The architecture of the ANFIS shown in Figure 1 consists of five layers, which can be briefly introduced as follows (Lohani

*et al.*2007).

*Layer 1*:

*O*

_{1,m}represents the membership function (MF) of

*A*and

_{m}*B*, and that is: The Gaussian membership function is used in this study, so and , where {

_{m}*δ*,

_{m}*σ*} is the parameter set of the MF in the premise part of fuzzy IF–THEN rules that change the shapes of the MF parameters in this layer and are referred to as the premise parameters.

_{m}*Layer 2*: Firing strength

*O*

_{2,m}is the output of this layer and results from multiplying the two MFs obtained in the previous layer using an AND operator:

*Layer 3*: The main objective of the normalized firing strength layer is to calculate the ratio of each

*m*th rule's firing strength:

*Layer 4*: The square nodes in this layer correspond to the consequent nodes that compute the contribution of each

*m*th rule aimed at the model output:

*Layer 5*: The single node computes the overall output by summing all the incoming signals. Accordingly, the defuzzification process transforms each rule's fuzzy results into a crisp output in this layer: From the Introduction above, one can find that the ANFIS has two kinds of parameter sets, i.e., the premise parameters {

*δ*,

_{m}*σ*} identifying the shape of the MF, and the consequent parameters {

_{m}*p*,

_{m}*q*,

_{m}*r*} determining the overall output of the system. In this paper, we use the hybrid-learning algorithm (Jang 1993) to identify these parameters. More descriptions of this algorithm can be found by referencing Jang

_{m}*et al.*(1997).

As shown in Equation (2), for the monthly-scale forecasting pattern (ANFIS), there are two parameters (*τ*_{1} and *d*_{1}) to be specified first. As pointed out in the literature (Sivakumar & Berndtsson 2010), hydrological sequence has generally chaotic characteristics. Thus, chaos phase space reconstruction technique is chosen to determine the ANFIS inputs. In this paper, mutual information (MI) method and false nearest neighbor (FNN) method are applied to determine *τ*_{1} and *d*_{1}, respectively.

**X**

_{1}= {

*x*(

*t*)} and its delay time series

**X**

_{1,τ1}= {

*x*(

*t−τ*

_{1})}. The MI variable

*I*(

*τ*

_{1}) with different

*τ*

_{1}is described as: where

*H*(

**X**

_{1}) and

*H*(

**X**

_{1,τ1}) are the marginal entropies, and

*H*(

**X**

_{1},

**X**

_{1,τ1}) the joint entropy of

**X**

_{1}and

**X**

_{1,τ1}. The three variables are, respectively, given by where

*P*(

*x*(

*t*)) and

*P*(

*x*(

*t−τ*

_{1})) represent the marginal probability distribution functions of

**X**

_{1}and

**X**

_{1,τ1}, respectively.

*τ*

_{1}, the FNN (Kennel

*et al.*1992) is applied to choose

*d*

_{1}. In a

*d*

_{1}-dimensional space, each phase vector

**X**

_{1}(

*t*) has a nearest neighbor

**X**

_{1,η}(

*t*), whose distance

*R*

_{d}_{1}is expressed as:

*R*> R

_{tol}= 15 (Abarbanel & Gollub 1996), and the loneliness tolerance threshold A

_{tol}satisfies (Kennel

*et al.*1992): this point is marked as having a FNN.

### EGM for the yearly-scale modeling

The yearly-scale pattern reflects the similarity of the same month's inflows in the adjacent years. Generally, the weather and the hydrologic conditions are roughly the same in a certain month for different years. Hence, the observations of the history series in the yearly-scale have a quasi-linear and quasi-stationary nature. Moreover, the modeling samples of the history series in the year's scale are usually less (only one sample for one year). Therefore, we adopt the EGM to model the yearly-scale pattern (*F*_{2}(.)) shown in Equation (2).

The EGM, first proposed by Deng (2005), has been widely used in forecasting fields. The EGM can be utilized to forecast the time series with only four or more observations. It overcomes the disadvantage that the forecasting accuracy of the statistical model depends on the data amount. In this paper, we employ the EGM (1, 1) algorithm that is described as follows (Liu *et al.* 2011).

Assume that the yearly-scale series **X**_{2} = (*x*(*t*-*τ*_{2}), *x*(*t*-2*τ*_{2}), …, *x*(*t*-*d*_{2}*τ*_{2})) as the modeling series **X**^{(0)} = (*x*^{(0)}(1), *x*^{(0)}(2), …, *x*^{(0)}(*n*)) (*n* = 1,2,…,*d*_{2}), and **X**^{(1)} is the 1-AGO sequence of **X**^{(0)}, **X**^{(1)} = (*x*^{(1)}(1), *x*^{(1)}(2), …, *x*^{(1)}(*n*)), where .

**Z**

^{(1)}as a sequence obtained by applying the mean operation to

**X**

^{(1)},

**Z**

^{(1)}= (

*z*

^{(1)}(2),

*z*

^{(1)}(3), …,

*z*

^{(1)}(

*n*)), where

*z*

^{(1)}(

*k*) = (

*x*

^{(1)}(

*k*) +

*x*

^{(1)}(

*k*-1))/2,

*k*= 2,3,…,

*d*

_{2}, then: is called an even GM (1,1) model, where

*a*,

*b*mean development coefficient and grey input, respectively. The parameters vector can be solved by the least square method: where ,

### GRA for scale-normalized model fusion

Data fusion integrates multiple data and knowledge representing the same real-world object into a consistent, accurate and useful representation. Its operating principle is using various corresponding methods to process data which reflect different characteristics of the time series. In this paper, we employ the GRA (Liu *et al.* 2011) as a fusion tool to normalize monthly-scale and yearly-scale models for the reservoir inflow forecast. The GRA method makes up the defects of the system analysis caused by the mathematical statistics, and inconsistencies will not exist between the qualitative analysis and the quantitative results. In addition, it is applicable to handling problems with less samples with the characteristics of generality and less calculation burden. The basic principle of the GRA is based on the geometrical relation comparison between multiple data sequences in the system. The much more similar geometry shapes mean the greater correlation degree. In other words, the smaller the forecasting error is, the higher the grey correlation degree will be. Therefore, model fusion based on the GRA can avoid amplification of different forecasting sequences' errors, thus improving the final forecasting precision.

According to the theory of the data fusion, the reservoir inflow **X** = (*x*(1), *x*(2), …, *x*(*N*)) (*N* represents the length of time) can be divided into two relevant series, named monthly-scale series , and yearly-scale series .

**X**

_{0}= (

*x*(1),

*x*(2), …,

*x*(

*M*)) is the target series (

*M*≤

*N*),

**Y**

_{1}= (

*y*

_{1}(1),

*y*

_{1}(2), …,

*y*

_{1}(

*M*)),

**Y**

_{2}= (

*y*

_{2}(1),

*y*

_{2}(2), …,

*y*

_{2}(

*M*)) are the forecasting results of the relevant series (monthly-scale series

**X**

_{1}and yearly-scale series

**X**

_{2}) using corresponding models, respectively, then the absolute relational degree between

**Y**

_{1},

**Y**

_{2}and

**X**

_{0}can be expressed as follows: , , and

*y*

_{i}(

*l*) represents

*l*th forecasted value of

*i*th forecasted series,

*i*= 1,2.

### Overview of the proposed model fusion approach

*Step 1:* Analysis of original data series **X**, and construct new series containing different characteristics (in this paper, we construct two relevant series, monthly-scale series **X**_{1} and yearly-scale series **X**_{2}).

*Step 2*: Forecast the different series' reservoir inflow using corresponding methods (in this paper, **Y**_{1} was obtained by ANFIS, and **Y**_{2} was achieved using EGM (1,1)).

*Step 3*: Scale-normalized model fusion through GRA between **Y**_{1}, **Y**_{2} and **X**_{0}, respectively.

*Step 4*: Output the forecasting results according to Equation (18). End.

## DATA AND PERFORMANCE CRITERIA

^{3}/month) is generated by summing up every day's data (m

^{3}/d) which is averaged by the instantaneous record (m

^{3}/s) at 8 a.m.

From Figure 4, it can be seen that the reservoir inflow has obvious periodicity in monthly-scale with the wet season from June to September, the dry season from December to March, and normal season in the rest of the months. In addition, the reservoir inflow also features the seasonality in yearly-scale, that the same monthly inflow in each year is maintained at a certain amount because of the same hydrologic condition and weather changes. One can also observe singular inflow patterns in year 2006 and year 2011 (when drought events occurred in the Yangtze River watershed).

*x*(

*t*) and

*y*(

*t*) represent the recorded and forecasted value of the

*t*th monthly reservoir inflow, respectively, and are the mean value, respectively.

*et al.*2014), is adopted in this paper. The equation is: where ,

*l*and

*u*represent the lower and higher limits in percentage, respectively, and

*ξ*is the relative error of the

_{t}*t*th data. Before computing Equation (22), the records are arranged in descending order. Note that the value of

*up*= 100%, the PPTS (

*lo*,

*up*) can be regarded as PPTS (

*lo*), which shows the PPTS of top

*lo*% data. In this paper, the PPTS (5) and PPTS (10) are all taken into consideration.

## RESULTS AND DISCUSSION

In this section, the proposed model fusion approach is evaluated using the reservoir inflow data of the Three Gorges, as shown in Figure 4. The forecasting performances are also compared with the sole application of the ANFIS, the EGM (1,1) and other peer methods to demonstrate the superiority.

### Inflow forecasting using the proposed approach

In this subsection, the proposed model fusion will be employed to forecast the monthly inflow of the Three Gorges reservoir. First, the two scales' constituting models (the ANFIS and the EGM (1,1)) are established for the monthly- and yearly-scales, respectively. Then, the scale-normalized model fusion based on the GRA is subsequently used to integrate the yearly-scale and the monthly-scale results.

*I*(

*τ*

_{1}) over

*τ*

_{1}are shown in Figure 5. As the optimal delay time

*τ*

_{1opt}is the time of the minimum in

*I*(

*τ*

_{1}) found first, one can see that

*τ*

_{1opt}= 4 according to Figure 5.

*d*

_{1opt}. Figure 6 shows the FNN results over different embedding dimensions for the data set as shown in Figure 4. According to the two thresholds above, the optimal embedding dimension

*d*

_{1opt}= 3 is obtained.

The ANFIS can be trained with the optimal input variables (three inputs and one output) of the model. After training, an ANFIS model with forecasting function will be obtained for output forecasting. In this paper, the monthly-scale series are divided into two sets, the training set and the testing set. The first 132 records, as shown in Figure 4, are employed for training, and the rest for testing. Subtractive clustering is used for training the number of MFs and rules with different value of influence range, squash factor, accept ratio, and reject ratio. A total of 25 models, designed by the orthogonal design, are trained as shown in Table 1. Note that the initial values of the four parameters are 0.5, 1.25, 0.5, 0.15, respectively. In addition, ANFIS training needs at least two rules. Therefore, we chose the range of four parameters [0.2, 0.3, 0.5, 0.8, 1], [0.5, 0.75, 1, 1.25, 1.5], [0.2, 0.4, 0.6, 0.8, 1], [0.1, 0.15, 0.3, 0.4, 0.5], respectively.

Model | Parameters (range of influence, squash factor, accept ratio, reject ratio) | Model | Parameters (range of influence, squash factor, accept ratio, reject ratio) |
---|---|---|---|

M1 | [0.2, 0.5, 0.2, 0.1] | M14 | [0.5, 1.25, 0.2, 0.3] |

M2 | [0.2, 0.75, 0.4, 0.15] | M15 | [0.5, 1.5, 0.4, 0.4] |

M3 | [0.2, 1, 0.6, 0.3] | M16 | [0.8, 0.5, 0.8, 0.15] |

M4 | [0.2, 1.25, 0.8, 0.4] | M17 | [0.8, 0.75, 1, 0.3] |

M5 | [0.2, 1.5, 1, 0.5] | M18 | [0.8, 1, 0.2, 0.4] |

M6 | [0.3, 0.5, 0.4, 0.3] | M19 | [0.8, 1.25, 0.4, 0.5] |

M7 | [0.3, 0.75, 0.6, 0.4] | M20 | [0.8, 1.5, 0.6, 0.1] |

M8 | [0.3, 1, 0.8, 0.5] | M21 | [1, 0.5, 1, 0.4] |

M9 | [0.3, 1.25, 1, 0.1] | M22 | [1, 0.75, 0.2, 0.5] |

M10 | [0.3, 1.5, 0.2, 0.15] | M23 | [1, 1, 0.4, 0.1] |

M11 | [0.5, 0.5, 0.6, 0.5] | M24 | [1, 1.25, 0.6, 0.15] |

M12 | [0.5, 0.75, 0.8, 0.1] | M25 | [1, 1.5, 0.8, 0.3] |

M13 | [0.5, 1, 1, 0.15] |

Model | Parameters (range of influence, squash factor, accept ratio, reject ratio) | Model | Parameters (range of influence, squash factor, accept ratio, reject ratio) |
---|---|---|---|

M1 | [0.2, 0.5, 0.2, 0.1] | M14 | [0.5, 1.25, 0.2, 0.3] |

M2 | [0.2, 0.75, 0.4, 0.15] | M15 | [0.5, 1.5, 0.4, 0.4] |

M3 | [0.2, 1, 0.6, 0.3] | M16 | [0.8, 0.5, 0.8, 0.15] |

M4 | [0.2, 1.25, 0.8, 0.4] | M17 | [0.8, 0.75, 1, 0.3] |

M5 | [0.2, 1.5, 1, 0.5] | M18 | [0.8, 1, 0.2, 0.4] |

M6 | [0.3, 0.5, 0.4, 0.3] | M19 | [0.8, 1.25, 0.4, 0.5] |

M7 | [0.3, 0.75, 0.6, 0.4] | M20 | [0.8, 1.5, 0.6, 0.1] |

M8 | [0.3, 1, 0.8, 0.5] | M21 | [1, 0.5, 1, 0.4] |

M9 | [0.3, 1.25, 1, 0.1] | M22 | [1, 0.75, 0.2, 0.5] |

M10 | [0.3, 1.5, 0.2, 0.15] | M23 | [1, 1, 0.4, 0.1] |

M11 | [0.5, 0.5, 0.6, 0.5] | M24 | [1, 1.25, 0.6, 0.15] |

M12 | [0.5, 0.75, 0.8, 0.1] | M25 | [1, 1.5, 0.8, 0.3] |

M13 | [0.5, 1, 1, 0.15] |

The different FIS are trained by hybrid method and all performance indices are computed. To determine the effective networks, the testing data are used for validation. Therefore, only nine models shown in Table 2 are suitable according to the criteria. According to the performances of both training and testing data as shown in Table 2, one can determine the optimal ANFIS networks (M22) with three inputs and one output in this paper. Each input has seven Gaussian membership functions with seven rules.

Model | No. of MFs | Performance criteria | |||||
---|---|---|---|---|---|---|---|

Training data | Testing data | ||||||

MAPE | NRMSE | R | MAPE | NRMSE | R | ||

M5 | 8 | 15.8013 | 0.2254 | 0.9438 | 20.3637 | 0.4150 | 0.8306 |

M11 | 16 | 9.6811 | 0.1408 | 0.9785 | 30.0328 | 0.5963 | 0.6541 |

M15 | 3 | 18.3139 | 0.2498 | 0.9304 | 19.2551 | 0.4771 | 0.7601 |

M18 | 6 | 14.7392 | 0.2131 | 0.9499 | 17.0150 | 0.4093 | 0.8273 |

M19 | 3 | 13.4378 | 0.1884 | 0.9614 | 18.6444 | 0.5002 | 0.7414 |

M20 | 3 | 13.4378 | 0.1884 | 0.9614 | 18.6444 | 0.5002 | 0.7414 |

M22 | 7 | 9.2670 | 0.1409 | 0.9787 | 12.1322 | 0.2990 | 0.9115 |

M23 | 4 | 11.1684 | 0.1596 | 0.9727 | 17.6465 | 0.4271 | 0.8095 |

M25 | 2 | 12.3738 | 0.1791 | 0.9653 | 14.4375 | 0.4519 | 0.7877 |

Model | No. of MFs | Performance criteria | |||||
---|---|---|---|---|---|---|---|

Training data | Testing data | ||||||

MAPE | NRMSE | R | MAPE | NRMSE | R | ||

M5 | 8 | 15.8013 | 0.2254 | 0.9438 | 20.3637 | 0.4150 | 0.8306 |

M11 | 16 | 9.6811 | 0.1408 | 0.9785 | 30.0328 | 0.5963 | 0.6541 |

M15 | 3 | 18.3139 | 0.2498 | 0.9304 | 19.2551 | 0.4771 | 0.7601 |

M18 | 6 | 14.7392 | 0.2131 | 0.9499 | 17.0150 | 0.4093 | 0.8273 |

M19 | 3 | 13.4378 | 0.1884 | 0.9614 | 18.6444 | 0.5002 | 0.7414 |

M20 | 3 | 13.4378 | 0.1884 | 0.9614 | 18.6444 | 0.5002 | 0.7414 |

M22 | 7 | 9.2670 | 0.1409 | 0.9787 | 12.1322 | 0.2990 | 0.9115 |

M23 | 4 | 11.1684 | 0.1596 | 0.9727 | 17.6465 | 0.4271 | 0.8095 |

M25 | 2 | 12.3738 | 0.1791 | 0.9653 | 14.4375 | 0.4519 | 0.7877 |

The second constituting model is the EGM (1,1). For a given sequence, the feasibility test for the EGM modeling is a key process that supplies information about whether an accurate EGM model can be built. Usually, if ratio , then the data of the reservoir inflow can be used to establish the EGM model, where .

In this study, the yearly-scale series are divided into two sets: the records from year 2000 to year 2010, as shown in Figure 4, are employed for training and the rest for testing (that is to say *d*_{2} = 11). In practice, we transform the data of the training set into the logarithm form before modeling. The ratio σ (*k*) ∈[0.964, 1.076] , so the training set conforms to the requirements of the EGM (1,1) modeling. Through modeling 12 groups' series, respectively, we can obtain the yearly-scale's results in the periods from January 2011 to December 2012 (model parameters shown in Table 3).

Month | Parameters |
---|---|

Jan. | a= − 0.0011, b= 5.1299 |

Feb. | a= − 0.0003, b= 5.0759 |

Mar. | a= − 0.0005, b= 5.1626 |

Apr. | a= − 0.0005, b= 5.2755 |

May | a= 0.0008, b= 5.5474 |

Jun. | a= 0.0022, b= 5.7944 |

Jul. | a= − 0.002, b= 5.8141 |

Aug. | a= 0.0003, b= 5.8658 |

Sept. | a= 0.0005, b= 5.8341 |

Oct. | a= 0.0009, b= 5.686 |

Nov. | a= 0.0003, b= 5.4327 |

Dec. | a= 0.0001, b= 5.2469 |

Month | Parameters |
---|---|

Jan. | a= − 0.0011, b= 5.1299 |

Feb. | a= − 0.0003, b= 5.0759 |

Mar. | a= − 0.0005, b= 5.1626 |

Apr. | a= − 0.0005, b= 5.2755 |

May | a= 0.0008, b= 5.5474 |

Jun. | a= 0.0022, b= 5.7944 |

Jul. | a= − 0.002, b= 5.8141 |

Aug. | a= 0.0003, b= 5.8658 |

Sept. | a= 0.0005, b= 5.8341 |

Oct. | a= 0.0009, b= 5.686 |

Nov. | a= 0.0003, b= 5.4327 |

Dec. | a= 0.0001, b= 5.2469 |

*ε*

_{01}= 0.9658 and

*ε*

_{02}= 0.9309. Equation (13) is then used to compute the weigh distribution for integrating the constituting results of both training and testing data. Figure 7(a) displays the comparison between the forecasted and the recorded data using the proposed model fusion method, and the residual error (RE) distribution is shown in Figure 7(b) and 7(c). To study the correlation between the recorded and forecasted data, we generate the scatter plots as shown in Figure 7(d) and 7(e), demonstrating that the higher the agreement between the two data sets, the more the points tend to concentrate in the vicinity of the identity line, marked as ‘ideal fit’ in this context.

From Figure 7(a), it can be seen that the modeling results can follow the trend of the training data successfully except for a time in the period 2006, and the forecasting results of the testing data are roughly good except for that of the inflow peak. As shown in Figure 7(b) and 7(c), the REs have a little fluctuation both in the training and testing processes, except for several values. There only exist six and three forecasted outliers in the training and the testing processes, respectively. Since the interval around the RE does not contain zero, this indicates that the nine REs are caused by the unfortunate forecasting, beyond the 95% confidence interval. In other words, the number of effective forecasted values accounts for 95% of the training data, and 87.5% of the testing data, respectively. Hence the thorough processes of the training and the testing are successful, and the results are acceptable. Furthermore, the scatter plot as shown in Figure 7(d) and 7(e), demonstrate that the correlations between the recorded and the forecasted data of the peak inflow are relatively poor, and other data are concentrated in the ideal fit. Besides the intuitive observation, Table 4 shows the quantitative evaluations using the MAPE, NRMSE, R, and PPTS of the training and testing data, respectively.

Grey relation degree | Performance criteria | Training data | Testing data |
---|---|---|---|

ANFIS: 0.9658; EGM (1,1): 0.9309 | MAPE | 10.4973 | 11.4588 |

NRMSE | 0.1653 | 0.2528 | |

R | 0.9641 | 0.9273 | |

PPTS(5) | 11.0813 | 12.1885 | |

PPTS(10) | 10.9624 | 11.4839 |

Grey relation degree | Performance criteria | Training data | Testing data |
---|---|---|---|

ANFIS: 0.9658; EGM (1,1): 0.9309 | MAPE | 10.4973 | 11.4588 |

NRMSE | 0.1653 | 0.2528 | |

R | 0.9641 | 0.9273 | |

PPTS(5) | 11.0813 | 12.1885 | |

PPTS(10) | 10.9624 | 11.4839 |

### Comparisons with the ANFIS and the EGM (1,1)

For quantitative evaluation, Table 5 shows the forecasting results using the ANFIS and the EGM (1,1), respectively. Compared with Table 4, it more obviously illustrates the fact that the proposed model fusion method has the best forecasting performance in the MAPE, NRMSE, and R. For the PPTS, the lower the value is, the better the capability for forecasting peak inflow will be (Lohani *et al.* 2014). Compared with Table 4, the PPTS values (PPTS (5) of the ANFIS, and PPTS (5) and PPTS (10) of the EGM (1,1)) confirm that the proposed model is capable of forecasting the peak inflow more accurately.

Model | Performance criteria | Training data | Testing data |
---|---|---|---|

ANFIS | MAPE | 9.2670 | 12.1322 |

NRMSE | 0.1409 | 0.2990 | |

R | 0.9787 | 0.9115 | |

PPTS(5) | 9.6801 | 12.8642 | |

PPTS(10) | 9.6136 | 10.6497 | |

EGM(1,1) | MAPE | 15.0627 | 14.7769 |

NRMSE | 0.2666 | 0.3114 | |

R | 0.9221 | 0.9037 | |

PPTS(5) | 15.0951 | 14.8831 | |

PPTS(10) | 15.2256 | 13.2658 |

Model | Performance criteria | Training data | Testing data |
---|---|---|---|

ANFIS | MAPE | 9.2670 | 12.1322 |

NRMSE | 0.1409 | 0.2990 | |

R | 0.9787 | 0.9115 | |

PPTS(5) | 9.6801 | 12.8642 | |

PPTS(10) | 9.6136 | 10.6497 | |

EGM(1,1) | MAPE | 15.0627 | 14.7769 |

NRMSE | 0.2666 | 0.3114 | |

R | 0.9221 | 0.9037 | |

PPTS(5) | 15.0951 | 14.8831 | |

PPTS(10) | 15.2256 | 13.2658 |

### Comparisons with other peer methods

In addition to the comparisons with the ANFIS and the EGM (1,1), the proposed approach is also compared with two methods researched and applied more widely, i.e., back propagation neural network (BPNN) and ARIMA, and the authors' peer method (Bai *et al.* 2014), using the same data set as shown in Figure 4.

*P*= 8, moving average

*q*= 11, and difference time

*dt*= 1. As shown in Figure 11(a), the forecasted trends are generally consistent with those of the recorded data. However, there exist significant errors between forecasted and recorded values in all the studied periods. The REs, shown in Figure 11(b), have a bigger fluctuation than in Figure 7(c), but one less forecasted outlier than the presented method. The scatter plots, as shown in Figure 11(c), prove dispersion of the data set.

Table 6 shows the performance criteria of the BPNN and the ARIMA methods. Compared with Table 4, the BPNN and the ARIMA can provide forecasting accuracy up to 20% in MAPE (11.2858%, 11.3975% reductions for the testing data, respectively), more than 0.4 in NRMSE (0.2424, 0.1411 reductions, respectively), and lower index in R (0.1603, 0.0851 reductions, respectively). Moreover, the PPTS values also demonstrate that the present model is capable of forecasting the peak inflow more accurately.

Model | Performance criteria | Training data | Testing data |
---|---|---|---|

BPNN | MAPE | 12.4655 | 21.7831 |

NRMSE | 0.1648 | 0.4952 | |

R | 0.9703 | 0.7670 | |

PPTS(5) | 16.3414 | 19.1710 | |

PPTS(10) | 16.0903 | 17.2658 | |

ARIMA | MAPE | 22.8563 | |

NRMSE | 0.3939 | ||

R | 0.8422 | ||

PPTS(5) | 18.5966 | ||

PPTS(10) | 18.3779 |

Model | Performance criteria | Training data | Testing data |
---|---|---|---|

BPNN | MAPE | 12.4655 | 21.7831 |

NRMSE | 0.1648 | 0.4952 | |

R | 0.9703 | 0.7670 | |

PPTS(5) | 16.3414 | 19.1710 | |

PPTS(10) | 16.0903 | 17.2658 | |

ARIMA | MAPE | 22.8563 | |

NRMSE | 0.3939 | ||

R | 0.8422 | ||

PPTS(5) | 18.5966 | ||

PPTS(10) | 18.3779 |

As above, these criteria shown in Tables 4–6 clearly illustrate that the proposed method in this paper has the best forecasting performance among all the peer methods.

In addition, compared with the additive model proposed by the authors (Bai *et al.* 2014), the differences are evident: (a) the combination structures (sum-up strategy in the peer paper; GRA in this paper); (b) the features extraction methods (ensemble empirical mode decomposition in the peer paper; scale feature analysis and classification in this paper); (c) the temporal scales (three sub-terms all in monthly-scale in the peer paper; yearly-scale and monthly-scale in this paper); (d) the computational complexities (the peer paper involves four forecasting models and two time-frequency analysis methods, numerous model parameters; this paper contains only two models and one fusion method, especially, the structures of the GRA and EGM (1,1) are simple).

In the case of little gaps in evaluation criteria (compared with the fusion model in this paper, the MAPE, NRMSE, and R are better than 0.0985%, 0.1023%, and 0.0526% in the peer paper, respectively), the computational burden has obviously dropped.

## CONCLUSIONS

In this paper, a model fusion method has been constructed for forecasting monthly reservoir inflow. Considering the characteristics of the reservoir inflow in different time scales (nonlinearity in monthly-scale and quasi-stability in yearly-scale), the ANFIS and the EGM (1,1) are applied to follow the monthly-scale and the yearly-scale inflow patterns, respectively. Using the GRA, the patterns at the two scales are normalized and integrated to generate the final forecasting results. The present method is applied to forecast the Three Georges reservoir inflow. The proposed method is also compared with four peer methods, the ANFIS, the EGM (1,1), the ARIMA, and the BPNN. The results show that, having integrated both monthly-scale and yearly-scale patterns into a grey fashion, the proposed approach exhibits the best forecasting performance in terms of the criteria. Due to the hydrological similarities of the reservoirs, the proposed model fusion method can also be applied for other reservoirs' inflow forecasts.

## ACKNOWLEDGEMENTS

This work is supported in part by the Natural Science Foundation of China (51375517, 71271226), and the Project of Key Discipline Construction of Anhui Science and Technology University (AKZDXK2015B01). The authors would also like to thank the editors/reviewers for their valuable suggestions and comments that have helped improve this paper.