Rainfall is a key part of the hydrological cycle, and correct forecasting of rainfall is vital in the planning and management of water resources. Generalized regression neural network (GRNN) and support vector regression (SVR) were both applied to forecast monthly rainfall, and the conventional autoregressive model was built for comparison. Furthermore, Akaike Information Criteria were used to identify the proper inputs for the rainfall forecasting model. The data sets of monthly rainfall for a 53-year period from 1957 to 2010 in western Jilin Province, China, were used. The results indicated that the proper inputs would help in effectively improving the prediction accuracy. Furthermore, the results showed that both the SVR and the GRNN model performed better than the autoregressive model in forecasting monthly rainfall. SVR models outperformed all other models during the testing period in terms of the mean absolute error, root-mean-square error, coefficient of efficiency and *R*^{2}. Therefore, SVR models were applied to forecast monthly rainfall for six cities including Baicheng, Qianguo, Fuyu, Qian'an, Changling and Tongyu.

## INTRODUCTION

Rainfall is a key part of the hydrological cycle, and rainfall forecasting has been widely used in many areas, such as agriculture, ﬂood warning and ecological environment assessment (Kumarasiri & Sonnadara 2008; Moustris *et al*. 2011). Therefore, it is of great importance to correctly estimate rainfall for water resources planning and management. During the past few years, both the physical approach and the pattern recognition approach have been developed for rainfall forecasting. However, because an accurate description of the rainfall physical mechanism for hydrologic processes was involved, the applicability of the physical approach was limited (Hong & Pai 2007). A number of studies showed that the pattern recognition approach was an effective tool for rainfall forecasting (Wong *et al*. 2007; Akrami *et al*. 2013; Mekanik *et al*. 2013). Classic models such as autoregressive integrated moving average had been used for hydrological time series forecasting, including rainfall forecasting (Salas *et al*. 1980). In addition, the seasonal autoregressive moving average model, the deseasonalized model and the periodic autoregressive model had been applied (Ahn 2000).

For the past few years, the artiﬁcial neural network (ANN) has been one of the most powerful techniques in pattern recognition and forecasting which does not rely on a well-deﬁned physical relationship as a black-box model and has the ability to approximate to nonlinear functions (Chen & Chang 2009; Hung *et al*. 2009; Wang *et al*. 2009; Javan *et al*. 2013). Backpropagation neural networks was employed to build for rainfall forecasting and provided more accurate rainfall forecasting in space and time compared with two other methods of short-term forecasting, persistence and nowcasting (French *et al*. 1992). ANNs were adopted to forecast short-term rainfall for an urban catchment in the western suburbs of Sydney, Australia. It was found that ANN provided the most accurate prediction when an optimum number of spatial inputs was included into the network (Luk *et al*. 2000). ANN were introduced to construct a nonlinear mapping between output data and surface rainfall data for the region of Sao Paulo State, Brazil. It is indicated that ANNs were superior to the linear regression model in forecasting rainfall (Valverde Ramírez *et al*. 2005). A hybrid neural network model which combined the self-organizing map with the multilayer perceptron networks was proposed to forecast the typhoon rainfall in the Tanshui River Basin, the results showed that the proposed model can forecast more precisely than the conventional approach (Lin & Wu 2009). ANN has been developed for a variety of major types, such as back propagation (BPANN), Hopfield neurons network, radial basis function neurons network (RBFNN), and generalized regression neural network (GRNN) is a kind of RBFNNs. Both BPANN and RBFNN can be used for forecasting, and compared with BPANN, the RBFNN has the advantages of faster learning, and converging to global optimization, thus avoiding the problem of local minima (Chang & Chen 2003; Bedia *et al*. 2012; Vicente *et al*. 2012).

In recent years, support vector regression (SVR) models have drawn much attention as a special technique and have been widely used to solve forecasting problems in many ﬁelds. SVR has been applied for travel-time prediction and its results compared with other baseline travel-time prediction methods using real highway traffic data. The results showed that the SVR models can significantly reduce both relative mean errors and root-mean-squared errors of predicted travel times (Chen & Wang 2007). SVR model was proposed for hybrid load forecasting, and the proposed model outperformed back propagation ANN and regression forecasting models in the annual load forecasting (Wang *et al*. 2012). In addition, SVR model had been successfully applied in hydrological forecasting, and was introduced to establish a real-time stage forecasting model in the Lan-Yang River, Taiwan. The results revealed that the proposed model can effectively predict the ﬂood stage forecasts 1–6 hours ahead (Yu *et al*. 2006). SVR modelling has also been used to forecast volumes of rainfall during typhoon seasons. The simulation results showed that SVR algorithms were a promising alternative in forecasting amounts of rainfall and had better generalization ability than the conventional ANNs (Hong & Pai 2007).

In this paper, GRNN and SVR models are constructed for monthly rainfall forecasting in western Jilin Province, China. The aims of the present study were as follows: (1) Akaike Information Criteria (AIC) was introduced to identify the proper inputs for GRNN, SVR and AR models, and prevent over-fitting problems caused by the inputs and improve the prediction accuracy; (2) SVR, GRNN and AR models were applied to forecasting monthly rainfall in Baicheng county; and (3) the accuracy of SVR models was compared with the GRNN and AR models in terms of the mean absolute error (MAE), root-mean-square error (RMSE), coefficient of efficiency (CE) and *R*^{2}, and the more accurate model was applied to forecasting monthly rainfall for the other five counties in an arid and semi-arid region of China.

## METHODOLOGY

### Generalized regression neural network

The GRNN is a typical feed-forward neural network, and it has a great deal of important features such as avoiding converging to the local minima, and nonlinear mapping characteristics (Cigizoglou 2005). The GRNN is based on the normalized radial basis function and kernel regression. It is composed of an input layer, a pattern layer, a summation layer, and an output layer as shown in Figure 1 (KIŞI 2006).

*x*is the input vector, is the smoothing parameter, and is the input portion of the

*i*th training vector represented by the

*i*th neuron in the pattern layer.

*j*th output node on the output layer, is weight connection between

*i*th pattern neurons and

*j*th units in the summation layer, and

*k*is the number of the output neurons (Firat & Gungor 2009).

The spread parameter was the main parameter in designing the network, and alters the performance of the GRNN method. A large spread can make the function approximation smooth, while a small spread can degrade the network's generalization and may even prevent it from forecasting accurately (Theodosiou 2011). Hence, the proper spread parameter should be found.

### Support vector regression

*et al*. 2004). The linear regression estimating function can be express as where is a non-linear mapping from the input space to a high-dimensional feature space, is a weight vector and

*b*is a threshold value, which can be estimated by minimizing the regularized risk function where measures the ﬂatness of the function, and the function is called -insensitive loss function Two slack variables were introduced to represent the distance from actual values to the corresponding boundary values of -tube. Then, Equation (6) is transformed into subject to where are two positive slack variables,

*N*is the number of training samples, is the feature of inputs,

*C*and are prescribed parameters,

*C*is considered to specify the trade-off between the empirical risk and the model ﬂatness, expresses the Vapnik's linear loss function zone to measure empirical error.

*C*controls the empirical risk degree of SVR, controls the width of the tube in loss function and controls the Gaussian function width of the kernel function.

*C*, and have been found to be important parameters in SVR models (Forrester & Keane 2009; Lu

*et al*. 2009; Yun

*et al*. 2009).

## STUDY AREA AND DATA

Study area is located between latitudes 44°57′–45°46′ N and longitudes 123°09′–124°22′ E, arid and semi-arid areas in western Jilin Province, China, of approximately 47,011 square kilometers, where the mean sea level elevations range from 150 to 200 m, and has a typical temperate continental climate. Annual average temperature in the study area was 4.6 °C, evaporation was strong and the annual average evaporation was 1,500–1,900 mm with annual average rainfall of 400–500 mm. The majority of rainfall occurs in summer, 70–80% of rainfall occurs during the wet season from June to September. In this study, the monthly rainfalls for a 53-year period from 1957 to 2010 in six cities were used. Baicheng city was taken for example, which is in the northwestern corner of western Jilin Province in Figure 2. Some of the statistical properties of the monthly rainfall data for different cities are listed in Table 1. It is of important practical significance to be able to forecast rainfall for the allocation and utilization of water resources in arid areas and improving social, economic and ecological benefits. The location of the stations is shown in Figure 2.

. | Changling . | Tongyu . | Qian'an . | Fuyu . | Baicheng . | Qianguo . |
---|---|---|---|---|---|---|

Mean (mm) | 36.60 | 32.06 | 33.93 | 42.16 | 32.50 | 35.33 |

Standard deviation (mm) | 50.94 | 47.25 | 49.59 | 56.29 | 51.30 | 48.30 |

Maximum (mm) | 343.21 | 276.85 | 310.34 | 379.42 | 325.32 | 254.31 |

Minimum (mm) | 0.01 | 0.03 | 0.00 | 0.01 | 0.01 | 0.05 |

. | Changling . | Tongyu . | Qian'an . | Fuyu . | Baicheng . | Qianguo . |
---|---|---|---|---|---|---|

Mean (mm) | 36.60 | 32.06 | 33.93 | 42.16 | 32.50 | 35.33 |

Standard deviation (mm) | 50.94 | 47.25 | 49.59 | 56.29 | 51.30 | 48.30 |

Maximum (mm) | 343.21 | 276.85 | 310.34 | 379.42 | 325.32 | 254.31 |

Minimum (mm) | 0.01 | 0.03 | 0.00 | 0.01 | 0.01 | 0.05 |

### Input determination

*p*is the order of the model, is the regression coefficients,

*i*= 0, 1, …,

*p*, is the current monthly rainfall value, is the monthly rainfall value

*p*months ago (Karimi 2007; Chattopadhyay

*et al*. 2011).

*n*is the total number of sequence data, is the measured value of the

*i*th year, is the prediction value of

*i*th year

### Performance indices

*R*

^{2}are employed to evaluate the performances of all the forecasting models. The RMSE evaluates the residual between observed and forecasted values, the MAE measures the MAE between observed and forecasted values, the lower the RMSE and MAE, the more accurate the forecasting is. CE evaluates the capability of the model in forecasting rainfall away from the mean, and

*R*

^{2}evaluates the linear correlation between the observed and forecasted rainfall, so the nearer CE and

*R*

^{2}values are to 1, the more accurate the forecasting. These indices are deﬁned as where

*y*and

*Y*are the observed and predicted data respectively, and is the mean observed value (Yang

*et al*. 2009).

## RESULTS AND DISCUSSION

In this paper, when the order was 11, the AIC value reached the minimum as shown in Figure 3, so the order of the autoregressive model is 11 by the AIC, that is, the current monthly rainfall value is closely related with the rainfall value of the previous 11 months.

MATLAB was used to calculate the parameters of the AR model developed in the present study. The coefficients for the AR model are shown in Table 2.

. | φ
. _{0} | φ
. _{1} | φ
. _{2} | φ
. _{3} | φ
. _{4} | φ
. _{5} | φ
. _{6} | φ
. _{7} | φ
. _{8} | φ
. _{9} | φ
. _{10} | φ
. _{11} |
---|---|---|---|---|---|---|---|---|---|---|---|---|

41.389 | 0.243 | − 0.076 | − 0.101 | − 0.104 | − 0.096 | − 0.087 | − 0.094 | − 0.116 | − 0.101 | − 0.101 | 0.351 | |

AIC | 368.93 |

. | φ
. _{0} | φ
. _{1} | φ
. _{2} | φ
. _{3} | φ
. _{4} | φ
. _{5} | φ
. _{6} | φ
. _{7} | φ
. _{8} | φ
. _{9} | φ
. _{10} | φ
. _{11} |
---|---|---|---|---|---|---|---|---|---|---|---|---|

41.389 | 0.243 | − 0.076 | − 0.101 | − 0.104 | − 0.096 | − 0.087 | − 0.094 | − 0.116 | − 0.101 | − 0.101 | 0.351 | |

AIC | 368.93 |

For the GRNN model, it is very important to select the proper input variables; if the order is too low, it is easy to miss the relevant information and have a negative impact on the prediction accuracy, but if the order is too high, the model structure becomes complicated requiring more computation, and may cause the over-fitting problem by the inputs. So the number of input variables was determined by the order of the autoregressive model. Following the analysis above, the autoregressive order was 11, as determined by AIC. Therefore, the rainfall values of the previous 11 months were used as inputs (i.e., X_{1}, X_{2}, … X_{10}, X_{11}), and the rainfall values of the so-called 12th month were used as outputs (i.e., X_{12}), and one input–output pair was one sample. In this way, 637 samples were generated with data for a 53-year period from 1957 to 2010, the 601 samples were used for training, the 36 remaining samples were used as test samples.

MATLAB was used to carry out the GRNN model. A four-layer GRNN model was built with 11 neurons in the input layer, 601 neurons in the pattern layer, two neurons in the summation layer and one neuron in the output layer, the structure was 11-601-2-1. The optimal spread in the pattern layer was selected based on the RMSE using a trial and error method by varying spread from 0.1 to 100, RMSE was calculated in the testing period. The effect of spread on RMSE is shown in Figure 4. The results showed that when the spread is 40, RMSE reached the lowest value. Therefore, the spread of the function was 40.

The SVR model was also developed to forecast monthly rainfall, and the parameters for the SVR model were determined by the trial-and-error method. It shows the variation of the RMSE for various values of regularization parameter *C* in Figure 5. As can be seen in Figure 5, *C* (regularization parameter) values varied from 5 to 15. When the regularization parameter *C* is 10, the RMSE reached the minimum, that is, the performance of the model is best, and the other parameters followed the same path. As a result, regularization parameter *C* (10), regression tube widths *ɛ* (0.0001) and spread *σ* (0.1) were selected as the optimal parameters.

Figure 6 displays the comparison between forecasted and observed rainfall during training by AR, GRNN and SVR for Baicheng. It shows that SVR models and GRNN performed better than AR model during the training period, and the AR model generally underestimated the monthly rainfall compared to the observed values.

It also shows a comparison of observed and predicted rainfall for the testing period by AR, GRNN and SVR models from Figure 7. It was obvious that SVR models outperformed all other models during the testing period in terms of all the standard statistical measures, and both SVR models and GRNN were able to forecast rainfall, the RMSE for SVR models and GRNN are 14.07 and 15.06, respectively, the MAE are 9.82 and 9.94, respectively, the CE are 0.83 and 0.81, respectively, the *R*^{2} are 0.93 and 0.92, respectively, however, all the other methods performed better than AR model, and it obtained the worst MAE and RMSE, CE statistics of 13.08, 20.34 and 0.43, respectively.

Selecting the proper input variables will effectively improve the prediction accuracy, and all had influence on each of the GRNN, SVR, AR models. Take 10, 11 and 12 input variables as an example, the performance of models with different input variables is presented in Table 3. The RMSE for SVR models with the three different input variables are 14.98, 14.07 and 15.86, respectively, the MAE are 9.96, 9.82 and 10.72, respectively, the CE are 0.76, 0.83 and 0.80, respectively, the *R*^{2} are 0.91, 0.93 and 0.90, respectively. It is obvious that the SVR model with 11 input variables performed best among different SVR for all the criteria, and SVR model with 10 input variables performed better than 12 input variables for MAE, RMSE and *R*^{2}, possibly as more input variables may have caused over-fitting. What's more, both GRNN and AR models with 11 input variables performed best for all the criteria, and both GRNN and AR models with 10 input variables performed better than 12 input variables. Therefore, it was obvious that the proper inputs would help in effectively improving the prediction accuracy.

. | AR(10) . | AR(11) . | AR(12) . | GRNN(10) . | GRNN(11) . | GRNN(12) . | SVR(10) . | SVR(11) . | SVR(12) . |
---|---|---|---|---|---|---|---|---|---|

MAE | 13.92 | 13.08 | 15.52 | 10.33 | 9.94 | 11.35 | 9.96 | 9.82 | 10.72 |

RMSE | 20.60 | 20.34 | 22.58 | 15.90 | 15.06 | 17.52 | 14.98 | 14.07 | 15.86 |

CE | 0.33 | 0.43 | 0.10 | 0.76 | 0.81 | 0.73 | 0.76 | 0.83 | 0.80 |

R^{2} | 0.85 | 0.86 | 0.83 | 0.91 | 0.92 | 0.89 | 0.91 | 0.93 | 0.90 |

. | AR(10) . | AR(11) . | AR(12) . | GRNN(10) . | GRNN(11) . | GRNN(12) . | SVR(10) . | SVR(11) . | SVR(12) . |
---|---|---|---|---|---|---|---|---|---|

MAE | 13.92 | 13.08 | 15.52 | 10.33 | 9.94 | 11.35 | 9.96 | 9.82 | 10.72 |

RMSE | 20.60 | 20.34 | 22.58 | 15.90 | 15.06 | 17.52 | 14.98 | 14.07 | 15.86 |

CE | 0.33 | 0.43 | 0.10 | 0.76 | 0.81 | 0.73 | 0.76 | 0.83 | 0.80 |

R^{2} | 0.85 | 0.86 | 0.83 | 0.91 | 0.92 | 0.89 | 0.91 | 0.93 | 0.90 |

According to Tables 3 and Figure 7, it was concluded that SVR models performed best among all other models, therefore, SVR models were applied to forecast monthly rainfall for six counties including Baicheng, Qianguo, Fuyu, Qian'an, Changling, Tongyu. The results are shown in Table 4 and Figure 8. Thus, it has been shown that SVR models has universal application for different regions and compensated for the lack of physical models and provided an effective method for use in hydrological forecasting.

. | Baicheng . | Qianguo . | Changling . | Tongyu . | Qian'an . | Fuyu . |
---|---|---|---|---|---|---|

MAE | 9.821 | 9.968 | 8.750 | 10.223 | 6.789 | 9.974 |

RMSE | 14.072 | 12.043 | 11.701 | 13.402 | 9.343 | 12.137 |

CE | 0.833 | 0.870 | 0.887 | 0.854 | 0.929 | 0.882 |

R^{2} | 0.931 | 0.934 | 0.930 | 0.943 | 0.970 | 0.954 |

. | Baicheng . | Qianguo . | Changling . | Tongyu . | Qian'an . | Fuyu . |
---|---|---|---|---|---|---|

MAE | 9.821 | 9.968 | 8.750 | 10.223 | 6.789 | 9.974 |

RMSE | 14.072 | 12.043 | 11.701 | 13.402 | 9.343 | 12.137 |

CE | 0.833 | 0.870 | 0.887 | 0.854 | 0.929 | 0.882 |

R^{2} | 0.931 | 0.934 | 0.930 | 0.943 | 0.970 | 0.954 |

## CONCLUSION

In this paper, GRNN and SVR models were both applied to forecast monthly rainfall, and the forecasting results were compared with an AR model. In addition, AIC criterion were used to identify the proper inputs for the GRNN, SVR and AR models. The monthly rainfall of Baicheng was used as an example, GRNN, SVR and AR models were constructed, the modeling process discussed and assessment made as to the effectiveness and accuracy of the models.

The rainfall values of the previous 11 months were used as inputs for the forecasting models from the AIC criterion in this paper. The results indicated that the appropriate inputs can improve the performance of all the forecasting models including GRNN, SVR and AR models, and had successfully prevented over-fitting problems caused by the inputs. Furthermore, the results showed that both GRNN and SVR models performed better than AR model in forecasting monthly rainfall and SVR models outperformed all other models during the testing period in terms of all the standard statistical measures. Therefore, SVR models were applied to forecast monthly rainfall for six counties. Four standard statistical measures, MAE, RMSE, CE and *R*^{2}, indicated that SVR models can be effective in forecasting monthly rainfall for different study areas. SVR models can compensate for the lack of physical models and are universal across different regions, while providing an effective way to forecast monthly rainfall.

## ACKNOWLEDGEMENTS

This work is supported by ‘The National Natural Science Funds’ (41372237) and Key Laboratory of Groundwater Resources and Environment, Ministry of Education, Jilin University, Changchun, China. Project 2014025 supported by Graduate Innovation Fund of Jilin University.