## Abstract

This study considers large-scale heavy rainfall as a forecast object based on the European central numerical forecast model product and uses a nonlinear fuzzy neural network (FNN) intelligent calculation method to establish a short-term forecast model of rainstorms. The information gain method is introduced to the predictor processing of the forecast model. Then the characteristics of many rainstorm predictors are calculated and screened on the basis of feature weight, information is condensed, some non-correlated forecast information variables are extracted, and the network structure of the forecast model is optimized. The modeled samples are determined and reconstructed by setting thresholds, and the modular forecast models of heavy rainfall and weak rainfall are established. The actual forecast results of the 24 h experimental prediction of the independent samples of large-scale rainstorms in Guangxi in 2012–2016 showed that the information gain-based modular FNN rainstorm forecasting model has higher prediction accuracy and a more stable forecasting effect. The various types of scores of 24 h of rainstorm (≧50 mm) at 89 weather stations in Guangxi from 2012 to 2016 are: threat score (*TS*) is 0.368, ETS: equal threat score (*E*) is 0.141, hit rate (*POD*) is 0.296, empty report rate (*FAR*) is 0.559, forecast bias (*B*) is 0.671, and HSS skill score (*H*) is 0.247. Further comparison and analysis of the European Centre for Medium-Range Weather Forecasts (ECMWF) numerical forecasting model forecast results indicated that the new model performed nonlinear intelligence calculated interpretation modeling on ECMWF numerical forecasting model products, and forecasting accuracy is improved to a certain extent compared with that of the original model. Forecasting techniques are positive and have good release effects, thereby improving the rain forecasting ability of ECMWF to a certain extent and providing a better reference value for business forecasters.

## HIGHLIGHTS

Based on the European central numerical forecast model product and using a nonlinear fuzzy neural network intelligent calculation method to establish a short-term forecast model of rainstorms.

The information gain method is introduced to the predictor processing of the forecast model.

Actual forecast results showed that the new model has higher prediction accuracy and a more stable forecasting effect.

## INTRODUCTION

China is located in the East Asian monsoon region. During the onset and duration of the summer monsoon, heavy rainstorms occur frequently, often causing heavy casualties and economic losses. With a complicated process and numerous influencing factors, the summer monsoon has been the focus of weather forecasting and is a difficult problem in global weather forecasting (Tao *et al.* 2003; Zhao & Sun 2013).

With the continuous enhancement of the space-time resolution of a numerical forecasting model and the continuous improvement of physical modeling, the interpretation and application of numerical forecasting products have become a more effective method of forecasting rainstorms; strengthening them and focusing on the role of forecasters to improve the accuracy of disaster prediction (Reynolds 2003; Tian *et al.* 2008; Yan *et al.* 2008). To improve the forecast level of rainstorms locally and internationally, researchers have developed various rainstorm forecasting methods based on the interpretation of numerical models and numerical model products. Among them, the numerical prediction products of the batching method are common, and the forecasting effect on heavy rainfall during a rainstorm has also been improved (Zhang *et al.* 2010; Liu *et al.* 2015; Huang *et al.* 2016). Linear statistical forecasting modeling also plays an important role in the forecasting of the numerical model products of rainstorms. It mainly uses the statistical method to establish the statistical relationship between predictors and forecasting quantity of the output of a numerical forecasting product to conduct or improve objective factor forecasting (Liu & Ma 1996; Zhang *et al.* 2006; Jing *et al.* 2008; Li & Zhao 2009; Zhao *et al.* 2009; Zhong *et al.* 2009). A lot of study analysis shows that the linear regression method has been greatly improved, and is mainly used in the prediction of natural disasters. For example, the forecasting of precipitation, rainfall intensity, maximum wind speed, and hurricane central pressure which can cause natural disasters and damage through floods, rainstorms, droughts and hurricanes (Murnane & Elsner 2012; Prahl *et al.* 2012; Zhai & Jiang 2014; Lee *et al.* 2016; Choi *et al.* 2017; Choo *et al.* 2017; Kim *et al.* 2017; Choi *et al.* 2018; Mosavi *et al.* 2018; Fung *et al.* 2019). Forecasting factors and rainstorms mostly show nonlinear relationships; therefore, artificial neural network methods with good nonlinear information are also widely used in rainstorm forecasting (Liang *et al.* 2009; Wu *et al.* 2012).

The objective quantitative rainstorm precipitation forecasting method based on the interpretation of numerical forecasting products is an important development in the direction of short-term objective quantitative forecasts. Changes in heavy rainfall are complicated, caused by many influencing factors, and characterized by dynamism, nonlinearity, and random uncertainty. In prediction, a signal processing algorithm should be modified as much as possible as the external environment changes to predict a storm by continuously adjusting algorithm parameters. In contrast to traditional mathematical models, fuzzy systems and neural networks can imprecisely handle uncertain information with their own characteristics. They can complement each other to good advantage, as a fuzzy system compensates for the abstract problem of neural network output expression, and the neural network compensates for the poor adaptive ability and robustness of the fuzzy system (Tian *et al.* 2002; Zhan *et al.* 2009; Zhang *et al.* 2009). On the basis of these phenomena, we aim to propose a numerical model rainstorm interpretation prediction method based on information gain-based feature optimization and a modular fuzzy neural network (FNN). Heavy rainfall at 89 stations in Guangxi is used as a forecast object. Rainstorm predictors are primarily selected on the basis of the analysis of various physical mechanisms affecting heavy rain, and the information gain calculation method is applied to obtain the optimal combination of factors. Then, sample models are further reconstructed on the basis of the European Centre for Medium-Range Weather Forecasts (ECMWF) value to establish the modular FNN forecasting models for heavy rainfall and light precipitation and to explore a new forecast method for heavy rainfall forecasting.

## PRINCIPLE AND METHOD OF INFORMATION GAIN-BASED MODULAR FNN PREDICTION MODEL

The construction method of the forecasting model is essential for an excellent forecasting effect. In this study, the FNN method combined with the information gain predictor selection technique is used to establish a rainstorm forecasting modeling method for numerical forecast products with nonlinear intelligent calculation**.**

### Information gain feature selection method

*et al.*2017). The measurement standard of information gain is the amount of information the feature can bring to the forecasting model, and the more information is available, the more important the feature will be. For a certain feature, the amount of information changes when it appears in the forecasting model and when it does not appear. The difference is the amount of information that the feature carries to the forecasting model. In information theory, the amount of information is ‘entropy’. When the feature weight is calculated using information gain, the weight of the feature is determined by the amount of information, and the important features are screened by this standard. The formula of information gain is expressed as follows:where

*c*is the total number of rainstorm predictors; is the probability that the characteristic rainstorm predictor appears in the total rainstorm predictor concentration; is the probability of feature item

*w*in the rainstorm predictor; is the conditional probability of the rainstorm predictor belonging to the category when it includes the characteristic

*w*; is the probability that the rainstorm predictor does not contain the characteristic

*w*; and is the conditional probability of the rainstorm predictor belonging to the category when it does not include

*w*.

### Fuzzy neural network structure and learning algorithm

A FNN combines the nonlinear processing ability of fuzzy sets with the advantages of artificial neural network self-organization and powerful self-learning ability; thus, it combines fuzzy computing, fast learning, and difficult nonlinear solving ability and greatly improves abilities. Among them, many methods, such as membership function synthesis, fuzzy clustering, fuzzy discriminant model, and fuzzy priority ratio, are often used in fuzzy mathematical methods (Xi 1987; Chen 1997). This study aims to use this calculation method to conduct short-term forecasting modeling experiments on large-scale heavy rainfall in Guangxi and to add new forecasting tools for daily work.

#### FNN structure

The FNN structure of this study is a numerical four-layer feedforward network: input, membership generation, inference, and anti-fuzzy output layers (Wang 1998) (Figure 1).

*m*is the fuzzy segmentation number, and and determine the center and width of the membership function, respectively.

#### FNN learning algorithm

When the FNN method is used to establish the rainfall forecasting model in Guangxi, it is essentially a multi-layer feedforward network; hence, the feedforward network of the BP (back propagation) algorithm is used to train the adjustment parameters, and the main parameters to be learned are the connection weight of the network, , and the central, , and width values of the membership function of the membership generation layer, (;).

*y*is the actual output value,

*Y*is the expected output value, and is the squared error function. In the process of learning and training, the gradient descent method is used to obtain the learning rules of the central value ; the width and the weight of the membership function which are as follows:where is the learning rate (). is substituted into Formulas (5), (6), and (7), and the derivation rule of the compound function is used to obtain the updated formulas for the membership function parameters and weights as follows:

In the process of network training, the membership function is set to Formula (2). First, the constructed predictive factor is used to learn matrix *X* and train the network. The main calculation steps are included as follows:

- (1)
The initial value is set as = 1. The learning rate is 0.9, and the network training error is .

- (2)
At the initial moment, the connection weight to the network, the center value of the membership function, and the width value are initialized with a random number.

- (3)
The BP algorithm is used to train and adjust the parameters and weights.

- (4)
The error of the actual output of the network and the expected output is calculated.

- (5)
When , step (3) is repeated. Otherwise, when the network training ends, the network parameters and connection rights obtained by the training are used for prediction.

### Combined design of information gain and FNN forecasting model

The FNN interpretation forecast modeling of a large-scale rainstorm in Guangxi is mainly designed by combining information gain and the FNN. The introduction of information gain is mainly to control the factor input matrix of the FNN. In general, in the training of the FNN model, the input layer parameters of the network should be sufficient. Otherwise, the network structure is complicated, and the amount of training and learning time are extended, but too few parameter inputs do not reach the predicted accuracy because of the lack of information. Therefore, in the factor input parameter of the FNN input parameters, the appropriate input parameter should be selected. The optimal combination method of the information factor of the information gain is then introduced to select and reconstruct the original storm predictor. On the basis of the calculated information factor on the characteristic of each factor, we determine the role of the attribute, select the feature factor with a large amount of information and a large contribution rate to the prediction accuracy, and eliminate the factor with a small shadow. The information gain method is used to optimize the network structure of FNN and conduct model prediction in accordance with the following steps (Figure 2):

- (1)
The predictors for numerical forecast products are pre-selected.

- (2)
The feature extraction of information gain is conducted for each original predictor field.

- (3)
The extracted information gain characteristic value is correlated with the forecast quantity, and the correlation significance test is performed to obtain several characteristic values condensed with all vector features as the input nodes of the neural network.

- (4)
The threshold is set on the basis of the forecast result of the European central model product, and the independent forecast samples are modularized on the basis of the rainfall amount to construct modular forecast models of rainstorm and no rainstorm.

- (5)
The modular FNN forecasting model is used to predict the heavy rainfall data of the 89 stations in Guangxi, and the prediction results are obtained.

## TEST OF INFORMATION GAIN-BASED MODULAR FNN TO FORECAST RAINSTORM INTERPRETATION

### Data of forecast test

Guangxi is located in the southern part of China and bordered by the South China Sea. It is affected by middle and high latitudes and tropical ocean circulation. Hence, precipitation is high, and heavy rains often occur, causing floods and disasters, which have a major impact on people's lives and economic assets. In this study, the rainstorm weather processes of 89 observation stations in Guangxi are used as the forecasting object. When modeling and testing the nonlinear intelligent calculation method for large-scale heavy rainfall in Guangxi, the basic rainstorm precipitation data are daily precipitation in Guangxi from a total of 35 years between 1982 and 2016 (from 20:00 to 20:00, Beijing time, the same as below). Given that this paper is mainly considering the forecasting of rainstorms in the base daily precipitation data of >35 years, the daily precipitation of ≥50 mm is selected, and the precipitation data exceeding this value for 10 stations are taken as a test sample. A total of 690 rainstorm samples are extracted, and the data of 1982–2011 (a total of 579 samples) are used as the basic sample for modular modeling. In the following 5 years, that is, from 2012 to 2016, 111 samples are used as independent samples to conduct prediction experiments consistent with actual business forecasts.

The physical data of numerical forecast products are mainly derived from the global re-analysis data (https://www.ecmwf.int/) of the European Center for Medium-Range Weather Forecasts (ECMWF) ERA interim. The spatial resolution is 0.75° × 0.75° latitude and longitude grid distance, and the time resolution is 6 h, including four world times of 00:00, 06:00, 12:00, and 18:00. Moreover, there are data on temperature, wind field, geopotential height field, sea level pressure field, and a physical index of various precipitation types at four levels (200, 500, 700, and 850 hPa), over a total of 35 years (1982–2016). The selected physical quantity field range was as follows: latitude of 9.75° N–40.5° N; longitude of 79.5° E–120° E; and a total of 2,310 grid points (Figure 3).

### Preselection of rainstorm predictors

The FNN is essentially a statistical forecasting method. Hence, the selection of historical samples and the construction of predictors are essential for the model construction. In a study on the large-scale heavy rainfall forecasting method in Guangxi, the physical quantity field of global re-analysis data from ECWMF ERA interim is used as a primary rainstorm predictor (hereinafter referred to as ECMWF). In the selection of the range of predictors, we know that the atmosphere is a process of continuous motion with time. In general, the occurrence of a weather phenomenon is closely related to local atmospheric circulation conditions and large-scale environmental field changes. Guangxi is located in southern South China, and its area is confined to latitude 19.5° N–26.5° N; longitude 104.2° E–112.2° E, and the range is relatively small for large-scale atmospheric movements (Figure 3). If only the circulation factor in the range is selected as the predictor and the change in atmospheric motion cannot be accurately described, we expand the selection range in the primary selection of factors to 9.75° N–40.5° N and 79.5° E–120° E, with a total of 2,310 grid points of physical field data as candidate factors.

The most important principle for selecting predictors in the rainstorm forecast experiment here is to judge the correlation with the forecast object (rainstorm). When calculating the correlation coefficients of rainstorm forecast factors, it is found that the most relevant forecast factors are mainly concentrated in the 24 h before the weather phenomenon occurs, followed by 48 h, and finally 72 h. As time moves forward, the correlation becomes lower and lower, and the number of predictors is also very small. If other time-dependent forecasting factors are added, it will take more time to forecast and affect the speed of forecasting. At the same time, it does not significantly improve the accuracy of forecasting. Therefore, in terms of timeliness, the data of 24, 48, and 72 h before the weather phenomenon occurs are mainly selected.

The occurrence and development of rainstorms are inseparable from the water vapor, energy, and thermal conditions. To fully explore various factors causing rainstorms, we extracted 24 types of heavy rainfall precipitation physical quantity forecast factors in the same period of the EC numerical forecasting model, and water vapor factors, including middle and high levels, specific humidity, relative humidity, water vapor flux, water vapor flux divergence, and water vapor advection. Dynamic factors include divergence, vertical velocity, vorticity, and wind field UV component. Thermal conditions mainly include total temperature field, total temperature advection, temperature advection, temperature, and false equivalent temperature. Various physical quantity indexes related to heavy rainfall are preferred: K index, ky index, SI index, and uplifting index. These 24 field factors are used as basic predictor groups, which are further combined and calculated to obtain more physical factors related to rainstorms.

### Factor processing and extraction

Our census reveals that a large number of primary rainstorm predictors can be obtained. If all of these predictors are used in the model for modeling and forecasting, the input structure of the forecasting model is too large and easily learned. The information gain method is used to reduce the dimensionality of the predictor group to reduce the input nodes of the forecasting model and retain the forecast information of all predictors as much as possible. The specific practices are as follows:

- (1)
In the extraction of information gain feature factors, the information gain is calculated for all of the grid points (2,310 grid points) of each physical quantity field, and the first few characteristic components with a high information gain rate are selected to obtain the set of the feature factors of physical quantity.

- (2)
The feature factors obtained by filtering all the physical quantity fields after information gain are combined to form a new predictor set. The new factor concentration predictors are numerous and produce the overlearning problems when they are directly used in the forecasting model. In the selected feature predictor set, it does not indicate that it has a high correlation with the forecast amount, although the weight of the information gain rate is high. Therefore, this method should further carry out the correlation calculation analysis with the forecast quantity and screen out highly relevant predictors to form a new forecast factor set. The criteria for the selection of predictors are mainly determined by multiple control experiments on the rainstorm forecast model. When the standard of selected correlation coefficient decreases, it is found that the number of predictors will increase greatly. After increasing the standard value, it is found that the amount of forecast information contained in the forecast factor concentration will gradually weaken, resulting in a decline in forecast accuracy. After many experiments, it is finally determined that the selection criterion of the predictor is the absolute value of the correlation coefficient ≥0.3.

- (3)
The newly formed high correlation predictor group should be further tested for significance to control the complex collinearity relationship between the factors, and the physical significance of the information gain rainstorm precipitation predictor can be decomposed. The noise factor affecting the forecast effect is removed to control the number of predictors selected into the final forecast model. In general, the appropriate predictor matrix structure can obtain a higher forecasting accuracy when the forecasting model is utilized, but the model input is too large and can easily cause overfitting problems (Jin

*et al.*2004).

### Modular FNN rainstorm forecast

In this study, in the large-scale heavy rainfall forecast experiment in Guangxi, the FNN modeling method is used to establish the corresponding FNN rainstorm forecasting model for each forecasting station in Guangxi. In the calculation of the FNN algorithm, the setting of the parameters is unified as follows: the number of predictors of each site selected is the input node of the network, the output node is 1, the number of inference layer sections is 3, the number of network training times is set to 500, the learning factor is 0.9, and the overall error is set to 0.0001. The set FNN forecasting model is used to test and inspect 111 independent forecast samples of rainstorm precipitation at 89 stations in Guangxi according to the actual business forecast. By using this model, we predicted and tested 111 independent forecast samples of rainstorms at 89 meteorological observatory sites in Guangxi. When the forecasting model predicted the 580th independent forecasting sample, it first forecasted the first independent sample by using the preceding 579 modeling samples to establish the forecasting model and then added the 579 modeling samples to the first independent sample, 580 samples as a model sample, the second independent sample of the forecast, and so on. Finally, 689 samples were used as a model sample for the last 690 independent samples for forecasting, and the forecasts of the independent samples were in line with the actual forecast.

In the study of rainstorm forecasting modeling for 89 sites in Guangxi, we conducted independent modeling and forecasting for substations, that is, 89 different forecasting models were established to forecast for 89 forecasting stations. In the sample selection, the daily precipitation of ≥50 mm and the precipitation data over 10 stations were used as the test sample. However, this finding does not indicate that all of the 89 sites in Guangxi have rainstorm weather. Instead, a certain threshold is set to reclassify the modeling sample set of the station's modeling samples so that each site can be modeled and forecasted more specifically. Specifically, in the actual forecast, we first model the first sample of the 111 independent samples. At present, the forecasting model of the ECMWF numerical forecasting model has stable rainfall forecasting performance and high forecasting accuracy. To control the empty report and false report rate of rainstorms and further improve the forecasting accuracy rate, we assume in the single-station rainstorm forecasting model based on the rain forecast products of ECMWF numerical forecast products that there are two threshold parameters A and B, where A represents the threshold of the precipitation above the rainstorm with a large probability of ECMWF rainfall forecast value. On the contrary, nearly no rainstorm is observed, and B is the actual precipitation. When the value of the ECMWF prediction interpolation to the site reaches A or above, the real precipitation may occur after a rainstorm. At the same time, the threshold set by parameter B is generally smaller than that by parameter A because an error exists between the predicted value of the model and the actual precipitation. The threshold is set, and a modular classification processing method for the basic modeling samples of the model is adopted, specifically to check the size of the ECMWF prediction value of the station. When the interpolated ECMWF prediction is further interpolated to a value above A, a sample with a historical live precipitation reaching B or higher is searched for in the basic modeling sample to reorganize the sample, and a rainstorm forecast model of the station is established. When the value is less than threshold A, the basic modeling sample is re-selected, and the samples with historical live precipitation less than B are searched for in the basic modeling samples to reconstruct the samples and construct a new non-storm heavy rainfall prediction model. In the setting of the threshold, we conducted a number of tests and finally determined that when the threshold A is 20 mm and B is 15 mm, the obtained prediction effect is the best. The specific modeling steps are as follows:

- (1)
The ECMWF precipitation forecast field is used as a reference standard for rainfall to identify the 24 h rainfall at each site.

- (2)
The polynomial interpolation method is utilized. The ECMWF 24 h rainfall forecast field is interpolated to 89 meteorological observatories in Guangxi so that each forecast object (weather station) has a rainfall value, which is recorded as

*R*n. - (3)
For the

*k*th (*k**=*1,…,89) forecast object Y (weather station), if its*R*n mean is greater than the threshold A mm, then the sample number set whose precipitation is greater than B mm is selected in the historical sample sequence of the forecast object Y in which the sample number is recorded as . - (4)
For the modeled sample number set of the forecast object Y, the corresponding factor matrix S and the forecast amount sequence composed of the information gain method selection can be obtained.

- (5)
The FNN model algorithm is used on the basis of the selected factor matrix and to establish the ensemble prediction model and to obtain the rainfall forecast value for the station for the next 24 h.

## FORECASTING MODEL PREDICTION TEST AND RESULT ANALYSIS

In this study, when the data from 1982 to 2011 were used as the basic modeling samples, a total of 111 samples from the rainstorm process in Guangxi during the period of 5 years from 2012 to 2016 were used as independent forecast samples for the 24 h forecast aging test consistent with the actual business forecast. The model is based on the modeled samples of each site identified above, and the input factors for the FNN forecasting model are selected using the final predictor input model (see sections on ‘Fuzzy neural network structure and learning algorithm’ and ‘Combined design of information gain and FNN forecasting model’).

*TS*), ETS: equal threat score (

*E*), empty report rate (

*FAR*), hit rate (

*POD*), forecast bias (

*B*), HSS skill score (

*H*), and other scoring indicators. The forecast bias (

*B*) indicates the ratio of the number of occurrences of the forecast event to the number of times the observed event occurred. The equal threat score is an improvement on the threat score, which can penalize empty or missing reports, making the score fairer than the latter. The HSS skill score can also punish empty or missing reports, and the expected score for the random and constant forecast is 0. The mathematical attribute is linear and progressively fair, which is also one of the fair precipitation tests (Table 1) (Wang & Yan 2007; Wu

*et al.*2017). The specific scoring formulas are as follows:wherewhere is the number of correct stations (times), is the number of empty report stations (times), is the number of missing report stations (times), and is the correct number of stations (times) for which the forecast and live conditions have not reached the threshold (Table 1). According to the definition,

*TS*is the threat score for a rainstorm forecast. The ideal score for the equal threat score is 1, ranging from −1/3 to 1, with 0 indicating no skill. However, penalties occur for both empty and missing reports. Hence, the source of the forecast error cannot be distinguished. In general, the score is lower than the threat score, and the equal threat score in the region with more precipitation is significantly lower than the threat score. and are the mathematical expectations of the random forecasts when the number of empty report stations and the number of missing report stations are equal.

Actual observation . | Forecast . | |
---|---|---|

Yes . | No . | |

Rainstorm | ||

No rainstorm |

Actual observation . | Forecast . | |
---|---|---|

Yes . | No . | |

Rainstorm | ||

No rainstorm |

The above scoring formula is used to obtain the various types of scores of 24 h of rainstorms (≥50 mm) at 89 weather stations in Guangxi from 2012 to 2016. The new scheme is essentially a method of interpretation and forecasting of numerical forecast products to verify the advantages and disadvantages of the new scheme for the forecast of heavy rainfall precipitation. To understand the interpretation and prediction ability of this new method proposed in this paper, we calculated the various types of scores of the rainstorm forecasts of the 89 meteorological observatories in Guangxi in the past five years, and the forecast results are compared with those of ECMWF in the same segment and in the same region. First, the data of the ECMWF prediction grid for the station should be interpolated, where bilinear interpolation is utilized for interpolation (the same as below). Table 2 and Figure 4 present scores using the new method for the forecast results in 2012–2016 compared with scores for the corresponding forecast period of the interpolated ECMWF rainfall data, for 89 meteorological stations in Guangxi of the yearly and 5-year average of annual rainstorms of various grades.

Scoring index . | Model . | 2012 . | 2013 . | 2014 . | 2015 . | 2016 . | Average . |
---|---|---|---|---|---|---|---|

TS | FNN | 42.35 | 36.62 | 40.78 | 32.71 | 32.02 | 36.84 |

ECMWF | 23.64 | 16.38 | 19.7 | 12.95 | 12.91 | 16.73 | |

ETS | FNN | 18.58 | 16.76 | 13.83 | 12.74 | 9.93 | 14.08 |

ECMWF | 17.44 | 8.82 | 11.27 | 7.61 | 7.80 | 10.18 | |

FAR | FNN | 53.75 | 57.81 | 52.58 | 55.05 | 61.07 | 55.88 |

ECMWF | 49.17 | 68.85 | 55.37 | 58.78 | 60.91 | 59.22 | |

POD | FNN | 37.19 | 36.49 | 30.36 | 26.18 | 21.80 | 29.62 |

ECMWF | 30.65 | 25.68 | 26.07 | 15.88 | 16.17 | 22.11 | |

B | FNN | 80.40 | 86.49 | 64.03 | 58.24 | 56.02 | 67.14 |

ECMWF | 60.30 | 82.43 | 58.42 | 38.53 | 41.35 | 54.21 | |

HSS | FNN | 31.34 | 28.7 | 24.31 | 22.6 | 18.07 | 24.69 |

ECMWF | 29.7 | 16.21 | 20.26 | 14.14 | 14.47 | 18.48 |

Scoring index . | Model . | 2012 . | 2013 . | 2014 . | 2015 . | 2016 . | Average . |
---|---|---|---|---|---|---|---|

TS | FNN | 42.35 | 36.62 | 40.78 | 32.71 | 32.02 | 36.84 |

ECMWF | 23.64 | 16.38 | 19.7 | 12.95 | 12.91 | 16.73 | |

ETS | FNN | 18.58 | 16.76 | 13.83 | 12.74 | 9.93 | 14.08 |

ECMWF | 17.44 | 8.82 | 11.27 | 7.61 | 7.80 | 10.18 | |

FAR | FNN | 53.75 | 57.81 | 52.58 | 55.05 | 61.07 | 55.88 |

ECMWF | 49.17 | 68.85 | 55.37 | 58.78 | 60.91 | 59.22 | |

POD | FNN | 37.19 | 36.49 | 30.36 | 26.18 | 21.80 | 29.62 |

ECMWF | 30.65 | 25.68 | 26.07 | 15.88 | 16.17 | 22.11 | |

B | FNN | 80.40 | 86.49 | 64.03 | 58.24 | 56.02 | 67.14 |

ECMWF | 60.30 | 82.43 | 58.42 | 38.53 | 41.35 | 54.21 | |

HSS | FNN | 31.34 | 28.7 | 24.31 | 22.6 | 18.07 | 24.69 |

ECMWF | 29.7 | 16.21 | 20.26 | 14.14 | 14.47 | 18.48 |

The modular FNN rainstorm forecasting model based on information gain has an average *TS* of 0.37 for the 24 h aging rainstorm (≧50 mm) at 89 weather stations in Guangxi from 2012 to 2016, and ECMWF has a *TS* of 0.17 for the 24 h aging rainstorm. The score for the new method is 0.20 higher, which is an increase of 120.20%. The accuracy of the new method of rainstorm forecasting is more obvious. The rainstorm forecast results of the new mathod significantly exceed the ECMWF rainstorm forecast results for 5 years through the further analysis of the average rainstorm threat scores of 89 stations in each year from 2012 to 2016. Each year improves by: 79.15%, 123.56%, 107.01%, 152.59%, and 148.02%; except for 2012, the increases in the other four years are more than double (Figure 4(a)).

The equal threat scores were significantly lower than the threat scores. Overall, the equal threat scores of the new method forecasts exceeded the ECMWF forecasts. In 2012–2016, they increased by 6.50%, 90.02%, 22.72%, 67.41%, and 27.31%. These equal threat scores were less improved in 2013. However, in comparison with ECMWF, they were greatly improved in 2013 and 2015 (Figure 4(b)).

The results of the empty report rate (*FAR*) showed that the ECMWF's empty report rate was high (Figure 4(c)). In our specific analysis, in 2012, the empty report rate of the new method was 0.53, and that of the ECMWF was 0.49, indicating that the new method had more empty reports than ECMWF. In 2013, 2014, and 2015, the ECMWF report rate was higher than that of the new method. In 2016, the results of the empty report rates of the two methods were equivalent.

In the comparison of the torrential rain hit rate (*POD*), the new hit rate for rainstorms in each year was significantly higher than that in the ECMWF forecast, increasing by 21.33%, 42.10%, 16.46%, 64.86%, and 34.82% in 2012–2016, demonstrating a good release forecast effect (Figure 4(d)).

According to the analysis of the forecast deviation (*B*) (Figure 4(e)), the new method predicts that the ratio of the number of rainstorms to the actual number of observations increases yearly and is closer to 1 than the ECMWF prediction result. Hence, it is closer to the actual observation value.

The HSS forecasting skill score shows that the actual forecast of the new method exceeds the forecasting skill of the ECMWF, increasing by 5.52%, 77.05%, 19.99%, 59.83%, and 24.88% in 2012–2016. The 4-year forecasting skills improved significantly, with an average increase of 33.60% over 5 years (Figure 4(f)).

The results of several torrential rainstorm test statistics show that the forecasting skills of the new model relative to the ECMWF are positive techniques with stable forecast performance and good release effect, which can improve the rainstorm forecasting ability of the ECMWF model to some extent. The improvement is closely related to the processing method of information gain factor, the reconstruction of modeling samples, and the application of the FNN model.

## CONCLUSION AND DISCUSSION

This study considers large-scale heavy rainfall as a forecast object based on the European central numerical forecast model product. The information gain method is introduced to the predictor processing of the forecast model. Then, a short-term forecast model of rainstorms is established by using the FNN intelligent calculation method. Using this model to predict large-scale rainstorms in Guangxi from 2012 to 2016, and comparing the results with the ECMWF model's forecasting results, the following conclusions are obtained:

- (1)
In this study, the information gain method was introduced to evaluate the feature information of many rainstorm forecasting factors in the primary selection, obtain new characteristic variables, and further acquire comprehensive characteristic information that can describe the original forecasting factors to the greatest extent by calculating the correlation and conducting a significance test. As a new input forecasting feature factor of the FNN model, the model could achieve the purpose of dimensionality reduction and remove redundant information, effectively decreasing the correlation between parameters, optimizing network structure, and enhancing performance.

- (2)
When the rainstorm forecasting modeling study was carried out on 89 forecasting stations in Guangxi, thresholds were set on the basis of ECMWF forecasting products to reclassify the modeled samples for rainstorm and non-storm samples so that the module could be better modeled for prediction.

- (3)
Rainstorm forecasting still has problems such as the inability to accurately evaluate the degree of influence of various factors. In order to solve the problem of rainstorms being affected by the interaction and coupling of multiple factors, many scholars currently use artificial neural network methods for prediction and analysis. However, due to the limitations of the neural network, the selection of weights and thresholds was relatively random, resulting in a low probability of obtaining the global optimal value. The difference between a fuzzy neural network and a traditional artificial neural network is that the FNN combines fuzzy theory with a neural network, and integrates the advantages of a neural network's nonlinear processing ability and fuzzy theory's logical reasoning ability. It is a hybrid intelligent optimization method. In contrast to the implementation processes of other feedback neural networks, fuzzy neural networks fuzzify the weights to make them have logical inference meaning. The characteristic of this network is that its network parameters are defined by fuzzy theory, which can handle fuzzy information and has strong fault tolerance. This fuzzy information processing method can break through the limitation of domain search, realize distributed collection, and is beneficial to quickly search for the optimal value. In our experiment, the fuzzy logic reasoning ability of the fuzzy neural network algorithm and the nonlinear processing ability of the neural network were used to construct the main influencing factors of the rainstorm and its own network optimization model to realize the accurate description and prediction of the rainstorm process. From the comparative analysis of the forecasted rainstorm value and the actual value in the experiment, it can be seen that the forecast accuracy was significantly higher than the forecast result of the numerical forecast model, indicating that the fuzzy neural network model has good reliability and accuracy, and has good performance for the weather forecast business. Moreover, the method can be widely used in the forecast of other weather disaster elements, such as the forecast of wind, temperature and other elements.

- (4)
The actual forecast results of the 24 h timeliness test of large-scale heavy rainfall in Guangxi in 2012–2016 showed that the new method has better forecast results and a stable forecasting effect.

- (5)
The forecasting model of heavy rainfall proposed in this paper was based on the numerical forecasting products. Therefore, the good or poor precipitation forecasting ability above the rainstorm level of the numerical forecasting model would directly affect the forecasting ability of the forecasting model proposed in this paper. However, using the forecast model in the rainstorm level forecasts of 89 stations in Guangxi showed that the forecasting ability of the forecasting product was more accurate than the interpolation of the numerical forecasting model, and had a better reference value for business forecasting personnel, through the method proposed in this paper.

## ACKNOWLEDGEMENTS

This study was supported by the Guangxi Key National Natural Science Foundation (Grant No. 2017GXNSFDA198030), the National Natural Science Foundation of China (Grant No. 41765002), and the Guangxi General National Natural Science Foundation (Grant Nos. 2018GXNSFAA281281, 2018GXNSFAA294128, 2018GXNSFAA281229).

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.