Drought prediction models driven by meteorological and remote sensing data in Guanzhong Area, China

Drought is an important factor that limits economic and social development due to its frequent occurrence and profound in ﬂ uence. Therefore, it is of great signi ﬁ cance to make accurate predictions of drought for early warning and disaster alleviation. In this paper, SPEI-1 was con ﬁ rmed to classify drought gradesintheGuanzhongArea,andtheautoregressiveintegratedmovingaverage(ARIMA),randomforest (RF) and support vector machine(SVM) model were established. Meteorologicaldata and remotesensing datawereusedtoderivethepredictionmodels.Theresultsshowedthat(1)theSVMmodelperformedthe best when the models were developed using meteorological data, remote sensing data and a combination ofmeteorologicaland remote sensingdata, butthemodel ’ s correspondingkernelfunctions aredifferentandincludelinear,polynomialandGaussianradialbasiskernelfunctions,respectively.(2)The RF model driven by the remote sensing data and the SVM model driven by the combined meteorological andremotesensing datawerefoundto perform betterthan themodel driven by the correspondingother data in the Guanzhong Area. It is dif ﬁ cult to accurately measure drought with the single meteorological data.Onlybyconsideringthecombinedfactorscanwemoreaccuratelymonitorandpredictdrought.This study can provide an important scienti ﬁ c basis for regional drought warnings and predictions.


INTRODUCTION
As a severe natural disaster, drought not only affects economic development but also influences water resources, agriculture, ecology and environments (Mazdiyasni & Aghakouchak ; Malik et al. ; Zhang et al. a, b; Guo et al. ). Due to the uncertainty in when droughts begin and end, it is very difficult to predict drought.
Therefore, drought has become one of the critical factors limiting the sustainable development of the economy and society in many areas (Yuan & Zhou ; Zhang et al. a, b; Dai et al. ). As an increase in the severity, drought has a profound impact on biogeochemical processes in terrestrial ecosystems (Fang et al. ). Drought also affects the water absorption of vegetation by affecting soil water content and increases the sensitivity of vegetation to water-energy, which plays an important role in terrestrial water, energy and carbon cycles (Fang et al. ; Yinglan et al. a, b). A warming climate is expected to perturb the hydrological cycle, resulting in changes in both the frequency and duration of drought (Han et al. ). To alleviate drought effects, it is of great significance to strengthen the study of regional drought characteristics and to make more accurate drought predictions for early warnings and disaster mitigation (Miao ).  (Dong & Xie ). The PDSI considers the balance of water resources, including precipitation and evaporation processes, additional runoff values, soil moisture content and other conditions (Zargar et al. ), and the PDSI has been extensively used to monitor long-term drought (Liu et al. ). This index also considers water shortages in crops and has been shown better performances in capturing spatial and temporal characteristics (Yuan ). Based on SPI and PDSI concepts, Vicente-Serrano Both the models predicted a progressively increasing aridity in the region throughout the 21st century. Li et al. () investigated the spatiotemporal characteristics of drought in the Weihe River Basin by employing the SPEI index. Vega et al. () investigated hydrological patterns in the Brazilian rainforest through a 9-month SPEI series and determined the Hurst exponents from detrended time series of days with precipitation and accumulated monthly rainfall. The researchers found that the Hurst exponent correlated positively with the monthly mean rainfall. The SPEI not only considers temperature and precipitation but also evapotranspiration (Lu ), and it has been confirmed to be applicable and better than other drought indices in the Guanzhong Area (Xu ). Therefore, the SPEI was selected as the drought index to evaluate the drought events in the Guanzhong Area in this paper.
The traditional drought index is usually based on hydrometeorological data measured at stations, and the spatial resolution of the index does not necessarily meet the requirements of drought monitoring in large-scale regions.
Meteorological satellites, which are widely used in drought remote sensing monitoring (Di et  to 2012 in the Huaihe River Basin. The researchers found that the overall average prediction accuracy is higher than the weather system's weather prediction accuracy, suggesting acceptable prediction results. ARIMA, RF and SVM have been used in drought prediction, but the performance of these three methods needs to be further explored to improve the accuracy of drought prediction in the Guanzhong Area. The main aims of this study are (1) to analyze the temporal distribution characteristics of drought and the main driving factors in the Guanzhong Area, China; (2) to build an ARIMA model, RF model and SVM model, and predict drought grades by using gauge-based and remote sensing monitoring data, respectively; and (3) to select the best prediction method and the corresponding predictors.

Study area
The Guanzhong Area is located in the middle of the The Guanzhong Area is located in the transitional zone between arid and humid regions (Li et al. ), which belongs to a continental monsoon climate with cold winters, hot summers and distinct seasonal features. During the same period of heat and rain, drought is more likely to occur in the Guanzhong Area. The average annual precipitation is 500-700 mm, with precipitation concentrated mostly in summer and autumn and little precipitation occurs in winter (Qiao ).
The special topography results in climatic conditions of high temperature and low rainfall. The area is mostly plain formed by loess sedimentation and river alluvial sediments with soft soil, poor water retention capacity, and therefore, the area is prone to drought. In terms of agriculture, approximately 70% of the land on the Guanzhong Plain is covered by cropland, which mainly includes grain crops, fruit woodlands and vegetables. More than 50% of the croplands in this area are rainfed. There are many irrigated croplands in the western and middle parts, while most of the croplands in the east are rainfed (Zhou et al. ). Due to the dense population and developed agriculture, the increasing industrial and agricultural water supply led to a more severe drought situation. Thus, the Guanzhong Area is known by the saying, '9 drought years out of 10 years' (Yu ).  1959-1961, 1966-1967, 1971-1972, 1976-1980and 1994-1997. Droughts in the Guanzhong Area occurred mostly in spring and summer, and continuous drought events frequently occurred, lasting from spring to summer and from summer to autumn (Wu ).

Data
The data used for drought prediction in the Guanzhong Area include gauge-based meteorological data and remote sensing data.
The gauge-based meteorological data include the daily wind speed, precipitation, air temperature, air pressure, sun-

Academy of Sciences Computer Network Information
Center (http://www.gscloud.cn). The digital elevation model (DEM) was also obtained from this website. Detailed information on the remote sensing data used in this study is shown in Table 1.

Calculation of the SPEI-1 series
The SPEI index is calculated as follows: (1) Calculate the monthly water surplus and deficiency D i , the monthly precipitation (mm) and PET i is the monthly potential evaporation (mm), which is calculated using the Morton-type Penman formula (Mei ).
(2) Calculate the probability distribution of the monthly water surplus and deficit D i by using the log-logistic probability distribution function. (2) Determine p and q (p represents the lag order of the autoregressive processes, and q represents the lag order of the moving average processes (Ghashghaie & Nozari )). The autocorrelation function (ACF) graph and the partial ACF (PACF) graph are drawn, and the ARMA(p,q) models are judged based on the tailing and truncation of the ACF and PACF graph (Wu ).
(3) Fit the model and perform the normality test, autocorrelation test and white noise test on the residuals.
(4) Use the ARIMA model to inspect and predict the monitoring series. For the seasonal non-stationary time series, , and the parameter set (θ i ) is an independent and identically distributed random vector.
Given the independent variable X, each decision tree classification model selects the optimal classification results by one-vote voting rights (Wang ). The modeling steps for the RF model are as follows: (1) Randomly extract M samples from the fitting data set T, and then fit the extracted data set.
(2) For each instance, generate a decision tree, and at each node of the tree.
a. Randomly extract a subset of m variables from the p valid overall features; b. Choose the best variables and the best partition from the set of m variables; and c. Continue until the tree is fully generated (Wang ).
(3) Perform RF predictions using all trees.

SVM model
The SVM model seeks the best compromise between the complexity of the model (learning-intensive reading of specific training samples) and the ability to learn (the ability to identify any samples without error) based on limited sample information for better promotion ability (Wang ).
The SVM was originally proposed for the problem of binary pattern classification in the case of linear separability.
The purpose of classification is to find a classification hyperplane that completely separates the two classes. Let The SPEI-1 series of the Guanzhong Area was calculated according to the SPEI-1 series at the 11 stations, as shown in Figure 3. The Guanzhong Area is prone to drought disasters because of the frequent alternation of dry and wet conditions. In 1962In , 1963In , 1967In , 1969In , 1976In , 1979In , 1994In -2002In and 2007, the Guanzhong Area suffered severe drought events as shown in Figure 3 based on the drought classification in Table 2.
The occurrence frequency and percentage of drought at all grades in 12 months at Baoji Station are acquired and shown in Table 3 1959-1961, 1966-1967, 1971-1972, 1976-1980, 1994-1997 and 1999-2002  Therefore, the SPEI-1 can reasonably reflect the occurrence of drought in the Guanzhong Area.
The number and proportion of droughts at all grades were calculated according to the SPEI-1 series at the 11 stations, as shown in Table 4. The proportion of droughts at 11 stations is nearly 35%, mainly light and moderate droughts, and severe droughts hardly occur. In addition, the number of droughts at all grades decreases with the severity of drought.

Drought prediction by different models
The three models previously mentioned in the 'Predictive models' section were used to predict the drought grades in the Guanzhong Area, with a predicted length of 12 months.
Drought prediction by the ARIMA model   The SPEI-1 was at a higher level before August 2016, that is, the drought degree was lighter.
The fitting and prediction of SPEI values were graded according to Table 2, and the qualified rate of the drought grade was calculated. The qualified fitting and prediction  rates of the ARIMA model at each station are shown in Table 6. The drought prediction grades of the ARIMA model can represent the actual situation to some extent, and the average qualified fitting and prediction rates are 0.5337 and 0.6061, respectively. Except for the qualified prediction rate of the Longxian station, which is 0.7500 and is significantly higher than the average, the qualified rates of other stations fluctuate around the average. The case where the qualified fitting rate is lower than the qualified prediction rate may be due to the uneven length of the fitting and prediction data sets. The fitting data were used from January 1960 to December 2015 (56 years), and the prediction data were used from January to December 2016 (1 year), so that the internal variance in the fitting process was much larger than the prediction process, that is, the fitting process had a large error with respect to the prediction process.
Drought prediction by the RF model  and then, the qualified fitting and prediction rates were calculated as shown in Table 8.
As seen from Table 6 Table 9. The average qualified fitting and prediction rates are 0.6316 and 0.5388, respectively.
Except for the qualified prediction rate of Longxian station (0.6667), which is significantly higher than the average qualified prediction rate, the qualified rates of the other stations fluctuate around the average.

Drought prediction by the SVM model
SVM model driven by meteorological data and accuracy assessment. By using the same fitting data and prediction data as in the 'RF model driven by meteorological data and accuracy assessment' section, four common kernel functions (linear kernel function, polynomial kernel function, Gaussian radial basis kernel function and sigmoid kernel function) were used to construct the SVM model as well as fit and predict the drought grades. The average qualified fitting and prediction rates at the 11 stations are shown in Table 10.
As shown in Table 10, the polynomial kernel function and the linear kernel function perform best in the fitting and prediction, respectively. The Gaussian radial basis kernel function ranks second for both the qualified fitting and prediction rates. The qualified fitting and prediction rates of the sigmoid kernel function differ by 0.0364, which is the smallest and the most stable among the four kernel functions, but the qualified rates of fitting and prediction are the lowest. The qualified rate of linear kernel function fitting is 0.7532, which is only 0.0437 smaller than the highest Gaussian radial basis kernel function, and the qualified rate is the highest in the prediction process.
The difference between qualified fitting and prediction rates is 0.0498, which is only larger than the sigmoid kernel function (ranks first). Therefore, the linear kernel function is considered to be more suitable for drought monitoring in the Guanzhong Area.
SVM model driven by remote sensing data and accuracy assessment. Using the same fitting data and the prediction data shown in the 'RF model driven by remote sensing data and accuracy assessment' section, four kernel functions were used to construct the SVM model as well as fit and predict the drought grades. The qualified fitting and prediction rates of the four kernel functions are shown in Table 11.
The polynomial kernel function and the linear kernel function perform best in the fitting and prediction process, respectively. The qualified rate of the polynomial kernel function in the prediction ranks second, and it differs from the first-ranked linear kernel function by only 0.0081. In addition, the differences in the qualified fitting and prediction rates of the four kernel functions are less than 0.1, which shows that they are relatively stable. Therefore, the polynomial kernel function is more suitable for drought prediction with remote sensing data driving the SVM model.  Table 12.
The Gaussian radial basis kernel function and the linear kernel function perform best in the fitting and prediction process, respectively. The Gaussian radial basis kernel function ranks second with a qualified prediction rate of 0.7652 and only differs from the linear kernel function (ranks first) by 0.0075. That is, the Gaussian radial basis kernel function performs well in both fitting and prediction. Therefore, the Gaussian radial basis kernel function is more suitable for drought prediction with the SVM model driven by combined meteorological and remote sensing data.

Comparison of the model-driven data
For the RF model, except for Longxian station, the qualified fitting and prediction rates show that the performance of remote sensing data in the RF model performed better than the combined meteorological and remote sensing data. The use of meteorological data will reduce the qualified fitting and prediction rates of the RF model. However, the qualified fitting and prediction rates of remote sensing data at the 11 stations are only slightly higher than those of the combined meteorological and remote sensing data. Therefore, the RF model driven by the remote sensing data performed well for drought monitoring in the Guanzhong Area.
In the SVM model, except for the polynomial kernel function, the qualified fitting and prediction rates of the other three kernel functions based on the remote sensing data are lower than those driven by the combined meteorological and remote sensing data. The difference in the qualified rates of the four kernel function models based on different data was calculated, which is shown in Table 13.
The differences between the fitting and prediction of the

CONCLUSIONS
Based on the calculation of the SPEI series of 11 meteorological stations in the Guanzhong Area, the applicability of the SPEI to the drought characterization in the Guanzhong Area was analyzed. Three different models were used to predict the drought grades, and the main conclusions are as follows: (1) The identified drought events based on SPEI-1 were in line with the recorded droughts, suggesting that the SPEI-1 data can reasonably reflect the occurrence of drought events in the Guanzhong Area.
(2) The SVM model (linear kernel function) performs the best, while the ARIMA model performs the worst in terms of drought prediction using meteorological data.
The SVM model (polynomial kernel function) is superior if the models are driven by remote sensing data, while the SVM model (Gaussian radial basis kernel function) outperforms the other models if the predictors are a combination of meteorological and remote sensing data.
(3) When using the remote sensing data and the combination of meteorological and remote sensing data to build the RF model, the use of meteorological data has a small reduction effect on the qualified fitting and prediction rate; the use of meteorological data has a different effect on qualified fitting and prediction rates of different kernel functions in the SVM model. In conclusion, the RF model driven by the remote sensing data and the SVM model driven by the combined meteorological and remote sensing data performed better than the model driven by the corresponding other data in the Guanzhong Area.
Different types and lengths of meteorological data and remote sensing data used in this paper may affect the fitting and prediction accuracy of meteorological and remote sensing data models, which should be discussed in the future.

ACKNOWLEDGEMENT
This work is supported by National Natural Science Foundation of China (No. 51479130).