Improving BP artificial neural network model to predict the SPI in arid regions: a case study in Northern Shaanxi, China


 Drought prediction plays an important guiding role in drought risk management. The standardized precipitation index (SPI) is a popular meteorological drought indicator to measure the degree of drought. The SPI time series is non-stationary, whereas the conventional artificial neural network (ANN) model has limitations to predict non-stationary time series. To overcome this limitation, it is essential to investigate input data preprocessing to improve the ANN model. In this paper, a hybrid model coupled with singular spectrum analysis (SSA) and backpropagation ANN is proposed (SSA-BP-ANN). The advantage of this model is that the SSA of finite-length SPI sequences does not require the adoption of boundary extensions to suppress boundary effects, while the most predictable components of the SPI can be efficiently extracted and incorporated into the model. The proposed SSA-BP-ANN model is tested in case studies at three meteorological stations in Northern Shannxi Province, China. The results show that the SSA-BP-ANN model can produce more accurate predictions than the BP-ANN model. In addition, the performance improvement of SSA on the BP-ANN model is slightly better than wavelet decomposition and empirical mode decomposition. This new hybrid prediction model has great potential for promoting drought early warning in arid regions.


INTRODUCTION
From a temporal perspective, humans have been struggling with drought for the last thousand years (Dai 2011). From a spatial perspective, drought is one of the most common natural disasters globally, characterized by high frequency, broad impact, long duration and high economic losses (Zhang et al. 2015a(Zhang et al. , 2015b. Therefore, the research on drought has attracted much attention world widely (Haile et al. 2019;Balti et al. 2020;Kaur & Sood 2020).
Drought generally can be divided into four categories: meteorological, agricultural, hydrological and socio-economic droughts (Kimwatu et al. 2021;Kolachian & Saghafian 2021;Wu et al. 2021;Zhang et al. 2021). A chronic lack of precipitation causes an imbalance in the hydrological cycle, the leading cause of the drought. Thus, many scholars have focused their attention on monitoring and preventing meteorological droughts. To quantify the severity of meteorological drought, researchers have developed many meteorological drought indicators (Waseem et al. 2015;Zhang et al. 2015aZhang et al. , 2015bRehana & Naidu 2021), such as precipitation anomaly percentage (PAP) (Tahroudi et al. 2020), standardized precipitation index (SPI) (Mckee et al. 1993), standardized precipitation evapotranspiration index (SPEI) (Beguería et al. 2014) considering evaporation and palmer drought severity index (PDSI) (Palmer 1965). To reduce

STUDY AREA AND DATA
The study area is the entire area under the jurisdiction of Yan'an City and Yulin City in the Loess Plateau region of northern Shaanxi Province (35°02 0 -39°35 0 N, 107°15 0 -110°15 0 E), with a total area of 79,981.9 km 2 . The average altitude in the area is about 1,200 m, the annual average precipitation is 400-500 mm, the annual average temperature is 7-12°C and the annual average evaporation is 1,200 mm. Figure 1 shows the general view of the study area.
The data used in this paper are obtained from the China Meteorological Data Network (available at http://data.cma.cn). The monthly rainfall of Hengshan, Suide and Yulin meteorological stations in Northern Shaanxi from 1955 to 2019 is selected as the data recognized and widely used in meteorological research. Table 1 summarizes the statistical characteristics of monthly rainfall at the three stations from 1955 to 2019.

. Standardized precipitation index
The SPI is widely used in meteorological drought assessment and prediction research because of its several advantages. (1) The SPI is recommended to the public by the World Meteorological Organization (Hayes et al. 2011); (2) the SPI only uses a long series of rainfall data, which makes drought assessment and prediction feasible in some areas where hydrological data such as evaporation are difficult to measure and (3) the SPI can evaluate and predict drought from multiple time scales (Pham et al. 2021). When calculating the SPI, first select an appropriate probability distribution function to fit a long series of observed rainfall data (the most commonly used is the gamma distribution, which has been proven to be a good description of rainfall data) (Thom 1958;Chen et al. 2020). Secondly, calculate the cumulative probability of the observed rainfall data. Finally, the inverse normal Gaussian function with a mean value of 0 and a variance of 1 is applied to the cumulative probability to obtain the SPI (Vidyarthi & Jain 2020). The probability density function defined by the gamma distribution is shown in the following equation: x aÀ1 e Àx=b , for x, a, b . 0 (1) where a and b represent the shape and the scale parameters, they can be obtained by maximum likelihood estimation, as shown in the following equations: The G(a) is the main function in the gamma distribution, as shown in the following equation: The cumulative probability G(x), then, can be obtained as follows: Since the observed precipitation time series may contain zeros, the cumulative probability will be changed to Equations (7) and (8) (Mishra & Desai 2006): q ¼ m n where m and n are the numbers of zero precipitation and sample size, respectively. The value of SPI can be calculated according to the following equations (Lloyd-Hughes & Saunders 2002): where x is the monthly rainfall, r 0 ¼2.515517, r 1 ¼0.802853, r 2 ¼0.010328, s 0 ¼1.432788, s 1 ¼0.189269 and s 2 ¼0.001308 (Kisi et al. 2019). The drought threshold classification of SPI is shown in Table 2.

Backpropagation artificial neural networks
The BP-ANN is a concept proposed by scientists Rumelhart et al. (1986) in the 1980s. It is a multi-layer feedforward neural network trained by an error backpropagation algorithm, which generally includes an input layer, hidden layer and output layer. The neurons in each layer of this neural network only accept the input of the neurons in the previous layer, and the subsequent layers have no signal feedback to the previous layer. Each layer performs a certain transformation on the input data and then uses the output result as the input of the next layer until the final output result (Zhao et al. 2010).
Training the BP-ANN model is mainly divided into three steps: (1) constructing the network model, including determining the input/output data form, network level and transfer function form; (2) cyclic calculation, forward propagation calculation error and backpropagation adjustment parameters and (3) return to the trained neural network model. This paper uses a three-layer BP-ANN model to predict time series, and the network structure is shown in Figure 2.
The applied rainfall data in this study is from 1955 to 2019; thus, the SPI-12 value of 780 months can be calculated to form the entire original SPI-12 time series, which can be recorded as the set X¼{X 1 , X 2 , …, X 799 , X 780 }. In Figure 2, X t (t is an integer and 1 t 780) represents the SPI-12 value of the tth month of the original SPI-12 time series, X tÀ1 represents the SPI-12 value 1 month before the tth month and X tÀn represents the SPI-12 value n months before the tth month. The value of n will be determined by the partial autocorrelation function (PACF) in Section 4.

Singular spectrum analysis
SSA is a time-series analysis method widely used in climatology, surveying, oceanography and other fields. Its advantage lies in extracting trends, periods and noise signals from noisy time series without any prior information and gathering the most predictable components into several time series (Baydaroglu et al. 2017). Therefore, it is particularly suitable for analyzing time-series data with periodic oscillations. The specific process of SSA mainly includes four steps (Yu et al. 2017;Mi et al. 2019). (1) Embedding. For a given one-dimensional time series X ¼ (x 1 , x 2 , Á Á Á , x N ), the appropriate window length L (2⩽L⩽N/2) is chosen to construct the LÂK trajectory matrix as follows: (2) Singular value decomposition (SVD). By calculating XX T and performing SVD, L eigenvalues can be obtained. The eigenvalues are arranged in a descending order as l 1 !l 2 ! Á Á Á !l L !0, and the corresponding feature vector is {U 1 , Á Á Á , U L }. Then, the SVD of the trajectory matrix X can be expressed as follows: The elementary matrix X i in the above formula represents the ith SVD component, and the SVD component's energy contribution to the original signal decreases as the order increases.
(3) Grouping. Dividing the subscript interval {1, 2, Á Á Á , d} of the elementary matrix X i into P unconnected subsets Calculating each composite matrix of I ¼ I 1 , I 2 , Á Á Á , I p , then Equation (16) can be obtained as follows: Among them, the selection process of {I 1 , I 2 , Á Á Á , I p } is the grouping.
(4) Refactor. By converting the matrix obtained by grouping in step (3) into a series of new reconstruction components of length N. Define Y ¼ X Ik , {y 1 , y 2 , Á Á Á , y N } is the sequence obtained by Y through diagonal averaging. Let The diagonal average formula is expressed as follows: Through diagonal averaging, the grouped matrix can be converted into its corresponding time-series data. The time-series data of each group show the hidden characteristics of the original sequence, such as long-term trend, periodicity and noise signal. The value of the original time series is equal to the sum of the characteristic decomposition series, as shown in the following equation: where the sequence f (k) n corresponding to each k in the formula is the diagonal mean value of the matrix X Ik .

SSA-BP-ANN prediction model
The SSA-BP-ANN prediction model is a new hybrid prediction model, and its specific prediction process is shown in Figure 3. The whole process can be roughly divided into four steps: (1) the division of training set and testing set for the calculated SPI-12 time series, where the training set accounts for 70%; (2) the SSA method is used to decompose the training set, the Uncorrected Proof parameter (window length, W L ) of the method is determined in the decomposition process and then the testing set is decomposed according to the same W L ; (3) the decomposed training set is fed into the BP-ANN model for model training and (4) the decomposed testing set is fed into the trained BP-ANN model to predict SPI-12 time series, and the evaluation indices are used to evaluate the performance of the model.

Model performance evaluation
In the stage of evaluating the performance of the model, one cannot rely on the results of a single index. A series of different evaluation indexes should be selected for comprehensive evaluation from multiple angles (Dawson et al. 2007). In this study, the evaluation indices include the mean absolute error (MAE), root-mean-square error (RMSE), coefficient of determination (R 2 ) and Nash-Sutcliffe efficiency coefficient (NSE), as shown in the following equations: where N is the number of samples,ŷ t is the predicted value, y i is the observed value and y t is the mean of the observed value (Fung et al. 2019).
In general, the performance indicators MAE (MAE!0) and RMSE (RMSE!0) are used to reflect the actual situation of the model prediction error. The smaller their values, the better the accuracy of the prediction model in describing the experimental data. The R 2 (0 R 2 1) and NSE (NSE 1) are used to measure the fitting accuracy of the prediction model. The larger their values, the better the performance of the model and the higher the fitting accuracy.
These indices are not only used to evaluate the performance of the SSA-BP-ANN model, but also used to compare the differences in model performance improvement of different data processing methods.

SPI-12 calculation
As the prediction object, the first step is to calculate the time series of SPI-12 based on the observed rainfall data. Figure 4 shows the results of SPI-12 time series for different meteorological stations. This processing follows: (1) applying the χ 2 and Kolgomorov-Smirnov (K-S) tests to detect whether the monthly rainfall series follows the gamma distribution; (2) using Matlab code to calculate the original SPI-12 time series (code available at https://www.csdn.net/) and (3) dividing the total data into two sets: about 70% as the training set (550) and about 30% as the testing set (230).

Singular spectrum decomposition of the training set and testing set
The object of singular spectral decomposition (Bonizzi et al. 2014) is the training set and testing set data. The purpose of this is to reduce the influence of noise, extract more predictable information and do data preprocessing for the input of subsequent models. Two parameters need to be set when using the SSA method, namely W L and the number of groups (m). If the value of  Uncorrected Proof the W L is small (e.g., W L 20), m is meaningless, because the grouping may hide some useful information (Zuo et al. 2020). In this work, W L is set to be 12 because the period of SPI-12 is 12 months. Therefore, the original SPI-12 time series will be decomposed and ordered into 12 subsequences. After determining the appropriate W L , the training set is first decomposed to verify the reliability of W L . If W L is reasonable, the same W L is used to decompose the testing set. This approach can effectively avoid bringing the future information of the testing set into the training process of the model.
For the subsequences after singular spectral decomposition, we would like to name the first subsequence as the trend component, the last subsequence as the noise component and all the intermediate subsequences as the periodic component. From Figures 5-7, the original SPI-12 time series of three stations are all decomposed into 1 trend component, 10 periodic components and 1 noise component. The trend component has the largest amplitude, the lowest frequency and the longest wavelength, which roughly reflects the changing trend of the original SPI-12 time series. The amplitude of the periodic components 1-10 gradually decreases, the frequency gradually increases and the wavelength gradually decreases, reflecting the fluctuation of the original SPI-12 time series. Each subsequence shows the characteristics of the original sequence.

BP-ANN model development
The BP-ANN model is developed in two steps. First, the forecasting strategy is determined for forecasting 1 month ahead SPI using the past SPI data, namely 'decomposition-forecast-reconstruction', and the process is shown in Figure 8. From

Uncorrected Proof
By drawing the PACF diagram of each component of SPI-12, find out the lag factor outside the confidence interval (blue line), which is the predictor. For example, for the trend component in Figure 9, the number of lags outside the confidence interval is 1, 2 and 3. Therefore, these three lags are selected as the input variables of the model, and the same number of lags is selected as the input variables of this component in the testing set. Based on the 'decomposition-forecast-reconstruction' strategy, the predictors of all components selected by the PACF are shown in Table 3. In Table 3, the SPI-12 time series of three stations are all decomposed into 12 components, and each component has the same length as the original time series. Therefore, each component can also be expressed in the form of a set X¼{X 1 , X 2 , …, X 799 , X 780 }. As described in Section 3, X t (t is an integer and 1 t 780) represents the tth value in the set X. Using the trend component of the Suide station in Table 3 as an example, the model structure shown in Figure 2 and the value of n determined in Figure 9 (n¼3) are used. Then the input layer of the BP-ANN model has three input variables, X tÀ1 , X tÀ2 and X tÀ3 , and the output layer is X t .
In the second step, using the training set data to train the BP-ANN model to get the best network structure, and bring in the testing set data to get the final prediction result. The BP-ANN model developed in this research is composed of three layers. At present, the input layer and output layer of each SPI-12 component have been determined. Next, the optimal number of neurons in the hidden layer is determined by using the trial and error method. To this end, the number of neurons in the hidden layer ranges from 1 to 15, and the most commonly used Levenberg-Marquardt (LM) algorithm is used to train the neural network (Hagan & Menhaj 1994).
In the development process of the model, the transfer function of the hidden layer is a tangent sigmoid function (tansig), and the transfer function of the output layer is a linear function (purelin). The model with the minimum performance standards MAE and RMSE is selected as the best network structure. The model parameters are set as follows: training times are 10,000, the learning rate is 0.01, the minimum error of training target is 0.01 and the maximum number of confirmation failures is 6. Therefore, the optimal network structure of all components of the three stations is shown in Table 4.   Table 5 as values and in Figure 12 as a Taylor diagram.   Table 5, the fitting accuracy of the SSA-BP-ANN model is significantly improved than that of the BP-ANN model, and the prediction errors are reduced to varying degrees. At the Suide station, the SSA-BP-ANN model produces R 2 as 0.86, NSE as 0.86, both 18% higher compared to the BP-ANN model, and the MAE and the RMSE are 33 and 39% lower. At the Hengshan station, the SSA-BP-ANN model produces R 2 as 0.88, NSE as 0.87, both 13 and 12% higher compared to the BP-ANN model, and the MAE and the RMSE are 28 and 23% lower. At the Yulin station, the SSA-BP-ANN model produces R 2 as 0.88, NSE as 0.86, both 14 and 18% higher compared to the BP-ANN model, and the MAE and the RMSE are 34 and 27% lower. Figure 12 displays the correlation coefficient, root mean square difference (RMSD) and standard deviation between the predicted and observed values in the testing set of three stations. According to Figure 12, the general performance of SSA-BP-ANN provides more accurate prediction results than the BP-ANN model. For example, the correlation coefficients of the SSA-BP-ANN model are all higher than the BP-ANN model, which are closer to 0.95. On the other hand, the RMSD values from the SSA-BP-ANN model are all lower than the BP-ANN model, which are closer to 0. These results above indicate the proposed approach to data preprocessing could significantly improve the performance of the model, which agrees with the conclusions from other studies (Khan et al. 2018;Mehdizadeh et al. 2020;Hinge & Sharma 2021). Figure 13 shows the observed and predicted time series for the testing set at the three stations. In general, the fitting from the BP-ANN model is significantly worse than the SSA-BP-ANN model, especially when predicting the valley value of the    The SSA data processing method used in this paper is a new attempt in the prediction of drought by the ANN model. However, currently the most widely used data processing methods are WD and empirical mode decomposition (EMD), and we have also conducted experiments to compare these three methods. Their detailed principles can be obtained from Özger et al. (2020). Here, we directly present their performance by the evaluation indices applied in Table 6. From Tables 5 and 6, the fitting accuracy of the BP-ANN model displays significant improvement after data processing, and the R 2 and NSE of all hybrid models are above 0.85. Meanwhile, the three data processing methods show almost identical prediction performance in terms of various statistical indicators. This indicates that the SSA method has great potential in improving model performance.
Meanwhile, the Kruskal-Wallis (KW) test has been done on each forecast time series to evaluate the statistical significance of all models (Basakın et al. 2021), and the result is shown in Table 7.
From Table 7, the H 0 hypothesis was rejected for all three hybrid models in the KW test, indicating that there is no significant difference between the means of predicted and observed values of the model and that the model accuracy was high. On the other hand, there is a significant difference between the means of predicted and observed values of the BP-ANN model at the Hengshan and Yulin stations, and the model accuracy is low. Therefore, the three SSA, EMD and WD methods have improved the reliability and accuracy of model prediction results to a certain extent.
To compare these methods more intuitively, we also produced the Taylor diagram ( Figure 14) and a box plot (Figure 15) to present the evaluation results.
From Figure 14, three hybrid models provide high prediction accuracy, and the differences between the different data processing methods are neglectable. The correlation coefficients of the three hybrid models are all higher than the BP-ANN model and close to 0.95.
From Figure 15, the three hybrid models are overall closer to 0 in terms of the absolute value of prediction error (AVE) than the BP-ANN model, and the variation range is slightly lower than the BP-ANN model. It again confirms the notion that the data processing method can improve the prediction performance of the model. Among them, the average value of the AVE of  Uncorrected Proof the SSA method is the smallest over three stations, indicating that the SSA method is slightly better than the EMD and WD methods in SPI prediction.

CONCLUSION
In this study, a prediction model coupled with SSA and BP-ANN was established and applied to the prediction of SPI-12 1 month ahead at three stations in northern Shaanxi, China. The following conclusions can be drawn from the study: (1) The result validates that by preprocessing the singular spectrum decomposition of the input data of the BP-ANN model, the prediction performance of the model is greatly improved.  Uncorrected Proof (2) The SSA method can effectively extract the valuable information hidden in the SPI-12 time series, gather and reconstruct the most predictable components into several meaningful subsequences, thereby reducing the influence of noise.
(3) The SSA-BP-ANN model is a promising new approach to predict the SPI in arid regions, such as northern Shaanxi, China.
The current research focused on the singular spectrum decomposition preprocessing of the data, which can significantly improve the performance of the BP-ANN prediction model. In future research, other data preprocessing methods will be used to screen out the best hybrid model.