## Abstract

In this study, wavelet-support vector machine (WSVM) is proposed for drought forecasting using the Standardized Precipitation Index (SPI). In this way, the SPI time series of Urmia Lake watershed is decomposed to multiple frequency time series by wavelet transform. Then, these time sub-series are applied as input data to the support vector machine (SVM) model to forecast drought. Also, a cuckoo search (CS)-based approach is proposed for parameter optimization of SVM, finding the best initial constant parameters of the SVM algorithm. The obtained results indicate that the radial basis function (RBF)-kernel function of the SVM algorithm has high efficiency in the SPI modeling, resulting in a determination coefficient (DC) of 0.865 in verification step. In the WSVM model, the Coif1, which is considered as a mother wavelet function with decomposition level of five, shows a better performance with DC of 0.954 in verification step, revealing that the proposed hybrid WSVM model outperforms the single SVM model in forecasting SPI time series. Also, DC of cuckoo search-support vector machine (CS-SVM) is calculated to be 0.912 in verification step, indicating the fact that the proposed CS-SVM model shows better efficiency than single SVM model.

## INTRODUCTION

Unlike other natural menaces, droughts have a slow evolution time such that the outcomes of droughts take a considerable amount of time to come into effect. As a result, the ability to forecast and model the characteristics of drought, especially their initiation, frequency, and severity is important in order to manage water resources for agricultural and industrial uses. The conventional method to monitor drought conditions is a drought index. Some drought indices, such as the Palmer Drought Severity Index (PDSI) and the Standardized Precipitation Index (SPI), are more commonly used than others. A major advantage of the SPI is that it makes the description of droughts on multiple time scales possible (Cacciamani *et al.* 2007). One of the differences between the PDSI and SPI is that the PDSI index has a complex structure with a very long memory whereas the SPI is an easily interpreted and simple-moving average process (Tsakiris & Vangelis 2004). Furthermore, unlike the PDSI, the characteristics of SPI are constant site to site and the calculations of which only take precipitation data (Belayneh & Adamowski 2013; Blagojevic *et al.* 2016). As a result, SPI is used as an appropriate drought index in this study.

The SPI needs machine learning tools to be forecasted. Today, artificial intelligence (AI) models such as the artificial neural network (ANN) and adaptive network-based fuzzy inference system (ANFIS) have been used in several studies to forecast the hydrological, geological, and meteorological characteristics (Jalalkamali *et al.* 2011; Zanganeh *et al.* 2016; Dariane & Azimi 2017).

Another machine learning tools suggested for time series modeling is the support vector machine (SVM). The SVM learning or data-driven model has increasingly become popular in the hydrologic forecasting, mainly due to effectiveness in dealing with non-linear characteristics of hydrologic data. Research has shown that the SVM approach is trained faster than ANN and ANFIS models (Lin *et al.* 2009; Noori *et al.* 2015). Moreover, the results obtained from the SVM approach are more accurate than those of ANN approach (Lin *et al.* 2009). In recent years, the SVM model has been tested for hydrological and climatological applications due to its superior potential versus ANN model (Wang *et al.* 2008; Behzad *et al.* 2009; Huang *et al.* 2017). Nikbakht Shahbazi *et al.* (2011) used SVM model to forecast SPI in four reservoir basins supplying the water demand of Tehran, Iran. They concluded that the SVM model has enough accuracy to be used in long-term water resources planning and management compared to ANN. Zahraei & Nasseri (2014) also used SVM to develop some models for forecasting seasonal SPI. They concluded that the SPI values can be forecasted by the proposed model with two to five months of lead-time with high accuracy.

Hybrid approaches have been extensively used in research to improve modeling accuracy (Vojinovic *et al.* 2003; Sannasiraj *et al.* 2004; Yu *et al.* 2004; Sun *et al.* 2010). For example, in spite of the suitable flexibility of SVM models for modeling hydrologic time series, there is a shortage arising when signal fluctuations are highly non-stationary and a physical-hydrologic process operates under a large range of scales varying from one day to several decades. In such a situation, the SVM model may not be able to cope with non-stationary data if pre-processing of the input and/or output data is not performed (Cannas *et al.* 2006). Therefore, data pre-processing can be performed by the decomposition of time series into its subcomponents while conducting a wavelet transform (WT) analysis. The WT provides useful decomposition of time series. Thus, it may improve the ability of a forecasting model by capturing useful information on various resolution levels. Hence, a hybrid wavelet-support vector machine (WSVM) model which uses multi-scale signals as input data rather than a single pattern input may present more accurate forecasting.

Furthermore, cuckoo search (CS) has recently been introduced as an optimization algorithm for parameter estimation in some applications, such as forecasting the output energy values of wind parks in Texas and Montana (Barbosa & Vasconcelos 2016) and applying a cuckoo search-support vector machine (CS-SVM) model for predicting dynamic measurement errors of sensors (Jiang *et al.* 2016). However, no attention has been paid to forecasting hydrological time series using the CS algorithm. As a result, CS may help the SVM model to forecast hydrological time series (e.g., SPI time series) efficiently by optimizing and setting appropriate constant parameters of the SVM algorithm.

In this paper, the WSVM model was proposed for drought modeling based on SPI obtained by a corresponding precipitation value from the same month in the previous year for Urmia Lake watershed in Iran. In this model, SPI data are decomposed into sub-signals with various resolution levels and periodicity scales by wavelet. Then, these sub-signals are inserted into the SVM model to reconstruct a multi-scale model for forecasting SPI time series. In addition, another aim of this study is to evaluate and validate the efficiency of CS-SVM model for modeling and forecasting SPI time series.

The other sections of this paper are organized as follows. Below, the SPI methods, WT, SVM, CS models, and the case study are briefly described. Then, the structures of the proposed hybrid model WSVM and CS-SVM are presented. In the section after, the classical SVM, WSVM, and CS-SVM model performances are evaluated and discussed with different structures. Concluding remarks are in the final section of the paper.

## METHODS

### Standardized Precipitation Index

The SPI was developed by McKee *et al.* (1993). It is exclusively based on precipitation that makes its calculation comparatively straightforward. Standardization of a drought index confirms the independence from the geographical position, affording the desired index to be calculated with respect to the average precipitation in the same place (Cacciamani *et al.* 2007).

SPI values are positive or negative for greater or less values than mean precipitation, respectively. Positive SPI values for wet conditions are greater than mean precipitation and negative SPI values for dry conditions are lower than mean precipitation. Variance from the mean is a probability indication of the severity of humidity or drought that can be employed for the risk assessment. Table 1 indicates SPI drought classes.

SPI values | Class |
---|---|

>2 | Extremely wet |

1.5–1.99 | Very wet |

1.0–1.49 | Moderately wet |

−0.99–0.99 | Near normal |

−1–1.49 | Moderately dry |

−1.5–1.99 | Very dry |

<−2 | Extremely dry |

SPI values | Class |
---|---|

>2 | Extremely wet |

1.5–1.99 | Very wet |

1.0–1.49 | Moderately wet |

−0.99–0.99 | Near normal |

−1–1.49 | Moderately dry |

−1.5–1.99 | Very dry |

<−2 | Extremely dry |

For more details, please see Abramowitz & Stegun (1965), Cacciamani *et al.* (2007), and Belayneh & Adamowski (2012).

### Wavelet transform

*, x*(

*t*), is defined as Equation (1) (Mallat 1998): where * refers to the complex conjugate and

*g*is called wavelet function or mother wavelet. The parameter

*a*is introduced as the scale parameter indicating the range and duration of the desired time series. Meanwhile, the

*b*parameter is transmission parameter determining wavelet position on the time axis. As a special case of this study, the WT concept is proposed in order to decompose initial raw signal of the SPI time series into several sub-series using mother wavelet functions extended by

*a*and

*b*parameters. This research deals with some mother wavelet functions such as Haar, Db2, Sym3, Coif1, Mexican hat, and Morlet, which are illustrated in Figure 1.

### Support vector machine

SVM is a kernel structure which is generally applied for solving classification and regression problems. The design of the SVM model was developed by Vapnik & Cortes (1995). It was formulated based on the structure risk minimization (SRM) principle. It has been proved that the application of SRM in SVM leads to better efficiency than the application of traditional empirical risk minimization (ERM) principle used in traditional models. SRM minimizes an upper bound on the expected risk whereas ERM minimizes the error for the training data (Haykin 2003). SVM can be used for regression and classification.

The SVM model detects the optimal solution of the following primary problem (Zahraei & Nasseri 2014).

where *L* is the number of data points in the training data set*, C* is the model parameter, *x _{i}* is the

*i*feature space data point,

_{th}*w*is one of the optimization problem's decision variables, and

*ξ*is the model residual defined by (

*ξ*=

_{i}*y*

_{i}*−*

*f(x*).

_{i})*ξ*and are positive slack variables and

_{i}*C*is a positive real value and predetermined constant. The constant

*C*specifies the amount to which deviations from

*ɛ*are tolerated. Deviations above

*ɛ*are defined by ξ

_{i}whereas deviations below

*ɛ*are defined by

*. C*, which is always positive, is the penalty parameter on the training error. As a result, the value of

*C*and

*ɛ*must be properly selected by the user.

In this research, the SVM method is employed in order to carry out regression and forecast the monthly time series of SPI. The transformation of initial data from the input space to a new space (feature space) is another quality of SVM, leading to better recognition of the data structure and making linear-regression possible (Figure 2). For this purpose, a non-linear transformation function, called kernel function, is indicated to map the input space to a higher dimension feature space.

### Cuckoo search

CS algorithm is inspired by the unique behavior of the cuckoo species and Levy flight. Yang & Deb (2009) proposed the CS algorithm. Cuckoos lay their eggs in other birds’ nests when the host birds leave the nest unguarded. In the process, some of these eggs, which are similar to the host bird's eggs, hatch and grow into adult cuckoos. If the host birds detect that the eggs are not their own, they will expel the alien eggs or leave their nest and find another place to build a new nest. Each egg in a nest suggests a solution, and a cuckoo egg represents a new solution. The purpose of the CS algorithm is to apply the new and potentially better solutions (cuckoos) to replace the not-so-good solutions in the nests (Jiang *et al.* 2016). The CS algorithm has the following three rules (Yang & Deb 2013):

Each cuckoo only lays one egg (one solution) at a time, and it puts the eggs in a haphazard determined nest.

In these nests, the best nest, with high quality eggs (solutions), will carry over to the next generation.

The total number of available host nests is fixed. A host bird can detect an alien egg with probability. In this case, the host bird may either expel the egg or leave and establish a new nest in a new location.

*ith*nest at iteration

*t*. The product ⊕ means entry-wise multiplication, and

*α*is the step size, which is subject to a normal distribution.

*L*is the Levy random search path.

### Model development

#### Wavelet-support vector machine model

The hybrid WSVM idea is proposed when the SVM model uses the pre-processed data generated with a WT. As shown in Figure 3, the WSVM and SVM model have the same process for the data modeling, but the WSVM model uses the decomposed input data (that are generated with a WT) instead of raw ones.

MATLAB programming was used to obtain the modeling results. In the proposed approach, the SPI signals were first decomposed into sub-signals with different scales, such as one large scale sub-signal (SPI_{a}) and several small scale sub-signals (SPI_{di}) in order to catch the temporal characteristics of input time series (see Figure 3). Then, the SVM model was ready for data modeling in a space that was created by the kernel function (K(*x,x _{i}*)) to forecast the one time ahead of the SPI (SPI

_{(t+1)}).

#### Cuckoo search-support vector machine model

The general procedure of CS-SVM is illustrated in the flowchart in Figure 4. The CS algorithm applied to optimize the SVM parameters *C* and *ε* is organized as follows (Jiang *et al.* 2016):

Initialize the CS algorithm and set the number of nests,

*N*, the probability parameters,*p*, the maximum iterations,_{a}*t*, and the ranges of_{max}*C*and*ɛ*.Evaluate the fitness value of each nest, discover the current best solution, and record the minimum fitness value and its corresponding position.

Keep the best solutions from the previous generation, and update the position of the other nests using Equation (3). Then, evaluate the fitness value of the new position.

Replace the best solution of the previous generation if the fitness value of the new generation is better than that of the previous generation, and record the position of the best nest.

Set a random number (

*random*) as the probability of egg detection. Compare it with*p*. If_{a}*random**>**p*, change the position of the nest randomly to obtain a new set of positions._{a}Find the best nest position in Step 6. Stop searching when the maximum iteration limit is reached, and output the best position to achieve the optimal parameter value; otherwise, return to Step 3.

### Case study

Urmia Lake is a lake in northwestern Iran and is reportedly the largest lake in the Middle East (between 45°03′00″ and 37°40′00″ east longitude and north latitude, respectively). It covers an area varying from 5,200 to 6,000 km^{2}. The lake is about 140 km long and 40–55 km wide with a maximum depth of 16 m (Figure 5). The lake's water levels are below the critical level and the groundwater levels in some parts of the basin have decreased by 16 meters. The annual mean temperature of the basin is 11 °C and 2.5 °C around the lake and in the mountainous areas, respectively. In this study, the average monthly precipitation from eight precipitation stations (illustrated in Figure 5) were prepared during 1969–2009 for developing the Urmia Lake simulation model. Thus, the SPI time series obtained from precipitation data in the 12-month period have been computed (Figure 6).

### Efficiency criteria

*DC*) and root mean square error (

*RMSE*) have been used to compare the efficiency of different models: where

*DC*stands for DC,

*RMSE*indicates root mean squared error,

*N*denotes the number of observations,

*SPI*is observed data,

_{obsi}*SPI*is forecasted values, and the mean of observed data are given by . Moreover, Equation (7) can be used to compare the ability of different models in capturing the peak values in SPI time series similar to Equation (5) for the total data. where

_{comi}*DC*is the DC for the peak values.

_{peak}*N*is the number of peak values and

*SPI*,

_{poi}*SPI*, and .are the observed data, computed values, and mean of observed data for peak values, respectively. The

_{pci}*RMSE*was used to measure the forecast accuracy which produces a positive value by squaring the errors. The

*RMSE*increases from zero for perfect forecasts through large positive values as the discrepancies between forecasts and observations become increasingly large. Obviously, a high value for

*DC*(up to one) and a small value for

*RMSE*indicate high performance of the model.

## RESULTS AND DISCUSSION

The SVM model consists of some structure parameters like the kernel function with various types. Hence, in the first step, the purpose is to select the best kernel function. In the next step, the WT is joined with the SVM model for choosing the best wavelet function and decomposition level.

### Results of single SVM

The SVM model uses a device called kernel mapping to map the data in input space to a high-dimensional feature space in which the problem becomes linearly separable. In the SVM training, the decision boundaries are directly determined by the training data. This learning strategy is based on statistical learning theory and minimizes the classification errors of the training data and the unknown data (Abe 2010). There are different kinds of kernel function. In the first step, the purpose is to achieve the best function as a high efficiency of data modeling compared with the other kernel functions. Hence, the performance of four kinds of kernel functions to forecast SPI was evaluated in terms of DC and RMSE criteria. The results are illustrated for different kernel functions in Table 2. Because SPI data are the normalized dimensionless ratios, RMSE values are dimensionless numbers.

Kernel function | Calibration | Verification | ||
---|---|---|---|---|

DC | RMSE | DC | RMSE | |

RBF-kernel | 0.883 | 0.154 | 0.865 | 0.237 |

Poly-kernel | 0.765 | 0.302 | 0.761 | 0.341 |

MLP-kernel | 0.791 | 0.210 | 0.771 | 0.302 |

Lin-kernel | 0.784 | 0.295 | 0.698 | 0.386 |

Kernel function | Calibration | Verification | ||
---|---|---|---|---|

DC | RMSE | DC | RMSE | |

RBF-kernel | 0.883 | 0.154 | 0.865 | 0.237 |

Poly-kernel | 0.765 | 0.302 | 0.761 | 0.341 |

MLP-kernel | 0.791 | 0.210 | 0.771 | 0.302 |

Lin-kernel | 0.784 | 0.295 | 0.698 | 0.386 |

As is clear in Table 2, comparing DC and RMSE values in the verification (simulation) phase reveals the radial basis function (RBF)-kernel function to have a better performance than the other kernel functions. Therefore, the RBF-kernel function was used for the next modeling process. RBF-kernel function is utilized in this paper. This kernel function has suitable characteristics, as follows (Xu *et al.* 2012):

- a.
In contrast with linear kernel, RBF kernel can hold the case when the communication between class labels and attributes is nonlinear.

- b.
RBF-kernel function has a smaller amount of hyper parameters which control the difficulty of model selection. Also, it has less numerical complexities.

Now, the details of the best kernel function, namely, RBF kernel are explained. In machine learning, the (Gaussian) radial basis function kernel, or RBF kernel, is a popular kernel function used in SVM classification which ranges between zero and one. In a more accurate explanation, so, the value of γ in the first equation is . *x* and *x _{i}* denote two sample data (Wang 2005). On the other hand,

*x*and

*x*represent feature vectors in some input space. σ is a free parameter and single value (and subsequently γ) which then needs to be tuned on a validation or tuning data set (Vert

_{i}*et al.*2004).

*SPI*

_{(t−1)},

*SPI*

_{(t−2)},

*SPI*

_{(t−3)}…) of Urmia Lake were used to forecast the SPI

_{(t+1)}(i.e., the SPI of the next month). The goal of time series prediction or forecasting can be formulated as follows (Babovic & Keijzer 2000):

It is supposed that the series is some sampling of a continuous system which may be either stochastic, chaotic, or deterministic. Thus, the following input combinations were defined as Combs. (1)–(6) as depicted below:

Comb. (1):

*SPI*_{(t)}Comb. (2):

*SPI*_{(t)},*SPI*_{(t−1)}Comb. (3):

*SPI*_{(t)},*SPI*_{(t−1)},*SPI*_{(t−2)}Comb. (4):

*SPI*_{(t)},*SPI*_{(t−1)},*SPI*_{(t−2)},*SPI*_{(t−3)}Comb. (5):

*SPI*_{(t)},*SPI*_{(t−1)},*SPI*_{(t−2)},*SPI*_{(t−3)},*SPI*_{(t−4)}Comb. (6):

*SPI*_{(t)},*SPI*_{(t−1)},*SPI*_{(t−2)},*SPI*_{(t−3)},*SPI*_{(t−4),}*SPI*_{(t−5)}

Like the previous step, to find the best result, each Comb. was modeled and its performance was assessed with DC and RMSE values (Table 3).

Combs | Calibration | Verification | ||
---|---|---|---|---|

DC | RMSE | DC | RMSE | |

Comb. (1) | 0.781 | 0.323 | 0.763 | 0.341 |

Comb. (2) | 0.883 | 0.190 | 0.865 | 0.237 |

Comb. (3) | 0.820 | 0.201 | 0.856 | 0.241 |

Comb. (4) | 0.786 | 0.305 | 0.761 | 0.345 |

Comb. (5) | 0.887 | 0.156 | 0.670 | 0.447 |

Comb. (6) | 0.888 | 0.144 | 0.641 | 0.455 |

Combs | Calibration | Verification | ||
---|---|---|---|---|

DC | RMSE | DC | RMSE | |

Comb. (1) | 0.781 | 0.323 | 0.763 | 0.341 |

Comb. (2) | 0.883 | 0.190 | 0.865 | 0.237 |

Comb. (3) | 0.820 | 0.201 | 0.856 | 0.241 |

Comb. (4) | 0.786 | 0.305 | 0.761 | 0.345 |

Comb. (5) | 0.887 | 0.156 | 0.670 | 0.447 |

Comb. (6) | 0.888 | 0.144 | 0.641 | 0.455 |

As Table 3 suggests, Comb. (2) shows the best efficiency as compared to the other Combs. Also, it can be concluded from Table 3 that high degrees of data combination, i.e., Comb. (5) and Comb. (6), resulted in over-fitting (over-training) in the calibration step such that the accuracy of results was reduced in the verification step. This item is in accordance with the results of other research when the autoregressive rule is used (Nourani *et al.* 2011a, 2011b, 2012). Simulated and actual SPI time series of Urmia watershed which were obtained by RBF-kernel function with Comb. (2) of input data are shown in Figure 7.

### Results of WSVM hybrid model

In this section, the effects of WT on data modeling are determined. For this purpose, the pre-processing of SPI data by a wavelet was inserted in the SVM model and the performance of the model was determined by DC and RMSE criteria. Hence, time series were decomposed to one to seven levels by four different kinds of wavelet transforms, i.e., 1-Sym3 wavelet with three sharp peaks, 2-Daubechies wavelet (Db2), a most popular wavelet, 3-Haar wavelet, a simple wavelet, and 4-Coif1 wavelet. Hence, the SPI data were decomposed in seven sub-signals by the WT. Then, the decomposed SPI data play the roles of SVM inputs and the performance of each case was determined by the mentioned criteria (Table 4).

Mother wavelet type | Decomposition level | Calibration | Verification | ||
---|---|---|---|---|---|

DC | RMSE | DC | RMSE | ||

Sym3 | 1 | 0.926 | 0.058 | 0.890 | 0.072 |

Sym3 | 2 | 0.926 | 0.058 | 0.890 | 0.072 |

Sym3 | 3 | 0.926 | 0.058 | 0.892 | 0.071 |

Sym3 | 4 | 0.926 | 0.058 | 0.892 | 0.071 |

Sym3 | 5 | 0.924 | 0.059 | 0.890 | 0.072 |

Sym3 | 6 | 0.932 | 0.057 | 0.377 | 0.172 |

Sym3 | 7 | 0.933 | 0.057 | 0.375 | 0.174 |

Db2 | 1 | 0.936 | 0.053 | 0.890 | 0.072 |

Db2 | 2 | 0.940 | 0.052 | 0.909 | 0.065 |

Db2 | 3 | 0.941 | 0.052 | 0.907 | 0.066 |

Db2 | 4 | 0.942 | 0.051 | 0.909 | 0.065 |

Db2 | 5 | 0.942 | 0.051 | 0.910 | 0.065 |

Db2 | 6 | 0.953 | 0.051 | 0.020 | 0.216 |

Db2 | 7 | 0.955 | 0.051 | 0.019 | 0.218 |

Haar | 1 | 0.926 | 0.058 | 0.890 | 0.072 |

Haar | 2 | 0.926 | 0.058 | 0.890 | 0.072 |

Haar | 3 | 0.926 | 0.058 | 0.892 | 0.071 |

Haar | 4 | 0.926 | 0.058 | 0.892 | 0.071 |

Haar | 5 | 0.924 | 0.059 | 0.890 | 0.072 |

Haar | 6 | 0.936 | 0.053 | 0.377 | 0.172 |

Haar | 7 | 0.938 | 0.052 | 0.375 | 0.173 |

Coif1 | 1 | 0.928 | 0.057 | 0.929 | 0.058 |

Coif1 | 2 | 0.931 | 0.056 | 0.930 | 0.057 |

Coif1 | 3 | 0.932 | 0.055 | 0.930 | 0.057 |

Coif1 | 4 | 0.932 | 0.055 | 0.931 | 0.057 |

Coif1 | 5 | 0.943 | 0.055 | 0.954 | 0.056 |

Coif1 | 6 | 0.983 | 0.043 | 0.780 | 0.102 |

Coif1 | 7 | 0.984 | 0.042 | 0.771 | 0.101 |

Mother wavelet type | Decomposition level | Calibration | Verification | ||
---|---|---|---|---|---|

DC | RMSE | DC | RMSE | ||

Sym3 | 1 | 0.926 | 0.058 | 0.890 | 0.072 |

Sym3 | 2 | 0.926 | 0.058 | 0.890 | 0.072 |

Sym3 | 3 | 0.926 | 0.058 | 0.892 | 0.071 |

Sym3 | 4 | 0.926 | 0.058 | 0.892 | 0.071 |

Sym3 | 5 | 0.924 | 0.059 | 0.890 | 0.072 |

Sym3 | 6 | 0.932 | 0.057 | 0.377 | 0.172 |

Sym3 | 7 | 0.933 | 0.057 | 0.375 | 0.174 |

Db2 | 1 | 0.936 | 0.053 | 0.890 | 0.072 |

Db2 | 2 | 0.940 | 0.052 | 0.909 | 0.065 |

Db2 | 3 | 0.941 | 0.052 | 0.907 | 0.066 |

Db2 | 4 | 0.942 | 0.051 | 0.909 | 0.065 |

Db2 | 5 | 0.942 | 0.051 | 0.910 | 0.065 |

Db2 | 6 | 0.953 | 0.051 | 0.020 | 0.216 |

Db2 | 7 | 0.955 | 0.051 | 0.019 | 0.218 |

Haar | 1 | 0.926 | 0.058 | 0.890 | 0.072 |

Haar | 2 | 0.926 | 0.058 | 0.890 | 0.072 |

Haar | 3 | 0.926 | 0.058 | 0.892 | 0.071 |

Haar | 4 | 0.926 | 0.058 | 0.892 | 0.071 |

Haar | 5 | 0.924 | 0.059 | 0.890 | 0.072 |

Haar | 6 | 0.936 | 0.053 | 0.377 | 0.172 |

Haar | 7 | 0.938 | 0.052 | 0.375 | 0.173 |

Coif1 | 1 | 0.928 | 0.057 | 0.929 | 0.058 |

Coif1 | 2 | 0.931 | 0.056 | 0.930 | 0.057 |

Coif1 | 3 | 0.932 | 0.055 | 0.930 | 0.057 |

Coif1 | 4 | 0.932 | 0.055 | 0.931 | 0.057 |

Coif1 | 5 | 0.943 | 0.055 | 0.954 | 0.056 |

Coif1 | 6 | 0.983 | 0.043 | 0.780 | 0.102 |

Coif1 | 7 | 0.984 | 0.042 | 0.771 | 0.101 |

Based on Table 4, the Coif1 wavelet function and decomposition level 5 produced better results. According to the structure of Coif1 wavelet (Figure 1(d)) which is similar to the SPI signal, it could capture the signal features, especially peak values, and yield comparatively high efficiency. The reason for the superiority of decomposition level 5 may be hidden in the fact that decomposition level 5 includes a 2^{5}-month mode which is approximately nearby annual mode. This periodicity is very important in a hydrologic time series, referring to the fact that the drought growth depends on the annual changes in Urmia Lake watershed. According to Table 4, high degrees of data decomposition (6 and 7) resulted in over-fitting in the calibration step so that the accuracy of results was reduced in the verification step as in the previous section. The calibration and verification time series of WSVM modeling with the best results, having high efficiency on data modeling, are shown by Figures 8 and 9.

By comparing the SVM and WSVM modeling results, it is concluded that using multi-resolution SPI as the input in the SVM algorithm leads to achieving better results as compared to the results obtained when using raw SPI data. In other words, when the WT converts SPI data into multi-resolution data, the SVM algorithm can perform data regression with minimum error. This process was done by the kernel function (see Figure 10). In this regard, Figure 10 schematically shows the regression on raw and decomposed SPI data by the SVM algorithm. As is clear, the regression accuracy of multi-resolution data may be better than raw ones.

The most important factor in drought management is the determination of extreme values in SPI time series which indicate the further potentials of dryness and drought. Since the feasible estimation of the peak values is usually the most important factor in any water resource management, a key point when comparing different models is the capability of the models in estimating peak values. For this purpose, peak values were sampled by considering the threshold of the top 5% of the data from the original SPI time series contractually. The results show that for the SVM modeling, the *DC _{peak}* value was 0.62 while for WSVM modeling, it was given as 0.93. According to the

*DC*values, it can be concluded that the seasonal model (i.e., WSVM) is more efficient than the autoregressive model which means SVM obtains the extreme values. It is obvious that the extreme values in SPI time series which occur in a periodic pattern can be correctly calculated by the seasonal model.

_{peak}### Results of CS-SVM model

SVM hyper-parameter *C* and *ε* (Equation (2)) are two main free parameters of the SVM algorithm that should appropriately be set by a user. To accomplish this aim, SPI time series is modeled by CS-SVM algorithm in MATLAB programming. Comb. (2) and RBF-kernel function is selected based on previous sections. In this way, the parameter optimizing and SPI modeling were carried out for 15 iterations. As illustrated in Figure 11, the value of the RMSE as fitness function is plotted against the number of iterations. The fitness function values converge to the desired value (minimum value) so that the SVM parameters are finally optimized to be selected as follows: *C**=* 901; *ɛ**=* 0.95. It is notable that over-fitting problem is likely to occur in higher iteration values than 15.

The results of verifying the CS-SVM model are obtained as follows. DC and RMSE criteria are calculated to be 0.899 and 0.093, respectively, in the calibration step and 0.912 and 0.065, respectively, in the verification step. While, for the single SVM model, DC and RMSE are about 0.865 and 0.237, respectively, in the verification step. The finding provides evidence that CS-SVM shows more efficiency than the SVM model due to using the CS parameter optimizer in the SVM algorithm. Figure 12 illustrates the simulated and actual SPI time series of Urmia Lake watershed modeled by CS-SVM.

In the final part, the single SVM, hybrid WSVM, and CS-SVM models are compared with each other, as shown in Table 5.

Model | Model type | DC calibration | DC verification |
---|---|---|---|

SVM | Single | 0.883 | 0.865 |

WSVM | Hybrid | 0.943 | 0.954 |

CS-SVM | Parameters optimized | 0.899 | 0.912 |

Model | Model type | DC calibration | DC verification |
---|---|---|---|

SVM | Single | 0.883 | 0.865 |

WSVM | Hybrid | 0.943 | 0.954 |

CS-SVM | Parameters optimized | 0.899 | 0.912 |

*Note:* In this table the best result for each model is presented.

As a result, the hybrid model showed a better performance than the single one. In addition, the CS-SVM model is more efficient than SVM model according to appropriate setting of SVM parameters *C* and *ε*. The optimized parameters lead to fitting the model to general patterns in the time series. In other words, the trained model based on optimized parameters can recognize and reflect the fluctuations in time series data. Hence, the over-fitting problem is not likely to occur in calibration data. Since the length of data in the verification period is less than the calibration period, the accuracy improvement for the verification step is more than calibration step. However, this type of improvement cannot be a general rule for every time series modeling and it is dependent on the existing patterns in calibration and verification data sets.

## CONCLUSIONS

In this paper, models including single SVM, CS-SVM, and WSVM were employed for forecasting one of the drought indices called SPI at one lead time for Urmia Lake watershed. In the first step, the purpose was to select the best kernel function. Thus, four kinds of kernel functions were used and the modeling performance of each function was determined by DC and RMSE criteria. Results show that the RBF-kernel function is more accurate than the others. Hence, the RBF-kernel function was used in the next steps of the study. Afterwards, the following input combinations (Combs. (1)–(6)) were used as inputs of the SVM model to forecast SPI_{(t+1)}. Then, the performance of each case of data combinations was examined. The results showed that Comb. (2) has better compatibility as the SVM input. In the next section of this study, a WT was designed. To achieve this goal, the SPI data were decomposed to the sub-signals with different degrees of decomposition and WT functions. Then, these sub-signals were used as the inputs of the SVM algorithm, the results of DC and RMSE criteria were calculated and the best-performing wavelet transformation and decomposition approaches were delineated. Accordingly, Coif1 WT with five degrees of decomposition level presented good results. Also, the overall results revealed the WSVM's superiority in assessing the extreme data in SPI time series and in observing long-term SPI by considering the seasonality effects. Furthermore, the hybrid model was shown to be more appropriate because it used the multi-scale time series of SPI data in the SVM model. Meanwhile, the evolutionary CS algorithm was applied to optimize the *C* and *ε* parameters of the SVM model, resulting in an improvement of accuracy of the SVM model.

As a suggestion for further studies, it is recommended to use the presented methodology to forecast the SPI 2, 3 and more lead times, and also to model the SPI data of the watershed by adding other time series of hydrologic variables, such as temperature. In addition, a further study could assess the effect of another evolutionary algorithm, such as ant colony optimization algorithm, to optimize constant parameters of the SVM model.

Methods and Tools for Drought Analysis and Management(G. Rossi, T. Vega & B. Bonaccorso, eds). Water Science and Technology Library book series, Vol. 62, pp. 29–48