## Abstract

The present study aimed to develop a hybrid model to predict the rainfall time series of Urmia Lake watershed. For this purpose, a model based on discrete wavelet transform, ARIMAX and least squares support vector machine (LSSVM) (W-S-LSSVM) was developed. The proposed model was designed to handle linear, nonlinear and seasonality of rainfall time series. In the proposed model, time series were decomposed into sub-series (approximation (a) and details (d)). Next, the sub-series were predicted separately. In the proposed model, sub-series were fed into SARIMAX to be predicted. The residual of predicted sub-series (error) of the rainfall time series was then fed into LSSVM to predict the residual components. Then, all predicted values were aggregated to rebuild the predicted time series. In order to compare results, first a classic modeling was performed by LSSVM. Later, wavelet-based LSSVM was used to capture the peak values of rainfall. Results revealed that Daubechies 4 and decomposition level 4 (db(4,4)) led to the best outcome. Due to the performance of db(4,4), it was selected to be applied in the proposed model. Based on results, it was observed that the W-S-LSSVM's performance was improved in comparison with other models.

## INTRODUCTION

In meteorological sciences, the prediction of rainfall over watersheds is an important research topic that has attracted much interest (Chang *et al.* 2016; Fadhel *et al.* 2016; Wang *et al.* 2016). Proper management of water resources involves planning, development, and distribution of water resources, which are associated with prediction of rainfall or spatial examination of the hydrologic cycle. Among hydrological cycle variables, rainfall prediction has a substantial role in runoff modeling and hence, water resource management (Valipour 2015). An accurate quantitative rainfall prediction can identify the potential for heavy rainfall and possible associated flash flooding, as well as providing information for hydrologic interests (Hall *et al.* 1998). This is particularly true in areas prone to flash flooding, where the prediction accuracy is highly dependent on the rapid availability of the rainfall distribution in advance (Ferraris *et al.* 2002).

Precipitation forecasting is a very difficult procedure. Calculating the response by applying conventional approaches in modeling rainfall time series is far from a trivial task, since the hydrologic processes are complex and involve various inherently complex predictors such as geomorphologic and climatic factors, which are as yet latent. Many researchers have carried out studies for quantitative precipitation forecasting using diverse techniques, including numerical weather prediction models and remote sensing observations (Yates *et al.* 2000; Ganguly & Bras 2003; Sheng *et al.* 2006; Davolio *et al.* 2008; Diomede *et al.* 2008), statistical models (Chu & He 1995; Chan & Shi 1999; DelSole & Shukla 2002; Munot & Kumar 2007; Li & Zeng 2008; Nayagam *et al.* 2008), non-parametric nearest neighbors method (Toth *et al.* 2000), and soft computing-based methods including artificial neural networks (ANN), support vector regression, and fuzzy logic (Venkatesan *et al.* 1997; Silverman & Dracup 2000; Toth *et al.* 2000; Pongracz *et al.* 2001; Brath *et al.* 2002; Guhathakurta 2008; Nasseri *et al.* 2008; Sedki *et al.* 2009; Talei *et al.* 2010). Chang *et al.* (2014) developed a neuro-fuzzy networks based technique called adaptive network-based fuzzy inference system (ANFIS) in order to forecast rainfall for a watershed. Bias correction was undertaken on the data used as input into the developed technique to enhance the accuracy of the data. The results of the study demonstrated that the ANFIS technique yielded reliable and stable predictions. Valipour *et al.* (2013) demonstrated how the accuracy of the ANN model improved when it was combined with an autoregressive model. Accurate and reliable predictions can also be achieved by decomposing the original data into components so that the direct use of the original data, which results in less accurate predictions, can be avoided. Recently, the discrete wavelet transform (DWT) type of wavelet analysis has been used to decompose the original data into bands (sub-series) and, as a result, prediction accuracy has been improved for short lead times. Partal & Cigizoglu (2009) predicted the daily precipitation (lead time of one day) from meteorological variables including temperature, humidity, and precipitation using the wavelet–neural network method. In their study, they combined DWT, which is used to decompose the original data into bands, and ANN, which is used as a modeling tool. The study showed that the wavelet–ANN model provided a good fit with the observed data and performed significantly better than those obtained by either the conventional ANN model or a multilinear regression model. Solgi *et al.* (2014) combined wavelet analysis with ANN to predict daily rainfall for a one-day lead time in Verayneh, Iran, and this combined approach was compared with an adaptive neuro-fuzzy system. The results showed that the combined wavelet–neural network model has a better performance (coefficient of determination of 0.9 for one-day lead time) than the adaptive neuro-fuzzy system.

It can be inferred from the literature that the accuracy of prediction can be enhanced when wavelet transform (WT) is used as a pre-processing approach by decomposing the original data into sub-series. Despite this achievement, the predictions are with very short lead times, so models could be developed to predict rainfall accurately for longer lead times.

In this paper, a new multivariate black box model based on AI techniques and DWT is proposed for the rainfall modeling of Urmia watersheds located in West Azerbaijan, Iran, which have different climatologic characteristics. The main idea of conjugating models in predictions is to employ each model's unique features to capture different patterns in the data. Both theoretical and empirical findings suggest that combining different methods can be an efficient way to improve forecasting accuracy (Zhang 2003). One of such hybrid models is the autoregressive integrated moving average (ARIMA)-ANN model, which was developed in recent years and is still being used in engineering and financial time series forecasting. This model was first proposed and evaluated by Zhang (2003) in order to forecast a univariate time series without considering seasonal effect. This technique has enjoyed a few applications in hydrological time series modeling, which usually show highly non-stationary seasonal behavior. In the proposed model, first, monthly rainfall time series of the watershed is decomposed into sub-signals in various resolution levels and periodicity scales considering the existence of seasonality in the time series. Next, a seasonal ARIMA model with exogenous inputs (i.e., rainfall), SARIMAX, was combined with the support vector machine (SVM) approach to construct the hybrid WT-SARIMAX-LSSVM (W-S-LSSVM) model.

## MATERIAL AND METHODS

### Study area and used data

Urmia Lake watershed is located in the northwest of Iran between longitude 44° 07′E to 47° 53′E and latitude 35° 40′N to 38° 30′N with a total area of about 52,679 km^{2} (Figure 1). Urmia Lake watershed is 51,876 km^{2} and is one the six largest watersheds in Iran. It is the largest lake in Iran and one of the world's salt-saturated lakes, and has a significant role in moderating the climate of a vast area containing East and West Azerbaijan and Kurdistan provinces. According to the 39-year period of daily precipitation data (1973–2011), the annual mean precipitation over the watershed is 352 mm. The river flow is the main source to supply domestic, industrial, and agricultural consumption as well as groundwater feeding along the river, which are increasing annually. The monthly rainfall data of 228 stations located inside and outside of Urmia Lake watershed were used to analyze and model the rainfall time series. The rainfall data used herein are on a monthly scale and cover the period from 1972 to 2014. For the goals of calibration and verification, time series were separated into calibration (70%) and verification (30%) purposes.

### Least squares support vector machine (LSSVM)

*et al.*2011). This results in the solution being transformed from one of solving a quadratic program to a set of linear equations known as the linear Karush–Kuhn–Tucker systems. The regression computation using LSSVM was later proposed in 2002. Multi-class classification can be considered as regression computation with multiple predefined threshold values. The LSSVM toolbox is also available at the web site <http://www.esat.kuleuven.be/sista/lssvmlab>. Suppose

*{(X*for

_{t},y_{t})}*t*=

*1*to

*N*is a given set of data where

*X*

_{t}*=*

*(x*, … ,

_{t1},x_{t2}*x*is the input vector with

_{tk})*k*multiple variables and

*y*is the corresponding price data at time

_{t}*t*which could be defined as: where denotes the dot product,

*W*is the weight vector,

*b*is the bias, and is the mapping function that transfers input vector

*X*into a much higher dimensional feature space (could be infinite). The corresponding optimization problem for LSSVM is formulated as: where

_{t}*e*is the error variable at time

_{t}*t*and

*γ*is a regulation constant. The Lagrange function can be obtained as: where

*α*is the Lagrange multipliers. The dual problem is then: An alternative formulation of (4) can be expressed as: where is called the kernel function and . The final LSSVM for nonlinear functions can be written as: where is called the kernel function. Gaussian radial basis kernel is the most effective one in nonlinear function estimation and expressed as:

_{t}### Basic concept of WT

*m*and

*n*are integers that control the wavelet dilation and translation, respectively;

*a*

_{0}is a specified fined dilation, step greater than 1; and

*b*

_{0}is the location parameter which must be greater than zero. The most common and simplest choice for parameters is

*a*

_{0}=

*2*and

*b*

_{0}=

*1*. This power-of-two logarithmic scaling of the dilation and translation is known as the dyadic grid arrangement. The dyadic wavelet can be written in more compact notation as (Mallat 1998): For a discrete time series,

*x*, the dyadic WT becomes (Mallat 1998): where

_{i}*T*is wavelet coefficient for the discrete wavelet of scale

_{m,n}*a*=

*2*and location

^{m}*b*=

*2*. Equation (10) considers a finite time series,

^{mn}*x*

_{i}, i*=*

*0,1,2,*…

*, N*; and

_{1}*N*is an integer power of

*2*:

*N*

*=*

*2M*. This gives the ranges of

*m*and

*n*as,

*0*

*<*

*n*

*<*

*2*and

^{M-m−1}*1*

*<*

*m*

*<*

*M*, respectively. The inverse discrete transform is given by Mallat (1998): or in a simple format as (Mallat 1998): where is called approximation sub-signal at level

*M*and

*W*are details of sub-signals at levels

_{m}(t)*m*=

*1, 2,*…

*, M*.

The wavelet coefficients, *W _{m}(t) (m*

*=*

*1, 2,*…

*, M)*, provide the detail signals, which can capture small features of interpretational value in the data; the residual term, , represents the background information of data.

### Selection of mother wavelets

One of the advantages of the AI-based wavelet conjunction model compared with the AI method is its ability to identify data components in a time series such as irregular components with multi-level wavelet decomposition. For this purpose, the rainfall signal was separated into different temporal scales (levels) by DWT. Selection of an appropriate wavelet function poses significant challenges and is governed largely by the problem at hand and some of the distinctive properties of the wavelet function, such as i) its region of support and ii) the number of vanishing moments (Maheswaran & Khosa 2012a).

#### Haar wavelet

These are symmetric and non-continuous wavelets, and the number of vanishing moments in this case is equal to one (1). Haar wavelets have been found appropriate for applications in signals that have sharp changes because of the relatively narrow span over which their energy is distributed.

#### Daubechies wavelet

These are a family of wavelets and include the Haar wavelet, written as db1, as a special case. These are compactly supported wavelets with extreme phase and highest number of vanishing moments for a given support width. The Daubechies wavelets have associated minimum-phase scaling filters, are both orthogonal and biorthogonal, and do not have an explicit analytic expression except for the db1 (or Haar) form.

#### Sym wavelet

Sym wavelet (or, in a compact form, symlet) is only near symmetric, and its other properties are similar to those of the general Daubechies wavelets, dbNs. Symlet has the highest number of vanishing moments for a given support width, and its associated scaling filters are near linear-phase filters. Order *N* can be *2, 3, y*. These wavelets can be both orthogonal and biorthogonal and provide compact support.

#### Coif wavelet

Coiflets are discrete wavelets that have scaling functions with vanishing moments. The wavelets are near symmetric, their wavelet functions have *N/3* vanishing moments and scaling functions *N/3−1*, and they have been used in many applications.

#### Meyer wavelet

The Meyer wavelet is an orthogonal wavelet that is indefinitely differentiable with infinite support.

### Proposed model

In the proposed model, the rainfall signal was first decomposed into sub-signals with different scales, i.e., a large scale sub-signal and several small scale sub-signals in order to obtain temporal characteristics of the input time series. For a given time series, the time series corresponding to *a(t)* (i.e., *Ia(t)*) is approximation sub-signal (large scale) of the original signal, and *i*th detailed sub-signal (small scale) is identified by I (i.e., *Idi(t)*), where *i* is the decomposition levels of the rainfall (*I(t)*) time series. In this study, some irregular mother wavelets such as Haar, db2 and db4 (Daubechies wavelet of order 2 and 4), sym2 and coif2 and Meyer were used. Mallat (1998) can be referred to for more information about these wavelets. Since rainfall time series are measured in point (such as rain gauges) and in a discrete form, in this study dyadic DWT was used. Also, ‘a trous’ is another alternative decomposition and reconstruction algorithm and can be considered for the modeling. It is a simple and fast algorithm, but it is a redundant algorithm, and its key is to determine an appropriate low-pass filter (usually a spline is selected); its determination is not usually easy for the non-expert.

The seasonal ARIMA with exogenous repressor (SARIMAX or structural SARIMA) process differs from the SARIMA process, ostensibly because it takes cognizance of an exogenous input, which consists of additional exogenous variables that could explain the behavior of the dependent variable. For construction of the wavelet-SARIMAX-LSSVM (W-S-LSSVM) model, the proposed methodology by Zhang (2003) in developing the ARIMA-ANN model is followed in this paper, but also considering two other components. First, because of the seasonality characteristics of the hydrological time series, and second, in order to employ the relevant variables (i.e., approximation and detail sub-series of precipitation) in the modeling, wavelet-SARIMAX-LSSVM is proposed to take advantage of SARIMAX for precipitation modeling instead of the ARIMA model, which considers just autoregressive property of the time series in the modeling, and also its formulation contains only one variable. According to the complex behavior of the hydrological time series (e.g., rainfall time series), they might include both linear and nonlinear components. Therefore, application of an individual linear model (such as ARIMA or SARIMA) for such signals may not be adequate. On the other hand, with all the advantages of the LSSVM, these models are not universal models suitable for all circumstances and may yield mixed results in modeling linear problems. Hence, it is not wise to apply LSSVM blindly to any type of data without any data pre/post-processing. For example, in a process which includes seasonal and non-stationary characteristics, temporal data pre-processing may improve the efficiency of the modeling (Zhang 2003; Cannas *et al.* 2006); but if the process shows trend in space, spatial data pre-processing such as spatial clustering should be employed prior to the main modeling (Nourani & Kalantari 2010). Therefore, since it is difficult to understand the characteristics of the data in a real problem completely, hybrid modeling can be a useful methodology for practical use, and by combining different models, different aspects of the underlying patterns may be captured.

The proposed hybrid W-S-LSSVM model consists of three steps. First, the rainfall time series are decomposed by DWT. In the next step, a SARIMAX model is initially fitted to model the linear components of DWT (approximations and details) of the rainfall time series so that the other input data are also used as external (regressor) variable in this exogenous modeling. The result of this stage is the estimation of actual value of rainfall time series for one-time ahead (i.e.,) via the used SARIMAX.

*e(t*

*+*1

*)*) may contain the nonlinear relationship, which relation can be detected by an LSSVM via the second step of the W-S-LSSVM modeling. (In the proposed model, the decomposed sub-series are predicted; as an instance,

*Ia(t*

*+*1

*)*is the approximation signal of rainfall as target data.) With

*N*input vectors in the LSSVM which include residual values of

_{N}*N*previous months and according to Equation (13), the LSSVM model for the residuals will be as follows: where

_{N}*f(t)*is a nonlinear function determined by the LSSVM and

*e(t*

*+*1

*)*is the random error of sub-series. Consequently, considering Equations (13) and (14), the forecasted value for the rainfall at time

*t*+ 1 will be computed by Equation (15). This procedure is illustrated in Figure 2.

### Efficiency criteria

In this study, the total data were separated into calibration and verification sets. Two different criteria were selected to evaluate the efficiency of the proposed forecasting methods; the root mean square error (*RMSE*) and the determination coefficient (*DC*). The *RMSE* and *DC* were applied to exhibit discrepancies between predicted and observed values (Adamowski & Chan 2011). *RMSE* is the standard deviation of the residuals (prediction errors). Residuals are a measure of how far from the regression line data points are. In other words, it describes how concentrated the data are around the line of best fit. The *RMSE* is utilized to measure forecast exactness, which gives a positive value by squaring the errors. The *RMSE* decreases from large positive values for poor performance to zero for perfect forecasts.

On the other hand, the *DC* was used to quantify the modeling accuracy by generating a positive value. The *DC* grows from zero for perfect forecasts through large positive values as the differences between forecasts and observations become increasingly large. It describes how many data points fall within the results of the line formed by the regression. The higher the coefficient, the higher the percentage of points the line passes through when the data points and line are plotted. Values of 1 or 0 would indicate that the regression line represents all or none of the data, respectively. A higher coefficient is an indicator of a better goodness of fit for the observations. The usefulness of *DC* is its ability to find the likelihood of future events falling within the predicted outcomes. The idea is that if more samples were added, the coefficient would show the probability of a new point falling on the line.

Obviously, a high value for *DC* (up to 1) and a small value for *RMSE* indicate high efficiency of the model. Legates & McCabe (1999) indicated that a hydrological model can be sufficiently evaluated by these statistics. These measures are not oversensitive to extreme values (outliers) and are sensitive to additive and proportional differences between model predictions and observations. Therefore, correlation-based measures (e.g., *DC* statistic) can indicate that a model is a good predictor (Legates & McCabe 1999).

*RMSE*values are transformed between 0 and 1 (Nourani

*et al.*2016). Descriptions of

*DC*,

*RMSE*, and

*Normalization*are presented as follows: where

*N*, , and are the number of observations, observed data, computed values, and mean of observed data, respectively.

*L*is the actual value and

_{i}*l*is the respective normalized value.

_{i}*L*and

_{min}*L*are the minimum and maximum of all used values, respectively.

_{max}## RESULTS AND DISCUSSION

*et al.*2000) weighted-average method, as follows: where

*A*is the area,

*w*and

*p*refer to the watershed and Thiessen polygon,

*I*is the variable (i.e., precipitation), and

*i*refers to each climatological station of the watersheds, so that is the Thiessen weighted-average variable. First, monthly rainfall averages for each of the climatological stations were calculated (not shown). The watersheds' mean values per variable were then calculated using the Thiessen method (Figure 3). The monthly rainfall data of 228 stations located inside and outside of Urmia Lake watershed were used to analyze and model the rainfall time series. Due to the existence of numerous rain gauges in the study area, and stability of polygons in the period of study, the aforementioned method seems to be suitable in comparison to other interpolation methods in terms of time, cost, and accuracy. In the Thiessen method, a constant area and weight are calculated for each rain gauge during the period of study and these weights are applied to estimate the mean monthly rainfall. According to Figure 3, different shapes of polygons were calculated. The polygons inside the lake are larger, since there are fewer rain gauges covering the area inside the lake. On the other hand, polygons for the zones with more rain gauges had smaller areas, which led to a more accurate outcome. Figure 4 shows the obtained time series via Thiessen method over the Urmia watershed. Also, statistical analysis of the obtained time series is shown in Figure 4.

At first, LSSVM without any pre-processing was used to predict the monthly rainfall time series of the Urmia Lake watershed. A correct use of LSSVM is to select an appropriate kernel function and tune the related hyper-parameters. A number of kernels were discussed by researchers, but studies suggested the effectiveness of radial basis function (RBF) kernel in the case of SVM in the majority of civil engineering applications (Gill *et al.* 2006; Goel & Pal 2009; Nourani *et al.* 2016). For a fair comparison of results, a RBF kernel where *σ* is a kernel specific and *γ* is margin parameter was used with LSSVM in this study. The tuned parameters (*σ*, *γ*) were optimized via a grid search procedure. The structure of LSSVM is demonstrated in Table 1 (for different applications). According to Table 1, minimum, maximum, and increment values of parameters were identified and then optimized through a trial-and-error process to find the best structure of LSSVM for all the models.

In order to perform classic modeling by LSSVM, some input combinations were determined according to the antecedents of rainfall dataset. Since rainfall time series might behave as a Markovian process, it is important to understand how the chronological sequence of months can influence rain prediction. In other words, the value of features at the current time step could be related to the conditions at the previous time steps. Therefore, the following shows input combinations (1–4), in which different numbers of input variables (*I*) were considered as input features to predict the rainfall (*I(t**+* 1*)* for all stations).

*Comb. (1): I(t)**Comb. (2): I(t), I(t*−1*)**Comb. (3): I(t), I(t*−1*), I(t*−2*)**Comb. (4): I(t), I(t*−1*), I(t*−2*), I(t*−12*)*

For all defined combinations, *t* demonstrates the time step (monthly step) and *I* represents rainfall features. The output consists of sole variable, i.e., *I* at future time steps (*I(t**+* 1)). Comb. 4 was designed to take advantage of seasonality of rainfall features (*t*−12).

An LSSVM through RBF kernel was trained for all input features to predict one-month-ahead rainfall values employing normalized rainfall data. In order to determine the LSSVM results, the performance evaluation results based on statistical evaluation values are demonstrated for different input combinations in Table 2. According to Table 2, Comb. 1 did not perform adequately in the calibration and verification steps. It can be inferred that target rainfall could not be predicted with one-month lag values precisely. Of Combs. 2 and 3, Comb. 3 could predict the rainfall relatively better. Comb. 3 also employed one to three months' ago values of rainfall features. Comb. 4, which included a 12 months' lag, showed improvement in comparison to other antecedent-based combinations; nevertheless, generally, results were not assured. It was observed that Comb. 4, which was designed by considering the seasonality of the process (with a period of one year) in addition to the auto-regressive characteristic of the data, could yield better results. Also, due to the poor Markovian process in monthly modeling of precipitation, it was observed that Combs. 1 to 3 led to weak outcome and proved that monthly rainfall is conditioned by seasonal components. The outcome of classic modeling by LSSVM using Comb. 4 is demonstrated in Figure 5.

### Wavelet-based modeling results

*et al.*1998; Nourani

*et al.*2009): where

*L*and

*N*are decomposition level and number of time series data, respectively. For the case study

*N*= 504, thus,

*L*= 3. This experimental equation was derived for fully autoregressive signals, and only considering time series length without being concerned about seasonal signatures of the hydrologic process (Nourani

*et al.*2011). Hence, by use of decomposition level equal to 3, not all seasonalities in the main time series might be taken into account. Decomposition level 3 yields three detailed sub-series (i.e., 2

^{1}months mode, 2

^{2}months mode, 2

^{3}months mode, which is nearly an eight-month mode), but according to the hydrological base of the time series, there might be other dominant seasonalities with longer periods. Therefore, one other decomposition level (i.e., 4) was also examined, in which decomposition level 4 led to better results via the presented modeling and was selected as the optimum decomposing level. Level 4 sub-series contains four sets of details: 2

^{1}months mode, 2

^{2}months mode, 2

^{3}months mode, and 2

^{4}months mode, which is nearly a year mode. Therefore, the seasonality of the process up to one year could be handled by the model. On the other hand, it was observed that, for example,

*d*at some time

_{4}*t*included knowledge of past values back to

*x(t*−16

*)*; therefore, the first 15-odd data were not used in the training process (Aussem

*et al.*1998).

The treatment of boundary merits special attention when implementing wavelet-based approaches for forecasting applications. Estimation of wavelet coefficient at any time *t* uses observations on past states going back up to (*t−p*) and the future, unrealized states up to (*t**+**p*). In general wavelet applications, various kinds of boundary conditions such as i) periodic boundary, ii) reflective boundary extension, and iii) constant extension are usually used for extending the series up to (*t**+**p*) (Maheswaran & Khosa 2012b).

Daubechies-2 (db2), Meyer, Sym 2, and coif2 mother wavelets were applied to decompose the rainfall time series. To investigate the effect of form similarity between mother wavelet and main time series, another selection of mother wavelets was also examined. For this purpose, two other mother wavelets (Haar and Daubechies-4, db4) were chosen according to the formation of main signals. As the Haar wavelet has a pulsed shape, it could properly capture the signal features of rainfall time series and may yield comparatively high efficiency (Nourani *et al.* 2009). Also, due to the formation of a db4 wavelet that is similar to the hydrological signal, it could capture the signal features, especially peak points, efficiently and led to comparatively good results. For instance, Figure 6 shows approximation and details of sub-series of rainfall time series decomposed by the Haar and db4 mother wavelet at level 4. Boundary (edge) effect is one of the deficiencies in the application of WT, which happens due to application of the wavelet to the beginning and end of the time series (signals), at which point there are no data before and after. As a solution, the zero padding method was used (Addison *et al.* 2001). Therefore, since the time series were long enough for the case study, suitable amounts of data were neglected from the beginning and end parts of time series after wavelet application.

The time series were decomposed into three and four levels by different kinds of WTs, i.e.: 1) the Haar wavelet, a simple wavelet; 2) the Daubechies wavelet (db), the most popular wavelet; 3) the sym wavelet, with three sharp peaks; and 4) the coif1 wavelet and other mother wavelets (Mallat 1998). Results of modeling by wavelet-based LSSVM (W-LSSVM) are presented in Figure 7. Figure 7 shows that the db4 mother wavelet and decomposition level 4 had better performance, and the subsequent results of the W-LSSVM model are presented for this mother wavelet. Predicted values via db(4,4) and W-LSSVM are demonstrated in Figure 8.

It was observed that the larger errors of LSSVM were found to be in peak values, while the oscillations of the rainfall values are relatively well predicted. On the other hand, the W-LSSVM model predicted both extreme and average values more precisely in comparison to the ad hoc LSSVM model. It could be inferred that the better fitting of extreme and average values significantly reduced the *RMSE* and increased *DC* as well. Comparison of Figures 5 and 8 proved this statement; also, it was observed that the fitted trend line in Figure 8 (scatter plot) is closer to 45° (perfect prediction) in comparison to Figure 5.

### Results of proposed model

Results approved the capability of db(4,4) in forecasting the rainfall time series. Therefore, db(4,4) was selected to be applied in the proposed model. Accordingly, in the first step, rainfall data were decomposed into several sub-series using DWT (db4). In the second step, various input combinations of sub-series with different time lags were tried for predicting the sub-series separately in two stages (SARIMAX and LSSVM).

The proposed hybrid model (i.e., W-S-LSSVM) was examined for the rainfall modeling of the Urmia Lake watersheds. The results of the model for db(4,4) are presented in Table 3. At first, components of db(4,4) were predicted by SARIMAX. Next, residual values (error) were fed into LSSVM to predict the error and, finally, by aggregating the sub-series that obtain from modeling, predict the rainfall values. The best W-S-LSSVM structure in monthly modeling belongs to the SARIMAX (2, 1, 1) (1, 1, 1)_{3}*Ia(t)* model, in which the obtained residuals were entered into the LSSVM with a window format of three successive months' residuals (i.e., *eIa(t),eIa(t*−1*),eIa(t*−2)) in order to find the one time step-ahead residual (i.e., *e*(*t**+* 1)). In this model, at first a SARIMAX model was fitted to the calibration rainfall dataset so that the decomposed rainfall values at previous time steps were also considered as the exogenous input of the model. The fitted model gave the estimated values for the rainfall , and then the residual time series of the calibration data was computed by subtracting the decomposed observed data (*I(t)*) from the estimated decomposed values (i.e., ). This residual time series of the calibration dataset was then used to train an LSSVM model to detect the nonlinear relationship among the residuals. The best architecture of the LSSVM was obtained via a trial-and-error process. The trained network was then used to find the residual values of the verification dataset month by month. On the other hand, SARIMAX (2, 1, 1) (1, 1, 1)_{12}*I(t)* was separately used in order to find the linear estimation of the rainfall values of the verification dataset (Figure 9(a)). Finally, the values of the computed rainfall for the verification dataset were obtained by Equation (15) and can be compared with the observed rainfall in terms of the *DC* and *RMSE*. In addition to the results presented in Table 3, the plot of the prediction in the verification stage is shown in Figure 9 for monthly modeling. It can be seen that the proposed model could capture the peak values, and better results were obtained in comparison to other models. Table 4 represents the overall performance of proposed models for rainfall prediction in the verification period. Based on results, it was observed that the W-S-LSSVM's performance was improved by about 7–8% in comparison to classic models and about 5–6% in comparison to the W-LSSVM model. A possible reason for this may be the fact that single models that use only rainfall data as input values for modeling perform weakly, whereas wavelet conjunction models use sub-series that include various information about the studied phenomenon which improves the prediction capabilities. Using only one resolution component to model, the rainfall time series does not readily explain the internal mechanism of the time series (Chou & Wang 2002).

In order to validate the stability of the proposed model, the rainfall time series were separated into two parts equally and were predicted by the proposed model. The first 50% of data (1972–1993) resulted in *DC**=* 0.93 and *RMSE**=* 0.018, and the other 50% (1994–2014) resulted in *DC**=* 0.92 and *RMSE**=* 0.021 (overall results). It was observed that even if the number of points in the times series was changed, the outcome proved to be robust.

By considering the significance of rainfall predictions in hydro-environmental systems, in this study, the ability of the proposed W-S-LSSVM model was also investigated for multi-step-ahead predictions. Two, three, six, and twelve month-ahead predictions were performed by employing W-S-LSSVM (db(4,4)) as the best model of one-step-ahead prediction. The obtained results are presented in Table 5. Results indicated that two-month-ahead prediction led to acceptable outcome by the means of *RMSE* and *DC* values. Despite the results obtained for two-month-ahead prediction, the outcome of prediction for three-, six-, and twelve-month-ahead weakened relatively. It could be concluded from Table 5 that in multi-step-ahead predictions, the proposed model led to relatively poor performance in comparison to one-step-ahead modeling and lost its accuracy by increasing the prediction horizon. The reason is because in multi-step-ahead prediction, the Markovian property weakens by time step increase.

### Characteristics of applied approaches in proposed models

SARIMAX is a classic time series forecasting model, and therefore was used as a hybrid model to predict the rainfall time series. Although a linear model like SARIMAX is unlikely to model a complex nonlinear hydrological process, the hybrid SARIMAX-LSSVM model could handle both linearity and nonlinearity of rainfall time series.

For the case of LSSVM, there are some main advantages. First, it has a regularization parameter to handle the over-fitting issues. Second, it uses the kernel; therefore, expert knowledge about the problem could be built via engineering the kernel. Third, an SVM is defined by a convex optimization problem (no local minima) for which there are efficient methods. Finally, it is an approximation to a bound on the test error rate, and there is a substantial body of theory behind it that suggests it to be a solid approach (Noori *et al.* 2011).

The disadvantages are that the theory only really covers the determination of the parameters for a given value of the regularization and kernel parameters and choice of kernel. In a way, the SVM moves the problem of over-fitting from optimizing the parameters to model selection. Unfortunately, kernel models can be quite sensitive to over-fitting the model selection criterion.

Results proved the capability of DWT in prediction of rainfall time series by giving incredible insight into the proposed artificial intelligence (AI)-based model. However, when applying DWT to annual precipitation data, the decimation and filtering of some of the signals at each scale resulted in signals not being time invariant. If a signal is shifted in time, the DWT does not produce a shifted version of the non-shifted transform. Redundant non-decimated WT can be used to address the issue. This algorithm essentially follows the same decomposition tree in DWT, except that instead of filtering and decimating the signals, the filters are being upsampled by inserting zeros in between prior filtering at each scale (Maheswaran & Khosa 2012b).

## CONCLUSION

Data pre/post-processing are proven to be an efficient way to improve the prediction precision in most models. The current study presented an AI-based three-stage model to improve forecasting quality of rainfall time series. To validate the proposed model, the Urmia Lake watershed was selected. The monthly rainfall data of 228 stations located inside and outside of Urmia Lake watershed were used to analyze and model the rainfall time series by Thiessen polygon approach. The obtained time series, which represented the rainfall values over the watershed, was used in the modeling process. First, the time series were predicted via ad hoc LSSVM via antecedent-based inputs. Comb. 4, which included the previous months' rainfall features with lag of one, two, and twelve months, outperformed other models. Next, W-LSSVM via different mother wavelets and decomposition levels was used to amend the results. Based on outcome, db(4,4) led to the best prediction. Accordingly, db(4,4) was used in the proposed W-S-LSSVM model. In the W-S-LSSVM model, decomposed sub-series were forecast separately by ARIMAX. Next, residual values were predicted by LSSVM. Outcomes of the model were summed to build the time series of rainfall. Based on results, it was observed that the W-S-LSSVM's performance was improved by about 7–8% in comparison to classic models and about 5–6% in comparison to the W-LSSVM model. Also, due to the importance of prediction of rainfall for future time steps, two-, three-, six-, and twelve-month-ahead predictions were performed using the best model. It was concluded that the higher the prediction horizon, the lower modeling accuracy is obtained.

## ACKNOWLEDGEMENTS

The authors would like to acknowledge East Azerbaijan Regional Water Company and Iran Water Company for data preparation.