## Abstract

Accurate forecasting of rainfall, especially daily time-step rainfall, remains a challenging task for hydrologists' invariance with the existence of several deterministic, stochastic and data-driven models. Several researchers have fine-tuned the hydrological models by using pre-processed input data but improvement rate in prediction of daily time-step rainfall data is not up to the expected level. There are still chances to improve the accuracy of rainfall predictions with an efficient data pre-processing algorithm. Singular spectrum analysis (SSA) is one such technique found to be a very successful data pre-processing algorithm. In the past, the artificial neural network (ANN) model emerged as one of the most successful data-driven techniques in hydrology because of its ability to capture non-linearity and a wide variety of algorithms. This study aims at assessing the advantage of using SSA as a pre-processing algorithm in ANN models. It also compares the performance of a simple ANN model with SSA-ANN model in forecasting single time-step as well as multi-time-step (3-day and 7-day) ahead daily rainfall time series pertaining to Koyna watershed, India. The model performance measures show that data pre-processing using SSA has enhanced the performance of ANN models both in single as well as multi-time-step ahead daily rainfall prediction.

## INTRODUCTION

Rainfall is an important hydrologic phenomenon which is highly uncertain in nature. It is also one of the prime indicators used in many climate change studies. Accurate prediction of rainfall is very crucial for planning an efficient water resource system and for efficient management of water distribution systems. Rainfall forecasting has remained a challenging task for hydrologists on account of its high temporal and spatial variation, non-linearity, non-stationarity, ergodicity, and other descriptive characteristic variabilities (Xu 1999; Cheng *et al.* 2016). As the time step of rainfall time series becomes smaller and smaller, the difficulty level of developing an efficient rainfall prediction model goes on increasing.

Conventional models used in prediction of hydrological processes include linear and non-linear deterministic regression models, stochastic models, such as ARMA, ARIMA models, etc., and data-driven techniques such as artificial neural networks (ANNs), genetic programming (GP), model tree (MT), support vector machine (SVM), etc. Azamathulla (2013) carried out a review of the application of soft computing techniques in water resources engineering and reported that soft computing techniques can be successfully applied to the prediction of hydraulic and hydrologic variables. There have been efforts by researchers to couple satellite data with several forecasting models in an attempt to predict rainfall more accurately (Kalsi 2003; Seto *et al.* 2016). More studies have been carried out by researchers by improving the resolution of satellite data to improve the prediction accuracy for smaller time-step rainfall (Sarumathi *et al.* 2015; Seto *et al.* 2016). Climate change models such as general circulation models and regional circulation models have been utilized in predicting annual, seasonal, and monthly rainfall data but lack accurate daily time-step rainfall prediction (Nearing 2001; Huntingford *et al.* 2003). The climate change models predict long-term variations in rainfall processes for various assumed scenarios. However, a hydrologist requires accurate rainfall prediction with smaller time steps ranging from hourly, daily, weekly, monthly, and seasonal rainfall. Thus, hydrologists are still in search of better models for smaller time-step rainfall prediction that can be effectively utilized in efficient water resource planning and management.

Data-driven techniques have proven to be very efficient in hydrological time series modeling. Babovic *et al.* (2000a) compared the performances of ANN, SVM, global linear models, and local linear models in correcting the errors made by a numerical model in forecasting water level. It is reported that the ANN and SVM models performed exceptionally better than the linear models and could efficiently reduce the errors in numerical modeling. Babovic & Keijzer (2002) utilized GP in order to carry out rainfall–runoff modeling and reported that data-driven models can provide best forecast results for hydrological processes. Vojinovic *et al.* (2003) utilized a hybrid model of neural networks and a deterministic model to combine the advantages of deterministic and stochastic approaches to forecast wet weather response in waste water systems. Yu *et al.* (2004) combined chaos theory with SVM to model runoff of two catchments and concluded that the decomposition of the time series could enhance the predictive capacity of SVM and could reduce the computational difficulty. Among the available data-driven time series models, ANN has proved to perform better especially in the case of non-linear time series such as rainfall prediction (Hsu *et al.* 1995; ASCE Task Committee 2000a; Tokar & Markus 2000; Solomatine & Dulal 2003; Zhang *et al.* 2011; Abhishek *et al.* 2012; Mandal & Jothiprakash 2012; Guven *et al.* 2013). Over the last two decades, the application of ANNs in the field of hydrology has gained wide attention (ASCE Task Committee 2000b). ANN is a data mining technique which is based on the idea of the human nervous system. ANN can be defined as an approximator based on the data given for training the network (Rumelhart *et al.* 1986; ASCE Task Committee 2000a). The advantages of ANN as a time series model include its ability to incorporate non-linearity, its data adaptive nature, its ability to extract the complex relationship between input and output and to generalize the relationship to produce output from the input. However, it is reported by researchers that there is a serious lag effect upon the usage of ANN while predicting climatic time series (Smith & Eli 1995; Dawson & Wilby 2001; Jain *et al.* 2004; de Vos & Rientjes 2005; Wu & Chau 2011). It is also indicated that in order to get accurate prediction results, training data should incorporate all the properties of the underlying physical process (Tokar & Johnson 1999; Solomatine & Ostfeld 2008). This is true in a sense, especially for rainfall prediction.

Navone & Ceccatto (1994) utilized hierarchical neural network with one output node to predict the Indian summer monsoon rainfall and reported that the ANN model performed better than the linear statistical model. Silverman & Dracup (2000) applied ANN for long range precipitation prediction of California. It was found that the ANN model predicted the phase of precipitation successfully but failed to predict magnitude accurately. It was suggested that by increasing the number of hidden nodes, the results can be enhanced. The study of Ramirez *et al.* (2005) in forecasting rainfall of the São Paulo region using ANN compared the performance of ANN with a multiple linear regression (MLR) model and a regional Eta model (a numerical atmospheric model). It was reported that the ANN model could outperform the MLR model especially when the rainfall values are zero, but is not as accurate as the regional Eta model. It has been suggested to use data pre-processing techniques to improve the prediction performance of ANN. Moustris *et al.* (2011) forecast the monthly mean, maximum, and minimum precipitation in specific regions of Greece using ANN and reported that the ANN model could not accurately predict the peak precipitation values. Thus, over the years, it has been found that ANN modeling in rainfall prediction displays a lag effect and the peak could not be accurately predicted even for monthly time-step rainfall data. It has also been suggested to adopt data pre-processing to overcome these difficulties.

Data pre-processing is necessary for most of the hydrologic time series to achieve better performance in prediction (Chou 2011; Sang 2013). The major reason for pre-processing of data is to eliminate the possible noise in the time series and thus to avoid undesirable training of the ANN, and also to reduce the effect of a large number of zeros in the time series (Sivapragasam *et al.* 2001; Wang *et al.* 2006; Hung *et al.* 2009; Wu *et al.* 2009; Fathima & Jothiprakash 2016). Wu *et al.* (2009) compared the performances of ANN, ANN coupled with wavelet transform, and ANN coupled with singular spectrum analysis (SSA) in predicting daily inflow. It is reported that ANN-SSA presented better prediction performance than ANN coupled with wavelet transform. Sun *et al.* (2010) applied a combination of chaos theory and time-delay neural networks to predict the model errors at the measurement stations in the Singapore Regional Model domain and reported that the approach could enhance the forecasting results of the Singapore Regional Model. Ramana *et al.* (2013) combined wavelet technique with ANN for single time-step prediction of monthly rainfall data and concluded that the prediction performance of wavelet neural network is better than the ANN model. Several other studies also concluded that data pre-processing helped to improve the performance of ANN (Lisi *et al.* 1995; Nourani *et al.* 2009; Chau & Wu 2010; Wu & Chau 2011; Xiao *et al.* 2014).

SSA is a non-parametric time series analysis technique that can be successfully applied as a data pre-processing algorithm for various time series applications. SSA works on the derived eigen values and corresponding eigen function for a given window length and is data adaptive in nature. Since Broomhead & King (1986) developed mathematical expressions for the method of SSA, various possible applications of SSA in time series analysis and prediction have gained wide attention by researchers. The successful application of SSA in a wide range of fields, such as image processing, filling gaps of time series, change point detection, extracting various components of time series, data pre-processing, etc. (Allen & Smith 1996; Schoellhamer 2001; Moskvina & Zhigljavsky 2003; Golyandina 2010; Itoh & Kurths 2010; Kondrashov *et al.* 2010; Rodrıguez-Aragon & Zhigljavsky 2010; Golyandina & Zhigljavsky 2013; Unnikrishnan & Jothiprakash 2015) made the method more popular. Sivapragasam *et al.* (2001) proposed a combination of SSA-SVM technique for rainfall and runoff forecasting and compared it with non-linear prediction (NLP) technique. It was observed that the SSA-SVM technique could predict rainfall and runoff with higher accuracy than the NLP method, and also suggested that in order to remove the discrepancy caused by discontinuity of data due to zero values, data pre-processing is necessary. Wu & Chau (2011) attempted rainfall–runoff modeling using ANN coupled with SSA and compared the prediction performance with that of ANN. The study shows that the ANN-SSA model could perform better than ANN in rainfall–runoff modeling and it was reported that application of SSA as a data pre-processing technique prior to any prediction model can be promising in terms of prediction performance.

In the current study, ANN (input data without data pre-processing) and SSA-ANN (ANN model with input data pre-processed using SSA) have been employed to forecast the daily rainfall of Koyna reservoir catchment in Maharashtra, India. There are already some good studies reporting pre-processing the data with SSA before forecasting using ANN (Chau & Wu 2010; Wu & Chau 2011). Nevertheless, the present work also has some novelties regarding the scientific background. (1) In the present study a new and different method of pre-processing the data was carried out based on a strong mathematical method of eigen values and eigen functions with proper evidence of elimination of noise and selection of significant components, and is explained in detail. (2) Two different approaches of utilization of pre-processed data (SSA1 and SSA2) have been used which have not been explored in any of the previous studies. (3) In the present study, prediction has been carried out both for single time-step and multi-time-step prediction and the performances are compared. The statistical performance indicators have been used to compare the performances of single and multi-time-step prediction using ANN and SSA-ANN models.

## DATA AND STATISTICAL ANALYSIS

Daily rainfall data pertaining to Koyna catchment in Maharashtra, India from 1st January 1961 to 31st December 2013 was utilized in the present study. The rainfall data were collected from Koyna Dam Division Office, Department of Irrigation, India. The Koyna catchment area consists of nine rain gauge stations. The data used in the study are the spatially averaged data of the observed rainfall of the nine rain gauge stations using Theissen polygon method. The region is monsoon fed and there is no rainfall during non-monsoon periods (November to May). Thus, out of 52 years of daily data collected, more than 60% is of zero magnitude. The details about the study area are given in Jothiprakash & Magar (2012), Unnikrishnan & Jothiprakash (2015), and Fathima & Jothiprakash (2016).

## DESCRIPTION OF METHODOLOGY

### Artificial neural network

ANN, based on the idea of the human nervous system, is one of the time series models that has received an overwhelming response from researchers in various fields. The main advantages of ANN over other conventional models are that it does not require any prior knowledge about the model and there is no requirement of complex mathematical relation between input and output to be developed (ASCE Task Committee 2000b). The ANN model works well with non-stationary and non-linear time series as well. The fundamental units of the ANN model are neurons. There are different layers of neurons, namely, input layer, hidden layer, and output layer. The training phase of ANN includes training the ANN model based on input and corresponding target data. Based on the training, ANN will develop a non-linear relation between input and output in the form of weights and bias. It is also worth noting that ANN is an over-parametrized set of non-linear equations (i.e., neurons). Thus, with increased numbers of neurons the chance of ANN to over-fit also will increase. One of the deciding factors in the performance of an ANN model is the appropriate selection of number of neurons in each layer and number of hidden layers. There is no systematic way of selecting those numbers. As the number of hidden layer increases, the complexity of the model increases (ASCE Task Committee 2000a). As suggested by many researchers, a single layer of hidden neurons is found to be a better alternative than multiple layers of hidden neurons to achieve better accuracy and have less complexity (Hornik *et al.* 1989; Minns & Hall 1996). In the present study, a single hidden layer has been utilized. The number of neurons in a hidden layer is selected based on a trial and error procedure based on the model performance measures during training, testing, and validation phases.

### Singular spectrum analysis

SSA of a time series involves splitting up of the series into eigen triples where the eigen triples are eigen functions, principal components, and eigen values. The details of the SSA method can be seen in various studies that have explained the mathematical description of the method (Elsner & Tsonis 1996; Golyandina *et al.* 2001; Golyandina & Zhigljavsky 2013; Unnikrishnan & Jothiprakash 2015). The steps of the SSA are briefly given below.

#### Embedding

Various embedding algorithms have been reported in the literature and are usually based on false nearest neighborhood and average mutual information (Weigend & Gershenfeld 1993; Abarbanel 1996; Babovic & Keijzer 2000, 2002; Babovic *et al.* 2000b; Sun *et al.* 2010).

In SSA, embedding is the first step of the SSA method, where the given time series is transformed into a matrix called trajectory matrix. The order of the matrix depends on the window length chosen such that the order of the matrix will be *K* × *L* where *L* is the window length and *K* is the lag parameter which is equal to *N−L**+* 1, where *N* is the time series length. Window length is the only parameter of the SSA method and its appropriate selection is crucial in the performance of the method in time series modeling (Golyandina & Zhigljavsky 2013).

#### Singular value decomposition

In this stage, the time series is split into a sum of eigen triples. Eigen triples include eigen function, principal components, and eigen values of the covariance matrix of the trajectory matrix. Singular value decomposition (SVD) is a mathematical procedure carried out on the covariance matrix of the trajectory matrix of the time series under consideration. Thus, at the end of SVD, there are a set of eigen triples representing the given time series.

#### Grouping

Identification of significant component matrices is carried out in this stage. There is no systematic way of grouping; it can be carried out in different ways of matrix analysis and based on the nature of time series undertaken.

#### Diagonal averaging

Based on the identified significant component in the stage of grouping, a time series is reconstructed by averaging the diagonals of the component matrices.

## ARCHITECTURE OF THE PROPOSED ANN AND SSA-ANN MODELS

In this study three different time-step prediction models have been used, namely, ANN1, ANN3, and ANN7 for single-time step, 3-day, and 7-day ahead prediction of daily rainfall series, respectively. ANN1 network was constructed using three input nodes (selected using a trial and error procedure) and one output node for prediction of rainfall of the next day. ANN3 comprises three input nodes and three output nodes for prediction of rainfall of the next 3 days. ANN7 comprises seven input nodes and seven output nodes for prediction of rainfall of the next 7 days. For all three networks, the other model parameters are the same. Single hidden layer ANN models have been used in the study. Thus, three-layered ANN models with the number of neurons in the hidden layer equal to ten (selected using trial and error procedure) have been used in the study. The number of hidden nodes was changed until a network with minimum mean absolute error (MAE) was obtained. The Levenberg–Marquardt algorithm was used as the learning algorithm for all the network models. The Levenberg–Marquardt algorithm takes less time for training network and gives better results compared to other algorithms such as back propagation, cascade correlation, conjugate gradient, etc. (Kişi 2007). The Levenberg–Marquardt back propagation algorithm is an iterative technique where there is a reverse movement of the adjustment of weight connections between neurons by back propagating the errors. It seeks local minimum non-linear function by the least squares method. The activation functions used in the hidden and output layers are log sigmoid and linear transfer functions, respectively.

For the utilization of SSA pre-processed data, two different approaches were used in the present study, namely, SSA1 and SSA2. In SSA1-ANN models, the SSA reconstructed data were given as input for the ANN model and as target the corresponding time step reconstructed data were given. Later, in order to get real-time forecast results, the predicted pre-processed data was back-transformed. In SSA2-ANN models, the SSA pre-processed data was given as input and observed rainfall of the next time step given as target to the ANN model. The details of various ANN models used in the study are shown in Table 1. From Table 1, it can be seen that in total, nine different ANN and SSA-ANN models were developed. In Table 1, the number and type of input for various models are described. *R _{t}* represents observed rainfall and

*PR*represents SSA pre-processed rainfall data during time

_{t}*t*.

Sl no. | Model Name | Structure | Time step of prediction | Input | Target | Remarks |
---|---|---|---|---|---|---|

1 | ANN1 | 3-10-1 | 1 | Observed rainfall data (R) _{t−2},R_{t−1},R_{t} | Observed rainfall data (R) _{t+1} | – |

2 | SSA1-ANN1 | 3-10-1 | 1 | SSA reconstructed data (PR) _{t−2},PR_{t−1},PR_{t} | Reconstructed data (PR) _{t+1} | Output from model is back transformed to (R_{t+1}) |

3 | SSA2-ANN1 | 3-10-1 | 1 | SSA reconstructed data (PR) _{t−2},PR_{t−1}, PR_{t} | Observed rainfall data (R) _{t+1} | – |

4 | ANN3 | 3-10-3 | 3 | Observed rainfall data (R) _{t−2},R_{t−1},R_{t} | Observed rainfall data (R) _{t+1},R_{t+2},R_{t+3} | – |

5 | SSA1-ANN3 | 3-10-3 | 3 | SSA reconstructed data (PR) _{t−2},PR_{t−1},PR_{t} | Reconstructed data (PR) _{t+1}, PR_{t+2},PR_{t+3} | Output from model is back transformed to (R) _{t+1}, R_{t+2}, R_{t+3} |

6 | SSA2-ANN3 | 3-10-3 | 3 | SSA reconstructed data (PR) _{t−2},PR_{t−1},PR_{t} | Observed rainfall data (R) _{t+1},R_{t+2},R_{t+3} | – |

7 | ANN7 | 7-10-7 | 7 | Observed rainfall data (R) _{t−6},R_{t−5},R_{t−4},R_{t−3},R_{t−2},R_{t−1},R_{t} | Observed rainfall data (R) _{t+1},R_{t+2},R_{t+3},R_{t+4},R_{t+5},R_{t+6},R_{t+7} | – |

8 | SSA1-ANN7 | 7-10-7 | 7 | SSA reconstructed data (PR) _{t−6},PR_{t−5},PR_{t−4},PR_{t−3},PR_{t−2},PR_{t−1},PR_{t} | Reconstructed data (PR) _{t+1},PR_{t+2},PR_{t+3},PR_{t+4},PR_{t+5},PR_{t+6},PR_{t+7} | Output from model is back transformed to (R) _{t+1},R_{t+2},R_{t+3},R_{t+4},R_{t+5},R_{t+6},R_{t+7} |

9 | SSA2-ANN7 | 7-10-7 | 7 | SSA reconstructed data (PR) _{t−6},PR_{t−5},PR_{t−4},PR_{t−3},PR_{t−2},PR_{t−1},PR_{t} | Observed rainfall data (R) _{t+1},R_{t+2},R_{t+3},R_{t+4},R_{t+5},R_{t+6},R_{t+7} | – |

Sl no. | Model Name | Structure | Time step of prediction | Input | Target | Remarks |
---|---|---|---|---|---|---|

1 | ANN1 | 3-10-1 | 1 | Observed rainfall data (R) _{t−2},R_{t−1},R_{t} | Observed rainfall data (R) _{t+1} | – |

2 | SSA1-ANN1 | 3-10-1 | 1 | SSA reconstructed data (PR) _{t−2},PR_{t−1},PR_{t} | Reconstructed data (PR) _{t+1} | Output from model is back transformed to (R_{t+1}) |

3 | SSA2-ANN1 | 3-10-1 | 1 | SSA reconstructed data (PR) _{t−2},PR_{t−1}, PR_{t} | Observed rainfall data (R) _{t+1} | – |

4 | ANN3 | 3-10-3 | 3 | Observed rainfall data (R) _{t−2},R_{t−1},R_{t} | Observed rainfall data (R) _{t+1},R_{t+2},R_{t+3} | – |

5 | SSA1-ANN3 | 3-10-3 | 3 | SSA reconstructed data (PR) _{t−2},PR_{t−1},PR_{t} | Reconstructed data (PR) _{t+1}, PR_{t+2},PR_{t+3} | Output from model is back transformed to (R) _{t+1}, R_{t+2}, R_{t+3} |

6 | SSA2-ANN3 | 3-10-3 | 3 | SSA reconstructed data (PR) _{t−2},PR_{t−1},PR_{t} | Observed rainfall data (R) _{t+1},R_{t+2},R_{t+3} | – |

7 | ANN7 | 7-10-7 | 7 | Observed rainfall data (R) _{t−6},R_{t−5},R_{t−4},R_{t−3},R_{t−2},R_{t−1},R_{t} | Observed rainfall data (R) _{t+1},R_{t+2},R_{t+3},R_{t+4},R_{t+5},R_{t+6},R_{t+7} | – |

8 | SSA1-ANN7 | 7-10-7 | 7 | SSA reconstructed data (PR) _{t−6},PR_{t−5},PR_{t−4},PR_{t−3},PR_{t−2},PR_{t−1},PR_{t} | Reconstructed data (PR) _{t+1},PR_{t+2},PR_{t+3},PR_{t+4},PR_{t+5},PR_{t+6},PR_{t+7} | Output from model is back transformed to (R) _{t+1},R_{t+2},R_{t+3},R_{t+4},R_{t+5},R_{t+6},R_{t+7} |

9 | SSA2-ANN7 | 7-10-7 | 7 | SSA reconstructed data (PR) _{t−6},PR_{t−5},PR_{t−4},PR_{t−3},PR_{t−2},PR_{t−1},PR_{t} | Observed rainfall data (R) _{t+1},R_{t+2},R_{t+3},R_{t+4},R_{t+5},R_{t+6},R_{t+7} | – |

### Model performance measures

Several statistical measures were utilized in the study to assess the predictive performance of the models along with time series and scatter plots. The description of statistical measures used are shown below.

#### Mean absolute error

*et al.*2005):

#### Mean positive error

#### Mean negative error

#### Root mean squared error

*et al.*2005; Wang

*et al.*2009):

#### The coefficient of determination (R^{2})

^{2}) is a statistical measure that indicates the predictability of dependent variable from a set of independent variable. Its value varies from 0 to 1 (Wang

*et al.*2009):

#### Nash–Sutcliffe efficiency coefficient (E)

#### Percentage mean error in estimating peak rainfall (%MP)

*y*and

_{t}*p*are observed and predicted rainfall values, respectively, during the time period

_{t}*t*,

*n*is the total number of rainfall observations, is the arithmetic mean of observed values,

*p*and

_{t−max}*y*are maximum values of predicted and observed values, respectively.

_{t−max}## RESULTS AND DISCUSSION

### Data pre-processing using SSA

Data-driven algorithms are heavily prone to overfitting and significantly affected by the presence of outliers. In order to avoid the presence of any outliers and avoid overfitting, noise was eliminated from the data by means of SSA pre-processing. In the study, for SSA-ANN models, SSA pre-processed data was utilized as input for ANN models. In pre-processing the rainfall data, noise from the data was eliminated and signal contained in the data was given as input to the ANN model. Based on the recommendation for selection of window length (Elsner & Tsonis 1996; Golyandina & Zhigljavsky 2013) and the time series property, a window length equal to 365 days was selected (Unnikrishnan & Jothiprakash 2015, 2016) owing to the presence of a prevailing oscillatory component with a period of 365 days in the time series. Thus, after SVD, 365 eigen triples were formed. Selection of appropriate eigen triples for the reconstruction of time series is called grouping. There is no hard and fast rule available for grouping of eigen triples. Earlier, researchers suggested grouping according to the 1-D and 2-D plots of eigen functions and principal components by visual analysis (such as group p vertex polygons into one group). 2-D plots of eigen functions or principal components are plots in which one eigen function/principal component is plotted against the next eigen function or principal component. The 2-D plot of the first ten principal components is given in Figure 1. The 2-D plots of eigen functions and principal components are complex spirals and it is not practical to group them by visual analysis of those plots.

SSA pre-processed data should be free from noise. To identify noise the eigen triples that correspond to noise need to be identified out of the 365 eigen triples. The identification of noise component can be done based on the irregular behavior of the eigen function and principal component. The separation of signal from noise may be accompanied by a gradual decrease in eigen values. The eigen values corresponding to all 365 eigen functions of the rainfall time series are shown in Figure 2. The horizontal line at eigen value equal to zero in the figure shows that from eigen function 70 to 365, the eigen value is near to zero. Thus, there is a steady decrease in the eigen values from eigen functions 70 to 365 (near to zero). Thus, it can be said that 70–365 eigen function corresponds to noise components with negligible eigen values. In order to ensure that the noise component was not mixed up with the signal, weighted correlation (*w* correlation) between the noise and signal was utilized. *W* correlations can be very helpful in determining the separability of different reconstructed components (Rocco 2013). *W* correlation is the weighted correlation among the reconstructed time series components. *W* correlation matrix reflects the structure of the series detected by SSA (Golyandina & Korobeynikov 2014). *W* correlation value can vary from 0 to 1. *W* correlation between the reconstructed components corresponding to different components of the time series should be near to zero and the *W* correlation among the reconstructed components corresponding to the same component should be near to 1 (Golyandina *et al.* 2001). The W correlation between the eigen functions 1–69 and 70–365 is found to be very small (0.0006). Thus, noise and signal components are very much isolated. Hence, eigen triples from 70 to 365 were identified as noise in the time series and excluded in the reconstruction of the time series. From the remaining 1–69 eigen triples, the signal components were selected based on the procedure explained below.

The first eigen triple was found to correspond to the non-linear trend of the time series by Unnikrishnan & Jothiprakash (2015). In order to select trend components, periodogram analysis was carried out in the grouping stage by Unnikrishnan & Jothiprakash (2015). In order to select the oscillatory and cyclic components in the time series, grouping of eigen functions based on periods of the eigen functions was used. From 2 to 69 eigen functions, the eigen functions that have a period of a rational multiple of the prevailing periodic component of the time series, i.e., 365 days (based on the catchment property of being monsoon fed) were used as oscillatory and cyclic components. Thus, the trend component, oscillatory and cyclic components were utilized for the time series reconstruction. The reconstructed rainfall data were utilized in the SSA-ANN models. The SSA reconstructed series eliminated the noise from the series and had a better structure on which an ANN model could be fit which was capable of generalizing.

### Method of coupling SSA with ANN

In combining SSA with ANN, two different approaches were carried out in the present study, namely, SSA1-ANN and SSA2-ANN. In SSA1-ANN models, the SSA pre-processed data, for the period 1st January 1961 to 31st December 2012 of Koyna daily rainfall, is given as input to the ANN model and the corresponding SSA reconstructed values given as target of the model. A relation has been ruled out between observed Koyna rainfall series and reconstructed series based on the Koyna rainfall and reconstructed series for a time period of 1st January 1961 to 31st December 2012, so that after SSA1-ANN forecast, the predicted reconstructed series can be back-transformed to rainfall values. For this purpose, various regression models were utilized to get the best fit between observed and reconstructed values. After various trial and error procedures, shape preserving interpolate fitting between observed values and reconstructed values was found to be the best fit. Shape preserving interpolation (Fontanella 1990; Costantini *et al.* 2001) preserves the monotonousness and shape of the data. There have been many attempts by researchers to approximate data by means of the shape preserving interpolate method (Kouibia & Pasadas 2001; Sim *et al.* 2008). In the current study, shape preserving interpolation utilized piecewise cubic Hermite interpolation for fitting the data. In the resulting fit, each pair of successive points is related by a different cubic polynomial described with four coefficients. Accordingly, a relation was made between observed and reconstructed series. As there are many data points of reconstructed and observed rainfall series in the present study, there are many piecewise polynomials in the resulting fit. Therefore, the resulting fit by shape preserving interpolate between reconstructed and observed rainfall values is given in Figure 3. Thus, by means of the resultant relation, the predicted reconstructed values are transformed back to predicted rainfall values. In the SSA2-ANN models, the SSA pre-processed data are given as input to the ANN models and corresponding observed rainfall values given as target to the ANN models. Therefore, back transformation is not required in SSA2-ANN models.

### Single and multi-time-step daily rainfall prediction using ANN and SSA-ANN models

Out of the collected, observed daily rainfall data from 1st January 1961 to 31st December 2013, the daily rainfall from 1st January 1961 to 31st December 2012 was used for training the neural network, and the rest of which was kept for revalidating the performance of the model, thus, one complete year's (1st January 2013 to 31st December 2013) values of input are not familiar to the network a priori. From the input data (from 1st January 1961 to 31st December 2012), 70% was taken for training, 15% for validation, and 15% for testing randomly. This phase of the neural networking is the calibration phase, where the input and target is given to the model and the model is trained with input and output and is validated and tested for the known output values. The performance of the models is then again revalidated for the values that are not given to the model earlier, i.e., the rainfall data from 1st January 2013 to 31st December 2013 as input to predict the corresponding single as well as multi-time-step ahead rainfall values. This test can give the measure of predictability of the ANN model generated in a real-life situation. This phase is known as the revalidation phase where the input values are not known to the network earlier and target values are not given.

### Single time-step prediction

#### Forecasting using ANN model

The ANN1 model was used to forecast 1 day ahead rainfall of Koyna daily average rainfall. The input of the model was rainfall that had occurred over the previous 3 days (*R*_{t−2}, *R*_{t−1}, *R _{t}*) and target was the next day's rainfall (

*R*

_{t+1}). The performance measures of the model during the calibration and revalidation phase (for the unfamiliar time period) are given in Table 2. The performance of the model deteriorated in the revalidation phase with very low performance measures. The predicted and the observed rainfall and the corresponding scatter plot during the revalidation phase (from 1st January 2013 to 31st December 2013) are shown in Figure 4(a). The figure shows that the prediction performance is not good and the peak values and low values are not predicted accurately. The figure also shows that there is a pronounced lag effect in the ANN prediction. The results are similar to that of reported lag effect in ANN prediction and underprediction of peak values (Dawson & Wilby 2001; Jain

*et al.*2004; Wu & Chau 2011). The poor performance of the ANN1 model can be attributed to the fact that daily rainfall data have a larger variation than monthly, seasonal and annual rainfall data and hence are highly non-linear. Over and above, the presence of noise in the data restricted the success of the ANN model.

Model | Calibration phase | Revalidation phase | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

R^{2} value | RMSE (mm) | MAE (mm) | MPE (mm) | MNE (mm) | E | %MP (%) | R^{2} value | RMSE (mm) | MAE (mm) | MPE (mm) | MNE (mm) | E | %MP (%) | |

ANN1 | 0.78 | 15.11 | 5.54 | 11.83 | −3.72 | 0.77 | −69.53 | 0.41 | 21.83 | 9.06 | 15.42 | −6.82 | 0.39 | −70.3 |

SSA1-ANN1 | 0.99 | 0.51 | 0.053 | 0.05 | −0.05 | 0.99 | 0.2 | 0.84 | 11.58 | 5.29 | 3.25 | 0 | 0.86 | −3.74 |

SSA2-ANN1 | 0.88 | 8.38 | 6.48 | 4.99 | −9.34 | 0.88 | 15.87 | 0.88 | 9.82 | 4.63 | 2.74 | −11.83 | 0.88 | −17.28 |

ANN3 | ||||||||||||||

R_{t+1} | 0.77 | 15.21 | 5.71 | 3.43 | −17.68 | 0.77 | −60.39 | 0.53 | 19.36 | 7.94 | 3.59 | −26.95 | 0.52 | −53.3 |

R_{t+2} | 0.69 | 20.34 | 8.16 | 4.79 | −25.98 | 0.6 | 131.207 | 0.582 | 18.38 | 8.41 | 5.49 | −23.6 | 0.57 | −49.7 |

R_{t+3} | 0.64 | 22.78 | 9.63 | 5.69 | −30.28 | 0.49 | −210.33 | 0.54 | 19.73 | 9.57 | 6.41 | −23.9 | 0.5 | −83.7 |

SSA1-ANN3 | ||||||||||||||

R_{t+1} | 0.99 | 0.2 | 0.12 | 0.12 | −0.11 | 0.99 | −0.004 | 0.82 | 12.02 | 5.61 | 3.12 | −13.7 | 0.82 | −3.6 |

R_{t+2} | 0.99 | 0.66 | 0.39 | 0.39 | −0.41 | 0.99 | 0.088 | 0.81 | 12.56 | 5.79 | 3.03 | −14.76 | 0.8 | −8.63 |

R_{t+3} | 0.99 | 1.56 | 0.94 | 0.89 | −0.99 | 0.99 | −0.187 | 0.81 | 13.2 | 6.17 | 3.22 | −14.91 | 0.76 | −37.63 |

SSA2-ANN3 | ||||||||||||||

R_{t+1} | 0.88 | 5.25 | 6.69 | 5.27 | −9.05 | 0.77 | −5.87 | 0.82 | 7.17 | 5.08 | 2.63 | −14.27 | 0.82 | −32.2 |

R_{t+2} | 0.86 | 7.03 | 6.47 | 5.15 | −8.71 | 0.78 | −6.84 | 0.83 | 9.55 | 5.12 | 2.89 | −11.01 | 0.83 | −29.6 |

R_{t+3} | 0.86 | 9.02 | 6.55 | 5.29 | −8.66 | 0.78 | −15.55 | 0.79 | 8.04 | 5.56 | 2.86 | −15.35 | 0.79 | −31.05 |

ANN7 | ||||||||||||||

R_{t+1} | 0.8 | 15.29 | 5.89 | 3.71 | −18.76 | 0.77 | −77.7 | 0.31 | 17.73 | 10.59 | 7.06 | −25.86 | 0.61 | −19.34 |

R_{t+2} | 0.78 | 20.26 | 8.22 | 5.13 | −27.31 | 0.59 | −99.02 | 0.4 | 18.92 | 10.03 | 6.7 | −28.41 | 0.55 | −51.98 |

R_{t+3} | 0.76 | 22.56 | 9.5 | 5.88 | −32.02 | 0.5 | −126.2 | 0.44 | 19.32 | 9.74 | 6.48 | −30.38 | 0.53 | −77.49 |

R_{t+4} | 0.78 | 23.83 | 10.26 | 6.28 | −34.8 | 0.45 | −150.11 | 0.47 | 20.57 | 9.66 | 6.46 | −29.92 | 0.47 | −102.81 |

R_{t+5} | 0.64 | 24.65 | 10.79 | 5.53 | −36.11 | 0.41 | −199.17 | 0.53 | 21.48 | 9.32 | 6.24 | −28.39 | 0.42 | −124.22 |

R_{t+6} | 0.51 | 25.37 | 11.23 | 6.78 | −37.79 | 0.38 | −193.17 | 0.4 | 22.92 | 9.11 | 5.99 | −31.49 | 0.34 | −133.35 |

R_{t+7} | 0.49 | 25.91 | 11.59 | 6.84 | −39.21 | 0.34 | −242.54 | 0.43 | 23.89 | 8.46 | 5.56 | −28.78 | 0.28 | −146.03 |

SSA1-ANN7 | ||||||||||||||

R_{t+1} | 0.99 | 0.1 | 0.06 | 0.06 | −0.05 | 0.99 | −0.35 | 0.88 | 9.99 | 4.54 | 2.83 | −9.65 | 0.88 | 1.21 |

R_{t+2} | 0.99 | 0.11 | 0.05 | 0.05 | −0.05 | 0.99 | 0.03 | 0.75 | 17.09 | 0.87 | 4.34 | −14.61 | 0.63 | 36.18 |

R_{t+3} | 0.99 | 0.26 | 0.12 | 0.13 | −0.12 | 0.99 | −1.32 | 0.77 | 14.35 | 6.38 | 2.92 | −16.6 | 0.74 | −27.31 |

R_{t+4} | 0.98 | 0.54 | 0.27 | 0.26 | −0.28 | 0.99 | −1.07 | 0.76 | 14.43 | 6.64 | 3.72 | −15.82 | 0.74 | 7.79 |

R_{t+5} | 0.98 | 1.02 | 0.52 | 0.49 | −0.55 | 0.99 | −0.57 | 0.78 | 13.9 | 6.57 | 3.55 | −16.53 | 0.76 | 1.63 |

R_{t+6} | 0.99 | 1.72 | 0.89 | 0.83 | −0.96 | 0.99 | 1.47 | 0.72 | 16.74 | 7.35 | 4.17 | −16.9 | 0.65 | −11.3 |

R_{t+7} | 0.99 | 2.07 | 1.4 | 1.3 | −1.52 | 0.99 | 1.19 | 0.75 | 16.46 | 6.92 | 2.86 | −17.46 | 0.66 | −1.83 |

SSA2-ANN7 | ||||||||||||||

R_{t+1} | 0.88 | 5.96 | 6.54 | 5.3 | −8.46 | 0.78 | −7.22 | 0.83 | 13.61 | 6.56 | 5.05 | −8.48 | 0.77 | −0.03 |

R_{t+2} | 0.86 | 7.91 | 6.41 | 5.26 | −8.25 | 0.78 | −29.33 | 0.79 | 13.33 | 6.44 | 4.89 | −8.5 | 0.78 | −56.45 |

R_{t+3} | 0.87 | 7.85 | 6.35 | 5.13 | −8.47 | 0.78 | −38.21 | 0.79 | 13.57 | 6.48 | 4.9 | −8.67 | 0.78 | −54.93 |

R_{t+4} | 0.85 | 6.86 | 6.33 | 5.37 | −7.74 | 0.78 | −39.54 | 0.78 | 13.43 | 6.42 | 5 | −8.25 | 0.78 | −47.89 |

R_{t+5} | 0.87 | 8.94 | 6.4 | 5.88 | −7.04 | 0.78 | −38.98 | 0.8 | 13.07 | 6.36 | 5.4 | −7.44 | 0.78 | −45.92 |

R_{t+6} | 0.83 | 10.04 | 6.58 | 6.08 | −7.14 | 0.78 | −37.43 | 0.79 | 13.07 | 6.53 | 5.37 | −7.98 | 0.78 | −50.01 |

R_{t+7} | 0.81 | 15.4 | 6.95 | 6.14 | −7.94 | 0.77 | −39.17 | 0.79 | 13.32 | 6.77 | 5.23 | −9.03 | 0.78 | −51.83 |

Model | Calibration phase | Revalidation phase | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

R^{2} value | RMSE (mm) | MAE (mm) | MPE (mm) | MNE (mm) | E | %MP (%) | R^{2} value | RMSE (mm) | MAE (mm) | MPE (mm) | MNE (mm) | E | %MP (%) | |

ANN1 | 0.78 | 15.11 | 5.54 | 11.83 | −3.72 | 0.77 | −69.53 | 0.41 | 21.83 | 9.06 | 15.42 | −6.82 | 0.39 | −70.3 |

SSA1-ANN1 | 0.99 | 0.51 | 0.053 | 0.05 | −0.05 | 0.99 | 0.2 | 0.84 | 11.58 | 5.29 | 3.25 | 0 | 0.86 | −3.74 |

SSA2-ANN1 | 0.88 | 8.38 | 6.48 | 4.99 | −9.34 | 0.88 | 15.87 | 0.88 | 9.82 | 4.63 | 2.74 | −11.83 | 0.88 | −17.28 |

ANN3 | ||||||||||||||

R_{t+1} | 0.77 | 15.21 | 5.71 | 3.43 | −17.68 | 0.77 | −60.39 | 0.53 | 19.36 | 7.94 | 3.59 | −26.95 | 0.52 | −53.3 |

R_{t+2} | 0.69 | 20.34 | 8.16 | 4.79 | −25.98 | 0.6 | 131.207 | 0.582 | 18.38 | 8.41 | 5.49 | −23.6 | 0.57 | −49.7 |

R_{t+3} | 0.64 | 22.78 | 9.63 | 5.69 | −30.28 | 0.49 | −210.33 | 0.54 | 19.73 | 9.57 | 6.41 | −23.9 | 0.5 | −83.7 |

SSA1-ANN3 | ||||||||||||||

R_{t+1} | 0.99 | 0.2 | 0.12 | 0.12 | −0.11 | 0.99 | −0.004 | 0.82 | 12.02 | 5.61 | 3.12 | −13.7 | 0.82 | −3.6 |

R_{t+2} | 0.99 | 0.66 | 0.39 | 0.39 | −0.41 | 0.99 | 0.088 | 0.81 | 12.56 | 5.79 | 3.03 | −14.76 | 0.8 | −8.63 |

R_{t+3} | 0.99 | 1.56 | 0.94 | 0.89 | −0.99 | 0.99 | −0.187 | 0.81 | 13.2 | 6.17 | 3.22 | −14.91 | 0.76 | −37.63 |

SSA2-ANN3 | ||||||||||||||

R_{t+1} | 0.88 | 5.25 | 6.69 | 5.27 | −9.05 | 0.77 | −5.87 | 0.82 | 7.17 | 5.08 | 2.63 | −14.27 | 0.82 | −32.2 |

R_{t+2} | 0.86 | 7.03 | 6.47 | 5.15 | −8.71 | 0.78 | −6.84 | 0.83 | 9.55 | 5.12 | 2.89 | −11.01 | 0.83 | −29.6 |

R_{t+3} | 0.86 | 9.02 | 6.55 | 5.29 | −8.66 | 0.78 | −15.55 | 0.79 | 8.04 | 5.56 | 2.86 | −15.35 | 0.79 | −31.05 |

ANN7 | ||||||||||||||

R_{t+1} | 0.8 | 15.29 | 5.89 | 3.71 | −18.76 | 0.77 | −77.7 | 0.31 | 17.73 | 10.59 | 7.06 | −25.86 | 0.61 | −19.34 |

R_{t+2} | 0.78 | 20.26 | 8.22 | 5.13 | −27.31 | 0.59 | −99.02 | 0.4 | 18.92 | 10.03 | 6.7 | −28.41 | 0.55 | −51.98 |

R_{t+3} | 0.76 | 22.56 | 9.5 | 5.88 | −32.02 | 0.5 | −126.2 | 0.44 | 19.32 | 9.74 | 6.48 | −30.38 | 0.53 | −77.49 |

R_{t+4} | 0.78 | 23.83 | 10.26 | 6.28 | −34.8 | 0.45 | −150.11 | 0.47 | 20.57 | 9.66 | 6.46 | −29.92 | 0.47 | −102.81 |

R_{t+5} | 0.64 | 24.65 | 10.79 | 5.53 | −36.11 | 0.41 | −199.17 | 0.53 | 21.48 | 9.32 | 6.24 | −28.39 | 0.42 | −124.22 |

R_{t+6} | 0.51 | 25.37 | 11.23 | 6.78 | −37.79 | 0.38 | −193.17 | 0.4 | 22.92 | 9.11 | 5.99 | −31.49 | 0.34 | −133.35 |

R_{t+7} | 0.49 | 25.91 | 11.59 | 6.84 | −39.21 | 0.34 | −242.54 | 0.43 | 23.89 | 8.46 | 5.56 | −28.78 | 0.28 | −146.03 |

SSA1-ANN7 | ||||||||||||||

R_{t+1} | 0.99 | 0.1 | 0.06 | 0.06 | −0.05 | 0.99 | −0.35 | 0.88 | 9.99 | 4.54 | 2.83 | −9.65 | 0.88 | 1.21 |

R_{t+2} | 0.99 | 0.11 | 0.05 | 0.05 | −0.05 | 0.99 | 0.03 | 0.75 | 17.09 | 0.87 | 4.34 | −14.61 | 0.63 | 36.18 |

R_{t+3} | 0.99 | 0.26 | 0.12 | 0.13 | −0.12 | 0.99 | −1.32 | 0.77 | 14.35 | 6.38 | 2.92 | −16.6 | 0.74 | −27.31 |

R_{t+4} | 0.98 | 0.54 | 0.27 | 0.26 | −0.28 | 0.99 | −1.07 | 0.76 | 14.43 | 6.64 | 3.72 | −15.82 | 0.74 | 7.79 |

R_{t+5} | 0.98 | 1.02 | 0.52 | 0.49 | −0.55 | 0.99 | −0.57 | 0.78 | 13.9 | 6.57 | 3.55 | −16.53 | 0.76 | 1.63 |

R_{t+6} | 0.99 | 1.72 | 0.89 | 0.83 | −0.96 | 0.99 | 1.47 | 0.72 | 16.74 | 7.35 | 4.17 | −16.9 | 0.65 | −11.3 |

R_{t+7} | 0.99 | 2.07 | 1.4 | 1.3 | −1.52 | 0.99 | 1.19 | 0.75 | 16.46 | 6.92 | 2.86 | −17.46 | 0.66 | −1.83 |

SSA2-ANN7 | ||||||||||||||

R_{t+1} | 0.88 | 5.96 | 6.54 | 5.3 | −8.46 | 0.78 | −7.22 | 0.83 | 13.61 | 6.56 | 5.05 | −8.48 | 0.77 | −0.03 |

R_{t+2} | 0.86 | 7.91 | 6.41 | 5.26 | −8.25 | 0.78 | −29.33 | 0.79 | 13.33 | 6.44 | 4.89 | −8.5 | 0.78 | −56.45 |

R_{t+3} | 0.87 | 7.85 | 6.35 | 5.13 | −8.47 | 0.78 | −38.21 | 0.79 | 13.57 | 6.48 | 4.9 | −8.67 | 0.78 | −54.93 |

R_{t+4} | 0.85 | 6.86 | 6.33 | 5.37 | −7.74 | 0.78 | −39.54 | 0.78 | 13.43 | 6.42 | 5 | −8.25 | 0.78 | −47.89 |

R_{t+5} | 0.87 | 8.94 | 6.4 | 5.88 | −7.04 | 0.78 | −38.98 | 0.8 | 13.07 | 6.36 | 5.4 | −7.44 | 0.78 | −45.92 |

R_{t+6} | 0.83 | 10.04 | 6.58 | 6.08 | −7.14 | 0.78 | −37.43 | 0.79 | 13.07 | 6.53 | 5.37 | −7.98 | 0.78 | −50.01 |

R_{t+7} | 0.81 | 15.4 | 6.95 | 6.14 | −7.94 | 0.77 | −39.17 | 0.79 | 13.32 | 6.77 | 5.23 | −9.03 | 0.78 | −51.83 |

#### SSA1-ANN1 model

In the SSA1-ANN1 model, the SSA pre-processed data of rainfall of the past 3 days (*PR*_{t−2}, *PR*_{t−1}, *PR _{t}*) was given as input to the ANN1 model instead of observed data and the SSA reconstructed value of next day rainfall (

*PR*

_{t+1}) given as target of the model. The performance measures of the model during the calibration phase are given in Table 2. The performance measures are very high owing to the absence of noise in the data. The predicted reconstructed values along with the actual reconstructed series for the validation time period (daily rainfall for the year of 2013) are shown in Figure 4(b). The figure shows the SSA1-ANN1 model can accurately predict the SSA reconstructed series. The scatter plot of predicted and reconstructed series is also given in Figure 4(b) and shows that there is an excellent correlation between the two. The reason for accurate prediction is that noise is removed from the data using SSA. As explained in the previous section, by means of the shape preserving interpolates method, the predicted reconstructed values during the revalidation phase are transformed back to predicted rainfall values. The shape preserving interpolate fit utilized is given in Figure 3. The predicted (back transformed) and observed rainfall values during the revalidation time period along with scatter plot are shown in Figure 4(c). The Figure shows a very good prediction performance as compared to the ANN1 model. The lag effect which was predominant in the ANN1 model is eliminated and the peak values are better predicted. The prediction performance measures of the SSA1-ANN1 model during the revalidation phase after back transformation are depicted in Table 2. The low performance measures during the revalidation phase as compared to the calibration phase of the model are due to the addition of noise in the data during back transformation of the predicted reconstructed values. However, the performance measures during the revalidation phase are much better than ANN1 model and peak values are better predicted with a percentage error in peak prediction of −3.74%. The MNE and MPE is zero and 3.25 mm, respectively, for the model in the revalidation phase, which implies that there is no significant underprediction and overprediction of values by the model. Moreover, the performance of the SSA1-ANN1 model is very good as far as daily rainfall prediction is concerned. Thus, it can be concluded that the data pre-processing using SSA increases the predictive capacity of the ANN model and also removes the lag effect.

#### SSA2-ANN1 model

In the SSA2-ANN1 model, the SSA pre-processed data were given as input to the ANN1 model and corresponding observed rainfall values given as target to the ANN model. The SSA pre-processed rainfall data of previous 3-day rainfall (*PR*_{t−2}, *PR*_{t−1}, *PR _{t}*) was given as input to the ANN1 model with next day observed rainfall (

*R*

_{t+1}) as target. The prediction performance measures of the SSA2-ANN1 model during calibration and revalidation phase is listed in Table 2. There is no significant variation in the performance of the model in the revalidation phase and in the calibration phase. Therefore, the performance of the model does not deteriorate when the unfamiliar rainfall data of the Koyna catchment is given as input. The observed and predicted rainfall values using SSA2-ANN1 model during the revalidation phase (daily rainfall data for the year 2013) and the corresponding scatter plot are given in Figure 4(d). The figure shows that the correlation between the observed and predicted rainfall is better as compared to the ANN1 model and, moreover, there is no lag effect in prediction of rainfall. Even though the correlation performance is better in the SSA2-ANN1 model, peak values are better predicted in the SSA1-ANN1 model with percentage mean error in peak prediction of −3.74% whereas it is −17.28% in the case of SSA2-ANN1. The reason is that in the SSA1-ANN1 model, the output is predicted without noise and noise added through back transformation, whereas in SSA2-ANN1 the output with noise is predicted. However, SSA2-ANN1 could predict peak rainfall values far better than the ANN1 model. However, the model tends towards underprediction of values as implied by a significant MNE value. However, the error is much less as compared to that of the ANN1 model, and all the other performance measures also show that the SSA2-ANN1 model is a better rainfall prediction model than the ANN1 model. Thus, it can be concluded that providing SSA pre-processed input will enhance the predictive capability of the ANN model and removes the lag effect.

### Multi-time-step prediction

Multi-time-step prediction of the daily Koyna rainfall data was carried out by means of six ANN models. Three-day ahead prediction and 1-week (7-day) ahead prediction was carried out in this phase. For 3-day prediction, ANN3 model with three input nodes and three output nodes was taken and for 7-day prediction ANN7 model with seven input nodes and seven output nodes was used. For multi-time step ANN models with data pre-processing, two models for each time-step prediction were utilized, namely, SSA1-ANN(3&7) and SSA2-ANN(3&7). The details of the architecture of the ANN models are given in the previous section and are tabulated in Table 1.

#### ANN models without data pre-processing

The architecture of the ANN3 and ANN7 model is as explained in the above sections. The previous 3-day rainfall values (*R*_{t−2}, *R*_{t−1},*R _{t}*) were given as input to the ANN3 model and the next 3-day rainfall values (

*R*

_{t+1},

*R*

_{t+2},

*R*

_{t+3}) given as the target. The rainfall of the previous 7 days (

*R*

_{t−6},

*R*

_{t−5},

*R*

_{t−4},

*R*

_{t−3},

*R*

_{t−2},

*R*

_{t−1},

*R*

_{t}) was given as input to the model and the corresponding 7-day ahead rainfall values (

*R*

_{t+1},

*R*

_{t+2},

*R*

_{t+3},

*R*

_{t+4},

*R*

_{t+5},

*R*

_{t+6},

*R*

_{t+7}) as target for the ANN7 model. The performance of the models was revalidated for the time period of 1st January 2013 to 31st December 2013 for 3-day (

*R*

_{t+1},

*R*

_{t+2},

*R*

_{t+3}) and 7-day predictions. The model performance measures of ANN3 and ANN7 during calibration and revalidation phases are given in Table 2. The table shows that the performance measures of both the models during the revalidation phase are very low as compared to the calibration phase with very low correlation and very high error and the result deteriorates as the prediction time step increases. The observed and predicted rainfall values along with corresponding scatter plot during the revalidation phase for ANN3 and ANN7 models are given in Figures 5 and 6, respectively. The figures show that there is lag effect (in each output) in the prediction using the ANN model and peak is underpredicted in both the models. The model performance measures are also low for all the prediction time steps.

#### SSA1-ANN(3&7) models

In the SSA1-ANN3 model, 3-day rainfall prediction of the Koyna daily rainfall data was carried out with the ANN3 model where SSA pre-processed data (*PR*_{t−2},*PR*_{t−1},*PR*_{t}) is given as input and SSA reconstructed rainfall data (*PR*_{t+1},*PR*_{t+2},*PR*_{t+3}) as target. In the SSA1-ANN7 model, SSA reconstructed previous 7-day rainfall data (*PR*_{t−6},*PR*_{t−5},*PR*_{t−4},*PR*_{t−3},*PR*_{t−2},*PR*_{t−1},*PR _{t}*) were given as input to the ANN7 model and corresponding SSA reconstructed data for 7-day ahead rainfall (

*PR*

_{t+1},

*PR*

_{t+2},

*PR*

_{t+3},

*PR*

_{t+4},

*PR*

_{t+5},

*PR*

_{t+6},

*PR*

_{t+7}) as target. The model performance measures during calibration phase for the models are given in Table 2. The table shows a very good correlation between the reconstructed and predicted values with R

^{2}value of 0.99 and with very less error in prediction for both the models. The models were revalidated for the time period as mentioned above. The output of the revalidation phase from the neural network is the SSA reconstructed data. Therefore, as explained in the previous section, in order to back transform the predicted reconstructed values to observed values, shape preserving interpolate was used. The performance of the back transformed predicted values with respect to the observed values for SSA1-ANN3 and SSA1-ANN7 along with the corresponding scatter plots are given in Figures 7 and 8, respectively. The figures show that both the models perform better than ANN3 and ANN7 models in all the prediction time steps. In the SSA1-ANN(3&7) model, the lag effect in the ANN model is also eliminated and the peak values are better predicted. The model performance measures during the revalidation phase after back transformation are given in Table 2. The decreased values of model performance measures in the revalidation phase in comparison to the calibration phase are due to the approximation carried out for back transformation of the output. However, the model performance measures are very good for a multi-time-step daily rainfall prediction with high correlation and low error. Thus, it can be concluded that SSA pre-processed data eliminates the lag effect of multi-time-step ahead ANN prediction models also.

#### SSA2-ANN(3&7) models

In the SSA2-ANN3 model, the SSA pre-processed rainfall data (*PR*_{t−2},*PR*_{t−1},*PR _{t}*) was given as input to the ANN3 model for 3-day prediction by giving the corresponding observed rainfall values (

*R*

_{t+1},

*R*

_{t+2},

*R*

_{t+3}) as target. In SSA2-ANN7, SSA pre-processed data for the previous 7-day rainfall (

*PR*

_{t−6},

*PR*

_{t−5},

*PR*

_{t−4},

*PR*

_{t−3},

*PR*

_{t−2},

*PR*

_{t−1},

*PR*) was given as input for the ANN7 model and the next 7-day observed rainfall (

_{t}*R*

_{t+1},

*R*

_{t+2},

*R*

_{t+3},

*R*

_{t+4},

*R*

_{t+5},

*R*

_{t+6},

*R*

_{t+7}) as target. The performance measures during calibration and revalidation of the models are given in Table 2. The performance measures are almost similar in both calibration and revalidation of both the models. The correlation between observed and predicted rainfall and efficiency (E) of the model is high in SSA2-ANN(3&7) models as far as multi-time-step daily rainfall prediction is concerned. However, the percentage error in peak prediction is higher in comparison with that of SSA1-ANN(3&7) models. Figures 9 and 10 show the comparison between the observed rainfall and predicted rainfall for SSA2-ANN3 and SSA2-ANN7, respectively, during revalidation of the model in the form of time series plot and scatter plot. The figures show that there is no lag effect in the prediction as in the case of ANN(3&7) models and non-peak values are predicted accurately.

The comparison between the six models in multi-time-step prediction of daily rainfall time series shows that prediction performance of ANN(3&7) is poorer with respect to the models where ANN is coupled with SSA. Both the models of SSA-ANN(3&7) showed better results compared to the ANN(3&7) model and could also eliminate the lag effect which is predominant in the ANN(3&7) models. Even though the correlation between observed and predicted values and efficiency of model is slightly better in SSA2-ANN(3&7) models, the peak values are better predicted in SSA1-ANN(3&7) values, which can be explicitly seen in the lower %MP values in SSA1-ANN(3&7) models. However, both ANN models that utilized SSA pre-processed data outperformed the ANN models for all the time-step predictions in terms of all prediction performance measures.

## CONCLUSIONS

In the present study, the prediction performance of ANNs coupled with SSA (SSA-ANN) was assessed for single as well as multi-time-step daily rainfall prediction. For the SSA-ANN model itself, two models were utilized, namely, SSA1-ANN and SSA2-ANN, based on the usage of the pre-processed data in the neural network. Three ANN models were developed named ANN1, ANN3, and ANN7 for single time step, 1-day, 3-day, and 7-day prediction of daily rainfall series. In total, nine models were developed. Koyna rainfall data from 1st January 1961 to 31st December 2012 were given as input to the ANN model and the rest of the data available (1st January 2013 to 31st December 2013) was used to test the prediction performance of the models. Even though the ANN models resulted in a good coefficient of determination during the calibration phase, while predicting the rainfall for the period which was not given to the network earlier (revalidation phase), its performance deteriorated. Also, peak values were underpredicted and there was a lag effect in the prediction in all the prediction time steps for single as well as multi-time-step prediction. SSA-ANN models outperformed the corresponding ANN models in single as well as multi-time-step prediction and eliminated the lag effect which is predominant in the ANN model. The results show that SSA2-ANN performed slightly better than SSA1-ANN models. However, SSA1-ANN could predict peak values more accurately. In the case of SSA1-ANN, back transformation of the results from neural network needed to be carried out which was not required in SSA2-ANN. However, both the SSA-ANN models could outperform the ANN models in terms of all prediction performance measures and could eliminate the lag effect in the ANN models. Thus, data pre-processing using SSA could enhance the prediction performance of ANN in a very efficient way.

The selection of number of neurons and hidden layers was carried out by trial and error method in the present study. As a future scope of the work, it is suggested to use other techniques, such as GP, chaos theory, etc., with the SSA-ANN model in order to select the appropriate number of inputs. The future direction of SSA-ANN models would be to improve the length of forecast horizon that can predict daily rainfall for longer time steps.

## ACKNOWLEDGEMENTS

The authors gratefully acknowledge the Koyna Dam Authorities, Executive Engineer of Koyna Dam for providing necessary data for this research. We gratefully acknowledge the anonymous reviewers as well as the editor for their excellent review, comments and suggestions that have improved the quality and readability of the manuscript.

## REFERENCES

*.*

*.*

*.*

*.*

*.*