Climate change has impacted all phenomena in the hydrologic cycle, especially extreme events. General circulation models (GCMs) are used to investigate climate change impacts but because of their low resolution, downscaling methods are developed to provide data with high enough resolution for regional studies from GCM outputs. The performance of rainfall downscaling methods is commonly acceptable in preserving average characteristics, but they do not preserve the extreme event characteristics especially rainfall amount and distribution. In this study, a novel downscaling method called synoptic statistical downscaling model is proposed for daily precipitation downscaling with an emphasis on extreme event characteristics preservation. The proposed model is applied to a region located in central Iran. The results show that the developed model can downscale all percentiles of precipitation events with an acceptable performance and there is no assumption about the similarity of future rainfall data with the historical observations. The outputs of CCSM4 GCM for two representative concentration pathways (RCPs) of RCP4.5 and RCP8.5 are used to investigate the climate change impacts in the study region. The results show 40% and 30% increase in the number of extreme rainfall events under RCP4.5 and RCP8.5, respectively.

## INTRODUCTION

The investigation of local rainfall event characteristics, especially extreme ones, is a key issue in hydrologic research in different fields such as urban drainage system management and flood control strategies. Owing to climate change impacts, the intensity and frequency of extreme rainfall events are increasing. According to the fifth report of the Intergovernmental Panel on Climate Change (IPCC 2014), the frequency and intensity of heavy precipitation events have increased in the northern hemisphere. General circulation models (GCMs) are the primary available tools for investigating the response of the hydro-climate system to climate change impacts (IPCC 2014).

^{2}) are the mesoscale phenomenon (Willems

*et al.*2012) and in terms of temporal scale, daily rainfall is the largest time scale applicable in urban flood management. Because of the spatial and temporal scale mismatch between the outputs of GCMs and the scale at which hydrological investigations of climate change impacts are commonly carried out, downscaling methods are applied (Wilby

*et al.*1998).

Two main approaches in downscaling are dynamical (numerical) downscaling and statistical (empirical) downscaling (Benestad *et al.* 2008). In dynamical downscaling, limited-area regional climate models are applied and the GCM outputs are used as their initial conditions (Feser *et al.* 2011). In the statistical downscaling (SD) methods, statistical or empirical models are used to relate the GCM outputs and local rainfall (Wilby & Dawson 2013). Different methods, techniques and approaches are proposed in the SD context including stochastic weather generators (Richardson 1981; Semenov & Barrow 1997; Semenov 2008; Baigorria & Jones 2010; Wilks 2010), transformation function (TF) approaches (Wilby *et al.* 2002; Dibike *et al.* 2008; Hessami *et al.* 2008; Pasini 2009; Tomassetti *et al.* 2009; Chau & Wu 2010; Chen *et al.* 2010; Najafi *et al.* 2011; Willems & Vrac 2011; Karamouz *et al.* 2013a, 2013b; Tavakol-Davani *et al.* 2013) and weather typing (WT) approaches (Hewitson & Crane 2006; Vrac & Naveau 2007; Arnbjerg-Nielsen 2008; Cheng *et al.* 2010, 2011; D'onofrio *et al.* 2010).

Previous studies have commonly focused on reproducing the mean behavior of the daily rainfall. However, some studies are devoted to downscaling daily precipitation with an emphasis on the extreme events (Hundecha & Bárdossy 2008; Benestad 2009; Cheng *et al.* 2011). It is difficult to preserve the characteristics of extreme daily rainfall during the downscaling process due to local phenomena that affect extreme daily rainfall amounts, the considerable number of dry days and a non-Gaussian statistical distribution on wet days (Benestad 2009). In many downscaling approaches, there is a tendency to underestimate the extreme precipitation events (Fealy & Sweeney 2007; Arnbjerg-Nielsen 2008; D'onofrio *et al.* 2010).

To overcome this shortcoming, some methods are proposed such as bias correction and change factor methods. Sarr *et al.* (2015) compared the performance of the delta-change method with a quantile–quantile transformation method in downscaling the amount of daily extreme rainfall events with 5 to 100 years return period in six stations in Senegal. The results showed that these methods in some cases overestimate and in some stations underestimate the extreme rainfall event amounts. A comparison between these methods can be found in Sunyer *et al.* (2015), Tryhorn & DeGaetano (2011) and Wang *et al.* (2015). According to their reviews, considering merely a limited number of extreme events in the bias correction method and inability to reproduce timing characteristics (chronology) of daily rainfall such as the length of dry/wet spells are some limitations of these approaches.

Another approach to overcome the limitations of ordinary SD models for downscaling daily precipitation is combining WT methods with SD models. The main idea in developing this approach is that the precipitation occurrence in different periods of time with a particular synoptic scale weather pattern follows a similar scheme (Willems *et al.* 2012).

Cheng *et al.* (2010) applied a synoptic WT using principal component analysis (PCA) and the logit and nonlinear regression procedures as the within-weather-type daily rainfall simulation models. Although the results show that the proposed method has a good potential to simulate daily rainfall occurrence and amount, it does not display a stable performance in all ranges of rainfall amounts. Hertig & Jacobeit (2013) also used the PCA on 700 hPa geopotential heights to determine circulation patterns (CP) in the synoptic scale and the Poisson regression model for daily precipitation simulation. Their model overestimated the precipitation below the 90% percentile and underestimated precipitations above this threshold.

Using k-means clustering technique to determine synoptic CPs, D'onofrio *et al.* (2010) simulated daily rainfalls according to daily precipitation probability density function (pdf) in each CP. Their approach was fairly straight and simple, but the results were overestimated for extreme values and wet/dry sequences. In addition, they assumed stationarity in precipitation pdf in future climate conditions. Similarly, Haberlandt *et al.* (2015) developed a stochastic rainfall downscaling model of which the parameters were conditioned based on CPs. They investigated the non-stationarity of the relationship between rainfall and CPs and concluded that the modification of the parameters of the stochastic rainfall downscaling model for its future application is necessary.

Despite the improvements in SD precipitation downscaling methods in previous studies, preserving all statistical characteristics of the observed local daily rainfall by the downscaling model is still a challenge. The previous studies cannot simulate the daily precipitation pdf for all ranges of rainfall amount without the assumption of the constant precipitation pdf for current and future climate conditions.

As estimation of extreme rainfall events is a vital issue in urban areas in planning for flood events and because of climate change impacts on the characteristics of extreme rainfall events, it is necessary to consider climate change impacts in extreme rainfall evaluation. On the other hand, the common methods used for evaluation of climate change impacts on rainfall events do not provide satisfactory results for extreme rainfall event investigation. Therefore, the main aim of this study is to develop a downscaling model which can preserve the distribution frequency or percentile values of the observed daily precipitation for all ranges of rainfall with an emphasis on extreme events. The proposed method, synoptic-statistical downscaling model (SSDM), is based on a combination of synoptic WT of upper-level atmosphere conditions which affect the local atmosphere states, and SD models for simulating daily rainfall in each weather type. The performance of SD models is improved by applying some modifications in common methods including skewness reduction and weighted least squares (WLS). These changes have made SSDM capable of overcoming some of the reported challenges and disadvantages in the literature including reproducing extreme event frequencies, and inherent non-Gaussian statistical distribution of daily rainfall. In the proposed method, stability in relationships between predictors and local rainfall in each weather type as well as the adequacy of recognized WTs for future events are assumed. There is no assumption about the stationarity of the daily rainfall pdf in future conditions, and changes in the frequency distribution of predictors (because of climate change) can result in daily rainfall pdf changes. The proposed method is applied to an area located to the north of Tehran, the capital of Iran, as the study area. The historical flooding of this city has resulted in considerable damage to people and properties in the past. Therefore, the investigation of climate change impacts on the precipitation regime of this region with an emphasis on extreme events is of great importance.

The next section describes the proposed method in this study for rainfall downscaling, SSDM. Then, the case study and data used in the study are introduced. Next, SSDM's development for the study area is described and the results of its application are discussed. Afterwards, climate change impacts on rainfall characteristics are investigated for the study area by applying SSDM on outputs of a GCM. Finally, a summary and conclusion are given.

## METHOD

### SSDM algorithm

*et al.*2010). In SSDM, days are classified according to the circulation type (CT) that occurs in each day. The SSDM algorithm includes two main phases. Figure 2 shows these two phases and their main steps and components as well as how they are related to each other. The detailed description of the SSDM algorithm is given in the following sections.

#### Phase I: synoptic scale classification model

Synoptic climatology was defined by Yarnal (1993) as the study of the linkage between the atmospheric circulation and an environmental response. A successful classification of atmospheric conditions is an essential step in the investigation of this linkage. Therefore, the first phase in the SSDM algorithm is clustering the large-scale atmospheric CTs that mainly influence the local surface weather conditions.

*Self-organizing maps (SOM).* In this study, the SOM technique is used for synoptic-based classification of CTs as a result of their successful application in climatological fields in the last decade (Hewitson & Crane 2002; Sheridan & Lee 2011). SOM was first presented in detail by Kohonen (1995). Inspired by neural networks, the SOM methodology utilizes an array or lattice of a low-dimensional (typically two-dimensional) matrix of neurons or nodes, which are trained to produce a discretized representation or map of the input space. Each node (considered a cluster) is defined by a reference vector of weighting coefficients, which has the same dimension as the input data vector. At first, nodes are randomly distributed in data space.

There are some standard SOM configurations used for its development including sheet, cylinder and toroid. The structure used here for SOM development is a sheet shape (matrix) of nodes with a rectangular lattice neighborhood structure. Since in the synoptic scale studies the two dimensional variations of climatic variables is investigated, the sheet shape is used similar to previous studies such as Hewitson & Crane (2002) and Sheridan & Lee (2011). The cylinder and toroid map shapes of SOM are used when three-dimensional analysis of data is considered. For the neighborhood condition in SOM map development, two cases, hexagonal and rectangular, are available, both of which are considered in this study. Due to the better performance of the rectangular neighborhood it is adopted in this study. In addition to SOM strcuture, the appropriate number of nodes in the SOM structure should be determined. Commonly, a subjective method is used for this purpose (Sheridan & Lee 2011). In many cases, different numbers of nodes are considered, and then by evaluation of the developed model's performance, the best number of nodes that results in the best model performance, is determined.

In the training procedure, each input vector is presented to the SOM and the best matching unit (BMU) is determined. BMU is the node that has the least Euclidean distance from the input vector. Then the weighting coefficients of BMU and its neighboring nodes are modified in such a way that the distance from the input vector is minimized. The weighting coefficients modifications are controlled by some factors such as learning rate. Neighborhood adjustment is utilized to preserve the topological properties of the input space. The training can be sequential (introducing input data one by one to the SOM) or batch training (introducing all input data simultaneously to the SOM). The training process has two stages of rough training (with large neighborhood radius and high learning rate) and fine-tuning (with small radius and low learning rate). The termination criterion for the training process can be the achievement of the minimum value of SOM error measurement called average quantization error (Kohonen & Honkela 2007) or reaching a predefined number of training epochs. A practical public domain software package for implementing SOMs along with extensive references and documentation are provided by Helsinki University of Technology's Laboratory of Computer and Information Science (available at http://www.cis.hut.fi/projects/somtoolbox/).

*Relating CTs and daily precipitation.*In order to analyze the relationship between the daily precipitation and the driving large-scale atmospheric CTs, a performance index (PI) (Zhang

*et al.*1997) is utilized in addition to the interpretation of the composite maps (Yarnal

*et al.*2001). Using this PI will help to quantify this relationship and makes it easier to determine CTs which result in wet and dry days which is necessary in this study. PI will help to provide the same interpretation for CTs in different cases which is very important in reproducing the study result. PI compares the mean daily precipitation for the

*i*th CT with the climatological daily mean precipitation as shown in Equation (1): where is the number of days included in the

*i*th CT, is the total amount of precipitation in the

*i*th CT and

*R*is the total amount of precipitation received in the study region in the considered period of

*n*days. PI values close to one show that the given CT leads to precipitation amounts around the climatological mean. PI values close to zero mean that the corresponding CT does not considerably contribute to precipitation, or the CT is unconducive to rainfall formation while PI values greater than one mean that the given CT is conducive to considerable precipitation.

#### Phase II: local scale statistical model

In the second phase of the SSDM algorithm, two statistical models are developed for each CT. The first model, called the rainfall occurrence model (ROM), determines that a day is wet or dry and the second model, called the rainfall amount model (RAM), estimates the amount of precipitation for wet days. Both models are regression-based. A combination of subjective and automatic methods is used for the selection of the independent variables in both regression models. In order to improve the ability of regression models to reproduce extreme rainfall event characteristics, transformation functions (TF) and WLS are used, respectively, for reduction of daily precipitation skewness and estimation model parameters.

*Variable selection.* For developing a statistical rainfall simulation model, appropriate features of the atmosphere are chosen as the predictors. There are numerous features of the atmosphere that can be used for this purpose. The potential variables are initially screened based on the physical process of rain formation and suggestions available in the literature for the study region. This initial screening is followed by an automatic variable selection procedure that determines the best set of variables to be used in the statistical models. In this study, the stepwise linear regression is used for this purpose.

Stepwise regression develops a multivariate regression model in which the choice of predictive variables is carried out by an automatic procedure. At each iteration, some predictors that improve the explanatory power of the model are added to the subset of predictive variables of the model. At the same time, variables whose elimination does not considerably affect the model's performance are removed from the subset of predictive variables of the model. The null hypothesis states that the statistical relationship between a considered variable and the rainfall is arbitrary. For a variable that is not currently included in the model, if there is sufficient evidence to reject the null hypothesis, the variable would be added to the model. Conversely, for a variable that is included in the model, if there is insufficient evidence to reject the null hypothesis, the variable would be removed from the model. The null hypothesis is checked through an F-statistic and by computing the corresponding *p*-value. The *p*-value shows the statistical significance level of the model and indirectly can be used to control the number of predictive variables in the regression model. Smaller *p*-values result in a higher level of statistical significance and fewer variables in the model. In this study, the *p*-values for ROMs and RAMs are set to 0.001 and 0.05, respectively. The procedure of determination of predictive variables through the stepwise regression is stopped when no more predictors can be justifiably entered or removed from the stepwise model (Ryan 1997).

*Skewness reduction.*In linear regression modeling, there is an assumption that variables follow the normal distribution. Non-normally distributed variables (e.g. highly skewed or kurtotic variables) can distort relationships and significance tests. In such cases, the simple linear regression on the original data could provide misleading results, or may not be the best test available (Draper & Smith 1998). Daily precipitation is a highly skewed variable, especially in arid and semi-arid areas. Therefore, for skewness reduction of daily precipitation, in each submodel (ROM and RAM), a TF is applied on daily rainfall. The considered TF for the ROM submodel is the Box-Cox transformation which was selected after checking different common transformation functions. This TF is applied on daily rainfall values as shown in Equation (2). The TF of RAM is the Box-Cox transformation applied on the square root of daily precipitation as given in Equation (3) (Box & Cox 1964; Li 2005). Different power transformations can be considered in the TF of RAM but in this study the best results were obtained using square root transformation.

- (a)
- (b)
TF of RAM submodel

*Model parameters estimation.*Ordinary least squares (OLS) is a standard method for estimating the unknown parameters in a linear regression model, with the goal of minimizing the differences between the observed and simulated values. The sum squared error or mean squared error (MSE) that are usually used as objective functions in OLS are calculated as shown in Equations (4) and (5) (Ryan 1997): where is the observed value,

*f*is the developed linear regression model, is the predictor variables vector, is the linear model parameters and

*n*is the number of observations.

The observed daily precipitation histograms are commonly strongly skewed. The mean and the median of the precipitation data are different because the data are not normally distributed. As the majority of daily precipitation data are concentrated around the mean value, the effect of extreme rainfall amounts is underestimated in the estimation of MSE. As a result, the variance of the model errors is not independent of its value. Inconstant error variance is often termed ‘heteroscedasticity’ (Carroll & Ruppert 1988; Draper & Smith 1998). But, one of the assumptions of the regression model is that the error variance is the same everywhere. In the heteroscedasticity case, OLS measures are inefficient because they give equal weight to all observations regardless of the fact that those with large residuals contain less information about the regression.

*p*th percentile, is the number of observations in the

*p*th percentile and is the mean daily rainfall amount. Then, for each percentile is estimated as follows: where is the initial weight of the

*p*th percentile. This weight is used for all rainfall amounts between the

*p*th and (

*p*+ 1)th percentiles. For more accuracy and flexibility of the model in reproducing extreme rainfall events values, the initial weights are modified during the model training process as follows:

for high percentiles (above 98%) and low percentiles (below 50%) the initial weights are increased;

for middle percentiles (between 50 and 98%) the initial weights are decreased.

The amount of change in initial weights in each of the above-mentioned categories is determined by trial and error to provide a better model performance.

*Regression models.*The mathematical formulations of ROM and RAM are shown in Equations (9) and (10): where and are the vectors of the selected predictive variables for ROM and RAM submodels, respectively, resulting from stepwise regression procedure, and are the coefficient vectors in linear models of ROM and RAM for the

*k*th CT, and is the rainfall event threshold that distinguishes between wet and dry days. If the output value of the regression model developed for the ROM model for a day is greater than or equal to , that day is considered a wet day and vice versa. The value of this threshold is a parameter in the model that should be determined in a way to better fit the distribution of the observed daily precipitation values. represents the reverse transformation function. For each CT, ROM and RAM submodels ( and ) are developed. The final statistical model () is the product of ROM and RAM as shown in Equation (11): where is the transformation function which downscales the large scale (mesocale) atmospheric variables to local precipitation.

## CASE STUDY

### Data

#### Observed daily precipitation

In this study, the observed daily precipitation dataset provided by the APHRODITE project for the Middle East is utilized (Yatagai *et al.* 2012). Its state-of-the-art gridded daily precipitation datasets, with high-resolution grids over Asia, are based on interpolations of data collected from the rain-gauge observation networks. The data release for the APHRO_V1101_ME for the period of 1970 to 2006 (13,514 days) is used. The network grid of the APHRODITE dataset, around Tehran's urban area, and the considered grid in the north of Tehran are shown in Figure 3.

#### Large-scale atmosphere variables

Large-scale atmosphere variables on a 2.5° × 2.5° grid network over Iran and around Tehran's urban region from 1970 to 2006, obtained from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis archive (Kalnay *et al.* 1996; Kistler *et al.* 2001) were utilized in this study as the observed large-scale atmosphere variables to develop the SSDM. Data grids in the synoptic domain (from 42.5° to 60° longitudes and from 25° to 40° latitudes including 70 grids) and the mesoscale domain (from 50° to 52.5° longitudes and 35° to 37.5° latitudes including four grids around the study area) were utilized, respectively, for synoptic scale pattern classification and rainfall SD model development. This domain is determined based on previous studies in the study area such as Alijani (2002) and Raziei *et al.* (2012, 2013). These spatial domains are shown in Figure 3. NCEP reanalysis data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, are available at http://www.esrl.noaa.gov/psd/.

#### GCM outputs and scenarios

Climate change scenario simulation results of the Community Climate System Model, version 4 (CCSM4) GCM under two different representative concentration pathways (RCPs), namely RCP4.5 and RCP8.5, in a near future period (from 2006 to 2050) were utilized for evaluation of climate change impacts on daily precipitation in the study area as an experimental analysis. CCSM4, developed at the National Center for Atmospheric Research (NCAR), is one of the climate models included in both the Coupled Model Intercomparison Project's fifth phase (CMIP5) and the IPCC Fifth Assessment Report (AR5) (Flato *et al.* 2013). The CCSM4 exchanges state information and fluxes between four main components of atmosphere, land, ocean, and sea ice. The CCSM4 was introduced to the climate science community in April 2010 (Gent *et al.* 2011). Historical experiment from 1970 to 2005 was also utilized for the evaluation of CCSM4 model performance in representation of the observed daily rainfall pattern in the study area. CCSM4 and NCEP reanalysis data are used for the same grids.

### Developing the SSDM

#### Phase I: synoptic scale pattern classification

*Synoptic pressure patterns.* Upper-air pressure patterns provide insights into the atmospheric conditions that are more conducive (or more unconducive) to rainfall formation (Yarnal 1993; Lutgens & Tarbuck 2013). Previous studies have shown that changes in moisture delivery to the majority of Iran are domestically controlled by interrelated, synoptic scale atmospheric pressure patterns (Alijani & Harman 1985; Alijani 2002). Alijani & Harman (1985) concluded that the uplift mechanisms account for more than 70% of rainfall events in Iran based on a subjective analysis of daily surface and 500 hPa atmospheric pressure patterns.

Raziei *et al.* (2013) investigated the relationship between daily large-scale atmospheric CTs and wintertime daily precipitation over Iran during the period 1965–2000. In their previous studies, they identified the dominant daily atmospheric CTs in the Middle East and studied their relationship with the occurrence of meteorological dry/wet spells during winter in western Iran (Raziei *et al.* 2012). They classified daily large-scale weather conditions during the period 1965–2000 into 12 CTs by applying PCA to the 500 hPa geopotential height fields. Results showed that just some limited number of identified CTs affect the precipitation occurrence over the majority of the country while the remaining CTs commonly provide regional or negligible contributions to precipitation. Therefore, in this study, patterns of 500 hPa geopotential height were used to determine the synoptic scale atmospheric conditions. For this purpose, the daily average 500 hPa geopotential heights on the synoptic domain scale over Iran (as mentioned above) from 1970 to 2007 (13,514 cases/days) were considered.

*SOM implementation.*The software used in this study for SOM implementation is the public domain software developed in the MATLAB environment by Helsinki University of Technology's Laboratory of Computer and Information Science, known as the SOM Toolbox (Vesanto

*et al.*2000). Here, the topology of SOMs is developed based on the sheet-shaped structures with a rectangular lattice neighborhood and the Euclidean distance is taken as distance measure in training and clustering steps. Each case (the daily pattern of 500 hPa geopotential height) is a vector of size 70 (number of grids in the study area). To eliminate the effect of data variability in days with the same pressure pattern, the data for each grid point were normalized as shown in Equation (12): where and are the normalized and original values of 500 hPa geopotential height in point

*i*and day

*t*, respectively, and and are, respectively, the mean and variance values of 500 hPa geopotential heights of all grids in day

*t*.

*n*is the number of nodes in the SOM and is the PI of the

*i*th node of the SOM. A higher value of SI means more discrete stratification of daily precipitation. In Table 1, the values of SI for developed SOMs with different sizes are given. According to these results, SOMs with sizes of 4 × 4, 5 × 5, 5 × 6 and 6 × 6 have higher SI values and, therefore, provide a better performance.

SOM (self-organizing maps) size . | 3 × 3 . | 3 × 4 . | 3 × 5 . | 3 × 6 . | 4 × 4 . | 4 × 5 . | 4 × 6 . | 5 × 5 . | 5 × 6 . | 6 × 6 . |
---|---|---|---|---|---|---|---|---|---|---|

Number of nodes | 9 | 12 | 15 | 18 | 16 | 20 | 24 | 25 | 30 | 36 |

SI (stratification index) | 0.493 | 0.476 | 0.467 | 0.471 | 0.527 | 0.521 | 0.513 | 0.555 | 0.546 | 0.583 |

SOM (self-organizing maps) size . | 3 × 3 . | 3 × 4 . | 3 × 5 . | 3 × 6 . | 4 × 4 . | 4 × 5 . | 4 × 6 . | 5 × 5 . | 5 × 6 . | 6 × 6 . |
---|---|---|---|---|---|---|---|---|---|---|

Number of nodes | 9 | 12 | 15 | 18 | 16 | 20 | 24 | 25 | 30 | 36 |

SI (stratification index) | 0.493 | 0.476 | 0.467 | 0.471 | 0.527 | 0.521 | 0.513 | 0.555 | 0.546 | 0.583 |

By increasing the number of SOM nodes, based on SI, the distinction of wet days from dry days is improved, but choosing the SOM with a bigger size means that more statistical models should be developed in the second phase of the SSDM algorithm. Hence in this study the 4 × 4 SOM size is chosen for classification of synoptic scale pressure patterns.

*Detected CTs and their relationship with local daily precipitation*. Figure 5 shows the 16 CTs which were determined in this study by applying the SOM method. Each CT map is drawn by averaging the 500 hPa geopotential height values of all days that belong to that CT. To investigate the contribution of CTs to daily precipitation in the study area, some statistical and subjective evaluations were done.

In Table 2, the frequency of wet days, the proportion of total rainfall amount and frequency of extreme events within 16 CTs are presented and in Table 3 the PI values associated with each CT are given. Based on these results, about 75% of rainy days happen in just 25% of the CTs (four CTs) including CT(1,4), CT(2,4), CT(1,3) and CT(3,4). These CTs are the most conducive situations for rainfall formation in the study area. About 85% of the total rainfall amount falls under these four CTs in the study area. Additionally, extreme rainfall events are just observed in these CTs where about 80% of extremes happen in CT(1,4). The PI values of these CTs are also higher than those of other CTs. These results are in good agreement with Raziei *et al.* (2013) who concluded that a number of identified CTs affect the precipitation occurrence over the majority of Iran.

Rain characteristics . | CTs^{a}
. | 1 . | 2 . | 3 . | 4 . |
---|---|---|---|---|---|

Rainfall occurrence percentages | 1 | 0.6% | 3.0% | 9.6% | 41.5% |

2 | 0.6% | 2.9% | 5.0% | 16.9% | |

3 | 0.4% | 0.8% | 3.4% | 8.1% | |

4 | 0.5% | 0.8% | 2.3% | 3.7% | |

Proportion of total rainfall amount | 1 | 0.4% | 1.9% | 9.2% | 54.2% |

2 | 0.4% | 1.6% | 3.6% | 15.5% | |

3 | 0.2% | 0.4% | 2.5% | 6.0% | |

4 | 0.2% | 0.4% | 1.4% | 2.0% | |

Proportion of number of extreme events | 1 | 0.0% | 0.0% | 10.8% | 78.4% |

2 | 0.0% | 0.0% | 0.0% | 10.8% | |

3 | 0.0% | 0.0% | 0.0% | 0.0% | |

4 | 0.0% | 0.0% | 0.0% | 0.0% |

Rain characteristics . | CTs^{a}
. | 1 . | 2 . | 3 . | 4 . |
---|---|---|---|---|---|

Rainfall occurrence percentages | 1 | 0.6% | 3.0% | 9.6% | 41.5% |

2 | 0.6% | 2.9% | 5.0% | 16.9% | |

3 | 0.4% | 0.8% | 3.4% | 8.1% | |

4 | 0.5% | 0.8% | 2.3% | 3.7% | |

Proportion of total rainfall amount | 1 | 0.4% | 1.9% | 9.2% | 54.2% |

2 | 0.4% | 1.6% | 3.6% | 15.5% | |

3 | 0.2% | 0.4% | 2.5% | 6.0% | |

4 | 0.2% | 0.4% | 1.4% | 2.0% | |

Proportion of number of extreme events | 1 | 0.0% | 0.0% | 10.8% | 78.4% |

2 | 0.0% | 0.0% | 0.0% | 10.8% | |

3 | 0.0% | 0.0% | 0.0% | 0.0% | |

4 | 0.0% | 0.0% | 0.0% | 0.0% |

^{a}The given value in the *i*th row and the *j*th column corresponds to CT (*i*, *j*).

CTs^{a}
. | 1 . | 2 . | 3 . | 4 . |
---|---|---|---|---|

1 | 0.08 | 0.42 | 1.31 | 3.16 |

2 | 0.13 | 0.40 | 0.75 | 1.56 |

3 | 0.07 | 0.19 | 0.60 | 0.81 |

4 | 0.05 | 0.15 | 0.32 | 0.38 |

CTs^{a}
. | 1 . | 2 . | 3 . | 4 . |
---|---|---|---|---|

1 | 0.08 | 0.42 | 1.31 | 3.16 |

2 | 0.13 | 0.40 | 0.75 | 1.56 |

3 | 0.07 | 0.19 | 0.60 | 0.81 |

4 | 0.05 | 0.15 | 0.32 | 0.38 |

^{a}The given value in the *i*th row and the *j*th column corresponds to CT (*i*, *j*).

On the other hand, less than 4% of wet days and 2% of rainfall amount occur in six CTs: the CTs in the first column (from left) and the two last ones in the second column in the CT array (Table 2). Because the number of rainy days and rainfall amount in these CTs are negligible, they can be considered unconducive situations to rainfall formation in the study area. Also, PI values of this group of CTs are smaller than those of other CTs. The rest of the CTs have PI values less than 1 which means that they lead to precipitation amounts less than the climatological mean value. However, precipitation occasionally occurs in the study area under these CTs.

#### Phase II: local scale SD models for daily precipitation

After detecting CTs, days were categorized based on CTs and then statistical models were developed for each category to simulate daily rainfall. For each model, the first step is determining the predictors. The set of predictors considered for both statistical models of ROM and RAM development are listed in Table 4. These variables are used at a spatial mesoscale domain (four neighbor grids around the study area as shown in Figure 3) and at the three levels of atmosphere including near surface, mid-level (700 hPa geopotential height) and high level (500 hPa geopotential height). Three kinds of variables are considered in this set to represent different conditions of the rain occurrence and are used in ROMs development. Dew point is the indicator of the moisture content of the air. Vorticity is the indicator of the tendency of upward motion of the atmosphere, and dew point depression is the indicator of the level of saturation. In RAMs development, the air temperature, geopotential height and relative humidity as well as the sea-level pressure are considered in addition to the variables included in the ROM.

Submodel . | Variable name . | Unit . | Variable abbreviation . | Considered levels . |
---|---|---|---|---|

ROM | Vorticity | s^{−1} | Vor | Surface, 700 hp, 500 hp |

Dew point temperature | °C | Dew | Surface, 700 hp, 500 hp | |

Dew point depression | °C | Dep | Surface, 700 hp, 500 hp | |

RAM | Vorticity | s^{−1} | Vor | Surface, 700 hp, 500 hp |

Dew point temperature | °C | Dew | Surface, 700 hp, 500 hp | |

Dew point depression | °C | Dep | Surface, 700 hp, 500 hp | |

Air temperature | °C | Ta | Surface, 700 hp, 500 hp | |

Geopotential height | m | hgt | 700 hp, 500 hp | |

Relative humidity | % | Rhu | Surface, 700 hp, 500 hp | |

Sea level pressure | N/m^{2} | Slp | Surface |

Submodel . | Variable name . | Unit . | Variable abbreviation . | Considered levels . |
---|---|---|---|---|

ROM | Vorticity | s^{−1} | Vor | Surface, 700 hp, 500 hp |

Dew point temperature | °C | Dew | Surface, 700 hp, 500 hp | |

Dew point depression | °C | Dep | Surface, 700 hp, 500 hp | |

RAM | Vorticity | s^{−1} | Vor | Surface, 700 hp, 500 hp |

Dew point temperature | °C | Dew | Surface, 700 hp, 500 hp | |

Dew point depression | °C | Dep | Surface, 700 hp, 500 hp | |

Air temperature | °C | Ta | Surface, 700 hp, 500 hp | |

Geopotential height | m | hgt | 700 hp, 500 hp | |

Relative humidity | % | Rhu | Surface, 700 hp, 500 hp | |

Sea level pressure | N/m^{2} | Slp | Surface |

Using the stepwise regression procedure, the predictor variables in ROMs and RAMs were chosen. The TFs (Box-Cox and square root) were also applied on the observed rainfall data for skewness reduction purposes; 80% of daily data were randomly selected for calibration of the models and the remaining data were used for model validation. Both ROMs and RAMs are linear models and WLS was used to determine RAMs coefficients as described above. The selected variables in each model are given in Table 5. Even though some unconducive CTs include a limited number of wet days, developing SD models for them with appropriate performance is impossible because of limited data. Therefore, all days corresponding to these CTs are considered dry days and are not included in Table 5.

CT . | Predictors^{a}. | |
---|---|---|

ROM . | RAM . | |

(1,2) | 700 hp-Dep, 500 hp-Dep | 700 hp-Rhu |

(1,3) | 700 hp-Dep, 500 hp-Dep | 700 hp-Rhu, 500 hp-Rhu, 700 hp-Vor, 500 hp-Vor |

(1,4) | 700 hp-Dep, 500 hp-Dep, 700 hp-Vor, 500 hp-Vor | 700 hp-Dew, 500 hp-Ta, 500 hp-Rhu, 700 hp-hgt, Surface-Slp, Surface -Ta |

(2,2) | 700 hp-Dep, 500 hp-Dep | 500 hp-Rhu, 700 hp-Vor, 700 hp-hgt |

(2,3) | 500 hp-Dep | 700 hp-Rhu |

(2,4) | 500 hp-Dew, 700 hp-Dep, 500 hp-Dep | 700 hp-Dew, 700 hp-Rhu, 500 hp-Rhu, Surface-Ta |

(3,2) | 500 hp-Dep | 700 hp-Vor |

(3,3) | 500 hp-Dep | 500 hp-Rhu |

(3,4) | 700 hp-Dep, 500 hp-Dep | 700 hp-Rhu |

(4,3) | 500 hp-Dep | 500 hp-Ta |

(4,4) | 700 hp-Dep, 500 hp-Dep | 500 hp-Dep, 500 hp-Rhu, 700 hp-Rhu |

CT . | Predictors^{a}. | |
---|---|---|

ROM . | RAM . | |

(1,2) | 700 hp-Dep, 500 hp-Dep | 700 hp-Rhu |

(1,3) | 700 hp-Dep, 500 hp-Dep | 700 hp-Rhu, 500 hp-Rhu, 700 hp-Vor, 500 hp-Vor |

(1,4) | 700 hp-Dep, 500 hp-Dep, 700 hp-Vor, 500 hp-Vor | 700 hp-Dew, 500 hp-Ta, 500 hp-Rhu, 700 hp-hgt, Surface-Slp, Surface -Ta |

(2,2) | 700 hp-Dep, 500 hp-Dep | 500 hp-Rhu, 700 hp-Vor, 700 hp-hgt |

(2,3) | 500 hp-Dep | 700 hp-Rhu |

(2,4) | 500 hp-Dew, 700 hp-Dep, 500 hp-Dep | 700 hp-Dew, 700 hp-Rhu, 500 hp-Rhu, Surface-Ta |

(3,2) | 500 hp-Dep | 700 hp-Vor |

(3,3) | 500 hp-Dep | 500 hp-Rhu |

(3,4) | 700 hp-Dep, 500 hp-Dep | 700 hp-Rhu |

(4,3) | 500 hp-Dep | 500 hp-Ta |

(4,4) | 700 hp-Dep, 500 hp-Dep | 500 hp-Dep, 500 hp-Rhu, 700 hp-Rhu |

^{a}The predictors' descriptions are given in Table 4.

### Daily rainfall simulation results

The comparison between some statistical characteristics of the downscaled and observed daily rainfall values for calibration and validation datasets as well as the total dataset is presented in Table 6. The differences between mean and standard deviation of the observed and simulated rainfall amounts are less than 5% and 13%, respectively, in both calibration and validation datasets. Although the difference between skewness of the observed and simulated rainfall amounts in the calibration dataset is small (3.3%), in the validation dataset it is considerable (about 40%). Based on this table, the differences between modeled and simulated percentages of wet days, mean daily rainfall and standard deviation of wet days' rainfall amount, considering both the calibration and validation periods, are less than 10%, which shows a good agreement between the observed and modeled rainfall amounts. However, it should be mentioned that in simulation of the maximum rainfall value and skewness of wet days' rainfall amount in the validation period, the model is not very successful.

Statistical characteristics . | Calibration . | Validation . | Total . | |||
---|---|---|---|---|---|---|

Observed . | Modeled . | Observed . | Modeled . | Observed . | Modeled . | |

Percentage of wet days (%) | 23.6 | 21.5 | 24.6 | 21.3 | 23.8 | 21.5 |

Average wet days rainfall amount (mm) | 4.3 | 4.5 | 4.7 | 4.6 | 4.4 | 4.5 |

Standard deviation of wet days rainfall amount (mm) | 4.1 | 3.7 | 4.6 | 4 | 4.2 | 4 |

Skewness of wet days rainfall amount | 3 | 3.1 | 2.6 | 3.7 | 2.9 | 3.2 |

Maximum daily rainfall amount (mm) | 47.1 | 46.3 | 32 | 45.8 | 47.1 | 46.3 |

Wet days coefficient of correlation (R) | 0.44 | 0.44 | 0.44 | |||

Root mean squared error (RMSE, mm) | 2.45 | 2.73 | 2.51 |

Statistical characteristics . | Calibration . | Validation . | Total . | |||
---|---|---|---|---|---|---|

Observed . | Modeled . | Observed . | Modeled . | Observed . | Modeled . | |

Percentage of wet days (%) | 23.6 | 21.5 | 24.6 | 21.3 | 23.8 | 21.5 |

Average wet days rainfall amount (mm) | 4.3 | 4.5 | 4.7 | 4.6 | 4.4 | 4.5 |

Standard deviation of wet days rainfall amount (mm) | 4.1 | 3.7 | 4.6 | 4 | 4.2 | 4 |

Skewness of wet days rainfall amount | 3 | 3.1 | 2.6 | 3.7 | 2.9 | 3.2 |

Maximum daily rainfall amount (mm) | 47.1 | 46.3 | 32 | 45.8 | 47.1 | 46.3 |

Wet days coefficient of correlation (R) | 0.44 | 0.44 | 0.44 | |||

Root mean squared error (RMSE, mm) | 2.45 | 2.73 | 2.51 |

*et al.*2012) are used, respectively, to test the equality of the mean and variance of observed and simulated monthly rainfall values as well as the number of wet days. Based on the Wilcoxon signed-rank test, mean monthly rainfall values of the simulated and observed series are equal at 95% significance level. However, the maximum differences between simulated and observed mean monthly rainfall values are observed in the months of March and January. The equality of the simulated and observed monthly mean of wet days is only rejected in July. Based on Levene's test, the equality of the variance of the simulated monthly rainfall with the observed values is approved at 95% significance level except in the months of May, July and August. The same result is obtained for monthly variance of the number of wet days. This can be due to the limited number of wet days in these months.

## CLIMATE CHANGE IMPACTS EVALUATION IN THE STUDY AREA

The developed SSDM for downscaling daily rainfall was used to evaluate the probable climate change impacts on the rainfall characteristics in the near future horizon (2006–2050) in the study area. For this purpose, the CCSM4 GCM model outputs for the historical, RCP4.5 and RCP8.5 experiments were used.

The acceptability of the chosen GCM performance in the simulation of the synoptic and mesoscale climate characteristics of the study area was evaluated through comparing the simulations of CCSM4 in the historical period of 1970 to 2005 with NCEP records. This evaluation was done in two ways:

The differences between rainfall occurrence frequencies of CTs in CCSM4 results and NCEP records, especially for most rainy CTs.

Comparison of the Q-Q plots of the predictors developed based on the CCSM4 outputs and NCEP records in the study area.

Source of considered data . | CTs^{a}
. | 1 . | 2 . | 3 . | 4 . |
---|---|---|---|---|---|

NCEP (1970–2005) | 1 | 6.7% | 5.1% | 7.1% | 16.4% |

2 | 3.9% | 4.5% | 4.9% | 9.9% | |

3 | 4.0% | 2.7% | 4.5% | 8.1% | |

4 | 7.6% | 3.4% | 4.9% | 6.4% | |

CCSM4-historical (1970–2005) | 1 | 5.7% | 4.8% | 6.3% | 16.6% |

2 | 3.1% | 3.6% | 4.5% | 9.6% | |

3 | 3.2% | 2.8% | 4.1% | 8.1% | |

4 | 8.5% | 4.9% | 5.5% | 8.7% | |

CCSM4-RCP4.5 (2006–2050) | 1 | 5.0% | 5.1% | 7.0% | 16.0% |

2 | 3.4% | 4.5% | 4.7% | 8.6% | |

3 | 3.3% | 3.7% | 4.0% | 7.4% | |

4 | 8.7% | 5.1% | 6.0% | 7.6% | |

CCSM4-RCP8.5 (2006–2050) | 5 | 5.9% | 5.0% | 7.1% | 15.9% |

6 | 3.6% | 4.5% | 5.0% | 8.5% | |

7 | 3.4% | 3.6% | 4.0% | 7.3% | |

8 | 7.9% | 5.1% | 5.1% | 8.1% |

Source of considered data . | CTs^{a}
. | 1 . | 2 . | 3 . | 4 . |
---|---|---|---|---|---|

NCEP (1970–2005) | 1 | 6.7% | 5.1% | 7.1% | 16.4% |

2 | 3.9% | 4.5% | 4.9% | 9.9% | |

3 | 4.0% | 2.7% | 4.5% | 8.1% | |

4 | 7.6% | 3.4% | 4.9% | 6.4% | |

CCSM4-historical (1970–2005) | 1 | 5.7% | 4.8% | 6.3% | 16.6% |

2 | 3.1% | 3.6% | 4.5% | 9.6% | |

3 | 3.2% | 2.8% | 4.1% | 8.1% | |

4 | 8.5% | 4.9% | 5.5% | 8.7% | |

CCSM4-RCP4.5 (2006–2050) | 1 | 5.0% | 5.1% | 7.0% | 16.0% |

2 | 3.4% | 4.5% | 4.7% | 8.6% | |

3 | 3.3% | 3.7% | 4.0% | 7.4% | |

4 | 8.7% | 5.1% | 6.0% | 7.6% | |

CCSM4-RCP8.5 (2006–2050) | 5 | 5.9% | 5.0% | 7.1% | 15.9% |

6 | 3.6% | 4.5% | 5.0% | 8.5% | |

7 | 3.4% | 3.6% | 4.0% | 7.3% | |

8 | 7.9% | 5.1% | 5.1% | 8.1% |

^{a}The given value in the *i*th row and the *j*th column corresponds to CT (*i*, *j*).

Statistical characteristics . | Historical (1970–2005) . | RCP4.5 (2006–2050) . | RCP8.5 (2006–2050) . |
---|---|---|---|

Percentage of wet days (%) | 23.9 | 21.7 | 22.7 |

Average wet days rainfall values (mm) | 4.2 | 4.5 | 4.4 |

Standard deviation of wet days rainfall values (mm) | 3.9 | 4.6 | 4.4 |

Skewness of wet days rainfall values | 3.6 | 4 | 3.9 |

Maximum daily rainfall value (mm) | 53.3 | 56.6 | 55.1 |

Average number of extreme events in a year | 1.0 | 1.4 | 1.3 |

Statistical characteristics . | Historical (1970–2005) . | RCP4.5 (2006–2050) . | RCP8.5 (2006–2050) . |
---|---|---|---|

Percentage of wet days (%) | 23.9 | 21.7 | 22.7 |

Average wet days rainfall values (mm) | 4.2 | 4.5 | 4.4 |

Standard deviation of wet days rainfall values (mm) | 3.9 | 4.6 | 4.4 |

Skewness of wet days rainfall values | 3.6 | 4 | 3.9 |

Maximum daily rainfall value (mm) | 53.3 | 56.6 | 55.1 |

Average number of extreme events in a year | 1.0 | 1.4 | 1.3 |

The characteristics of wet days under climate change impacts are compared in Table 8. Based on this table, it seems that the percentages of wet days are decreasing under climate change impacts, but the average rainfall values are increasing. Furthermore, the increase in the variance of daily rainfall shows the increasing possibility of extreme rainfall values. The number of extreme rainfall events in a year is also increasing under climate change impacts.

## SUMMARY AND CONCLUSION

In this study, SSDM is proposed for daily rainfall downscaling that can preserve the statistical characteristics of the observed local daily rainfall, especially extreme rainfall events that are used in different hydrological investigations such as in flood management. SSDM is an SD model conditioned on synoptic scale CTs. The model includes two parts. The first part is clustering synoptic weather patterns into CTs and then investigating the relationships between identified CTs and local daily rainfall events. The SOM is trained to cluster synoptic patterns into CTs and then it is used to classify each day's synoptic pattern to determined CTs. The second part of SSDM includes developing occurrence and amount regression SD models in each CT to simulate local daily rainfall. The developed model is based on two main assumptions. The first one is that the regression equations developed between predictors and rainfall will be valid in the future. The second assumption is that the considered CTs would cover all future events. Further investigations would be necessary for the evaluation of the validity of these assumptions under climate change impacts.

The model is applied to the northern part of Tehran as the study area. In the first step, 16 CTs were determined using a SOM model developed based on 500 hPa geopotential height. The investigation of the relationship between determined CTs and local daily rainfall demonstrates that detected CTs can explain the main atmospheric features that result in dry and wet conditions in the study area. In the second step, two models for rainfall occurrence and amount simulation are developed. The model parameters are estimated using WLS and skewness reduction techniques to better cover the extreme events in the model development. Applying these methods for estimating regression model coefficients has improved the ability of developed models in reproducing local rainfall amounts. Despite this improvement, the model performance in simulation of some ranges of rainfall amount is not satisfactory. This could be because of local phenomena that affect the rainfall variability and are not considered in the developed models. Further investigations on comparison of the developed model in this study with other models and the methods that could overcome this model's weaknesses can be considered in future studies.

Then experiment outputs of CCSM4 GCM model under AR5 climate change scenarios (RCP4.5 and RCP8.5) were used to investigate climate change impacts in the study area. The results show that under climate change impacts the number of extreme rainfall events is increasing. It must be emphasized that since in this study only one GCM and one run of climate change scenarios were used, the obtained results are experimental. More investigations considering different climate change scenarios and different GCMs are needed for a more reliable judgment about climate change impacts on rainfall regimes in the study area.

## ACKNOWLEDGEMENTS

We acknowledge the World Climate Research Program's Working Group on Coupled Modeling, which is responsible for CMIP, and we thank the National Center for Atmospheric Research (NCAR) for producing their model outputs (CCSM4) and making them available for us. For CMIP, the US Department of Energy's Program for Climate Model Diagnosis and Intercomparison provided coordinating support and led the development of software infrastructure in partnership with the Global Organization for Earth System Science Portals.