Abstract
The present study proposed a time-space framework using discrete wavelet transform-based multiscale entropy (DWE) approach to analyze and spatially categorize the precipitation variation in Iran. To this end, historical monthly precipitation time series during 1960–2010 from 31 rain gauges were used in this study. First, wavelet-based de-noising approach was applied to diminish the effect of noise in precipitation time series which may affect the entropy values. Next, Daubechies (db) mother wavelets (db5–db10) were used to decompose the precipitation time series. Subsequently, entropy concept was applied to the sub-series to measure the uncertainty and disorderliness at multiple scales. According to the pattern of entropy across scales, each cluster was assigned an entropy signature that provided an estimation of the entropy pattern of precipitation in each cluster. Spatial categorization of rain gauges was performed using DWE values as input data to k-means and self-organizing map (SOM) clustering techniques. According to evaluation criteria, it was proved that k-means with clustering number equal to 5 with Silhouette coefficient=0.33, Davis–Bouldin=1.18 and Dunn index=1.52 performed better in determining homogenous areas. Finally, investigating spatial structure of precipitation variation revealed that the DWE had a decreasing and increasing relationship with longitude and latitude, respectively, in Iran.
INTRODUCTION
Assessment of precipitation variation over a large area (e.g., Iran) could provide valuable information for water resources management and engineering issues, particularly in a changing climate. The impact of global warming on different water cycle components is strongly variable across the globe and causes increases in average global precipitation, evaporation, and runoff (Clark et al. 1999; Pechlivanidis et al. 2017; Salvia et al. 2017; Sattari et al. 2017; Wei et al. 2017; Ba et al. 2018). Alteration of the hydrologic cycle will have significant impacts on the rate, timing, and distribution of rain, evaporation, temperature, snowfall, and runoff, the main causes of change in the accessibility of water resources (Mishra et al. 2009). The example of precipitation variation in Iran (during 1966–2005) could be referred to the rate of the significant decreasing trends in annual precipitation that varied from (−)1.999 mm/year in the northwest to (+)4.261 mm/year in the west of Iran. The significant negative trends mainly occurred in the northwest of Iran. These negative trends can affect agriculture and water supply of the regions. On the contrary, no significant trends were detected in the eastern, southern, and central parts of the country (Tabari & Hosseinzadeh Talaee 2011; Raziei 2017). By considering the high spatial and temporal variability of precipitation and frequent dry periods, the increasing water demands for growing population as well as for industry and economic development, including irrigation, aggravating water scarcity makes it difficult for a rationale water management. Hence, determination of sub-regions according to different precipitation regimes is important for water resources management and land use planning.
In recent decades, some studies have focused on studying precipitation across Iran (Domroes et al. 1998; Dinpashoh et al. 2004; Modarres 2006; Soltani et al. 2007; Raziei et al. 2008; Modarres & Sarhadi 2009; Tabari & Hosseinzadeh Talaee 2011). Domroes et al. (1998) applied principal component analysis (PCA) and cluster analysis (CA) on mean monthly precipitation of 71 stations and classified the precipitation regimes into five different sub-regions. On the other hand, applying the PCA and CA to 12 variables selected from 57 candidate variables for 77 stations distributed across the entire country, Dinpashoh et al. (2004) divided the country into seven climate sub-regions. Rainfall climates in Iran were also analyzed by Soltani et al. (2007) using monthly precipitation time series from 28 main sites. To determine regional climates, a hierarchical CA was applied to the autocorrelation coefficients at different lags, and three main climatic groups were found. Tabari & Hosseinzadeh Talaee (2011) analyzed trend over different sub-regions of Iran during 1966–2005. Raziei et al. (2008) analyzed the spatial distribution of the seasonal and annual precipitation in western Iran using data from 140 stations covering the period 1965–2000. Applying the precipitation concentration index (PCI), the intra-annual precipitation variability was also studied. The results suggest that five homogenous sub-regions can be identified based on different precipitation regimes. Modarres & Sarhadi (2009) performed spatial and temporal trend analysis of the annual and 24-hr maximum rainfall of a set of 145 precipitation gauging stations of Iran during the period of 1955–2000. The study showed that the annual rainfall is decreasing at 67% of the stations while the 24-hr maximum rainfall is increasing at 50% of the stations.
Wavelet analysis (WA), which has been widely applied in hydrology and hydrogeology, is capable of elucidating the localized characteristics of non-stationary time series both in temporal and frequency domains (Nourani et al. 2009, 2015; Kisi & Shiri 2012; Danandeh Mehr et al. 2015; Karimi et al. 2016; Danesh-Yazdi et al. 2017), and it is just suitable for hydrologic time series analyses. The wavelet entropy, combined by WA and information theory, is an important concept of describing the variability and complexity of hydrologic time series with non-stationary and multi-temporal characteristics (Zunino et al. 2007). It is used to first analyze a time series by WA, such as continuous wavelet transform (CWT) and multi-resolution analysis, and then calculate the entropy measures, mainly including Shannon entropy (Jaynes 1957), mutual information (Molini et al. 2006), and relative entropy (Abramov et al. 2005). Various studies have manifested the better performance of wavelet entropy in analyzing the variability and complexity of hydrologic variables compared with traditional methods (Simpson's index, McIntosh index, Berger–Parker index, Brillouin index, etc.) (Mishra et al. 2009; Brunsell 2010).
The proposed technique combines discrete wavelet transform (DWT) based multiscale entropy approach with k-means and self-organizing map (SOM) clustering techniques. The discrete wavelet multiscale entropy (DWE) which is a measure of the degree of order/disorder of the signal and carries information associated with multi-frequency signal, can provide useful information about the underlying dynamic processes associated with the signal and can help in precipitation-based studies (Cazelles et al. 2008). Therefore, this study tried to develop a precipitation-based regionalization-based DWE approach. In this study, the DWE method was applied to monthly precipitation data observed at 31 rain gauges in Iran. Higher entropy reflects more random and complicated systems and vice versa. Traditional entropy measures usually provide inaccurate or incomplete descriptions of climatic systems which generally operate over multi-resolution scales (Li & Zhang 2008). DWT was used to decompose each of the observed precipitation time series using the Daubechies (db) wavelet to capture the multiscale variability of the precipitation based on wavelet coefficients. Next, these wavelet coefficients for each scale are used to obtain the entropy for the respective scales (Sang 2012; Agarwal et al. 2016). The spatial organization of this multiscale variability in terms of DWE is identified using clustering methods.
MATERIAL AND METHODS
Case study and climatological dataset
This study used monthly climate data of 31 precipitation gauges all over Iran for studying precipitation regionalization (1960–2010) (Figure 1 and Table 1). Due to the variety of information involved in hydrologic processes and need to have accurate models, monthly precipitation time series was used which include various multivariate properties such as seasonality of process. Iran is a large country (approximately 1,600,000 km2), in which climate is mostly affected by the wide latitudinal extent. Iran is located in Southwest Asia (25° to 40°N and 44° to 63°E). There are three seas in Iran, in the north the Caspian Sea and in the south the Persian Gulf and Oman Sea (Araghi et al. 2014). The moisture coming from the Persian Gulf is usually trapped by the Zagros Mountains. The plateau is open to the cold (dry) continental currents flowing from the northeast and the mitigating influence of the Caspian Sea is limited to the northern regions of the Alborz Mountains. The Zagros chain, which stretches from northwest to southeast, is the source of several large rivers such as the Karkheh, Dez, and Karoon. Lowland areas receive surface water from these basins and are of great importance for agricultural applications (Raziei et al. 2008). Iran's climate is generally recognized as arid or semi-arid with an annual average precipitation of about 250 mm; however, its climate is very diverse, with annual precipitation and temperature variation over the country (Figure 1(a)). For instance, in different areas of the country annual precipitation changes from 0 to 2,000 mm (Domroes et al. 1998; Dinpashoh et al. 2004). The Caspian Sea coastal areas along with the northern and northwestern regions of the country are subjected to higher precipitation. On the other hand, the lowest values of annual precipitation are found in the southern, eastern, and the central desert regions (Ashraf et al. 2013). Generally, Iran is categorized as hyper-arid (35.5%), arid (29.2%), semi-arid (20.1%), Mediterranean (5%), and wet climate (10%). Also, temperature in Iran varies widely (−20 to +50 °C) (Saboohi et al. 2012). On the northern edge of the country (the Caspian coastal plain) temperatures rarely fall below freezing and the area remains humid for all of the year. Summer temperatures rarely exceed 29 °C (Nagarajan 2010; Weather & Climate Information 2015). To the west, settlements in the Zagros basin experience lower temperatures, severe winters with below zero average daily temperatures and heavy snowfall. The eastern and central basins are arid and have occasional deserts. Average summer temperatures rarely exceed 38 °C (Nagarajan 2010). The coastal plains of the Persian Gulf and Gulf of Oman in southern Iran have mild winters, and very humid and hot summers (Figure 1(a)). The dataset applied in this study was provided by the Iran Meteorological Organization (http://www.irimo.ir).
Selection of optimum number of clusters based on Dunn, Davies–Bouldin, and Silhouette indices
Clustering technique . | Validity indices . | Cluster number . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . | ||
SOM | Silhouette index | 0.19 | 0.28 | 0.27 | 0.28 | 0.34 | 0.30 | 0.21 | 0.18 | 0.09 |
Davies–Bouldin index | 1.59 | 1.10 | 1.45 | 1.40 | 1.20 | 1.48 | 1.7 | 1.62 | 1.35 | |
Dunn index | 0.96 | 1.12 | 1.07 | 1.36 | 1.49 | 1.24 | 1.14 | 0.99 | 0.97 | |
K-means | Silhouette index | 0.22 | 0.23 | 0.26 | 0.33 | 0.21 | 0.25 | 0.25 | 0.22 | 0.22 |
Davies–Bouldin index | 1.34 | 1.33 | 1.33 | 1.18 | 1.18 | 1.51 | 1.14 | 1.42 | 1.34 | |
Dunn index | 1.37 | 1.19 | 1.32 | 1.52 | 1.2 | 0.83 | 0.90 | 0.78 | 1.07 |
Clustering technique . | Validity indices . | Cluster number . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . | ||
SOM | Silhouette index | 0.19 | 0.28 | 0.27 | 0.28 | 0.34 | 0.30 | 0.21 | 0.18 | 0.09 |
Davies–Bouldin index | 1.59 | 1.10 | 1.45 | 1.40 | 1.20 | 1.48 | 1.7 | 1.62 | 1.35 | |
Dunn index | 0.96 | 1.12 | 1.07 | 1.36 | 1.49 | 1.24 | 1.14 | 0.99 | 0.97 | |
K-means | Silhouette index | 0.22 | 0.23 | 0.26 | 0.33 | 0.21 | 0.25 | 0.25 | 0.22 | 0.22 |
Davies–Bouldin index | 1.34 | 1.33 | 1.33 | 1.18 | 1.18 | 1.51 | 1.14 | 1.42 | 1.34 | |
Dunn index | 1.37 | 1.19 | 1.32 | 1.52 | 1.2 | 0.83 | 0.90 | 0.78 | 1.07 |
(a) Climate map of Iran and (b) geographic location of rain gauges used in this study.
(a) Climate map of Iran and (b) geographic location of rain gauges used in this study.
Discrete wavelet transform (DWT)
Signal de-noising with wavelets
De-noising a signal using WT is based on the observation that in many signals (e.g., rainfall signals) energy is mostly concentrated in a small number of wavelet dimensions. The coefficients of these dimensions are relatively large compared to the other dimensions or to noise, which has its energy spread over a large number of coefficients. Hence, by setting to zero, the coefficients smaller than a certain threshold, noise can nearly be optimally eliminated while preserving the important information of the original signal (Donoho 1995). Because amplitude de-noising is performed instead of frequency de-noising, the low frequency noise can also be suppressed (Nourani & Partoviyan 2017).
In Equation (2), λ and D(i) (j= 1, 2, … , M) indicate threshold value and absolute value of detailed sub-series at ith resolution level, respectively. The algorithm to de-noise a signal f(k) corrupted by a noise signal n(k) can be summarized by the following three steps:
Apply DWT to a noisy signal to obtain approximations A(i) and details D(i).
Apply a thresholding technique to detail coefficients D(i) to obtain the thresholded coefficients Dth(i).
Transform the signal back based on A(i) and Dth(i) to obtain the de-noised signal (reconstruction).
Time series decomposition via the discrete wavelet transform
The conventional discrete WA of time series was performed on each rain gauge using the multilevel 1-D wavelet decomposition function in MATLAB (MATLAB Wavelet Toolbox). This produces the WT of the time series of the interest at all dyadic scales. The monthly precipitation input time series are all one-dimensional. Decomposing the time series using specified filters (wavelet and scaling functions) produces two types of coefficients: the approximation or residual, and detail vectors (Chou 2007). These coefficients resulted from the convolution of the original time series with a low-pass filter and a high-pass filter. The low-pass filter is the scaling function and the high-pass filter is the wavelet function. The convolutions of time series with the low-pass filter produced the approximation coefficients, which represent the large-scale or low-frequency components of the original time series. Convolutions with the high-pass filter produced the detail coefficients, which represent the low-scale or high-frequency components (Bruce et al. 2002). The process of time series decomposition was repeated multiple times, decomposing the original time series into several different lower-resolution components (Partal 2010). The detail and approximation coefficients produced from the time series decomposition were then reconstructed since they are merely intermediate coefficients. These have to be re-adjusted to the entire one-dimensional signal in order to enable the investigation of their contribution to the original time series (Dong et al. 2008). This contribution may be reflected in the different time scales such as intra-annual, inter-annual, decadal, and multi-decadal.
Selection of an appropriate wavelet function poses significant challenges and is governed largely by the problem at hand and some of the distinctive properties of the wavelet function such as (i) its region of support and (ii) the number of vanishing moments (Maheswaran & Khosa 2012).
The region of support implies the length span of the given wavelet, which in turn affects its feature localization capabilities as it is understandable that a long and widely distributed wavelet function will calculate the instantaneous process amplitude while, at the same time, spanning a wider window of the underlying process resulting in a high degree of averaging of the process states. Vanishing moment, on the other hand, limits the wavelet's ability to suitably represent polynomial behavior or information in a time series. For example, the db2 wavelet encodes polynomials with two coefficients, i.e., a process having one constant and one linear time series component, and the db3 wavelet encodes a process having a constant, linear, and quadratic time series components. Within each family of wavelets are wavelet subclasses distinguished by their respective number of coefficients and the number of vanishing moments, as discussed below (Maheswaran & Khosa 2012).
Discrete wavelet-based multiscale entropy approach (DWE)
This study proposed an approach based on hybrid DWT, entropy and k-means models to investigate the variation and regionalize the precipitation in Iran. Figure 2 shows the schematic of modeling in this study. The monthly time series of rain gauges used in this study were firstly pre-processed using DWT. For this end, Daubechies mother wavelet (db) and proper related parameters were selected for each precipitation time series.
Schematic of proposed DWE-based analysis and regionalization of rain gauges in this study.
Schematic of proposed DWE-based analysis and regionalization of rain gauges in this study.
The entropy of a random variable is a measure of the uncertainty of the random variable; it is a measure of the amount of information required on average to describe the random variable (Termini & Moramarco 2016; Werstuck & Coulibaly 2016). Next, entropy values of decomposed time series (detail (D1, D2 …Di) and approximation (Ai) components) were fed into k-means approaches in order to perform spatial clustering. Different statistical evaluation criteria were used to verify the validity of clustering, which is explained below.
K-means clustering
Self-organizing maps (SOM)
The self-organizing map is a powerful method used to explore and extract the inter-relationships of high-dimensional multivariate systems, and it is beneficial for clustering and forecasting in a widespread range of disciplines (Kohonen 1997). One of the main advantages of the SOM is its ability to extract implicit patterns from high-dimensional input dataset and classify the obtained patterns into a low-dimensional output layer, where similar inputs remain close together in the output neurons while preserving data structure (Hsu & Li 2010; Nourani et al. 2015). The neurons in the output layer are commonly arranged in two-dimensional grids so that the constructed topology can be visualized to give an insight into the system under investigation. The SOM has gained increasing interest and been successfully applied to hydrology and water resources management (Kalteh et al. 2008; Hsu & Li 2010; Nourani et al. 2015; Chang et al. 2016; Iwashita et al. 2018).
Evaluation criteria
In the present study, three validation metrics, namely, Davies–Bouldin index (DBi), Dunn index, and Silhouette coefficient (SC) were utilized to validate the outcome of spatial clustering via k-means technique.
In hydrology, DBi is a widely applied internal evaluation criterion (Davies & Bouldin 1979; Kasturi et al. 2003), which is applied to distinguish the number of optimal clusters that are well-detached and well-set based on content and specification of dataset. A lower DBi value represents better clustering results. On the other hand, DBi has a disadvantage in that best information detection cannot be implied by a good reported DBi value.
The Dunn index's goal is to distinguish a category of clusters that are well-set, with a small variance among components of the cluster, and well detached, where the averages of the various clusters are adequately far apart when compared to the within cluster variance (Dunn 1973). A higher Dunn index's value shows better clustering outcome as it shows a well-compacted cluster (Agarwal et al. 2016). Computational cost increases when the number of clusters and dimensionality increases, which is a disadvantage for the Dunn index.
The SC index's goal is to show how analogous a member is to the related cluster (cohesion) in comparison to the other clusters (separation). The SC values vary from −1 to 1, where a high SC indicates that the member is well-adapted to the related cluster and insignificantly adapted to neighboring clusters.
If most members have a high SC value, then the formation of the clustering is suitable. On the other hand, if many members have a low or negative SC value, accordingly the clustering formation may have too many or too few clusters. Generally, studies have offered the applicability of SC (Hsu & Li 2010; Nourani et al. 2015). Nevertheless, the present study evaluated the outcome of spatial clustering based on all three indices to take advantage of them.
RESULTS AND DISCUSSION
Precipitation time series pre-processing via DWT
The precipitation time series might include a degree of noise-contamination which could influence the calculation of wavelet-based entropy values. Hence, the noise in the time series was removed by WT de-noising approach; afterwards, the DWT was applied to the de-noised time series using the chosen db mother wavelet to decompose precipitation time series into approximation and detail components.
After selecting the proper mother wavelet, boundary extensions, and decomposition level for each precipitation time series, an adequate threshold value should be chosen for the de-noising procedure. The range of threshold values within the local vicinity of the universal threshold value was acquired by Donoho's formula (Equation (3)) to determine ‘appropriate threshold value’ for precipitation decomposition via db mother wavelet. As an illustration, de-noised time series of rain gauge 4 (RG 4) is shown in Figure 3(a). Existence of noise in a time series can result in corruption and uncertainty by adding complexity to hydrologic time series, and the aforementioned issue becomes worse when the signal to noise ratio (SNR) value decreases. Therefore, the existence of noise in precipitation time series can significantly affect the results of the proposed model in both temporal pre-processing and spatial clustering stages. Besides, because the energy of noise mainly concentrates in small temporal scales, it has a more severe influence on the entropy values under small temporal scales than those under large temporal scales (Sang et al. 2011). Figure 3(b) and 3(c) show the power spectrum of original and de-noised time series, respectively. It is clearly observed that the power of de-noised time series in higher frequencies (lower temporal scales) has remarkably decreased in comparison to the original time series. Although WA analysis as a time series pre-processing method can also handle some degrees of noise included in the time series, as demonstrated in Figure 3, WD approach handle the noise in time series better, especially in higher frequencies.
(a) De-noised precipitation time series of rain gauge 4 (RG 4) via DWT, (b) power spectrum of original time series, and (c) power spectrum of de-noised time series.
(a) De-noised precipitation time series of rain gauge 4 (RG 4) via DWT, (b) power spectrum of original time series, and (c) power spectrum of de-noised time series.
Figure 4 shows the results of decomposing RG 4 precipitation time series using db9 mother wavelet and zero-padding boundary extension. Each monthly precipitation dataset was decomposed into six lower resolution levels via the DWT approach. The detail components represent the 2-month periodicity (D1), 4-month periodicity (D2), 8-month periodicity (D3), 16-month periodicity (D4), 32-month periodicity (D5), and 64-month periodicity (D6). The A6 represents the approximation component (including the trend) at the sixth level of decomposition.
Decomposition of monthly precipitation time series via db(9,6) and zero padding boundary extension for RG 4 (Ramsar).
Decomposition of monthly precipitation time series via db(9,6) and zero padding boundary extension for RG 4 (Ramsar).
It was observed that as the transform progressed from low to high scale (short to long time scale), more boundary points became distorted due to the decimation process. In other words, as filter length increased, more points at the boundary become affected. For higher scales (trend), the distortion becomes visibly worse. Application of boundary extension can cause inconsistency in computations of sub-series captured from DWT since it can add some uncertainties into time series (Mun 2004). This inconsistency could affect the performance of the proposed model. Hence, it was attempted to minimize the effect of applied boundary extension by using the MRE and er criteria in order to select the efficient border extension (see the ‘Time series decomposition via the DWT’ section).
Regionalization of rain gauges using the proposed model
At this stage, entropy-based values of the decomposed components of precipitation time series were calculated to be used as input layer of k-means. Spatial distribution of the seven entropy values of the sub-series (A6, D6, D5 … D1) in Iran are demonstrated in Figure 5. Based on Figure 5, highest DWE values generally were for A6 and D3 sub-series, whereas D1 sub-series had the least DWE values among all values calculated. It can be observed that there are compact counter lines on the north and northern west parts of Iran. It means that DWE values of various scales change rapidly on north and northern west parts of Iran and these zones are located in rainy and cold regions of Iran. Spatial changes of DWE becomes smoother for western and southern zones which are semi-arid and arid regions of Iran. Generally, it could be stated that rapid changes of entropy pattern are observed for the northern west parts of Iran, which are mostly cold areas.
Spatial distribution of discreet wavelet multiscale entropy (DWE) values over Iran.
Spatial distribution of discreet wavelet multiscale entropy (DWE) values over Iran.
These seven values as signature of decomposed time series were used as input data to SOM and k-means in order to perform precipitation regionalization. The number of clusters for the dynamic features of monthly precipitation time series was determined by three validity indices' values. Table 1 shows the validity values for various numbers of DWE-based clustering approaches.
As discussed, for spatial clustering of 31 rain gauges in Iran, the DWE value of each rain gauge was used as input data of k-means clustering technique. At first, k-means approach with a 1,000 trial was trained based on DWE values. The optimal number of clusters was determined using validation indices. The clustering number 5 with SC=0.33, DBi=1.18, and Dunn=1.52 showed a better performance in determining homogenous areas in comparison to other clustering numbers for the k-means approach. Therefore, clustering number equal to 5 was selected as the optimum value to categorize the rain gauges.
On the other hand, SOM models were used to cluster the 31 rain gauges into a visible 2-dimensional topology of regional RGL maps. For this end, map sizes of 2 × 2 to 10 × 10 were tried. The constructed topological maps coupled with related key features showed that clustering number 6 with SC=0.34, DBi=1.20, and Dunn=1.49 led to a better performance in determining homogenous areas of precipitation variation in comparison to other clustering numbers for the SOM approach. However, there was a failure in outcome of SOM with six clusters. It was observed that two clusters had only one rain gauge, and two clusters had more than ten rain gauges. Results of k-means in means of both evaluation criteria and classification of rain gauges in various clusters proved to be better than SOM, and therefore was applied for further analysis.
Some studies took advantage of DWT-based clustering approaches as a modeling approach. For example, Hsu & Li (2010) used the WT and self-organizing map (WTSOM) framework to spatially cluster the precipitation time series. In the proposed approach, they combined the WT and a SOM neural network. WT was used to extract dynamic and multiscale features of the non-stationary precipitation time series, and SOM was employed to objectively identify spatially homogeneous clusters on the high-dimensional wavelet transformed feature space. Haar and Morlet wavelets were selected in the data pre-processing stage to preserve the desired characteristics of the precipitation data. In this study, decomposition was performed using smoother db mother wavelets (db5–db10) along with optimum parameters. The entropy-based dynamic features of the time series could improve the performance of the clustering approach. Sub-series (i.e., Ai, Di i= 1, 2 … 6) represent various monthly scales. Nevertheless, some of these components might not demonstrate enough correlation with rainfall original time series. For this end, DWE was calculated and used as input to k-means to perform spatial clustering.
Geographic location of rain gauges based on clustering via DWE as input into k-means approaches is demonstrated in Figure 6. Also in Figure 6, the CC based on validity indices are presented. It was seen that some of the stations in a given cluster are spread across the study area, revealing that the basis of clustering is not geographic proximity. For example, the rain gauges located near the Caspian Sea (rain gauges 19, 18, 4, and 11) with highest precipitation values and geographical proximity, are assigned to various clusters due to the differences in entropies calculated for each rain gauge. The stations in each of these clusters are further examined for any common characteristics (in terms of multiscale entropy) they may have among themselves.
Geographic location of rain gauges (RG) in each cluster along with cluster centers (CC) for WT-k-means approach.
Geographic location of rain gauges (RG) in each cluster along with cluster centers (CC) for WT-k-means approach.
Raziei et al. (2008) regionalized the precipitation of the western part of Iran. They found five zones based on the behavior of precipitation. As can be seen from Figure 7, the rain gauges located in the west of Iran were placed in five different clusters. Different from results of spatial classification of rain gauges, Raziei (2017) found eight sub-regions of precipitation in Iran, namely, mountainous regime (covering Zagrom and some part of Alborz mountains), central Alborz regime, monsoonal southeastern regime, Caspian regime, northwestern regime, central-eastern regime, south and southwestern regime, and costal southeastern regime (geographic neighborhood). Also, Domroes et al. (1998); Modarres (2006), and Raziei (2017) separated Iran's rainfall regions into eight groups. The outcome of these studies are very analogous to each other. Similar to the present study, precipitation in the west of Iran was subjected to various precipitation changes for these studies (Figure 5). However, the results of the clustering in this study are very different, since the aforementioned studies classified the precipitation regime in Iran based on neighborhood approximates. As can be seen, the clustering shows that there is hydrologic similarity (in terms of multiscale variation of precipitation) in the clusters apart from the geographic neighborhood. It was observed that some of the rain gauges in a given cluster are spread across the study area showing that the basis of clustering is not the geographic contiguity.
Multiscale entropy values for five clusters (k-means): (a) Cluster 1, (b) Cluster 2, (c) Cluster 3, (d) Cluster 4, and (e) Cluster 5.
Multiscale entropy values for five clusters (k-means): (a) Cluster 1, (b) Cluster 2, (c) Cluster 3, (d) Cluster 4, and (e) Cluster 5.
As can be seen in Figure 7, the multiscale entropy values are, to a great extent, similar within any given cluster and the basis of the clustering is the entropy signature of the precipitation observed at all the rain gauges for all clusters. For example, in Cluster 1 (Figure 7(a)), the entropy signatures for all the rain gauges are similar and the peaks in the plots indicate high values of entropy, which corresponds to high variability of the precipitation features at the specific scale across time. In addition, the pattern of the entropy in a given cluster across all scales for the rain gauges is unique for that cluster (homogeneity) but also different from every other cluster.
In order to ensure a more sensible and simpler analysis, the average entropies for all clusters were used instead of entropy values of single rain gauges, and this was considered as the representative value of entropy for all clusters at a specific scale. Figure 8 shows the DWE values obtained for detail and approximation components. It was observed that entropy values of D3 and approximation components had the highest values and lowest variation whereas D1 had lowest entropy values with highest variation among all components. The DWE values increased from D1 to D3, then decreased from D3 to D4. However, variation of DWE from D4 to D6 and A6 (Figures 7 and 8) was not constant. In order to prove the outcome of Figure 8, the wavelet power spectrum of central rain gauges of each cluster are presented in Figure 9. It can be observed, that for all the rain gauges, that there are very rapid changes for the period of 1 to 16 months. For the period of 1 month, mostly low powers were observed; however, for the periods up to 8 months, the change in power spectrum becomes rapid from low to high and vice versa was observed. These changes become smoother for the period of the 16-month band in comparison to the 8-month band. For the bands beyond 16 months, the changes become smoother in comparison to previous bands and also, power spectrum values are higher than the 1-month band. Therefore, it can be inferred that these entropies of D1 to D4 sub-series are the key variables in precipitation regionalization. Also, it can be stated that the precipitation variation is affected by different variables, such as timing, amount, and temporal distribution.
Based on these observations, three distinct bands were determined for further analysis. Band 1 considered the features up to 8-month scales (D1, D2, and D3). Band 2 considered the features from 8 to 16 months. The features having a scale beyond 16 months were categorized as Band 3 (D5 and D6). Figure 10 shows the average normalized DWE for all the clusters at different bands (the DWE values were normalized for better comparisons). There is a clear distinction in the values of DWE for different clusters in the first two bands and, in view of this, information from the first two bands was further analyzed. The DWE of each cluster was further classified into ‘High’, ‘Medium’, and ‘Low’, by considering the condition of the individual DWE plot according to the mean level for that band. For instance, if the DWE of a cluster in a specific band fell below the mean of DWE of all clusters, then that particular cluster was assigned a signature of ‘Low’. Using this classification, an entropy signature was given to each cluster based on the entropy values in the three scale-based bands. For notational simplicity, the classifications ‘High’, ‘Medium’, and ‘Low’ were represented by ‘1’, ‘0’, and ‘ − 1’, respectively. This means, for example, that an entropy signature of (0, −1) would indicate that the cluster had a relatively moderate entropy up to 8 months and low entropy for 8–16 months. Based on these notations, the entropy signature for each of the 14 clusters is given in Table 2.
Entropy signature of five clusters for 31 rain gauges in Iran
Cluster no. . | Comparative observation of DWE . | Entropy signature . | |
---|---|---|---|
Band 1 . | Band 2 . | ||
1 | H | M | (1,0) |
2 | L | M | (− 1,0) |
3 | H | L | (1, − 1) |
4 | M | L | (0, − 1) |
5 | L | L | (− 1, − 1) |
Cluster no. . | Comparative observation of DWE . | Entropy signature . | |
---|---|---|---|
Band 1 . | Band 2 . | ||
1 | H | M | (1,0) |
2 | L | M | (− 1,0) |
3 | H | L | (1, − 1) |
4 | M | L | (0, − 1) |
5 | L | L | (− 1, − 1) |
Comparison of normalized DWE values for each scale for all clusters according to segregated bands: (a) Band 1: 2–8 months, (b) Band 2: 8–16 months, and (c) Band 3: 16–64 months.
Comparison of normalized DWE values for each scale for all clusters according to segregated bands: (a) Band 1: 2–8 months, (b) Band 2: 8–16 months, and (c) Band 3: 16–64 months.
As a further step, it was attempted to connect the DWE values at different scale-based bands to their respective mean monthly precipitation of rain gauges. Boxplots of mean monthly precipitation (Figure 11) suggest that the clusters with ‘High’ entropy for the scale 9–13 months (i.e., Clusters 2 and 5) had smaller precipitation values. Clusters characterized by ‘Medium’ or ‘High’ entropy for the scale 8–16 months (i.e., Clusters 1, 3, and 4) had larger precipitation values. Hence, mean monthly precipitation values and relative entropy showed an inverse relationship.
Distribution of mean monthly precipitation values for the five clusters.
As an important issue, the connection between the DWE with latitude and longitude was investigated to indicate the spatial structure of the precipitation variation, which is shown in Figure 12. For DWE latitude, R2 = 0.227 and P-value = 0.37 (not significant) were calculated and a downward relation was observed; however, for DWE longitude, R2 = 0.22 and P-value = 0.34 (not significant) were calculated and an upward relation was observed. For both of the relations no significant trend was detected. It can be inferred from Figure 12 that multiscale precipitation variation (DWE) possesses the latitude zonality, which implies that precipitation variability increases with the latitude from the west to the east. On the other hand, decrease of DWE values from north to south was observed.
Spatial structure of DWE values (in latitude and longitude directions) and related trends in Iran.
Spatial structure of DWE values (in latitude and longitude directions) and related trends in Iran.
Results showed the capability of the present methodology for precipitation regionalization. When accessibility to recorded precipitation time series is limited at the region of interest, regionalization methods might lead to incorrect results. For the case of the discrete wavelet, although the wavelet power spectrum has successfully been used for capturing hydrological time series behavior, it becomes difficult to use the wavelet spectrum in cases of limited or incomplete time series. Nevertheless, entropy provides information about the uncertainty at a given scale, which can highlight the level of variation present at that scale. Further, entropy enables the determination of least-biased probability distributions with limited time series knowledge. Entropy theory can serve as a useful approach to study hydrologic and meteorological processes (Mishra et al. 2009; Agarwal et al. 2016). Sang (2012) also showed the usefulness of applying DWE in precipitation-based studies.
The obtained results are applicable in local scale, since various factors can affect the outcome of the proposed model (e.g., geographic location, precipitation variation, effect of climatic phenomena, precipitation gauges network, etc.). Due to the existence of uncertainties and various factors it is suggested to apply the proposed model for various case studies and to compare the outcome. Also, it is suggested to validate the capability of the proposed model on other hydrologic and climatic variables (i.e., evapotranspiration, temperature, runoff, etc.) with various time scales (e.g., daily, annual, etc.).
CONCLUSION
In this study, the spatio-temporal variability of monthly precipitation in Iran during 1960–2010 was investigated using DWE, and the pattern of DWE changes along with regionalization of rain gauges were further analyzed. In order to meet the objectives of this study 31 rain gauges were selected.
In order to have a correct vision of decomposing precipitation time series, smoother db mother wavelets were applied (db5–db10). Also, optimal decomposition level and boundary extension treatment were applied. In order to classify the rain gauges, SOM and k-means clustering models were used. The methodology based on the DWE approach k-means clustering technique for precipitation regionalization proved to be robust for hydrologic regionalization.
Wavelet-based multiscale entropy values showed the distinct variation of precipitation dynamics at each rain gauge and allowed for the establishment of homogeneous areas (with no prior assumptions). Most of the previous studies in precipitation regionalization delineated the rain gauges based on geographic proximity; however, the present study categorized rain gauges according to the uncertainties (entropy) in a multiscale approach. The DWE was useful circumstantial evidence in capturing the precipitation characteristics. The 31 rain gauges studied were clustered into five groups, each one having a unique DWE pattern across different time scales. Based on the pattern of mean DWE for each cluster, a characteristic signature was assigned, which provided an estimation of DWE of a cluster across scales 2–8, 8–16, and 32–64 months relative to other stations. Fluctuations in DWE at different scales in this study were related to monthly precipitation.
Results showed the capability of the present methodology for precipitation regionalization. When accessibility to recorded precipitation time series is limited at the region of interest, regionalization methods might lead to incorrect results. For the case of the discrete wavelet, although the wavelet power spectrum has successfully been used for capturing hydrological time series behavior, it becomes difficult to use the wavelet spectrum in cases of limited time series. Nevertheless, entropy provides information about the uncertainty at a given scale, which can highlight the level of variation present at that scale. Further, entropy enables the determination of least-biased probability distributions with limited time series knowledge. Entropy theory can serve as a useful approach to study hydrologic and meteorological processes (Mishra et al. 2009; Agarwal et al. 2016). Sang (2012) also showed the usefulness of applying DWE in precipitation-based studies.