Abstract

In the present study, classical and proposed methods were used to investigate the monthly precipitation characteristics of 30 stations in the southeastern United States during 1968–2018. Maximal overlap discrete wavelet transform (MODWT) as preprocessing method and K-means clustering method were used. First, the monthly precipitation time series of stations were decomposed into several subseries using MODWT and considering db as the mother wavelet. Then, the energy values of theses subseries were calculated and used as inputs in K-means and radial basis functions (RBF) methods. The optimum number of clusters obtained for the considered stations in both classical and proposed methods was five clusters. In order to use the data as the input of the RBF method, the data correlation was evaluated by variogram. Based on the results of clustering and in accordance with the latitude and longitude variations of the stations, it was found that with increasing the energy of the clusters, the amount of precipitation in the stations decreased and vice versa. The silhouette coefficient of clustering for the classical method obtained was 0.3 and for the proposed method it was 0.8, which indicates better clustering of the selected area using the proposed method.

HIGHLIGHTS

  • Discussing important climatic elements which affect the hydrological cycle.

  • Discussing temporal-spatial variations of precipitation.

  • Two classical and proposed methods were used to investigate the monthly precipitation characteristics.

  • Maximal Overlap Discrete Wavelet Transform (MODWT) as pre-processing method and K-means clustering method were used.

  • The data correlation was evaluated by variogram and covariance graphs.

INTRODUCTION

Precipitation variation assessment over a large area can provide valuable information for water resources management and engineering issues, particularly in a changing climate (Mishra et al. 2009; Wei et al. 2017). The variability of precipitation through its participation in the global hydrologic and energy cycles is important to understand the behavior and the Earth's climatic system changes (Lettenmaier et al. 1994; Karl & Knight 1998; Lins & Slack 1999; Douglas et al. 2000; Mauget 2003; Zhang & Mann 2005; Small et al. 2006). Also, understanding spatial and temporal variation of precipitation will provide knowledge into the management and arranging of precipitation subordinate exercises (Kolivras & Comrie 2007; Nischitha et al. 2014). Many researchers have studied global and regional precipitation variations and tried to find the impacts of climate changes on the hydrological cycle and human life. Buytaert et al. (2006) investigated rainfall data from 14 rain gauges in the western mountain range of the Ecuadorian Andes. They studied spatial and temporal rainfall patterns and showed that spatial variability in average rainfall was very high. Also, significant correlations were found between average daily rainfall and geographical location. Cheng et al. (2008) evaluated the rain-gauge network using geo-statistical methods to calculate the mean precipitation in areas without stations. They showed that annual rainfall exhibits a significant orographic effect and less spatial variability, whereas hourly rainfall exhibits higher variability in space and the spatial variation structures vary among different storm types. Chu et al. (2010) investigated the temporal–spatial distribution of the precipitation of several watersheds in China. The produced various distribution maps showed that for the entire watersheds the precipitation of 1958–2007 decreased except for the spring season. The decline trend was significant in summer. The annual and seasonal precipitation amounts and changing trends were found to be different in different regions and seasons.

Unal et al. (2012) analyzed annual, wet and dry seasons' precipitation records for the period of 1961 to 2008 from 271 stations in Turkey via the rotated empirical orthogonal function (REOF), the Mann–Kendall trend test, and the continuous wavelet transform (WT) method. The obtained results showed that the decreasing wet/dry season precipitation that was observed throughout the country, except the northeast coasts and eastern parts of Turkey, had a strong impact on the economic livelihood of the region, especially on agricultural production, drinking water supply, and hydroelectricity production.

Zhang et al. (2017) investigated the spatial and temporal patterns of rainy season precipitation from 1960 to 2014 using principal component analysis. The results showed that Niño-3.4 SST in winter positively impacts the subsequent year's rainy season precipitation. Mengmeng & Baoxiang (2019) analyzed the trend of precipitation change in Shandong Province through the climate trend rate. Through multiple methods, they identified the interannual, seasonal abrupt points. The results showed that in terms of space, the annual precipitation in each region of Shandong Province was basically reduced to a different extent, and the southeast coastal region had a larger reduction than the northwest inland region. Aydin & Raja (2020) evaluated the changing spatial and temporal characteristics of precipitation on August 24, 2015 and November 11–12, 2015 that triggered flash floods and landslides in Artvin in Turkey. They used spatiotemporal (ST) kriging as a tool to investigate the ST patterns. They showed that the analysis of spatiotemporal characteristics of heavy rainfall events represents an indispensable basis to ensure that necessary precautions regarding flash flood events in the city are taken.

Since most hydrological time series are non-stationary, trend dependent, or with seasonal fluctuations, therefore, the use of other methods for identifying the dominant periodicities of time series is necessary. Wavelet analysis (WA) is one of the most common approaches used by researchers for this aim (Kumar & Foufoula 1993; Adamowski et al. 2009; Chou 2013; Roushangar et al. 2018). Hybrid wavelet analysis has been used to improve the ability of models to capture the multiscale features of hydrological time series (e.g., Agarwal et al. 2017; Farajzadeh & Alizadeh 2017). By applying the WA to a signal, it is transformed into time–frequency space. Such a transformation can be useful to identify the dominant periodicities of variability and alteration of them over time (Adamowski et al. 2013; Joshi et al. 2016; Farajzadeh & Alizadeh 2017; Roushangar et al. 2019). Therefore, WA is a beneficial method to analyze the localized changes of power of a specific signal. WA has been used as a useful method for breaking down and excavating complex, periodic, and irregular hydrological time series (Roushangar & Alizadeh 2018). According to Adamowski et al. (2013), the wavelet transform (WT) is able to provide information on the non-stationary time series' localized characteristics both in temporal and frequency domains. Different methods have been developed for WT application in modeling and analysis which sometimes have resulted in the wrong application and incorrect outcome (Du et al. 2017). Based on Quilty & Adamowski (2018), time series decomposition into subseries generated by WT is the primary mistake made by some researchers in using WT, where isolation and extraction of relevant features from a given time series using WT occurs. This leads to adding error to the wavelet and scaling coefficients due to the boundary extensions' wrong selection (Maheswaran & Khosa 2012; Roushangar & Alizadeh 2019).

On the other hand, for identifying structure in an unlabeled precipitation data set clustering techniques can be applied. Clustering techniques objectively organize data into homogeneous groups in which the within-group-object similarity is maximized and the between-group-object similarity is minimized. Most classical clustering methods use static data where features do not change with time. The static data clustering methods are partitioning, hierarchical, density-based, grid-based, and model-based methods (Jain et al. 1999). Clustering analysis is the same as the homogeneity test (Viglione et al. 2007; Mousavi et al. 2015). Recently, time series clustering methods have been developed, including raw data-based, feature-based, and model-based methods (Liao 2003; Zhou & Chan 2014; Barton et al. 2016). Since time series clustering considers features in both temporal and spatial domains, the analysis results show the transient changes and local characteristics that usually occur in climate data. The two most common cluster analysis methods are the K-meaning method and Ward's method. In both cases, the number of clusters needs to be determined in advance.

In general, input data for clustering have a significant effect on output results. However, all properties of input variables are not required in area clustering. Therefore, appropriate methods are needed to extract the desired characteristics from the input data. The original precipitation data contain a large amount of information, so, a multiscale approach is needed to select some of the desired data and eliminate undesirable or additional information. Due to the dynamic characteristics and non-uniform distribution of precipitation data and also the need for identifying the homogeneous precipitation areas in water resources management, a temporal–spatial model is proposed to investigate the precipitation characteristics. In this regard, monthly precipitation data from 30 precipitation stations in the United States during the period 1968–2018 were used and the temporal–spatial characteristics of precipitation were investigated using two methods, classical and proposed. In the proposed method, data were first decomposed into several subseries using maximal overlap discrete wavelet transform (MODWT) method and the energy values of each subseries were calculated. In fact, for correcting the WT misapplication, precipitation datasets were analyzed using MODWT and suitable boundary extensions were selected. These data were then used as inputs in the K-means clustering method. The main objects of this study are as shown below:

  • Investigating the benefits of the MODWT in evaluating the spatial–temporal characteristics of precipitation.

  • Clustering the study area based on the most dominant subseries and identifying the homogeneous precipitation areas based on the subseries energy values.

  • Finding the relationship of precipitation with subseries energy values and latitude and longitude of the selected stations.

MATERIALS AND METHODS

Study area

In this study, precipitation data from 30 stations in the southeast United States and around Atlanta City in the state of Georgia were used to investigate the temporal–spatial variability of precipitation. Most of the southeast part of the United States is dominated by humid subtropical climate and receives uniform precipitation throughout the year. Atlanta has hot and humid summers and mild winters. The mean annual precipitation in Atlanta is 50.2 inches (1,280 mm). Atlanta's area is 347.1 square kilometers, of which, 344.9 square kilometers is land and 2.2 square kilometers water. Atlanta is located among the foothills of the Appalachian Mountains. Among major cities in the east part of the Mississippi River, Atlanta has the highest elevation. Figure 1 and Table 1 show the location of the selected stations.

Table 1

Information of the selected stations in the study

Station numberStation nameStateLatitudeLongitudeStation numberStation nameStateLatitudeLongitude
S1 Anderson Faa airport SC 34.495 −82.71 S16 Cornelia GA 34.513 −83.527 
S2 Anderson SC 34.532 −82.328 S17 Coweeta Experiment station NC 35.073 −83.422 
S3 Appling 2 NW GA 33.557 −82.328 S18 Eastman 1 W GA 32.205 −83.197 
S4 Athens Ben Epps Airport GA 33.956 −83.317 S19 Gainesville GA 34.309 −83.864 
S5 Athens TN 35.434 −83.575 S20 Hawkinsville GA 32.281 −83.452 
S6 Atlanta Hartsfield International Airport GA 33.645 −84.441 S21 Jasper 1nnw GA 34.47 −84.441 
S7 Calhoun Falls SC 34.099 −82.583 S22 Jonesboro GA 33.532 −84.336 
S8 Carnesville GA 34.365 −83.25 S23 Louis Ville 1e GA 33.019 −82.388 
S9 Cartersville Number 2 GA 34.173 −85.778 S24 Macon Middle Ga Regional Airport GA 32.692 −83.647 
S10 Chattanooga Airport TN 35.42 −85.197 S25 Montezuma 2nw GA 32.325 −84.059 
S11 Clarks Hill 1 W SC 33.663 −82.186 S26 Taylorsvilee GA 34.074 −84.098 
S12 Clayton 1ssw GA 34.883 −83.385 S27 Toccoa GA 34.581 −83.34 
S13 Clemson university SC 34.661 −82.823 S28 Walhalla SC 34.753 −83.077 
S14 Cleveland Filter Plant TN 35.214 −84.785 S29 Washington GA 33.719 −82.71 
S15 Cleveland GA 34.593 −83.782 S30 Woodbury GA 32.987 −84.598 
Station numberStation nameStateLatitudeLongitudeStation numberStation nameStateLatitudeLongitude
S1 Anderson Faa airport SC 34.495 −82.71 S16 Cornelia GA 34.513 −83.527 
S2 Anderson SC 34.532 −82.328 S17 Coweeta Experiment station NC 35.073 −83.422 
S3 Appling 2 NW GA 33.557 −82.328 S18 Eastman 1 W GA 32.205 −83.197 
S4 Athens Ben Epps Airport GA 33.956 −83.317 S19 Gainesville GA 34.309 −83.864 
S5 Athens TN 35.434 −83.575 S20 Hawkinsville GA 32.281 −83.452 
S6 Atlanta Hartsfield International Airport GA 33.645 −84.441 S21 Jasper 1nnw GA 34.47 −84.441 
S7 Calhoun Falls SC 34.099 −82.583 S22 Jonesboro GA 33.532 −84.336 
S8 Carnesville GA 34.365 −83.25 S23 Louis Ville 1e GA 33.019 −82.388 
S9 Cartersville Number 2 GA 34.173 −85.778 S24 Macon Middle Ga Regional Airport GA 32.692 −83.647 
S10 Chattanooga Airport TN 35.42 −85.197 S25 Montezuma 2nw GA 32.325 −84.059 
S11 Clarks Hill 1 W SC 33.663 −82.186 S26 Taylorsvilee GA 34.074 −84.098 
S12 Clayton 1ssw GA 34.883 −83.385 S27 Toccoa GA 34.581 −83.34 
S13 Clemson university SC 34.661 −82.823 S28 Walhalla SC 34.753 −83.077 
S14 Cleveland Filter Plant TN 35.214 −84.785 S29 Washington GA 33.719 −82.71 
S15 Cleveland GA 34.593 −83.782 S30 Woodbury GA 32.987 −84.598 
Figure 1

Location of the study area and selected stations.

Figure 1

Location of the study area and selected stations.

Maximal overlap discrete wavelet transform (MODWT)

MODWT was considered as the discrete wavelet transform (DWT) modified version. Both DWT and MODWT allow performance of a multi-resolution analysis which is a scale-based additive decomposition. MODWT has several merits in comparison with DWT. For example, MODWT can be properly defined for arbitrary signal length, while the DWT is limited to a signal length with an integer multiple of a power of two.

According to Percival & Walden (2000), MODWT projects a time series onto a collection of non-orthogonal basis functions (wavelets) for generating a set of wavelet coefficients. MODWT makes it possible to develop a multi-resolution analysis (MRA), which is a scale-based additive decomposition. For a time series P with n samples, the MODWT can be expressed as:
formula
(1)
where Wj is wavelet coefficients which capture local fluctuations over the whole period of a time series at that particular scale. VN is scaling coefficients and shows the overall trend of the original signal. N is the decomposition level number. For a discrete signal P = {Pt, t = 0, 1, …, N–1}, the elements of the jth level MODWT wavelet and scaling coefficients, Wj and Vj, can be written as Equation (2):
formula
(2)
formula
where Wj,t is the tth element of the jth level MODWT wavelet coefficient; Vj,t is the tth element of the jth level MODWT scaling coefficient; {} and {} are the jth level MODWT high- and low-pass filters (wavelet and scaling filters) yielded by periodizing {} and {}to length n, respectively; {} and {} are the jth level MODWT high-pass filter () and low-pass filter ();{} and {} are the jth level DWT high- and low-pass filters; and L is the highest decomposition level. The filters are determined depending on the mother wavelets, as in DWT. Figure 2 represents a flowchart of a three-level MODWT. As seen in Figure 2, the MODWT-based MRA decomposes an original signal P into scaling (V3) wavelet coefficients (W1, W2, and W3) (Maslova et al. 2016). Based on Cornish et al. (2006), unlike the WDT, the MODWT does not decimate the coefficients and the number of wavelet and scaling coefficients is the same as the number of sample observations at each level of the transform. In fact, the MODWT coefficients consider the result of a simple changing in the pyramid algorithm used in computing DWT coefficients through not down sampling the output at each scale and inserting zeros among coefficients in the scaling and wavelet filters. Details on MODWT can be found in Percival & Walden (2000).
Figure 2

The steps of a time series decomposition into detail (D) and approximation (A) subseries.

Figure 2

The steps of a time series decomposition into detail (D) and approximation (A) subseries.

Calculating the energy of decomposed subseries via MODWT

The wavelet and scaling coefficients obtained from the MODWT method were used as inputs in the clustering and geostatistical methods in the proposed model. In the proposed model, in order to reduce the numbers of inputs and prevent possible errors in the output of the used models, the energy values of the subseries decomposed by MODWT were calculated for each of the wavelet and scaling coefficients using Equation (3). Then, the obtained energy values were used as a representative of each coefficient in clustering methods.
formula
(3)
where P is the precipitation amount, n represents the month of the time series, and E denotes the energy value.

K-means clustering method

Clustering is the process of dividing or grouping a specific set of patterns into separate groups, in which, similar patterns remain in the same cluster and different patterns locate in the other clusters. The k-means clustering is used for vector quantization and, generally, it is used in data mining cluster analysis. The aim of the k-means clustering is to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This leads to a partitioning of the data space into Voronoi cells. According to Pham et al. (2005), k-means minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. Given an initial set of k means, the algorithm proceeds by alternating between two steps:

  • Assignment step: assigning each observation to the cluster whose mean has the least squared Euclidean distance.

  • Update step: calculating the new means (centroids) of the observations in the new clusters. When the assignments no longer change, the algorithm has converged (see Pham et al. 2005 for more details).

Radial basis functions (RBF) method

Geostatistical methods can be used to extend the obtained results to the whole studied area. Radial basis function (RBF) is a geostatistical model and has its origin in methods for providing the exact interpolation of datasets' points in a multi-dimensional space (Powell 1987). RBF interpolation is an advanced method in approximation theory for constructing high-order accurate interpolants of unstructured data, possibly in high-dimensional spaces. The interpolant takes the form of a weighted sum of RBF. RBF is often spectrally accurate and stable for large numbers of nodes even in high dimensions (Buhmann & Dyn 1993). According to Flyer & Wright (2009), RBF interpolation has been used to approximate differential operators, integral operators, and surface differential operators. The RBF mappings give an interpolating function which passes exactly through every data point. If there is noise present on the data, the interpolating function which averages over the noise gives the best extension.

Considered methodology

The main aim of this study was to investigate the capability of new methods in temporal and spatial analysis of precipitation in an area to reduce computational time, cost, errors, and noise in temporal and spatial investigation of precipitation. The MODWT–K-means framework was used for this aim. MODWT was used to extract dynamic and multiscale features of the non-stationary precipitation time series, and K-means was applied to objectively identify spatially homogeneous clusters on the high-dimensional wavelet-transformed feature space. During the modeling process, an attempt was made to reduce the number of input data in the analyzing process. The following points could be considered to reduce the noise values:

  1. The number of inputs in modeling process should be reduced.

  2. The probability of selecting inputs should be reduced (i.e., the selection of inputs should not be done by chance).

  3. The methods with higher accuracy should be used.

Figure 3 illustrates the various stages of the research to estimate the temporal–spatial characteristics of monthly precipitation using both classical and proposed models. In this study, monthly precipitation datasets from 30 rainfall stations for the period 1968–2018 were used. Each station had 612 data. In the classical method, without any preprocessing in the precipitation data, all data were used as inputs for area clustering. The silhouette coefficient was used to validate the results. In the proposed model for investigating the precipitation variations over the selected time period, the precipitation time series were first decomposed using the MODWT method. The MODWT method was used to reduce the number of inputs. In this regard, between the subseries obtained from MODWT, the best input combination was used for clustering.

Figure 3

Schematic view of the considered modeling in the study.

Figure 3

Schematic view of the considered modeling in the study.

RESULTS AND DISCUSSION

Decomposition of precipitation time series and calculating the subseries energy values

To investigate the temporal–spatial variations of precipitation in the study area, time series were first decomposed to several subseries via MODWT. In this study, the db was used as mother wavelet, since this type of mother wavelet has been widely used in hydrological studies. The db wavelets provide complete support for time series, indicating that these wavelets have non-zero basic functions over a given interval. In the proposed model, in order to find the best value of the decomposition level and the best type of db, monthly time series were decomposed in several levels between two and six and numbers two to five were considered for db. Among the 20 different considered states, the best case was selected by calculating the values of root mean square error (RMSE) criteria. Therefore, the decomposition level of four and the db number of three was selected for time series decomposition. In order to use the MODWT method, some data must be removed from the left side of the time series, whereas in the DWT method it is necessary to remove data from both left and right sides. In fact, the MODWT needs an infinite signal Pt, in which t = …, −1, 0, 1, …., N − 1, N, while in reality the data are measured in a finite interval at discrete times. To use this method, the extension of time series is need for unobserved amounts determining, P0, P1, …. PN+1,PN+2, prior to preprocessing. Therefore, the right end of the Ps series (i.e., PN+1, PN+2, …) should be extended properly and special attention should be paid to the values affected by the boundary conditions. There are two methods for considering the boundary effect, which include data modification and wavelet modification. In this study, boundary conditions' handling is performed based on Percival et al. (2011). The extension of data is done using the following equation:
formula
(4)
where j is decomposition level, L is twice the db number, and Lj is the number of omitted data. In Figure 4, the results of station S2 time series decomposition with and without data removing are shown.
Figure 4

Comparing the precipitation time series decomposition with and without data removing for station S2.

Figure 4

Comparing the precipitation time series decomposition with and without data removing for station S2.

Signal decomposition via MODWT will lead to two wavelet and scaling coefficients. The scaling coefficients (V) represent the wavelet transform coefficients with large resolution which show smooth trends in the series and wavelet coefficients (W1, W2, W3, …) provide detailed information of trends in hydrological time series. Each of the W components provides a specific period of time. For example, in monthly data, W1, W2, W3, and W4 indicate 2, 4, 8, and 16-month periods, respectively. After time series decomposition, in the next step, the energy of the subseries was calculated using Equation (3). Figure 5 shows the energy values of each subseries decomposed by MODWT for all stations. The magnitude of the energy variations in the W1 was 3.53 × 1020 to 5.78 × 1020, in the W2 was 1.84 × 1020 to 3.58 × 1020, in the W3 was 9.36 × 1019 to 1.73 × 1020, and in W4 was 6.74 × 1019 to 1.01 × 1020. Also, V4 was between 19.04 × 10−4 and 19.35 × 10−4. As can be seen, V4 and W2 had the least variations and W4 had the most variations among the subseries.

Figure 5

The variations of energy values of decomposed subseries by MODWT.

Figure 5

The variations of energy values of decomposed subseries by MODWT.

Clustering the study area

After calculating the subseries energy values, clustering was performed for both classical and proposed models and the obtained results were compared. In the clustering process, first, the number of clusters should be determined, therefore, the number of clusters was selected between 2 and 10 and K-means operation was performed. The best number of clusters was selected based on silhouette coefficient (SC) and dispersion of stations. Accordingly, the best cluster number was found to be 5 clusters. In the classical method, all monthly precipitation data were considered as input. In this case, 612 data were selected as input for each station and clustering was performed. The silhouette coefficient for the classical method obtained was 0.31 which indicated relatively poor correlation and clustering. Figure 6(a) shows the silhouette coefficient of each cluster in the classical method. As can be seen, the cluster number 5 is a single-member cluster that showed dissimilarity to other stations. Also, clusters 2 and 3 had negative silhouette coefficients. In the proposed method, for selecting the best input data for clustering with 5 clusters, the energy values of decomposed subseries and combination of them (15 different states) were considered and the silhouette coefficient and spatial distribution of the stations were assessed. The results are presented in Table 2 and Figure 6. It was observed that when variables W2 and V4 were used as inputs, the value of silhouette coefficient improves up to 0.8. Another reason for selecting variables W2 and V4 as the best inputs was their lower data variability in comparison with the other variables. According to the results, the S18 and S24 stations with SC = 0.98 and S = 0.97 had the highest, and the S30 and S22 stations with S = 0.11 and S = 0.37 had the lowest SC, respectively. In Figure 6(b), the clusters were marked on the map with different shapes. From the results, it seems that the stations are clustered appropriately. The clusters were located as: the cluster number 2 (with rhombus shape) in the southern part, the cluster number 3 (with square shape) in the north and northwest parts, the cluster number 5 (with triangle shape) in the east part, and the cluster numbers 1 and 4 (with star and circle shapes, respectively) from the east to the west parts. Also, Figure 7 shows the energy values of the decomposed subseries. According to this figure, it could be deduced that the subseries energy values had wide variations; however, in each cluster the subseries energy variations were almost similar to each other. For all stations, the W1 subseries had the highest energy amount and the W4 and V4 subseries had the lowest energy amount. Also, the most similarity was obtained for cluster 1 and the least similarity was obtained for clusters 3 and 4.

Table 2

Cluster information

Station numberCluster numberMonthly mean precipitation variationEnergy variation
1,051.4–951.4 2.41 × 1020–2.28 × 1020 
978.2–959.4 3.58 × 1020–3.3 × 1020 
1,525.8–1,081.8 1.98 × 1020–1.84 × 1020 
1,266.7–988.1 2.25 × 1020–2.07 × 1020 
Station numberCluster numberMonthly mean precipitation variationEnergy variation
1,051.4–951.4 2.41 × 1020–2.28 × 1020 
978.2–959.4 3.58 × 1020–3.3 × 1020 
1,525.8–1,081.8 1.98 × 1020–1.84 × 1020 
1,266.7–988.1 2.25 × 1020–2.07 × 1020 
Figure 6

(a) The SC values of each station in clusters using classical and proposed methods and (b) spatial clustering of stations.

Figure 6

(a) The SC values of each station in clusters using classical and proposed methods and (b) spatial clustering of stations.

Figure 7

The energy values of decomposed subseries in each cluster.

Figure 7

The energy values of decomposed subseries in each cluster.

Hsu & Li (2010) used the wavelet transform self-organizing map (WTSOM) framework to cluster and explore spatial–temporal characteristics of the 22 years of precipitation data for Taiwan. They showed that the hybrid methods performed successfully in recognizing homogeneous hydrologic regions. Comparing the results of the classical and proposed models applied in this study showed high efficiency of the considered methodology in the selected area clustering. From the MODWT, the best input combination (i.e., W2 + V4) was used for clustering. This reduced the number of input data to one-fifth. The amount of the silhouette coefficient for this case was almost three times more than the classical method, which shows the efficiency of the selected method in zoning the area. Reducing the number of inputs, not using the usual classical methods to select the mother wavelet type and the level of decomposition in the MODWT method (which creates additional subseries), and testing a different number of clusters and selecting the most appropriate k based on trial-and-error process increased the accuracy of the proposed method.

Figure 8 shows the relationship of mean precipitation with energy values for each cluster and central stations of clusters. The obtained results indicated that as the mean precipitation of the clusters increased, the energy values decreased. The highest precipitation was measured at cluster 3 (square shape) stations, where northern stations of the study area and north of Atlanta are located. Also, the lowest precipitation was measured at cluster 2 (rhombus shape) stations, where the southern stations and south of Atlanta are located. According to the results, for stations and clusters located in the northern part of the selected area, the amount of precipitation increased and the amount of energy decreased. Also, for the stations and clusters located in the southern part, the amount of precipitation decreased and the amount of energy increased. As can be seen, the highest precipitation after cluster 3 was obtained for cluster 4 (circle shape). Each cluster had several stations as the members of that cluster; among them, one station was selected as the cluster central station, which had the lowest distance from the center of the cluster and the highest silhouette coefficient among the other members. Details of the central stations for each cluster are given in Table 3. According to Figure 8(b), it can be seen that with increasing energy values the precipitation values decreased and vice versa.

Table 3

Information of the central stations of each cluster

Cluster numberStation numberSilhouette coefficientCentral station distance from the cluster center
S2 0.94 3.95 × 1035 
S18 0.98 3.58 × 1035 
S12 0.96 2.29 × 1036 
S13 0.88 1.91 × 1036 
S3 0.85 1.85 × 1036 
Cluster numberStation numberSilhouette coefficientCentral station distance from the cluster center
S2 0.94 3.95 × 1035 
S18 0.98 3.58 × 1035 
S12 0.96 2.29 × 1036 
S13 0.88 1.91 × 1036 
S3 0.85 1.85 × 1036 
Figure 8

(a) Relationship of mean precipitation of each cluster with energy values and (b) relationship of mean precipitation of central stations of each cluster with energy.

Figure 8

(a) Relationship of mean precipitation of each cluster with energy values and (b) relationship of mean precipitation of central stations of each cluster with energy.

Radial basis functions (RBF) method results

In this study, two important parameters of precipitation and energy were used as inputs for the RBF method which is an interpolation method. Before interpolating the precipitation and energy values of stations without data, first, data distribution should be investigated. One of the useful methods for determining the data distribution is the QQ plot. In this study, the QQ plot was used and the results showed that the data used had normal distribution. After preliminary analysis, the semi-variance graphs were plotted. Semi-variance analysis reveals the data correlations, trends, and similarity covariance between points. Figure 9(a) shows the semi-variance and covariance graphs for monthly precipitation and energy. Also, in Table 4, the characteristics of precipitation and energy semi-variance models are listed. To express the robustness of the spatial structure of a variable, the ratio C0/(C+ C0) (C0 is nugget effect and C is partial sill) can be used and investigated to see how much of the total variability justifies the nugget effect. Since the obtained values for both monthly precipitation and energy parameters were less than 1/2, it could be deduced that the role of the unstructured component is less than the structured component. Therefore, the investigated parameters had strong spatial structure. According to Figure 9(b), and based on the covariance between mean monthly precipitation and station distance, it could be indicated that there was a significant positive correlation between these two variables. This correlation was higher at the initial distances and as the distance increased, the effect of station distance on the mean monthly precipitation decreased. At a distance of about 142 km, the similarity of these two parameters was zero and, after this point, variation on monthly precipitation was approximately ineffective. The covariance between energy and station distance also showed that there was a significant positive correlation between these two variables. The correlation was higher at the initial distances and, as the distance increased, the effect of the station distance on the amount of energy decreased.

Table 4

Characteristics of precipitation and energy semi-variance models

EnergyPrecipitationParameter
479,313 (m) 192,932 (m) Effect amplitude (R) 
1.470 (mm) 0.891 (mm) Partial sill (c) 
0.378 (mm) 0.225 (mm) Nugget effect (c0
EnergyPrecipitationParameter
479,313 (m) 192,932 (m) Effect amplitude (R) 
1.470 (mm) 0.891 (mm) Partial sill (c) 
0.378 (mm) 0.225 (mm) Nugget effect (c0
Figure 9

(a) Semi-variance and (b) covariance of mean precipitation and energy of stations, and (c) ccomparison of predicted and observed values of precipitation and energy using RBF with spline with tension model.

Figure 9

(a) Semi-variance and (b) covariance of mean precipitation and energy of stations, and (c) ccomparison of predicted and observed values of precipitation and energy using RBF with spline with tension model.

The RBF method has five models. These models can be evaluated using different performance criteria to select the most appropriate model. In this study, different models with different parameters were tested and after interpolation via the RBF method the model with the lowest error (RMSE) and the highest correlation coefficient (R2) was considered as the best model. The results are shown in Figure 9(c). From the results, it was found that the best model for both mean monthly precipitation and energy variables was spline with tension, with R2 = 0.93 and RMSE = 4.15 for precipitation variable and R2 = 0.77 and RMSE = 2.53 × 1019 for energy variable.

The study area zoning was performed for the W2 subseries and mean monthly precipitation values using the spline with tension model as the best model. Due to the lesser variability of the W2 subseries compared to other subseries, this parameter was selected as the best input for the RBF method. According to Figure 10, it was observed that in the south and southeast of the studied area, monthly precipitation of the stations was lower than the other stations, and in the north and northwest parts, the monthly precipitation increased and reached the highest amount of precipitation. In the state of zoning based on energy values, it was found that the south and southeast of the area had the highest energy values and the northern part had the lowest energy values. The results showed an inverse relationship between the energy and monthly precipitation values.

Figure 10

Zoning of the studied area based on the precipitation and energy values using RBF method.

Figure 10

Zoning of the studied area based on the precipitation and energy values using RBF method.

For investigating the relationship between precipitation and latitude and longitude of the selected stations, Figure 11 was drawn. This figure shows that with increasing the stations' longitude, the amount of precipitation increased and the amount of energy decreased. Also, with decreasing the stations' latitude, the amounts of precipitation increased and the energy values decreased. This issue verified the obtained results of the proposed model.

Figure 11

Precipitation and energy variation of stations related to their latitude and longitude.

Figure 11

Precipitation and energy variation of stations related to their latitude and longitude.

CONCLUSION

In this study, two proposed and classical methods were used to investigate the monthly precipitation characteristics of the selected area in the United States. In the proposed method, the time series were decomposed via MODWT, different combinations of the wavelet (W) and scaling (V) coefficients were used to determine the input dataset as a basis of spatial clustering. These combinations were determined in way to cover all possible scales captured from MODWT. The proposed model's efficiency in spatial clustering stage was verified using silhouette coefficient index. Results demonstrated superior performance of MODWT–K-means in comparison to historical-based K-means approach. It was observed that the clusters captured by MODWT–K-means approach determined homogenous precipitation areas very well (based on physical analysis). In the classical method, monthly precipitation data were used as input for clustering the study area. The results showed that in the proposed method, clustering based on the combination of W2 and V4 subseries led to better results. The best number of clusters obtained was 5. The silhouette coefficient in the classical method obtained was 0.3 and in the proposed method 0.8, which indicated appropriate clustering of the selected area using the proposed method. In the RBF modeling, data distribution was first evaluated and the correlation of the data was verified. Then, the five most commonly used RBF methods were modeled and the best model was selected based on RMSE and R2. The results showed that the best model for both mean monthly precipitation and energy variables was spline with tension model. According to the results, an inverse relationship between monthly precipitation and subseries energy was obtained. It was found that the southeast of the selected area had the highest energy and the lowest precipitation values, and the northern parts had the highest precipitation and the lowest energy values. Also, variations of the precipitation and energy parameters were investigated in terms of stations' latitude and longitude. It was observed that with increasing the stations' longitude and decreasing their latitude the amount of precipitation increased and the energy values decreased. In general, the proposed model yielded better results than the classical model due to its higher silhouette coefficient and station similarities. Also, the proposed model performed better than the classical method due to less input data and computational time.

DATA AVAILABILITY STATEMENT

All relevant data are available from an online repository or repositories (https://gis.ncdc.noaa.gov/maps/ncei/summaries/monthly).

REFERENCES

REFERENCES
Adamowski
K.
Prokoph
A.
Adamowski
J.
2009
Development of a new method of wavelet aided trend detection and estimation
.
Hydrology Processes
23
(
18
),
2686
2696
.
Adamowski
J.
Adamowski
K.
Prokoph
A.
2013
Quantifying the spatial temporal variability of annual streamflow and meteorological changes in eastern Ontario and southwestern Quebec using wavelet analysis and GIS
.
Atmospheric Research
499
,
27
40
.
Agarwal
A.
Maheswaran
R.
Sehgal
V.
Khos
R.
Sivakumar
B.
Bernhofer
C.
2017
Hydrologic regionalization using wavelet-based multiscale entropy method
.
Hydrology
538
,
22
32
.
Aydin
O.
Raja
N. B.
2020
Spatial-temporal analysis of precipitation characteristics in Artvin, Turkey
.
Theoretical and Applied Climatology
142
(
1
),
729
741
.
Barton
Y.
Giannakaki
P.
Von Waldow
H.
Chevalier
C.
Pfahl
S.
Martius
O.
2016
Clustering of regional-scale extreme precipitation events in southern Switzerland
.
Monthly Weather Review
144
(
1
),
347
369
.
Buhmann
M.
Dyn
N.
1993
Spectral convergence of multiquadric interpolation
.
Proceedings of the Edinburgh Mathematical Society
36
(
2
),
319
333
.
Buytaert
W.
Celleri
R.
Willems
P.
Bièvre
B. D.
Wyseure
G.
2006
Spatial and temporal rainfall variability in mountainous areas: a case study from the south Ecuadorian Andes
.
Hydrology
329
(
3
),
413
421
.
Cheng
K.
Lin
S.-h.
Liou
J. J.
2008
Rain-gauge network evaluation and augmentation using geostatistics
.
Hydrology Processes
22
,
2554
2564
.
Chou
C. M.
2013
Enhanced accuracy of rainfall–runoff modeling with wavelet transform
.
Journal of Hydroinformatics
15
(
2
),
392
404
.
Chu
J.
Xia
J.
Xu
C.
Li
L.
Wang
Z.
2010
Spatial and temporal variability of daily precipitation in Haihe river basin, 1958–2007
.
Geographical Sciences
20
(
2
),
248
260
.
Cornish
C. R.
Bretherton
C. S.
Percival
D. B.
2006
Maximal overlap wavelet statistical analysis with application to atmospheric turbulence
.
Boundary-Layer Meteorology
119
,
339
374
.
Douglas
E. M.
Vogel
R. M.
Kroll
C. N.
2000
Trends in floods in the United States: impact of spatial correlation
.
Hydrology
240
,
90
105
.
Flyer
N.
Wright
G. B.
2009
A radial basis function method for the shallow water equations on a sphere
.
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
465
(
2106
),
1949
1976
.
Jain
A. K.
Murty
M. N.
Flynn
P. J.
1999
Data clustering: a review
.
ACM Computer Survey
31
(
3
),
264
323
.
Joshi
N.
Gupta
D.
Suryavanshi
S.
Adamowski
J.
Madramootoo
C. A.
2016
Analysis of trends and dominant periodicities in drought variables in India: a wavelet transform based approach
.
Atmospheric Research
182
,
200
220
.
Karl
T. R.
Knight
R. W.
1998
Secular trends of precipitation amount, frequency, and intensity in the united states
.
Bulletin of the American Meteorological Society
79
(
2
),
231
241
.
Kolivras
K. N.
Comrie
A. C.
2007
Regionalization and variability of precipitation in Hawaii
.
Physical Geography
28
(
1
),
76
96
.
Lettenmaier
D. P.
Wood
E. W.
Wallis
J. R.
1994
Hydro-climatological trends in the continental United States, 1948–1988
.
Journal of Climate
7
,
586
607
.
Liao
S. H.
2003
Knowledge management technologies and applications-literature review from 1995 to 2002
.
Expert Systems with Applications
25
(
2
),
155
164
.
Lins
H. F.
Slack
J. R.
1999
Streamflow trends in the United States
.
Geophysical Research Letters
26
(
2
),
227
230
.
Maheswaran
R.
Khosa
R.
2012
Comparative study of different wavelets for hydrologic forecasting
.
Computers & Geosciences
46
,
284
295
.
Maslova
I.
Ticlavilca
A. M.
Mckee
M.
2016
Adjusting waveletbased multiresolution analysis boundary conditions for longterm streamflow forecasting
.
Hydrological Processes
30
(
1
),
57
74
.
Mengmeng
M.
Baoxiang
Z.
2019
Analysis of spatial-temporal distribution characteristics of precipitation in Shandong Province from 1961 to 2017
.
IOP Conference Series: Earth and Environmental Science
376
,
012021
.
Mishra
A. K.
Özger
M.
Singh
V. P.
2009
An entropy-based investigation into the variability of precipitation
.
Hydrology
370
,
139
154
.
Mousavi
M.
Bakar
A. A.
Vakilian
M.
2015
Data stream clustering algorithms: a review
.
International Journal of Advances in Soft Computing
7
(
3
),
13
20
.
Nischitha
V.
Ahmed
S.
Varikoden
H.
Revadekar
J.
2014
The impact of seasonal rainfall variability on NDVI in the Tunga and Bhadra river basins, Karnataka, India
.
International Journal of Remote Sensing
35
(
23
),
8025
8043
.
Percival
D. B.
Walden
A. T.
2000
Wavelet Methods for Time Series Analysis, Cambridge Series in Statistical and Probabilistic Mathematics
.
Cambridge University Press
,
Cambridge, UK
.
Percival
D. B.
Lennox
S. M.
Wang
Y. G.
Darnell
R. E.
2011
Wavelet-based multiresolution analysis of wivenhoe Dam water temperatures
.
Water Resources Research
47
(
5
),
1
19
.
Pham
D. T.
Dimov
S. S.
Nguyen
C. D.
2005
Selection of K in K-means clustering
.
Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science
219
(
1
),
103
119
.
Powell
M. J. D.
1987
Radial basis functions for multivariable interpolation: a review
. In:
Algorithms for Approximation
(
Mason
J. C.
Cox
M. G.
, eds).
Clarendon Press
,
Oxford, Uk
, pp.
143
167
.
Small
D.
Islam
S.
Vogel
R.
2006
Trends in precipitation and streamflow in the eastern U.S. Paradox or perception?
.
Geophysical Research Letters
33
,
L03403
.
Unal
Y. S.
Deniz
A.
Toros
H.
Incecik
S.
2012
Temporal and spatial patterns of precipitation variability for annual, wet, and dry seasons in Turkey
.
International Journal of Climatology.
32
(
3
),
392
405
.
Viglione
A.
Laio
F.
Claps
P.
2007
A comparison of homogeneity tests for regional frequency analysis
.
Water Resources Research
43
,
W03428
.
Wei
Q.
Sun
C.
Wu
G.
Pan
L.
2017
Haihe River discharge to Bohai Bay, North China: trends, climate, and human activities
.
Hydrology Research
48
(
4
),
1058
1070
.
Zhang
Z.
Mann
M. E.
2005
Coupled patterns of spatiotemporal variability in northern hemisphere sea level pressure and conterminous U.S. drought
.
Geophysical Research
110
,
D03108
.
doi:10.1029/2004JD004896
.
Zhou
P. Y.
Chan
K. C.
2014
A model-based multivariate time series clustering algorithm. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. (V. S. Tseng, T. B. Ho, Z.-H. Zhou, A. L. P. Chen & H.-Y. Kao, eds). Springer, Cham, pp. 805–817.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).