The statistical downscaling of global circulation models presents a significant challenge in selecting appropriate input variables from a vast pool of predictors. To address the issue, we developed ensemble approach based on the Combining Multiple Clusters via Similarity Graph (COMUSA), which integrates k-means and self-organizing maps (SOMs) methods with the mutual information (MI)-random sampling approach. This innovative feature extraction technique demonstrated a 21% improvement in the classification efficacy of large-scale climatic variables. When comparing feature extraction methods, the combination of MI-random sampling and ensemble clustering yielded more accurate results than SOM clustering alone. The most efficient artificial neural networks (ANNs)-based downscaling model was employed to project near- and mid-future precipitation and temperature (2025–2035, 2035–2045), revealing varied outcomes under different scenarios (SSP3-7.0 and SSP5-8.5). Under SSP3-7.0 and SSP5-8.5, annual mean precipitation values are projected to decrease by 2–3 and by 4–5%. Also, projected annual mean temperature values indicate an increase in 21-27 and 29-35% under SSP3-7.0 and SSP5-8.5 scenarios. Integrating COMUSA ensemble clustering with MI-random sampling enhances the estimation accuracy of the ANN downscaling model, contributing to accurate projections of future precipitation and temperature values.

  • Integrating k-means and self-organizing map (SOM) as feature extraction methods to enhance the precision of the artificial neural networks downscaling model.

  • Combining Multiple Clusters via Similarity Graph (COMUSA) ensemble clustering approach to integrate k-means and SOM feature selection approaches.

  • The MI-random sampling method for selecting the dominant predictors from each cluster as the representative of the clusters.

  • Comparison of COMUSA ensemble clustering and SOM approaches.

AI

artificial intelligence

ANN

artificial neural network

ARIL

average relative length

BP

back propagation

CC

correlation coefficient

COMUSA

combining multiple clusters via similarity graph

DL

distance of lower bands

DU

distance of upper bands

FFNN

feed-forward neural network

GCMs

global circulation models

GT

gamma test

IPCC

Intergovernmental Panel on Climate Change

KGE

Kling–Gupta efficiency

LM

Levenberg–Marquardt

MI

mutual information

NSE

Nash–Sutcliffe efficiency

PCA

principal component analysis

RCMs

regional climate models

RMSE

root mean square error

SC

Silhouette coefficient

SOM

self-organizing maps

SSP

shared socioeconomic pathway

TS

tangent sigmoid

The emission of greenhouse gases is a significant factor contributing to climate change. According to the Intergovernmental Panel on Climate Change (IPCC), human activities are the primary drivers behind the increasing levels of greenhouse gases, which in turn lead to global warming and significant disruptions in the water cycle (IPCC 2018). Consequently, accurate forecasting of regional variations in precipitation and temperature is essential for developing effective strategies to adapt to and mitigate the adverse effects of climate change (Mirdashtvan et al. 2019; Zhang et al. 2022). Global circulation models (GCMs) are widely regarded as the most reliable frameworks for projecting future climate change scenarios (Rahimi et al. 2021). However, their coarse spatial resolution often fails to capture the fine-scale processes necessary for accurate regional climate predictions (Wilby et al. 2002). Therefore, GCM outputs cannot be directly used for simulating and projecting the impacts of climate change on land surface variables.

To address this limitation, downscaling methods have been developed to enhance the resolution of projected climatic variables, including precipitation, air temperature, and humidity (Mora et al. 2014; Elkiran et al. 2021; Mirdashtvan et al. 2021; Chen et al. 2023). Downscaling techniques are generally classified into two broad categories: dynamical and statistical methods (Mirdashtvan et al. 2018). High-resolution regional climate models use boundary conditions from GCMs to derive finer climate variables at the local scale. In contrast, statistical downscaling techniques establish empirical relationships between local climate variables (predictands) and large-scale GCM outputs (predictors) (Mirdashtvan & Malekian 2020).

Statistical downscaling methods are increasingly favored for their simplicity and lower computational costs (Tavakol-Davani et al. 2013). Among various statistical techniques, the use of artificial intelligence (AI) methods, such as artificial neural networks (ANNs), is increasingly successful. This is attributed to their effectiveness in capturing the nonlinear dynamics of hydro-climatic variables across different spatiotemporal scales (Nourani et al. 2018; Haji Hosseini et al. 2020; Wang et al. 2020; Rabezanahary Tanteliniaina et al. 2021; Gumus et al. 2023).

The use of ANN-based downscaling models has demonstrated various advantages and disadvantages, highlighting both their effectiveness and limitations in processing GCM outputs (Snell et al. 2000; Hosseini Baghanam et al. 2019). The differing results across studies utilizing ANN-based downscaling can often be attributed to the quality and quantity of GCM data used as input. The inclusion of irrelevant or insignificant data can create significant challenges during the training of AI-based models (Wang et al. 2024). Thus, selecting the most relevant features as potential input variables is crucial for enhancing the efficiency of ANN-based downscaling models (Ahmadi et al. 2015; Asghari & Nasseri 2015; Ang et al. 2023; Ghimire et al. 2023).

Given the importance of robust feature selection methods in data mining-based downscaling approaches, several commonly used techniques have proven effective in identifying the most relevant input variables for statistical downscaling. These techniques include the correlation coefficient (CC) (Mehta et al. 2023; Nourani et al. 2023), principal component analysis (Haji Hosseini et al. 2020), mutual information (MI) (Nasseri et al. 2013), decision trees (Nourani et al. 2018), and the gamma test (Ahmadi et al. 2015). In recent decades, several studies have utilized various clustering-based feature selection methods coupled with different pre-processing techniques (e.g., MI and CC) to identify the dominant inputs of AI-based models in hydro-environmental contexts (Bowden et al. 2002; Chang et al. 2016; Feng et al. 2023) and the statistical downscaling of GCMs (Sehgal et al. 2018; Hosseini Baghanam et al. 2019).

The process of selecting clustering methods is influenced by their underlying assumptions, and there is no consensus on which technique performs best. To address the issue, researchers have proposed a range of solutions, including the use of ensemble clustering approaches. One such method, Combining Multiple Clusters via Similarity Graph (COMUSA), developed by Mimaroglu & Erdil (2011), has been widely used in various studies, coupled with filter-based feature selection via MI to identify dominant inputs for AI-based hydrological models (Nourani et al. 2022; Sharghi et al. 2022).

Given the dynamic nature of hydro-climatic data, the hybrid MI-random sampling technique (MI-random sampling) offers a preferred alternative to MI for representative selection within each cluster. This technique increases the number of dominant data points and fills the gap between dominant and secondary data, potentially better representing the entire time series during clustering.

To the authors' best knowledge, prior research has primarily focused on the effectiveness of ensemble clustering methods as feature extraction approaches for AI-based hydrological models. However, their application as feature selection methods for identifying and grouping similar parameters in the statistical downscaling of GCMs is relatively limited. Furthermore, the use of the MI-random sampling technique to represent the entire time series during clustering is underexplored, particularly when coupled with clustering-based feature extraction approaches. The innovation of current study lies in addressing the following research objectives:

  • Integrating k-means and self-organizing map techniques with the COMUSA ensemble clustering approach as feature extraction methods for an AI-based downscaling model.

  • Coupling MI-random sampling with COMUSA ensemble clustering in the statistical downscaling of predictands.

Case study and data explanation

The primary goal of this study is to integrate ensemble clustering with the MI-random sampling technique as a feature extraction method to develop a robust AI-based statistical downscaling model for projecting monthly precipitation and temperature at the Ardabil station. The Ardabil plain (38° 22′ N, 48° 30′ E) is located in the northwest of Iran (Figure 1). The plain encompasses three primary rivers: the Gharasu River, the Balikhli River, and the Ghoorichay River. The Yamchi Dam, with an effective storage capacity of 80 million cubic meters, is located on the Balikhli River in the southwest of Ardabil city. This dam currently plays a crucial role in meeting a significant portion of the agricultural and domestic water needs of Ardabil city, making it vital for water storage and ensuring a stable supply during dry periods. However, climate change, characterized by frequent and severe droughts and increased flood events, poses significant risks to water resources, agriculture, and infrastructure in the Ardabil plain. Therefore, understanding these changes and projecting future climatic variables are essential for developing effective adaptation strategies to mitigate the impacts of climate change.
Figure 1

Case study location and the selected GCM grid points around the Ardabil station.

Figure 1

Case study location and the selected GCM grid points around the Ardabil station.

Close modal

In this study, the monthly observed precipitation and temperature data of the Ardabil synoptic station were collected from the Iran Meteorological Organization for the period 1981–2014. Different GCMs from the IPCC's 6th Assessment Report (CMIP6) were considered in this study (see Table 1). Among 25 GCMs considered, three GCMs (i.e., ACCESS-CM2, FGOALS-g3, and CanESM5-CanOE) were selected based on CC metric. The CC was calculated by assessing the linear relationships between the precipitation and temperature data from each GCM and the observed historical data, all at a monthly temporal resolution. This analysis evaluated the performance of each GCM in simulating monthly precipitation and temperature patterns. The GCMs with the highest CCs, indicating the best agreement with observed historical data, were selected for the modeling procedure. The monthly historical GCM dataset (1981–2014) and their projections under the different shared socioeconomic pathway (SSP) scenarios (i.e., SSP1–2.6, SSP3–7.0, and SSP5–8.5) were obtained from the Copernicus Climate Change Service (https://cds.climate.copernicus.eu/).

Table 1

GCM predictors used in the present study

NO.PredictorsDescription
pr Precipitation 
taa Air temperature 
hura Relative humidity 
husa Specific humidity 
uaa Eastward wind 
zga Geopotential height 
vaa Northward wind 
uas Eastward near-surface wind 
evspsbl Evaporation including sublimation and transpiration 
10 tas Near-surface air temperature 
11 huss Near-surface specific humidity 
12 vas Northward near-surface wind 
13 psl Sea level pressure 
14 tauv Surface downward northward wind stress 
15 rsds Surface downwelling shortwave radiation 
16 ts Surface temperature 
17 hfls Surface upward latent heat flux 
18 rlus Surface upwelling longwave radiation 
19 rsdt TOA incident shortwave radiation 
20 rsut TOA outgoing shortwave radiation 
21 tasmax Daily maximum near-surface air temperature 
22 tasmin Daily minimum near-surface air temperature 
23 hurs Near-surface relative humidity 
24 sfcWind Near-surface wind speed 
26 ps Surface air pressure 
27 tauu Surface downward eastward wind stress 
28 rlds Surface downwelling longwave radiation 
29 snw Surface snow amount 
30 hfss Surface upward sensible heat flux 
31 rsus Surface upwelling shortwave radiation 
32 rlut TOA outgoing longwave radiation 
33 clt Total cloud cover percentage 
NO.PredictorsDescription
pr Precipitation 
taa Air temperature 
hura Relative humidity 
husa Specific humidity 
uaa Eastward wind 
zga Geopotential height 
vaa Northward wind 
uas Eastward near-surface wind 
evspsbl Evaporation including sublimation and transpiration 
10 tas Near-surface air temperature 
11 huss Near-surface specific humidity 
12 vas Northward near-surface wind 
13 psl Sea level pressure 
14 tauv Surface downward northward wind stress 
15 rsds Surface downwelling shortwave radiation 
16 ts Surface temperature 
17 hfls Surface upward latent heat flux 
18 rlus Surface upwelling longwave radiation 
19 rsdt TOA incident shortwave radiation 
20 rsut TOA outgoing shortwave radiation 
21 tasmax Daily maximum near-surface air temperature 
22 tasmin Daily minimum near-surface air temperature 
23 hurs Near-surface relative humidity 
24 sfcWind Near-surface wind speed 
26 ps Surface air pressure 
27 tauu Surface downward eastward wind stress 
28 rlds Surface downwelling longwave radiation 
29 snw Surface snow amount 
30 hfss Surface upward sensible heat flux 
31 rsus Surface upwelling shortwave radiation 
32 rlut TOA outgoing longwave radiation 
33 clt Total cloud cover percentage 

aPredictors at 100, 500, 1000, 2000, 3000, 5000, 7000, 10,000, 15,000, 20,000, 25,000, 30,000, 40,000, 50,000, 60,000, 70,000, 85,000 Pressure heights.

Some previous studies have highlighted the advantages of using data from multiple grid points surrounding the study area (Tavakol-Davani et al. 2013; Beecham et al. 2014). Consequently, predictors were derived from four grid points (see Figure 1). The selected predictors used in the statistical downscaling process for both temperature and precipitation are listed in Table 2.

Table 2

The selected best GCMs based on CC values (blacked and underlined models)

ModelPrecipitationTemperatureModelPrecipitationTemperature
CCCCCCCC
CanESM5-CanOE 0.69 0.95 MIROC-ES2L 0.43 0.90 
INM-CM4-8 0.06 0.71 E3SM-1-1-ECA 0.42 0.90 
FGOALS-f3-L 0.30 0.82 BCC-ESM1 0.27 0.81 
FGOALS-g3 0.51 0.94 CESM2-WACCM 0.40 0.89 
AWI-CM-1-1-MR 0.29 0.81 GFDL-ESM4 0.30 0.88 
AWI-ESM-1-1-LR 0.34 0.90 MPI-ESM1-2-HR 0.25 0.84 
HadGEM3-GC31-LL 0.30 0.87 MPI-ESM1-2-LR 0.41 0.91 
ACCESS-CM2 0.54 0.92 KACE-1-0-G 0.30 0.85 
ACCESS-ESM1–5 0.29 0.85 CMCC-CM2-HR4 0.36 0.82 
EC-Earth3-CC 0.36 0.90 CMCC-CM2-SR5 0.29 0.81 
EC-Earth3-Veg-LR 0.38 0.88 CMCC-ESM2 0.40 0.90 
EC-Earth3-AerChem 0.40 0.89 CESM2-WACCM-FV2 0.44 0.92 
IPSL-CM6A-LR 0.31 0.87    
ModelPrecipitationTemperatureModelPrecipitationTemperature
CCCCCCCC
CanESM5-CanOE 0.69 0.95 MIROC-ES2L 0.43 0.90 
INM-CM4-8 0.06 0.71 E3SM-1-1-ECA 0.42 0.90 
FGOALS-f3-L 0.30 0.82 BCC-ESM1 0.27 0.81 
FGOALS-g3 0.51 0.94 CESM2-WACCM 0.40 0.89 
AWI-CM-1-1-MR 0.29 0.81 GFDL-ESM4 0.30 0.88 
AWI-ESM-1-1-LR 0.34 0.90 MPI-ESM1-2-HR 0.25 0.84 
HadGEM3-GC31-LL 0.30 0.87 MPI-ESM1-2-LR 0.41 0.91 
ACCESS-CM2 0.54 0.92 KACE-1-0-G 0.30 0.85 
ACCESS-ESM1–5 0.29 0.85 CMCC-CM2-HR4 0.36 0.82 
EC-Earth3-CC 0.36 0.90 CMCC-CM2-SR5 0.29 0.81 
EC-Earth3-Veg-LR 0.38 0.88 CMCC-ESM2 0.40 0.90 
EC-Earth3-AerChem 0.40 0.89 CESM2-WACCM-FV2 0.44 0.92 
IPSL-CM6A-LR 0.31 0.87    

k-means and SOM clustering

k-means is commonly used as a linear unsupervised clustering technique in the fields of climate and water science due to its effectiveness and simplicity (Nasseri & Zahraie 2011; Kissi et al. 2023). This algorithm categorizes a given dataset into k clusters, aiming to group data points within each cluster as closely as possible while maximizing the separation from points in other clusters. The objective function of the k-means algorithm minimizes the sum of the distances between data points within each cluster and their respective centroids. Additionally, self-organizing maps (SOMs) are a reliable and widely used nonlinear unsupervised clustering technique that has proven effective in climatic research (Hosseini Baghanam et al. 2019; Takong & Abiodun, 2023). SOM groups homogeneous data with similar patterns by transforming high-dimensional data into simpler geometric relationships on a two-dimensional lattice represented by nodes. Each node in the lattice is assigned a weight vector with the same dimensionality as the input vector. To identify the node with the closest weight vector to a given n-dimensional input vector, the Euclidean distance is calculated, as shown in Equation (1) (Kohonen 2001). The SOM technique offers a valuable approach for identifying and analyzing patterns in complex climatic data (Huang & Chang 2021):
(1)
where is the ith data set and denotes the weight vector. The weight vector with the highest similarity to the input vector is the conqueror node, recognized as the best matching unit (BMU). For further shortening the distance between the weights and BMU, the weights are updated in every training iteration as follows (Kohonen 2001):
(2)
where β represents the learning rate, which typically ranges from 0 to 1 and is the proximity function.

In this study, both k-means and SOM clustering techniques were employed for their respective advantages in handling different aspects of the data. k-means was chosen for its straightforward application and rapid convergence, with its simple linear structure making it easy to implement and understand. In contrast, SOM was utilized for its ability to transform complex, nonlinear statistical associations among high-dimensional attributes into simple geometric relationships on a low-dimensional map, while preserving the dataset's structure. By leveraging k-means for its linear capabilities with SOM for its nonlinear approach, we employed two complementary methods to effectively identify patterns in the data.

Ensemble of k-means and SOM clustering

The ensemble clustering technique aims to integrate multiple base clustering algorithms to create a robust and accurate clustering approach that produces reliable results. Several ensemble clustering methods have been proposed, including factor graphs (Huang et al. 2016), weighted co-association matrices (Berikov & Pestunov 2017), and density-based similarity matrices (Beauchemin 2015). However, there is currently no consensus on the best technique among these approaches. In this study, the similarity graph (SG) approach was employed to integrate k-means and SOM clustering methods for the input dataset (D). This approach is regarded as one of the most accurate and straightforward methods for ensemble clustering. It combines the outputs of k-means and SOM, resulting in a unified clustering solution. The SG approach, as described by Mimaroglu & Erdil (2011), is outlined as follows:

  • 1. In the first step, each individual clustering approach is defined as where Ci is a cluster of ; and is the best cluster sets of each individual clustering approach.

  • 2. In the second step, the similarity matrix (SM) is developed using:
    (3)
  • where votesi.j denotes the number of times objects i and j are allocated to the same clusters.

  • To demonstrate the SM, the SG can be utilized as an undirected and weighted graph, where SG = (D, E) and each edge (di. dj) has a sign related to the SMij in the SM.

  • 3. In the third step, the attachment index, which attempts to form new clusters, is defined as:
    (4)
  • where df (di) denotes the degrees of freedom and sw(di) is the sum of the weights of the edges connected to di. It should be noted that the member with the maximum attachment index is chosen as the pivot (initial member).

  • 4. First, all nearby neighbors are considered to expand the pivot item within each singular cluster. A neighbor is incorporated into the cluster of a pivot if it exhibits the highest similarity to that pivot. Once a neighbor is added, it becomes a new pivot and evaluates its own nearby neighbors for further expansion. The cluster expansion in COMUSA stops when the current pivots can no longer incorporate additional objects. If there are any remaining unassigned objects in the input dataset, COMUSA initiates the creation of a new cluster by selecting a new pivot. This clustering process continues until all objects in the dataset are assigned to a cluster. Once all objects have been assigned, COMUSA terminates its operation. For more details on COMUSA, please refer to the work by Mimaroglu & Erdil (2011).

Artificial neural networks

In the present study, the statistical downscaling of GCM data was performed using a three-layer feed-forward neural network (FFNN). Previous research has shown that FFNN equipped with the back propagation (BP) algorithm are commonly employed to establish regression-based relationships between hydro-climatologic predictors and predictands (Maier & Dandy 2000). To achieve the highest efficiency of the three-layer FFNN-BP model, the Levenberg–Marquardt scheme was utilized for training the ANNs due to its faster convergence rate (Haykin 1994).

In this study, the tangent sigmoid activation function was selected as the nonlinear kernel for the ANNs. The training process of the network was terminated when the error rate on the test data increased, indicating the completion of the training phase. It is important to note that a crucial aspect of ANN modeling is the design of appropriate architectures, including determining the number of hidden neurons and the number of iterations (epochs). The optimal network structures were obtained through a trial-and-error process. For a more comprehensive explanation of the mathematical principles underlying ANNs, readers are advised to see the work by Haykin (1994).

In this study, the ANN model was utilized for statistical downscaling due to its widespread popularity, ease of application, and proven accuracy.

Mutual Information

MI is a statistical dependency metric derived from Shannon's entropy. As a commonly used feature extraction method, MI detects nonlinear relationships between predictors and predictands while reducing computational costs. Shannon information content is mathematically formulated using data probability distributions, applicable in both discrete and continuous forms depending on the data and problem context (Macedo et al. 2022). In this study, the MI approach is employed to compute nonlinear relationships between the predictors of each cluster and the predictands (i.e., precipitation and temperature). For a deeper understanding of the fundamental mathematics of MI, readers are referred to Shannon (1948).

Evaluation metrics

Cluster evaluation metrics

The Silhouette coefficient (SC) is a commonly used indicator for evaluating the performance of clustering methods by assessing consistency within clusters (Rousseeuw 1987). It is computed using the Silhouette index formula:
(5)
where a(i) is the average dissimilarity of member i to the rest of the members in the same cluster and b(i) is the lowest average dissimilarity of member i to the members within a different cluster. The s(i) ranges from −1 to 1, indicating the accuracy of cluster assignment: nearing 1 indicates proper cluster assignment, −1 indicates misclassification, and close to zero indicates being far apart from two clusters. The SC, as a vital criterion to assess clustering quality and determine the optimal number of clusters, is computed as:
(6)
where N represents the number of members and k represents the number of clusters. The greatest average Silhouette index across different cluster numbers (i.e., k = 2, 3, … ,N) indicates the optimal number of clusters.

Model evaluation metrics

In model evaluation, comparing simulated and observed data is essential. In this study, four commonly used criteria, namely Nash–Sutcliffe efficiency (NSE), CC, root mean square error (RMSE), and Kling–Gupta efficiency (KGE), were utilized to assess the performance of the downscaling model. NSE (Equation (7)) measures match quality (with a range from −∞ to 1), CC (Equation (8)) assesses linear correlation (between from −1 to +1), RMSE (Equation (9)) evaluates prediction accuracy (with a range from 0 to ∞), and KGE (Equation 10) considers various statistical metrics (ranging from −∞ to 1). Ideal values are 1 for NSE, CC, and KGE, and 0 for RMSE, indicating optimal model performance (Moriasi et al. 2007; Knoben et al. 2019):
(7)
(8)
(9)
(10)
where N, , , , and are the total number of time steps, simulated data at time t, mean simulated data, observed data at time t, and mean observed data, respectively. and are the standard deviation of simulated and observed data, respectively.

Uncertainty evaluation metrics

In the uncertainty analysis of a downscaling model based on different input selection approaches, several uncertainty evaluation metrics should be assessed. Jin et al. (2010) proposed two indices for uncertainty assessment. The first index (i.e., Pcl) indicates the percentage of observed data that lie within the confidence levels of simulated data. The Pcl index is given as follows:
(11)
where NQin is the number of observed data that are located within the confidence level of simulated data and N denotes the number of time steps. The average relative length (ARIL), as the second index, assesses the width of the simulated confidence level versus the observed data. ARIL is defined as:
(12)
where UP limitt denotes the upstream confidence level of simulated data for tth month, LOW limitt denotes the downstream confidence level of simulated data for tth month, Obst is the observed data for tth month, and N denotes the number of time steps. A higher value of (i.e., 100) and a lower value of ARIL (i.e., 0) signify better performance (Jin et al. 2010; Lu et al. 2010).
In uncertainty analysis, has an upper and lower limits. Due to its indirect dependency on the mean of the observed data, this index has only been utilized to assess mean observation points grouped by simulated confidence interval. However, a significant limitation arises when all mean observed data fall within the simulated confidence level. In such cases, alone does not provide a complete picture of the uncertainty assessment. To address this limitation, we have introduced two new indices, namely the distance of upper (DU) bands and the distance of lower (DL) bands. These metrics aim to evaluate the distribution of the observed data lying outside the upper and lower simulated confidence levels for providing a more nuanced assessment of model performance, helping to identify and select the most suitable model. It should be noted that DU represents the difference between the upper simulated confidence level and the upper observed confidence interval, while DL represents the difference between the lower simulated confidence level and the lower observed confidence interval. DU and DL can be estimated using Equations (13) and (14), respectively:
(13)
(14)
where and represent the upper and lower bands of the observed confidence interval, respectively, and and denote the upper and lower bands of the simulated confidence level, respectively. A lower value for both DU and DL indicates superior performance.
The proposed methodology consists of three primary steps, as illustrated in Figure 2. The first step focuses on feature selection using an ensemble of k-means and SOM clustering methods based on the COMUSA approach. Dominant predictors are then selected using the MI-random sampling method. The second step involves developing an ANN-based downscaling model, which will be trained using the input data obtained from the previous step. In the final step, the statistical downscaling model developed in the second step is used to project future climate variables for the Ardabil meteorological station under the SSP3–7.0 and SSP5–8.5 scenarios. A detailed explanation of each step follows.
Figure 2

Graphical representation of the proposed methodology *At the first step, and denote the selected predictors from the Nth cluster (N = 1–7) of ensemble clustering approach by MI and Zth random sampling (Z = 1–50) methods, respectively. GCMi denotes the selected predictors belongs to ith GCM among multi-GCM (i = ACCESS-CM2, FGOALS-g3, and CanESM5-CanOE).

Figure 2

Graphical representation of the proposed methodology *At the first step, and denote the selected predictors from the Nth cluster (N = 1–7) of ensemble clustering approach by MI and Zth random sampling (Z = 1–50) methods, respectively. GCMi denotes the selected predictors belongs to ith GCM among multi-GCM (i = ACCESS-CM2, FGOALS-g3, and CanESM5-CanOE).

Close modal

First step: Feature selection and screening of dominant inputs

Considering that various GCMs have distinct resolutions and utilize different modeling specifications, a multi-GCM ensemble (i.e., ACCESS-CM2, FGOALS-g3, and CanESM5-CanOE) was utilized. Appropriate GCMs were selected based on the highest CC values between the GCM predictors and observed precipitation and temperature datasets. This ensemble approach aims to reduce uncertainties and encompass both the advantages and limitations of multiple GCMs.

Given that prevailing climatic conditions in each region significantly influence the climate of adjacent areas, the predictors are evaluated across the four grid points surrounding the research area. The four closest grid points to the Ardabil synoptic station are illustrated by , , and (i = 1–4) for FGOALS-g3, CanESM5-CanOE, and ACCESS-CM2, respectively (see Figure 1).

In general, predictors do not uniformly influence to the predictands. While some may exhibit a strong correlation, others may show little relevance. Moreover, using a large set of predictors can diminish the accuracy of the ANN downscaling model. Feature selection in this context involves identifying a subset of relevant variables (features) from a larger dataset using clustering approach coupled with pre-processing techniques that contribute most effectively to the prediction of a target outcome, such as precipitation and temperature. Therefore, it is essential to group similar predictors and select dominant ones from each cluster. To achieve this, the COMUSA ensemble clustering algorithm, which leverages the strengths of both k-means and SOM, was employed to identify optimal cluster structures. The similarity metrics (linear, nonlinear, and multi-linear) between a predictor and predictand within a cluster may not always yield the highest correlation but can still exceed the maximum similarity values found in other clusters. Consequently, a hybrid MI and random sampling technique was used to select dominant predictors from the clusters. In this process, MI was applied alongside a random sampling to choose a representative from each cluster.

Second step: AI-based statistical downscaling model

The second step involves the developed ANN-based downscaling methods. These models are trained using the dominant predictors identified in the first step. It is worth noting that standardizing the GCM's outputs is highly recommended (Wilby & Dawson 2004).

Third step: Future precipitation and temperature projection

In the final step, the calibrated downscaling model was employed to project future precipitation and temperature for the Ardabil synoptic station. Projections were conducted under two SSP scenarios: SSP3–7.0 and SSP5–8.5, for the periods 2025–2035 and 2035–2045. These scenarios are considered to yield more realistic and appropriate outcomes; thus, they are recommended for inclusion in CMIP6 to assess the impacts of climate change (Hausfather & Peters 2020; Nourani et al. 2023).

The purpose of this study was to evaluate the effectiveness of coupled ensemble clustering methods with MI-random sampling as a robust pre-processing technique in ANN-based downscaling to project precipitation and temperature at the target station. Since the proposed methodology consists of three phases, the results are accordingly presented in three sections as follows.

Results of feature selection

Considering the correlation between historical precipitation and temperature from 25 GCMs and the predictands over the period 1981–2014, suitable GCMs were selected. Consequently, FGOALS-g3, ACCESS-CM2, and CanESM5-CanOEP5 were employed in the modeling procedure. The results of the grid points with the highest CCs with the predictands are listed in Table 2. After identifying the main GCMs, the proposed feature extraction approach was applied to select the most dominant predictors from the high-dimensional input matrix across multiple GCMs. Each GCM contains 119 predictors at 17 pressure heights, resulting in a total of 119 × 3 × 4 = 1,428 predictors to evaluate in the downscaling process.

In the first step, all predictors from the three GCMs were clustered using two distinct methods including k-means and SOM. By evaluating cluster numbers from 2 to 10, the optimal number of clusters was determined based on the SC metric (Table 3). Mean SC values greater than 0.5 indicate well-structured clusters, while values below 0.5 suggest poorly structured clusters. According to Table 3, k-means clustering with seven clusters was selected as the best structure for more precise differentiation. The mean SC value for the SOM clustering technique reached a maximum of 0.571 with six clusters. A comparison of SC values between the two methods shows that SOM outperformed k-means across all cluster numbers. This superiority can be attributed to the nonlinear nature of SOM and its ability to manage complex interactions within datasets. After clustering the input variables using distinct methods, the COMUSA ensemble clustering technique was employed as a pre-processing approach to integrate and enhance the best outcomes of the k-means and SOM. The results of the evaluation metrics for the ensemble clustering approach are presented in Table 4. A comparison of the SC values between ensemble clustering (Table 4) and the individual k-means and SOM methods (Table 3) demonstrates that ensemble clustering can improve the efficacy of these methods by up to 21%, effectively recognizing patterns and features of climatic variables.

Table 3

Evaluation of various numbers based on the mean SC (blacked and underlined cluster numbers)

MethodsNumber of clusters
2345678910
k-means 0.51 0.41 0.41 0.34 0.40 0.51 0.38 0.30 0.29 
SOM 0.53 0.53 0.52 0.47 0.57 0.52 0.44 0.41 0.40 
MethodsNumber of clusters
2345678910
k-means 0.51 0.41 0.41 0.34 0.40 0.51 0.38 0.30 0.29 
SOM 0.53 0.53 0.52 0.47 0.57 0.52 0.44 0.41 0.40 
Table 4

Evaluation results of ensemble clustering methods based on the mean SC and number of predictors in each cluster

Ensemble clusteringSCNumber of clusters
1234567
 0.634 Number of predictors in each cluster 15 229 67 302 454 329 33 
Ensemble clusteringSCNumber of clusters
1234567
 0.634 Number of predictors in each cluster 15 229 67 302 454 329 33 

Figure 3 illustrates three clusters from COMUSA (i.e., clusters 1, 3, and 7) as examples. The figure shows that COMUSA has created novel clusters with a higher level of differentiation, characterized by greater dissimilarity between clusters and similarity within each cluster. The red node in the figure represents the initial pivot item of the clusters, which has the maximum attachment amount, the highest sum of weights, and the minimum freedom degree. The blue nodes in the SG indicate predictors that are near neighbors of the pivot node.
Figure 3

Similarity graphs of predictors using COMUSA ensemble clustering technique: (a) cluster 1, (b) cluster 2, and (c) cluster 3.

Figure 3

Similarity graphs of predictors using COMUSA ensemble clustering technique: (a) cluster 1, (b) cluster 2, and (c) cluster 3.

Close modal

As the final phase of pre-processing, dominant predictors from each cluster were selected using a hybrid approach that combines MI and random sampling. These selected predictors are considered the best representatives of the clusters. Given the superior performance of ensemble clustering compared to individual clustering methods, the MI-random sampling feature selection method was applied 50 times to determine the dominant predictors for ensemble clustering. Additionally, to compare the results of ensemble clustering with those of distinct methods in the downscaling process, the dominant predictors from SOM (due to their superior SC values) were also selected using the MI-random sampling feature extraction method. The multi-GCM dominant predictors selected by the MI-random sampling technique from the clusters of both ensemble clustering and SOM methods are presented in the Supplementary Materials, Tables S1 and S2, respectively.

Results of AI-based statistical downscaling model

In the second step, an ANN-based downscaling model was employed as the statistical downscaling method. The main predictors selected in the previous step through MI-random sampling from the two clustering approaches (ensemble clustering and SOM) were standardized over the baseline period from 1981 to 2014. To calibrate and validate the developed model, the predictors and predictands dataset was divided into calibration (75% from 1981 to 2006) and validation (25% from 2006 to 2014) sets. This data division has been widely used in AI-based hydro-climatological modeling studies (Chau 2007; Komasi & Sharghi 2016; Nourani et al. 2018). To expedite the training process, both input and output data were normalized before training.

A three-layer FFNN using a BP algorithm was employed to downscale the precipitation and temperature for the Ardabil synoptic station. To identify the optimal number of hidden layer neurons, a sequential search was performed over 1,000 epochs. The evaluation metrics indicated that the best training epoch and the optimal number of hidden neurons were found within the ranges of 70–240 and 3–8, respectively. Subsequently, the four statistics (NSE, RMSE, CC, and KGE) of the downscaled precipitation and temperature (using predictors obtained through MI and MI-random sampling from ensemble clustering) are reported in Table 5. The results suggest that ensemble clustering combined with MI-random sampling outperforms the combination of ensemble clustering with MI. This superiority may stem from MI-random sampling's ability to explore different sections of the feature space, thereby introducing diversity into the feature selection process.

Table 5

ANN-based downscaling results of precipitation and temperature utilizing prominent inputs extracted by MI and MI-random sampling from ensemble clustering approach

ApproachPrecipitation
Temperature
Train
Test
Train
Test
NSERMSECCKGENSERMSECCKGENSERMSECCKGENSERMSECCKGE
MI-random sampling 0.73 0.07a 0.90 0.72 0.59 0.07a 0.78 0.62 0.98 0.02a 0.99 0.98 0.97 0.03a 0.98 0.94 
MI 0.68 0.16a 0.79 0.67 0.50 0.14a 0.70 0. 58 0.95 0.03a 0.98 0.93 0.94 0.04a 0.97 0.91 
ApproachPrecipitation
Temperature
Train
Test
Train
Test
NSERMSECCKGENSERMSECCKGENSERMSECCKGENSERMSECCKGE
MI-random sampling 0.73 0.07a 0.90 0.72 0.59 0.07a 0.78 0.62 0.98 0.02a 0.99 0.98 0.97 0.03a 0.98 0.94 
MI 0.68 0.16a 0.79 0.67 0.50 0.14a 0.70 0. 58 0.95 0.03a 0.98 0.93 0.94 0.04a 0.97 0.91 

aNormalized RMSE results.

To provide broader insights into the effects of feature extraction methods on the downscaling models, the outcomes of combining MI-random sampling with both ensemble and SOM clustering methods are illustrated in Figures 4 and 5. The boxplots in these figures compare the distributions of statistics for downscaled precipitation and temperature based on the dominant predictors obtained from SOM and ensemble clustering approaches. As shown in Figure 4(a), 4(b), and 4(c), the ensemble clustering feature extraction approach exhibited the highest median values for similarity metrics in both training and testing processes. Additionally, it showed the lowest median values for RMSE in both processes (Figure 4(d)). Similarly, for temperature downscaling, the ensemble clustering approach demonstrated the highest median values of NSE, CC, and KGE (Figure 5(a), 5(b), and 5(c) and the lowest median value of RMSE (Figure 5(d))), indicating superior performance compared to SOM clustering.
Figure 4

Boxplots of four statistics of the downscaled precipitation by dominant inputs based on hybrid MI-random sampling with ensemble clustering and SOM methods.

Figure 4

Boxplots of four statistics of the downscaled precipitation by dominant inputs based on hybrid MI-random sampling with ensemble clustering and SOM methods.

Close modal
Figure 5

Boxplots of four statistics of the downscaled temperature by dominant inputs based on hybrid MI-random sampling with ensemble clustering and SOM methods.

Figure 5

Boxplots of four statistics of the downscaled temperature by dominant inputs based on hybrid MI-random sampling with ensemble clustering and SOM methods.

Close modal

The highest SC value for ensemble clustering compared to SOM (as shown in Tables 3 and 4) further confirms the superiority of ensemble clustering as a feature extraction method for ANN downscaling models. The robustness of ensemble clustering may be attributed to its ability to leverage the strengths of various clustering algorithms, thus addressing the limitations of individual methods. It is important to note that, due to SOM's superiority over k-means in terms of SC value, the evaluation metrics of the downscaling model based on inputs from the ensemble clustering feature extraction approach were compared with those based on inputs from SOM. In summary, the results indicate that the combination of ensemble clustering, which capitalizes on the strengths of various clustering algorithms, and MI-random sampling, which selects representative features from different parts of the feature space, significantly enhances the accuracy of ANN-based downscaling models for precipitation and temperature projections.

Considering that uncertainties propagate from the input variables to the model outcomes, it has become imperative to evaluate the uncertainty inherent in the developed ANN-based downscaling model through various input screening approaches. Consequently, confidence levels for the model were constructed. Figure 6 shows the result of the uncertainty of the downscaled precipitation and temperature using the two feature extraction approaches. As shown in the figure, the whole long-term monthly observed values are located within the 95% simulated confidence level (i.e., Pcl = 100).
Figure 6

Uncertainty of the long-term mean monthly precipitation during 1981–2014 based on (a) ensemble clustering, (b) SOM feature extraction methods; uncertainty of the long-term mean monthly temperature during 1981–2014 based on (c) ensemble clustering and (d) SOM feature extraction methods.

Figure 6

Uncertainty of the long-term mean monthly precipitation during 1981–2014 based on (a) ensemble clustering, (b) SOM feature extraction methods; uncertainty of the long-term mean monthly temperature during 1981–2014 based on (c) ensemble clustering and (d) SOM feature extraction methods.

Close modal

In addition, Table 6 shows the results of four uncertainty evaluation metrics including Pcl, ARIL, DU, and DL for downscaled precipitation and temperature using both the ensemble clustering and SOM methods. According to Table 6, the ARIL value for downscaled precipitation is 1.60 when using the ensemble clustering feature extraction approach, whereas the ARIL value for the same precipitation data is 1.50 when using the SOM method. However, the ARIL value of ensemble clustering (2.05) is lower than the results of SOM (2.15) feature selection approach for downscaled temperature, which means that the observed data are in a narrow uncertainty area. To better assess the uncertainty results, DL and DU as new uncertainty metrics are considered in addition to the Pcl, and ARIL. In both downscaled precipitation and temperature, DU and DL of the ensemble clustering feature extraction method represent more appropriate uncertainty outcomes than the SOM approach (based on Table 6). The appropriate uncertainty results of the ensemble clustering feature extraction method for the downscaled precipitation and temperature variables can be attributed to the effective clustering of the input dataset.

Table 6

Comparison of uncertainty metrics for downscaled precipitation and temperature and two feature extraction approaches

VariableEnsemble clustering
SOM
PclARILDUDLPclARILDUDL
Temperature 100 2.05 0.73 1.74 100 2.15 0.82 2.86 
Precipitation 100 1.60 11.34 −2.98 100 1.50 11.79 −3.17 
VariableEnsemble clustering
SOM
PclARILDUDLPclARILDUDL
Temperature 100 2.05 0.73 1.74 100 2.15 0.82 2.86 
Precipitation 100 1.60 11.34 −2.98 100 1.50 11.79 −3.17 

Results of the projected precipitation and temperature

In the final step, projections for precipitation and temperature at the Ardabil synoptic station under the climate change scenarios SSP3–7.0 and SSP5–8.5 were examined for the near future (2025–2035) and mid-future (2035–2045) periods. Based on the results from the second step, the best-performing ANN downscaling model – the calibrated ANN model using inputs from ensemble clustering combined with the MI-random sampling feature extraction approach – was employed for these projections. Figure 7(a) and 7(b) shows the variation in mean monthly projected precipitation and temperature for the near and mid-future periods compared to observations under the SSP3–7.0 and SSP5–8.5 scenarios. The projected precipitation indicates reductions of 2–5% relative to the baseline period for both the near and mid-future. Figure 7(a) highlights noticeable decreasing trends in mean monthly precipitation across both projected time frames. Additionally, mean precipitation values for fall (October to December) and spring (April to June) are expected to decline under both climate change scenarios.
Figure 7

Projected monthly (a) precipitation and (b) temperature variation compared to base line period.

Figure 7

Projected monthly (a) precipitation and (b) temperature variation compared to base line period.

Close modal

Conversely, projected winter and summer precipitation shows increasing trends in the mid-term under both scenarios, indicating a shift in precipitation patterns compared to historical data. However, the increase in winter and summer precipitation is less pronounced than the decrease in spring and fall precipitation. In the near future, anticipated precipitation is expected to exceed that of the mid-future in all seasons except summer under the SSP3–7.0 scenario. Under SSP5–8.5, a decrease in precipitation is projected for spring, fall, and winter in the near future, while summer shows an increasing trend, as illustrated in Figure 7(a). Furthermore, the analysis of projected mean temperatures under both scenarios reveals a substantial rise in mean monthly and seasonal temperatures for both the near and mid-future periods, as shown in Figure 7(b). Mean temperature values under the SSP3–7.0 and SSP5–8.5 scenarios exhibit increasing trends of 21–27 and 29–35%, respectively, compared to the baseline period.

Figure 8(a) and 8(b) illustrates the annual variation of precipitation and temperature at the Ardabil station, showcasing both baseline and projected values under the SSP3–7.0 and SSP5–8.5 scenarios. The figures reveal contrasting trends: a decrease in precipitation and an increase in temperature. This observed pattern aligns with conclusions from prior research (Malekian & Kazemzadeh 2016; Nourani et al. 2023). The projected decrease in precipitation corresponds with the increase in temperature. The Caspian Sea, located to the east of Ardabil, significantly influences the station's precipitation patterns by providing moisture. As temperatures rise, the saturation vapor pressure increases more rapidly, enhancing the air's capacity to hold water vapor compared to colder air. Consequently, this temperature rise is associated with reduced annual precipitation. These findings are consistent with recent research by Shang et al. (2023). The mean annual precipitation at the Ardabil station during the baseline period is 291 mm. Under the SSP3–7.0 scenario, the projected mean annual precipitation decreases to 281 mm, while under the SSP5–8.5 scenario, it further declines to 271 mm (Figure 8(a)). Regarding temperature changes shown in Figure 8(b), the mean annual temperature during the baseline period is approximately 9 °C. Under the SSP3–7.0 scenario, the projected mean annual temperature rises to 11 °C, and under the SSP5–8.5 scenario, it increases to 12 °C. Both greenhouse gas emissions trajectories, SSP3–7.0 and SSP5–8.5, exhibit similar trends in future precipitation and temperature projections for the region. However, the extreme emissions and rapid economic growth associated with SSP5–8.5 result in higher mean annual temperatures compared to SSP3–7.0. Despite lower temperatures in SSP3–7.0, this pathway shows higher mean annual precipitation.
Figure 8

Mean annual (a) precipitation and (b) temperature variation during 1981–2045.

Figure 8

Mean annual (a) precipitation and (b) temperature variation during 1981–2045.

Close modal

To assess the potential variability of projected precipitation and temperature values, uncertainty bands were calculated, representing two standard deviations around the yearly mean of the models. In Figure 8(a), the uncertainty band for projected precipitation ranges from 219 to 345 mm under SSP3–7.0 and from 215 to 325 mm under SSP5–8.5. In Figure 8(b), the uncertainty bands for projected temperature vary from 7 to 16 °C for SSP3–7.0 and from 7 to 11 °C for SSP5–8.5. A comparison of the uncertainty band widths reveals that the band is wider for SSP3–7.0 in both projected precipitation and temperature, indicating greater uncertainty associated with projections under this scenario compared to SSP5–8.5.

In this study, we assessed the impact of climate change on precipitation and temperature at the Ardabil synoptic station for the near (2025–2035) and mid-future (2035–2045) periods using an ANN-based downscaling model. Three different GCMs from CMIP6 were incorporated into the downscaling models under the SSP3–7.0 and SSP5–8.5 climate change scenarios, employing a multi-GCM input technique. To identify the most informative predictors across multiple grid points, we applied the innovative coupled COMUSA ensemble clustering method along with MI-random sampling for feature selection.

The SC indicated that ensemble clustering yielded more accurate results, with SOM clustering outperforming other methods like k-means. To select a representative predictor from each cluster, we utilized MI and MI-random sampling techniques. Comparing MI and MI-random sampling methods with ensemble clustering revealed that the combination of ensemble clustering and MI-random sampling outperformed MI alone by 18 and 3% in terms of NSE during the testing of precipitation and temperature downscaling models, respectively. Furthermore, when comparing ensemble clustering with MI-random sampling to SOM clustering with MI-random sampling, the former demonstrated superior performance by 16 and 2% in mean NSE for precipitation and temperature downscaling, respectively. Overall, the projected trends indicate a decline in precipitation and an increase in temperature in the coming years. Specifically, precipitation is expected to decrease by 2–3 and 4–5% under the SSP3–7.0 and SSP5–8.5 scenarios, while temperatures are projected to rise by 21–27 and 29–35% under these scenarios, respectively.

The findings provide strong evidence for the effectiveness of statistical downscaling, particularly highlighting the advantages of the COMUSA ensemble clustering combined with MI-random sampling for selecting large-scale climatic predictors. The integration of multiple clustering methods, such as Ward, SOM, and k-means, within the COMUSA approach presents a promising avenue for future research. We also suggest incorporating a decision tree input screening approach with multi-linear entities alongside MI-random sampling to select dominant predictors from the generated clusters. Additionally, committee of other AI and statistical learning methods (such as support vector machine), for statistical downscaling could enhance accuracy comparisons with the ANN model.

Z.R. contributed to conceptualization, methodology, data curation, formal analysis; investigated the work; wrote the original draft; and also reviewed and edited the manuscript. M.N. contributed to conceptualization, methodology, formal analysis; validated and supervised the work and also wrote, reviewed, and edited the manuscript. M.T. contributed to conceptualization; wrote, reviewed, and edited the manuscript; and also supervised the work. F.M. supervised the work; contributed to data curation; and also wrote and edited the manuscript.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

Ahmadi
A.
,
Han
D.
,
Kakaei Lafdani
E.
&
Moridi
A.
(
2015
)
Input selection for long-lead precipitation prediction using large-scale climate variables: A case study
,
Journal of Hydroinformatics
,
17
(
1
),
114
129
.
doi:10.2166/hydro.2014.138
.
Ang
Y. K.
,
Talei
A.
,
Zahidi
I.
&
Rashidi
A.
(
2023
)
Past, present, and future of using neuro-fuzzy systems for hydrological modeling and forecasting
,
Hydrology
,
10
(
2
),
36
.
doi:10.3390/hydrology10020036
.
Asghari
K.
&
Nasseri
M.
(
2015
)
Spatial rainfall prediction using optimal features selection approaches
,
Hydrology Research
,
46
(
3
),
343
355
.
doi:10.2166/nh.2014.178
.
Beauchemin
M.
(
2015
)
A density-based similarity matrix construction for spectral clustering
,
Neurocomputing
,
151
,
835
844
.
doi:10.1016/j.neucom.2014.10.012
.
Beecham
S.
,
Rashid
M.
&
Chowdhury
R. K.
(
2014
)
Statistical downscaling of multi-site daily rainfall in a south Australian catchment using a generalized linear model
,
International Journal of Climatology
,
34
(
14
),
3654
3670
.
doi:10.1002/joc.3933
.
Berikov
V.
&
Pestunov
I.
(
2017
)
Ensemble clustering based on weighted co-association matrices: Error bound and convergence properties
,
Pattern Recognition.
,
63
,
427
436
.
doi:10.1016/j.patcog.2016.10.017
.
Bowden
G. J.
,
Maier
H. R.
&
Dandy
G. C.
(
2002
)
Optimal division of data for neural network models in water resources applications
,
Water Resources Research
,
38
(
2
),
2-1
2-11
.
doi:10.1029/2001WR000266
.
Chang
F. J.
,
Chang
L. C.
,
Huang
C. W.
&
Kao
I. F.
(
2016
)
Prediction of monthly regional groundwater levels through hybrid soft-computing techniques
,
Journal of Hydrology.
,
541
,
965
976
.
doi:10.1016/j.jhydrol.2016.08.006
.
Chau
K.
(
2007
)
A split-step particle swarm optimization algorithm in river stage forecasting
,
Journal of Hydrology
,
346
(
3–4
),
131
135
.
doi:10.1016/j.jhydrol.2007.09.004
.
Chen
G.
,
Zhang
K.
,
Wang
S.
,
Xia
Y.
&
Chao
L.
(
2023
)
Ihydroslide3d v1.0: An advanced hydrological-geotechnical model for hydrological simulation and three-dimensional landslide prediction
,
Geoscientific Model Development
,
16
(
10
),
2915
2937
.
doi:10.5194/gmd-16-2915-2023
.
Elkiran
G.
,
Nourani
V.
,
Elvis
O.
&
Abdullahi
J.
(
2021
)
Impact of climate change on hydro-climatological parameters in North Cyprus: Application of artificial intelligence-based statistical downscaling models
,
Journal of Hydroinformatics
,
23
(
6
),
1395
1415
.
doi:10.2166/hydro.2021.091
.
Feng
Z. K.
,
Niu
W. J.
,
Zhang
T. H.
,
Wang
W. C.
&
Yang
T.
(
2023
)
Deriving hydropower reservoir operation policy using data-driven artificial intelligence model based on pattern recognition and metaheuristic optimizer
,
Journal of Hydrology
,
624
.
doi:10.1016/j.jhydrol.2023.129916
.
Ghimire
S.
,
Nguyen-Huy
T.
,
Prasad
R.
,
Deo
R. C.
,
Casillas-Pérez
D.
,
Salcedo-Sanz
S.
&
Bhandari
B.
(
2023
)
Hybrid convolutional neural network-multilayer perceptron model for solar radiation prediction
,
Cognitive Computation
,
15
(
2
),
645
671
.
doi:10.1007/s12559-022-10070-y
.
Gumus
V.
,
Moçayd
N. E.
,
Seker
M.
&
Seaid
M.
(
2023
)
Evaluation of future temperature and precipitation projections in Morocco using the ANN-based multi-model ensemble from CMIP6
,
Atmospheric Research
,
292
.
doi:10.1016/j.atmosres.2023.106880
.
Haji Hosseini
R.
,
Golian
S.
&
Yazdi
J.
(
2020
)
Evaluation of data-driven models to downscale rainfall parameters from global climate models outputs: The case study of Latyan watershed
,
Journal of Water and Climate Change
,
11
(
1
),
200
216
.
doi:10.2166/wcc.2018.191
.
Hausfather
Z.
&
Peters
G. P.
(
2020
)
Emissions – The business as usual story is misleading
,
Nature
,
577
,
618
620
.
Haykin
S.
(
1994
)
Neural Networks A Comprehensive Foundation
.
New York
:
MacMillan College Publishing Co
.
Hosseini Baghanam
A.
,
Nourani
V.
,
Keynejad
M. A.
,
Taghipour
H.
&
Alami
M. T.
(
2019
)
Conjunction of wavelet-entropy and SOM clustering for multi-GCM statistical downscaling
,
Hydrology Research
,
50
(
1
),
1
23
.
doi:10.2166/nh.2018.169
.
Huang
D.
,
Lai
J.
&
Wang
C. D.
(
2016
)
Ensemble clustering using factor graph
,
Pattern Recognition
,
50
,
131
142
.
doi:10.1016/j.patcog.2015.08.015
.
Intergovernmental Panel on Climate Change
. (
2018
)
Global warming of 1.5°C
.
Jin
X.
,
Xu
C. Y.
,
Zhang
Q.
&
Singh
V. P.
(
2010
)
Parameter and modeling uncertainty simulated by GLUE and a formal Bayesian method for a conceptual hydrological model
,
Journal of Hydrology
,
383
,
147
155
.
doi:10.1016/j.jhydrol.2009.12.028
.
Kissi
A. E.
,
Abbey
G. A.
&
Villamor
G. B.
(
2023
)
Perceptions of climate change risk on agriculture livelihood in Savanna Region, Northern Togo
,
Climate
,
11
(
4
),
86
.
doi:10.3390/cli11040086
.
Knoben
W. J. M.
,
Freer
J. E.
&
Woods
R. A.
(
2019
)
Technical note: Inherent benchmark or not? Comparing nash-Sutcliffe and Kling-Gupta efficiency scores
,
Hydrology and Earth System Sciences
,
23
(
10
),
4323
4331
.
doi:10.5194/hess-23-4323-2019
.
Kohonen
T.
(
2001
)
Self-Organizing Maps
.
New York
:
Springer Inc.
Komasi
M.
&
Sharghi
S.
(
2016
)
Hybrid wavelet-support vector machine approach for modelling rainfall–runoff process
,
Water Science and Technology
,
73
(
8
),
1937
1953
.
doi:10.2166/wst.2016.048
.
Lu
L.
,
Xia
J.
,
Xu
C. Y.
&
Singh
V. P.
(
2010
)
Evaluation of subjective factors of the GLUE and comparison with the formal Bayesian method in uncertainty assessment of hydrological models
,
Journal of Hydrology
,
390
(
3–4
),
210
221
.
doi:10.1016/j.jhydrol.2010.06.044
.
Macedo
F.
,
Valadas
R.
,
Carrasquinha
E.
,
Oliveira
M. R.
&
Pacheco
A.
(
2022
)
Feature selection using decomposed mutual information maximization
,
Neurocomputing
,
513
,
215
232
.
doi:10.1016/j.neucom.2022.09.101
.
Maier
H. R.
&
Dandy
G. C.
(
2000
)
Neural networks for the prediction and forecasting of water resources variables: A review of modelling issues and applications
,
Environmental Modeling and Software
,
15
(
1
),
101
124
.
doi:10.1016/S1364-8152(99)00007-9
.
Malekian
A.
&
Kazemzadeh
M.
(
2016
)
Spatio-Temporal analysis of regional trends and shift changes of autocorrelated temperature series in Urmia Lake basin
,
Water Resources Management
,
30
,
785
803
.
doi:10.1007/s11269-015-1190-9
.
Mehta
D.
,
Yadav
S.
,
Ladavia
C.
&
Caloiero
T.
(
2023
)
Drought projection using GCM & statistical downscaling technique: A case study of Sirohi District
,
Results in Engineering
,
20
.
doi:10.1016/j.rineng.2023.101605
.
Mimaroglu
S.
&
Erdil
E.
(
2011
)
Combining multiple clusterings using similarity graph
,
Pattern Recognition
,
44
(
3
),
694
703
.
doi:10.1016/j.patcog.2010.09.008
.
Mirdashtvan
M.
&
Malekian
A.
(
2020
)
A regional assessment of wet/dry spells characteristics using RCPs scenarios in a semiarid region
,
Arabian Journal of Geosciences
,
13
,
781
.
doi:10.1007/s12517-020-05778-w
.
Mirdashtvan
M.
,
Najafinejad
A.
,
Malekian
A.
&
Sa'doddin
A.
(
2018
)
Downscaling the contribution to uncertainty in climate-change assessments: Representative concentration pathway (RCP) scenarios for the South Alborz Range, Iran
.
Meteorological Applications
,
25
,
414
422
.
doi:10.1002/met.1709
.
Mirdashtvan
M.
,
Najafinejad
A.
,
Malekian
A.
&
Sa'doddin
A.
(
2019
)
Regional analysis of trend and non-stationarity of hydro-climatictime series in the Southern Alborz Region, Iran
,
International Journal of Climatology
,
40
(
4
),
1979
1991
.
doi:10.1002/joc.6313
.
Mirdashtvan
M.
,
Najafinejad
A.
,
Malekian
A.
&
Sa'doddin
A.
(
2021
)
Sustainable water supply and demand management in semi-arid regions: Optimizing water resources allocation based on RCPs scenarios
,
Water Resources Management
,
35
,
5307
5324
.
doi:10.1007/s11269-021-03004-0
.
Mora
D. E.
,
Campozano
L.
,
Cisneros
F.
,
Wyseure
G.
&
Willems
P.
(
2014
)
Climate changes of hydrometeorological and hydrological extremes in the Paute basin, Ecuadorean Andes
,
Hydrology and Earth System Sciences
,
18
(
2
),
631
648
.
doi:10.5194/hess-18-631-2014
.
Moriasi
D. N.
,
Arnold
J. G.
,
Van Liew
M. W.
,
Bingner
R. L.
,
Harmel
R. D.
&
Veith
T. L.
(
2007
)
Model evaluation guidelines for systematic quantification of accuracy in watershed simulations
,
Transactions of the ASABE
,
50
(
3
),
885
900
.
doi:10.13031/2013.23153
.
Nasseri
M.
&
Zahraie
B.
(
2011
)
Application of simple clustering on space-time mapping of mean monthly rainfall pattern
,
International Journal of Climatology
,
31
(
5
),
732
741
.
doi:10.1002/joc.2109
.
Nasseri
M.
,
Tavakol-Davani
H.
&
Zahraie
B.
(
2013
)
Performance assessment of different data mining methods in statistical downscaling of daily precipitation
,
Journal of Hydrology
,
492
,
1
14
.
doi:10.1016/j.jhydrol.2013.04.017
.
Nourani
V.
,
Razzaghzadeh
Z.
,
Hosseini Baghanam
A.
&
Molajou
A.
(
2018
)
ANN-based statistical downscaling of climatic parameters using decision tree predictor screening method
,
Theoretical and Applied Climatology
,
137
,
1729
1746
.
doi:10.1007/s00704-018-2686-z
.
Nourani
V.
,
Ghaneei
P.
&
Kantoush
S. A.
(
2022
)
Robust clustering for assessing the spatiotemporal variability of groundwater quantity and quality
,
Journal of Hydrology
,
604
,
127272
.
doi:10.1016/j.jhydrol.2021.127272
.
Nourani
V.
,
Hasanpour Ghareh Tapeh
A.
,
Khodkar
K.
&
Huang
J. J.
(
2023
)
Assessing long-term climate change impact on spatiotemporal changes of groundwater level using autoregressive-based and ensemble machine learning models
,
Journal of Environmental Management
,
336
.
doi:10.1016/j.jenvman.2023.117653
.
Rabezanahary Tanteliniaina
M. F.
,
Rahaman
M. H.
&
Zhai
J.
(
2021
)
Assessment of the future impact of climate change on the hydrology of the Mangoky River, Madagascar using ANN and SWAT
,
Water
,
13
(
9
),
1239
.
doi:10.3390/w13091239
.
Rahimi
R.
,
Tavakol-Davani
H.
&
Nasseri
M.
(
2021
)
An uncertainty-based regional comparative analysis on the performance of different bias correction methods in statistical downscaling of precipitation
,
Water Resources Management
,
35
,
2503
2518
.
doi:10.1007/s11269-021-02844-0
.
Rousseeuw
P. J.
(
1987
)
Silhouettes: A graphical aid to the interpretation and validation of cluster analysis
,
Journal of Computational and Applied Mathematics
,
20
,
53
65
.
doi:10.1016/0377-0427(87)90125-7
.
Sehgal
V.
,
Lakhanpal
A.
,
Maheswaran
R.
,
Khosa
R.
&
Sridhar
V.
(
2018
)
Application of multi-scale wavelet entropy and multi-resolution Volterra models for climatic downscaling
,
Journal of Hydrology
,
556
,
1078
1095
.
doi:10.1016/j.jhydrol.2016.10.048
.
Shang
K.
,
Xu
L.
,
Liu
X.
,
Yin
Z.
,
Liu
Z.
,
Li
X.
,
Yin
L.
&
Zheng
W.
(
2023
)
Study of urban heat island effect in Hangzhou metropolitan area based on SW-TES algorithm and image dichotomous model
,
SAGE Open
,
13
(
4
).
doi:10.1177/21582440231208851
.
Shannon
C. E.
(
1948
)
A mathematical theory of communications
,
Bell System Technical Journal
,
27
,
379
423
.
Sharghi
E.
,
Nourani
V.
,
Zhang
Y.
&
Ghaneei
P.
(
2022
)
Conjunction of cluster ensemble-model ensemble techniques for spatiotemporal assessment of groundwater depletion in semi-arid plains
,
Journal of Hydrology
,
610
,
127984
.
doi:10.1016/j.jhydrol.2022.127984
.
Snell
S. E.
,
Gopal
S.
&
Kaufmann
R. K.
(
2000
)
Spatial interpolation of surface Air temperatures using artificial neural networks: Evaluating their use for downscaling GCMs
,
Journal of Climate
,
13
(
5
),
886
895
.
doi:10.1175/1520-0442(2000)013%3C0886:SIOSAT%3E2.0.CO;2
.
Takong
R. R.
&
Abiodun
B. J.
(
2023
)
Projected changes in precipitation characteristics over the Drakensberg Mountain range
,
International Journal of Climatology
,
43
(
6
),
2541
2567
.
doi:10.1002/joc.7989
.
Tavakol-Davani
H.
,
Nasseri
M.
&
Zahraie
B.
(
2013
)
Improved statistical downscaling of daily precipitation using SDSM platform and data-mining methods
,
International Journal of Climatology
,
33
(
11
),
2561
2578
.
doi:10.1002/joc.3611
.
Wang
J.
,
Hu
L.
,
Li
D.
&
Ren
M.
(
2020
)
Potential impacts of projected climate change under CMIP5 RCP scenarios on streamflow in the Wabash River basin
,
Advances in Meteorology
,
2020
.
doi:10.1155/2020/9698423
.
Wang
P.
,
Wei
Z.
,
Qi
H.
,
Wan
S.
,
Xiao
Y.
,
Sun
G.
&
Zhang
Q.
(
2024
)
Mitigating poor data quality impact with federated unlearning for human-centric metaverse
,
IEEE Journal on Selected Areas in Communications
,
42
(
4
),
832
849
.
doi:10.1109/JSAC.2023.3345388
.
Wilby
R. L.
&
Dawson
C. W.
(
2004
)
Using SDSM Version 3.1 — A Decision Support Tool for the Assessment of Regional Climate Change Impacts, User Manual
. Available at: sdsm.org.uk/sdsmmain.html.
Wilby
R. L.
,
Dawson
C. W.
&
Barrow
E. M.
(
2002
)
SDSM – A decision support tool for the assessment of regional climate change impacts
,
Environmental Modelling and Software
,
17
(
2
),
145
157
.
doi:10.1016/S1364-8152(01)00060-3
.
Zhang
K.
,
Li
Y.
,
Yu
Z.
,
Yang
T.
,
Xu
J.
,
Chao
L.
,
Ni
J.
,
Wang
L.
,
Gao
Y.
,
Hu
Y.
&
Lin
Z.
(
2022
)
Xin'anjiang nested experimental watershed (XAJ-NEW) for understanding multiscale water cycle: Scientific objectives and experimental design
,
Engineering
,
18
(
11
),
207
217
.
doi:10.1016/j.eng.2021.08.026
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data