Selection of representative GCM scenarios preserving uncertainties

Climate change studies usually include the use of many projections, and selecting an essential number of projections is very important, because using all Global Climate Model (GCM) scenarios is impossible in practice. Furthermore, the climate change impact assessment is often sensitive to the choice of GCM scenarios. This study suggests that selecting the best-performing scenarios based on a historical period should be avoided in nonstationary cases like climate change, and then proposes a new approach that can preserve the uncertainty, that all scenarios contain. The new approach groups all GCM scenarios into several clusters, and then selects a single representative scenario among member scenarios of each cluster, based on their skill scores. The proposed approach is termed ‘selecting the principal scenarios’, and applied to select five principal GCM scenarios for the South Korean Peninsula, among 17 GCM scenarios of the 20C3M emission scenario. The uncertainty preservation is measured with the maximum entropy theory. The case study presents that the principal scenarios preserve the full range of total uncertainty, compared to less than 65% for the best scenarios confirming that preserving uncertainty with the principal scenarios is more adequate, than selecting the best-performed scenarios, in climate change studies. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/). doi: 10.2166/wcc.2017.101 s://iwaponline.com/jwcc/article-pdf/8/4/641/239414/jwc0080641.pdf Jae-Kyoung Lee Innovation Center for Engineering Education, Daejin University, Pocheon-si, Gyeonggi-do 487-711, Republic of Korea Young-Oh Kim (corresponding author) Department of Civil and Environmental Engineering, Seoul National University, 599 Gwanak-ro, Gwanak-gu, Seoul 151-742, Republic of Korea E-mail: yokim05@snu.ac.kr


INTRODUCTION
Since climate change studies are the process of forecasting an uncertain future, 'uncertainty' is most definitely a keyword of all climate change assessments. According to the four assess- Therefore, it is inevitable that future water resource projections will be much impacted by which GCM scenario is used. In the review of existing studies, using only one with a few different initial conditions or a small number of multiple GCM scenarios was recommended, to project and assess the impact of climate change. It is not convincing that a single GCM scenario selected at present would occur some decades in the future. On the other hand, one may employ all the GCM scenarios provided by IPCC to cover more possibilities in the future, but this would be an impractical alternative to decision-making in practice, even if it were possible for research purposes. Therefore, it is necessary to select several scenarios, to show the possible number of cases, to the maximum extent possible.
Until now, standard criteria for selecting climate change scenarios have not been obvious. GCM scenarios were selected depending on arbitrary means, or the subjective judgment of the researcher. For example, climate impact assessments have been mainly conducted using a GCM scenario developed by their own country, or using high-resolution GCM scenarios. The most objective method of selecting a GCM scenario is to compare performances of various GCM scenarios during a certain past period (i.e. baseline simulation), and to select the GCM scenario with the best accuracy, which implies the assumption that the best projection occurs 100% of the time, but the others never occur. One may assign a different weight to each scenario, to create a combined scenario, but it is considered that all GCM scenarios in the 4th Assessment Report of IPCC () have equal possibilities of occurrence, due to lacking information to make reliable estimates of which are more or less likely to occur. Moreover, the nonstationarity concept in climate change implies that the future would not be a repetition of the past, and thus some best-performed GCM scenarios could not explain the uncertainties of all GCM scenarios in the future.
The purpose of this study is to propose a new approach for selecting some representative GCM scenarios (called the 'principal' scenarios). Here we underline preserving the uncertainty the IPCC GCM scenarios show. Using the maximum entropy (ME) theory, this study also quantifies the uncertainty of the selected GCM scenarios, to prove the principal scenarios that attempt to preserve the whole range of uncertainty of all GCM scenarios are more effective than the 'best' individual scenarios. Finally, based on the proposed approach, this study proposes some GCM scenarios that are more appropriate for future water resources management in the Korean Peninsula.

A NEW APPROACH FOR THE SELECTION OF REPRESENTATIVE GCM SCENARIOS
As mentioned in the introduction, this study proposes selecting the 'principal' scenarios which best represent the uncertainty range of all IPCC GCM scenarios, rather than the 'best' scenarios which performed best in the past, with respect to a certain accuracy measure. The best scenario approach would be unconvincing, because the best scenarios selected based on a historical period may not perform well in the future (Raisanen ), due to the 'non- Therefore, in such a nonstationary world, preserving future uncertainty should be more focused on deep uncertainty.
In this study, the principal scenario approach consists of two steps: first, 'clustering' scenarios; and then, 'selecting' a representative scenario from each cluster.

GCM scenarios clustering
The scenario clustering step groups the entire set of GCM scenarios used into several clusters, according to only the statistical properties of each GCM scenario not a GCM structure, because even GCMs have different scenarios under identical research. Among many cluster analysis methods, the characteristic-based K-means cluster analysis method is able to cluster the data, after reducing the dimensionality (e.g. the number of data) of higher-order time series, such as GCM scenarios, using only statistical characteristics (Wang & Smith ). In this study, the uncertainties are assumed to all be captured with six statistical characteristics (i.e. average for the total period, average for each month, standard deviation for the total period, standard deviation for each month, trend for each month, and lag-1 autocorrelation). In other words, it is assumed that the GCM scenarios in the same cluster classified by the characteristic-based K-means cluster method, are statistically the most similar.

Selection of representative scenarios from each cluster
The second step of the principal scenario approach selects a representative scenario from each cluster, by evaluating the performance of GCM scenarios belonging to each cluster. In this study, the probability density function (PDF) method (Perkins et al. ), which compares PDFs of GCM scenarios and observations, was adopted, because the overall likelihood of the tested scenarios is more important than time-to-time matching with the observed series. If PDFs of both scenarios of a GCM and a corresponding observed series are identical, the skill score becomes one. The skill score of the PDF method is calculated as follows (refer to Figure 1): where x i is the value of the ith quantile, p X GCM ( ) is the empirical PDF of scenarios of a GCM for a historical period, p X O bs ( ) is the empirical PDF of the observed series, and n is the total number of quantiles.

ME for uncertainty quantification
Introduced by Shannon (), entropy was utilized to quantify uncertainties of random variable X with probabilities p.
In the field of hydrology, for example, there exist a number of studies (e.g. Singh ; Koutsoyiannis ; Deng & Pandey ; Gay & Estrada ). Jaynes () further extended the Shannon's entropy theory to the 'ME', to maximize uncertainty subject to given available information. A basic equation of the ME is as follows (Jaynes ): subject to moment À consistency constraint: where H is the entropy of X, X is the random variable with probabilities p, x is the value of X, p X (•) is the probability mass function of X, f k (•) is the moment constraint, N is the sample size, K is the number of moments. After applying the Lagrangian multiplier to Equation (2) under unconstrained conditions, the estimated p X (x) can be expressed as where λ k is the weight of constraint f k (•) in kth moment. In Equation (3), because the probability p X (x) of the ME is not in closed form, numerical analysis techniques are required to estimate parameter λ and probability p X (x).
The above ME theory implies that the distribution with the largest entropy should be chosen if nothing is known about a distribution except the given information. For example, when values of a maximum and a minimum are given, the uniform distribution has the ME. In this study, the ME theory was applied to quantify the uncertainty of the selected projections.
More specifically, the selected scenarios provide the maximum and minimum values and Equation (3) calculates the entropy of the selected scenarios as the uncertainty measure.

Performance of IPCC GCM scenarios
In this study, GCM scenarios were selected to study the climate change impact on water resources for the South Korean Penin-  in A1B and A2, respectively, while the best GCM in A1B is ranked the sixth in 20C3M. This fact would be a symbol of nonstationarity, although a short length in A1B and A2 would also cause the sampling uncertainty. In other words, Figure 4 warns the best selection based on a certain period may be meaningless, in different periods in the climate change era.

Bias correction of GCM scenarios
Climate change impact assessments attempt to quantify the future risks to water resource systems, particularly at regional The flood season: P cor ¼ P p × P Obs P h (4) The dry season: where P is the precipitation, P cor is the corrected precipitation of GCMs, P p is the projected precipitation of GCMs, P Obs is the average of the observed precipitation, P h is the average of the simulated precipitation of GCMs for the historical period, P p is the average of the projected precipitation of GCMs, σ Obs is the standard deviation of the observed precipitation, and σ p is the standard deviation of the projected precipitation of GCMs.
The above equations were applied to the precipitation scenarios of each GCM, month by month. Table 2 reports averages and standard deviations over all the GCM scenarios, before and after the bias correction, compared with the observation. Note that Equations (4) and (5) were calibrated using the period from January 1960 to December 1979, while the verifications shown in Table 3 were based on the period from January 1980 to December 1999.

Selection of representative GCM scenarios
This section illustrates how to select five representative GCM scenarios for the South Korean Peninsula. One can select the five best, or the five principal scenarios, as described in the following two subsections.
(1) Five best GCM scenarios The five best-performed GCM scenarios were selected using the PDF selection method. (2) Five principal GCM scenarios The characteristic-based K-mean method was applied to the 17 IPCC GCM scenarios, based on the 20C3M emission scenario. Table 4 presents the resulting five clusters, their member GCMs, and their overall averages and standard deviations, among the six statistical characteristics used in the cluster analysis. It is obvious in Table 4 that the averages and standard deviations are very heterogeneous over the clusters. A single scenario was then selected from each cluster, in the same way as the PDF method for the selection of five best scenarios. The above two-step selection procedure finally resulted in the five principal scenarios, namely MIUB-ECHO-G (Germany/ Korea), MPI-ECHAM5 (Australia), GISS-AOM (USA), MRI-CGCM232 (Germany), and CSIRO-Mk30 (Australia). Table 5 compares the best and principal GCM scenarios for the South Korean Peninsula. The two approaches share MRI-CGCM232, while four out of the five GCMs in each approach are different.

Comparison of the best and the principal GCM scenarios in uncertainty
This study also quantified uncertainties of various sets of scenarios, such as all, the best, and the principal scenarios sets, using ME. Since a GCM scenario is a time series, where values vary as a function of time, it is necessary to represent the simulation with a single summary statistic, before the ME is applied. In this study, we adopt average or standard deviation of each GCM scenario over time as the summary statistics. As a result, all, the best, and the principal scenarios sets have 17, 5, and 5 summary statistics, respectively. Among the summary statistics, their maximum and minimum values are denoted 'a' and 'b', respectively. The equation of ME is then expressed as: Figure 4 | Skill rank changes between 20C3M, A1B, and A2 emission scenarios.
The optimum of the above constrained optimization problem is H opt measures the uncertainty of all, the best, and the principal scenarios sets, when the time average on the time standard deviation is employed, as summarized in Table 6. When the average is used as the summary statistic, ME (¼4.052) of the principal scenarios approach fully covers the uncertainty all GCM scenarios contain (i.e. the total uncertainty), but the best five scenario approach (ME ¼ 2.474) covers only 61.07% of the total uncertainty.
When the standard deviation is used as the summary statistic, ME (¼4.373) of the principal scenarios also covers the whole range of the total uncertainty but ME (¼2.791) of the best five scenarios occupies only 63.83%. Our results confirm that the principal GCM scenarios approach is  superior to the best five scenarios approach, in terms of the uncertainty preservation. Selecting an essential number of scenarios among the many scenarios available is very important, because using all the scenarios is impossible in practice. The proposed approach can be applied to any GCM scenario selection that is dominated by nonstationarity with large uncertainty.