Global climate models (GCMs) are developed to simulate past climate and produce projections of climate in future. Their roles in ascertaining regional issues and possible solutions in water resources planning/management are appreciated across the world. However, there is substantial uncertainty in the future projections of GCM(s) for practical and regional implementation which has attracted criticism by the water resources planners. The present paper aims at reviewing the selection of GCMs and focusing on performance indicators, ranking of GCMs and ensembling of GCMs and covering different geographical regions. In addition, this paper also proposes future research directions.
Global climate models (GCMs) are numerical models describing natural mechanisms in the atmosphere, land surface and ocean. GCMs represent the climate system adopting a 3D grid with horizontal coarse resolution of 250–600 km over the world, and 10–20 vertical layers in the atmosphere as well as around 30 layers in the oceans. They are developed to indicate atmospheric physics, dynamics and to simulate past climate for analysing future climate changes. GCMs follow conservation laws (momentum, mass, energy, moisture), fluid dynamics, equation of state and more. Some of the parameters and boundary conditions considered in GCMs are rotation speed of the Earth, thermodynamic and radiation constants of atmospheric gases and clouds, surface elevation, total mass of the atmosphere and its composition, soil type and surface albedo (Schmidt et al. 2006). However, lack of complete information about atmospheric processes, approximations during numerical modelling, spatio-temporal scales, coarser or finer resolution, different feedback mechanisms (cloud and solar radiation, greenhouse gases, aerosols, natural and anthropogenic sources, ocean circulation, water vapour and warming, ice and snow albedo), and different perspectives (physical parameterisations, initialisations, and model structures) are the causes of uncertainties that lead to either overestimation or underestimation of values of the considered climate variable, as compared to the observed variables. This inadvertently results in different outcomes for different GCMs for the same forcing (Sood & Smakhtin 2015; Jain et al. 2019).
Mandal et al. (2019) identified six uncertainty causes, namely, (i) selection process of GCMs, (ii) choice of GCMs, (iii) emission scenarios, (iv) downscaling models, (v) hydrologic model parameters and (vi) model structures. Liu et al. (2012), as part of their studies on headwater catchment in China, demonstrated that uncertainty levels associated with GCM outputs were crucial for assessing their impact on climate change. Bosshard et al. (2013) found that GCMs were the dominant uncertainty source. Xuan et al. (2017), based on their studies in Zhejiang Province, in Southeast China, concluded that most GCMs were not able to predict the observed spatial patterns, due to insufficient resolution. Benedict et al. (2019) analysed spatial resolution of GCMs and global hydrological models for the Rhine and Mississippi basins. Higher resolution GCMs yielded improved precipitation budget for the Rhine whereas no substantial improvement was found for the Mississippi. Above all, lack of precise observed data to assess the simulating ability of GCMs is another concern (https://www.tau.ac.il/∼colin/courses/CChange/CC5.pdf; https://www.climate.gov/maps-data/primer/climate-models; https://www.ipcc-data.org/guidelines/pages/gcm_guide.html). Keeping this in view, Tian et al. (2015) suggested uncertainty assessment along the whole climate modelling chain.
Hughes et al. (2014) discussed the cause of inherent errors and uncertainties occurring due to simplification of highly complex atmospheric physics in GCMs. They found multi-model ensemble (MME) was a good fit for the situation in comparison to individual GCMs mainly due to compensation of individual errors. Ahmed et al. (2020), as part of their study over Pakistan, presented similar views. Yan et al. (2015), as part of their study on Xinjiang Basin, China, suggested an MME to minimise the biases and uncertainties of future climate simulations which is available as supplementary data. Tebaldi & Knutti (2007) observed ‘combining models generally increases the skill, reliability and consistency of model forecasts’. Hughes et al. (2011) suggested all GCMs in MME for effective representation of climate change whereas Basharin et al. (2016) suggested choosing GCMs for MME that described the present climate more precisely, which would facilitate decision-makers in using the predictions effectively. However, Bannister et al. (2017) cautioned ‘MME does not consider the relative strengths and weaknesses of each model as an ensemble invariably hides the substantial variations between the individual models’. They warned ‘If the GCMs collectively misrepresent some component of the forcing or partially cancel each other out, then the future natural variability in an ensemble will be inherently suppressed’. Complimentarily, Raghavan et al. (2018), as part of their studies on South East Asia, expressed ‘although the ensemble mean of the models is a better representation of the observed climate, the spread among the individual models is large’. Another problem is that some GCMs partly work with the same code; in other words, similar components of ocean and atmosphere results in identical forecasts (Raju & Nagesh Kumar 2016) and this may mislead the process of MME. More information on uncertainty and MMEs is available in Knutti et al. (2010), Najafi et al. (2011), Weiland et al. (2012), Knutti & Sedlácek (2013), Miao et al. (2014), Northrop & Chandler (2014), Lutz et al. (2016), Song & Chung (2016), You et al. (2018), Salman et al. (2018) and Jobst et al. (2018). Duan et al. (2019) carried out an extensive review on multi-model analysis that provides guidelines for robust climate change research. In totality, researchers question the reliability and simulating ability of GCMs due to the uncertainties involved at every level of modelling, as explained earlier (Wilby & Harris 2006; Mujumdar & Nagesh Kumar 2012).
Basharin et al. (2016) observed significant improvements in Coupled Model Intercomparison Project 5 (CMIP5)-GCM simulations as compared to its previous generation of Coupled Model Intercomparison Project 3 (CMIP3)-GCMs in terms of ‘effect of aerosols, the interaction at the land–ice boundary, stratosphere–troposphere interactions, the carbon cycle, runoff, and biochemical interactions between ecosystems and other processes’. Interestingly, the Coupled Model Intercomparison Project 6 (CMIP6) represents a considerable expansion over CMIP5, in terms of (a) 100 distinct climate models from 40 different modelling groups, (b) eight future scenarios representing Shared Socioeconomic Pathways (SSPs) and (c) different experiments conducted. CMIP6 models have a remarkably higher climate sensitivity than those of CMIP5 (Tokarska et al. 2020; https://www.carbonbrief.org/cmip6-the-next-generation-of-climate-models-explained). However, Jain et al. (2019) were critical about GCMs’ development and suggested focusing on the improvement of GCMs that should perform relatively efficiently over many regions of the world. They discouraged expanding GCMs that may not compete strongly in continental/regional scales. Similar views regarding the number of GCMs are expressed by Allan Hollander (http://climate.calcommons.org/article/why-so-many-climate-models). Details about developed centre/institution and name of global climate models are available at: CMIP3 (http://www.ipcc-data.org/sim/gcm_monthly/SRES_AR4/index.html#Acknowledge), CMIP5 (https://pcmdi.llnl.gov/mips/cmip5/availability.html) and CMIP6 (https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip6).
Pierce et al. (2009) raised questions related to the effect of choosing different GCMs for regional climate studies, on downscaled outcomes and the relevance of a blueprint for selecting GCMs (Legates & McCabe 1999; Mujumdar & Nagesh Kumar 2012; Bhattacharjee & Zaitchik 2015; Raju & Nagesh Kumar 2018). Hargreaves & Annan (2014) discussed the origin of GCMs in detail, interpretation and evaluation of ensemble and intricacies in climate modelling. They raised concern about the gap between actual and potential performances and limited benefits irrespective of substantial investment. Sikder et al. (2016) opined that GCMs were yet to be improved for prime-time operationalisation of multiscale water management decisions. Knutti (2008) explained the necessity of evaluating GCMs and opined that skilful simulations of the past might not produce skilful predictions of the future. However, Knutti (2008) also cautioned that lack of past skill in simulation might also translate to lack of skill in future predictions. Hence, it is important to analyse each chosen GCM(s) for its efficacy to assess how realistic GCM results are in comparison with observed records of climate (https://www.climate.gov/maps-data/primer/climate-models). Otieno & Anyah (2013) suggested a cautious approach while choosing GCMs. Zhang & Yan (2016) recommended assessing the effectiveness of GCM to simulate conservation-based climate zones. Cook et al. (2017) opined that selecting an appropriate GCM for the intended application and its performance, ascertaining differences in temporal scale and spatial scale outputs, and inferring results for engineering design are essential. Hence, performance evaluation of GCMs for selection of the best GCM or suitable GCM(s) is essential and leads to confidence of the policymakers and planners for using them for impact assessment studies and other purposes (Perez et al. 2014; Aloysius et al. 2016).
The present paper is aimed at reviewing the selection of GCMs, focusing on performance indicators, ranking of GCMs, and related ensembling of GCMs covering different geographical regions. In addition, MME of GCMs without explicit evaluation covering different geographical regions is also presented. Even though these topics are separately discussed, they are interrelated to a considerable extent. The period chosen is from 2006 to 2020 with the selected literature review focusing on CMIP5 and CMIP3 repositories. Efforts are made to discuss only representative studies to effectively focus on the theme.
The present paper describes the general structure for selection of GCMs, literature review on performance measures, ranking and ensembling of GCMs covering different geographical regions, MME without explicit evaluation of GCMs, observation and discussions, followed by summary and conclusions. Acronyms and performance measures used in this paper are presented in Tables A1 and A2 of the Appendix, respectively, which is available as supplementary data.
GENERAL STRUCTURE FOR SELECTION OF THE BEST GCM OR AN ENSEMBLE OF GCMS
After reviewing more than 200 papers on climate modelling, the modelling procedure observed for selection of the best/suitable/ensembling of GCMs is found as follows (irrespective of geographical region):
Selection of appropriate climate variables for the chosen region.
Selection of appropriate X number of GCMs from either CMIP3 or CMIP5 repository Y [XY] for the chosen region.
Collection of observed and simulated values of X GCMs for the chosen climate variable.
Identification of evaluation criteria/metrics/indicators to ascertain the simulating ability of X GCMs with observed data.
Comparison of outputs of X GCMs with observed data for historic time period in terms of chosen evaluation criteria.
Selection of suitable GCM(s) that represent the climate system from X based on (5).
Formulate ensemble mechanism if required either with subset of suitable GCMs (from (6)) or with full GCM ensemble.
PERFORMANCE INDICATORS, RANKING OF GCMS AND RELATED ENSEMBLING OF GCMS
Performance indicator is a metric to measure how efficiently a GCM simulates observed data (Gómez-Navarro et al. 2012). Meaningful indicators are required to evaluate GCMs (Tebaldi & Knutti 2007; Knutti et al. 2010; Raju & Nagesh Kumar 2014b). Gleckler et al. (2008) suggested developing metrics to characterise the GCM performance. This may facilitate identification of optimal subsets of GCMs for various applications. Guilyardi et al. (2009) viewed that metrics ‘should be concise, physically informative, societally relevant & easy to understand, compute and compare’. Gu et al. (2015) expected that metrics should (a) have the ability to distinguish between good performance and poor performance, and (b) be computationally efficient, bounded and dimensionless. Similar views were presented by Moise & Delage (2011) and McMahon et al. (2015). Fahimi et al. (2017) summarised various evaluation indicators employed by various researchers in hydrological modelling. The following sections present studies over different geographical areas.
Gu et al. (2015) evaluated 27 CMIP5-GCMs for seasonal and annual surface air temperature and precipitation for five climate-based regions of China. Mielke measures, M1, M2 and M3 were explored to evaluate GCMs. The GCMs, BCC-CSM1.1(m), CanESM2, CMCC-CMS and CMCC-CM were preferred for precipitation, while the GCMs, CMCC, BCC-CSM1.1(m), IPSL-CM5A-MR, NCAR and MPI were preferred for temperature. Fine resolution improved the GCM simulating ability especially for temperature. Bao & Feng (2016) conducted a similar study using 16 CMIP5-GCMs for the Yellow River and Yangtze River basins in China for precipitation, evaporation and water vapour transport. Most GCMs had a tendency to overestimate precipitation in the Yellow River basin. However, simulating capability of GCMs was satisfactory in the Yangtze River basin. Jiang et al. (2016) evaluated 77 GCMs from Third Assessment Report (TAR), CMIP3 and CMIP5 in simulating the mean state and year-to-year climate variability over China and the East Asian monsoon. Simulating ability of GCMs improved from TAR to CMIP3 for both temperature and precipitation. However, simulating ability remained stable for temperature and decreased for precipitation for CMIP5 from CMIP3.
Bannister et al. (2017) evaluated 47 CMIP5-GCMs for Sichuan basin in China for mean temperature, minimum temperature and maximum temperature with performance indicator skill score (SS). They also formulated MME with all GCMs with equal weightage. MIROC4h, IPSL-CM5A-MR, CESM1(FASTCHEM), MPI-ESM-MR and MIROC5 were the top five GCMs for mean temperature; MIROC4h, GISS-E2-R-CC and GISS-E2-H-CC in the case of minimum temperature and CESM1(BGC), CCSM4 and CESM1 (FASTCHEM) for maximum temperature. However, SS of MME was substantially lower than SS of the top GCMs for mean temperature, maximum temperature and minimum temperature. They concluded that determining the best overall GCM was difficult. Wu et al. (2018) evaluated six CMIP5-GCMs for the Huaihe River basin in China for variable precipitation. Indicators used were standard deviation (), deterministic coefficient (DC), correlation coefficient (CC), relative error (RE) and root mean square error (RMSE). MME was performed by arithmetic mean (MME-AM) and backpropagation neural network (MME-BP). The relative order of simulation ability of GCMs on precipitation process was BNU-ESM, MME-AM, CNRM-CM5, MRI-CGCM3, MIROC-ESM, BCC-CSM1.1 and MPI-ESM-LR.
He et al. (2019) evaluated nine CMIP5-GCMs, BCC-CSM1.1(m), CMCC-CMS, CNRM-CM5, FGOALS-g2, GFDL-ESM2G, INM-CM4, IPSL-CM5A-MR, HadGEM2-AO and MPI-ESM-MR over China for temperature. RMSE and CC were employed as indicators to evaluate GCMs. GCMs, CMCC-CMS and MPI-ESM-MR, had higher capability in simulating spatial pattern and its decadal change of climate zones. An MME with seven GCMs (except FGOALS-g2 and INM-CM4) was formulated which was found preferable to that of any single GCM. They preferred MME to reduce uncertainty of GCMs.
Chhin & Yoden (2018) evaluated 43 GCMs with 36 performance metrics for the Indo-China region for precipitation. They provided different patterns of ensemble average. Optimal ensemble subsets significantly improved the monthly precipitation, as compared to full model ensemble as well as best single GCM during the historical period.
Jena et al. (2016) evaluated 20 CMIP5-GCMs for the Indian summer monsoon and CCSM4, CESM1(CAM5), GFDL-CM3 and GFDL-ESM2G were the preferred GCMs. Sarthi et al. (2016) evaluated 34 CMIP5-GCMs for the Indian summer monsoon with indicators, Taylor diagram, SS, CC and RMSE. The GCMs, MPI-ESM-MR, CESM1(WACCM), CESM1(CAM5), CESM1(BGC), BCC-CSM1.1(m) and CCSM4, captured precipitation effectively. Jain et al. (2019) evaluated 28 CMIP5-GCMs for the Indian summer monsoon with indicator pattern correlation (PC). All GCMs simulated seasonal mean surface air temperatures well whereas performance was relatively poor for precipitation. MIROC-4h was preferred as compared to the other GCMs. Raju & Nagesh Kumar (2014b) evaluated precipitation for 73 grid points for Indian conditions. Eleven GCMs of CMIP3 repository with indicators, CC, average absolute relative error (AARE), normalised root mean square error (NRMSE), absolute normalised mean bias error (ANMBE) and SS were considered. Weights of indicators were obtained by entropy method; equal weights of indicators were also used. PROMETHEE-2 (Preference Ranking Organisation METHod of Enrichment Evaluation) was employed to compute preference of GCMs. Ranking of GCMs for Cauvery, Godavari, Mahanadi and Krishna river basins was also performed. No single GCM was found suitable for any of the Indian conditions and river basins. An ensemble of HadGEM1, MPI-ECHAM4, HadCM3, BCCR-BCCM2.0, MIROC3 and GFDL2.0 was suggested for India; ensembles of GFDL2.0, CGCM2, GISS, HadCM3 for Cauvery river basin; GFDL2.0, MIROC3, BCCR, HadCM3, CGCM2, GFDL2.1 for Godavari river basin; GFDL2.0, HadCM3, GFDL2.1, CGCM2, ECHAM for Mahanadi river basin; and GFDL2.0, BCCR, CGCM2, MIROC3, GISS for Krishna river basin.
Raju & Nagesh Kumar (2015a) evaluated precipitation and temperature for 11 GCMs of CMIP3 repository for India as well as Krishna and Mahanadi basins with the indicator SS. Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) was employed to prioritise GCMs. Ensemble of GFDL2.1, BCCR-BCCM2.0, MIROC3, HadCM3, INGV-ECHAM4 and GFDL2.0 was found preferable for India. The result of this study was contrary to the study of Raju & Nagesh Kumar (2014b) as the ensemble suggested was altogether different. This may be due to the additional variable (i.e., temperature) used, the chosen number of indicators and the chosen decision-making technique.
Anandhi & Nanjundiah (2015) evaluated 19 CMIP3-GCMs for daily precipitation for the Indian region consisting of six zones with indicator SS for three categories, annual, June–October and non-monsoon seasons (JFMAMND: January to May, November, December). No single GCM has been diagnosed as suitable for all categories and zones. Raju et al. (2017) evaluated 36 CMIP5-GCMs for India for maximum temperature and minimum temperature. Indicators chosen were CC, NRMSE and SS. Compromise programming (CP) technique was employed to prioritise GCMs; group decision-making for facilitating aggregate ranking for India. They also proposed MME, which they claimed simple and effective. Meher et al. (2017) evaluated 13 CMIP3-GCMs and 42 CMIP5 for the Western Himalayan region for precipitation. Signal-to-noise ratio, mean annual cycle, spatial patterns, trends and annual cycles of interannual variability were the employed indicators. CMIP3-GCMs, GFDL-CM2.1, GFDL-CM2.0 and MIROC3.2 (hires), and CMIP5-GCMs, INM-CM4, MIROC5 and CESM1(BGC), were graded as the most competent. Interestingly, some of the coarser resolution GCMs were found to have improved skills, compared to the finer resolution GCMs.
Panjwani et al. (2019) evaluated 12 CMIP5-GCMs for India for precipitation, minimum temperature and maximum temperature. They employed fuzzy analytic hierarchy process (FAHP) and reliability index to assess their ability. Indicators employed were agreement index (AI), RMSE and CC. FAHP was found suitable to rank GCMs. NorESM1-M for maximum temperature; MIROC5, GFDL-CM3, FIO-ESM and IPSL-CM5A-LR for minimum temperature; IPSL-CM5A-LR, GFDL-ESM2M, HadGEM2, MIROC5 and CSIRO for precipitation were found suitable. Pandey et al. (2019) evaluated 24 CMIP5-GCMs for Upper Narmada river basin (UNB), India. Six climate variables were chosen. Indicators were SS, RMSE and total index (TI). Three GCMs, MIROC5, CNRM-CM5 and MPI-ESM-LR, were suitable GCMs. Sreelatha & Raj (2020) evaluated average temperature for Telangana region, South India. Indicators chosen were SS, CC, normalised root mean square deviation (NRMSD, Nash–Sutcliffe efficiency (NSE), and absolute normalised mean bias deviation (ANMBD). CP and group decision-making were considered for ranking pattern. MIROC5, CNRM-CM5, ACCESS1.0 and BCC-CSM1.1(m) were found to be suitable GCMs.
Khan et al. (2018) evaluated 31 CMIP5-GCMs for the variables precipitation, minimum temperature and maximum temperature for Pakistan. The chosen main indicator was symmetrical uncertainty (SU). Six GCMs, MIROC5, HadGEM2-ES, HadGEM2-CC, CMCC-CM, CESM1(BGC) and ACCESS1.3, were the top ranked GCMs and they proposed the same for MME. Mahmood et al. (2018) used CC and RMSE to evaluate five CMIP5-GCMs for Jhelum River basin, Pakistan and India, for precipitation. GCMs were found to lack the capability of capturing variation in the precipitation pattern. Latif et al. (2018) evaluated 36 CMIP5-GCMs for the Indo-Pakistan subcontinent for precipitation. The indicator chosen was partial correlation (PAC). Three GCMs, HadGEM2-AO, CNRM-CM5, and CCSM4 were found relevant and expected relatively reliable future projections.
Ahmed et al. (2018) evaluated 20 CMIP5-GCMs for precipitation over Pakistan. Spatial indicators chosen were Goodman–Kruskal's lambda, Kling–Gupta efficiency, Mapcurves, fractional skill score (FSS), spatial efficiency metric and Cramer's V. GFDL-ESM2G, GFDL-CM3, CESM1 (CAM5) and NorESM1-M were preferred and used for MME, performed with random forest (RF) and mean. RF-based MME was preferred. They advocated the use of an ensembling approach that would reduce uncertainties in climate projections. Ahmed et al. (2019) applied wavelet-based skill score (WSS), SU and CP for ranking 20 CMIP5-GCMs for precipitation and for minimum and maximum temperature over Pakistan. SU preferred CESM1(CAM5), HadGEM2-AO, NorESM1-M and HadGEM2-ES; CP preferred CESM1(CAM5), HadGEM2-AO, NorESM1-M and GFDL-CM3; WSS preferred CCSM4, CESM1(CAM5), GFDL-ESM2G and HadGEM2-ES. MME of better performing GCMs captured return periods associated with observed moderate and severe droughts. Ahmed et al. (2020) evaluated 36 CMIP5-GCMs for precipitation, minimum temperature and maximum temperature over Pakistan. They used Taylor skill score (TSS). HadGEM2-AO, CMCC-CM, CESM1(CAM5) were the preferred GCMs. Optimum ensemble of 18 top ranked GCMs was suggested.
In summary, for Pakistan, Khan et al. (2018), Latif et al. (2018), Ahmed et al. (2018), Ahmed et al. (2019) and Ahmed et al. (2020) considered six, two, three, four and 18 top ranked GCMs for forming MME, respectively in their studies. Ahmed et al. (2020) carried out an extensive and informative literature review regarding optimum number of GCMs for MME.
Middle East (Iran, Iraq, Syria)
Farzaneh et al. (2012) evaluated four CMIP3-GCMs, CCSR, CGCM2, CSIRO and HadCM3, for the Northern Karoon region, Iran for precipitation and temperature and found that HadCM3 was suitable due to more appropriate correlation with the observed data. Afshar et al. (2017) analysed the Kashafrood mountainous watershed, Iran for precipitation and temperature. They evaluated 14 GCMs of CMIP5 repository with ratio of the root mean square error to the standard deviation of measured data (RSR), percent of bias (PBIAS), NSE and coefficient of determination (R2). Preferred GCMs were GFDL-ESM2G, IPSL-CM5A-MR, MIROC-ESM and NorESM1-M.
Zamani & Berndtsson (2019) evaluated 20 CMIP5-GCMs for temperature and precipitation. TOPSIS was applied to rank GCMs for Bakhtegan (BKH), Zard River (ZR), and Ghareso (GH) in west and southwest Iran. Indicators were NRMSE, TSS, Brier score (BS) and SS. MIROC-ESM, MPI-ESM-MR, MPI-ESM-LR and GFDL-ESM2M were preferred for the ZR basin; BCC-CSM1.1, CanESM2, MIROC5 and ACCESS1.0 for the BKH basin; BCC-CSM1.1, CanESM2, ACCESS1.0 and NorESM1-M for the GH basin.
Abbasian et al. (2019) evaluated 37 CMIP5-GCMs for Iran for precipitation and temperature. Evaluation statistics were Kolmogorov–Smirnov (KS) statistic, CC, NSE, mean bias, RMSE and Sen's slope estimator. CMCC-CMS and MRI-CGCM3 were preferred GCMs. Ehteram et al. (2018) evaluated five CMIP3-GCMs for the Dez basin, Iran for precipitation and temperature. They used indicators RMSE, mean absolute error (MAE) and CC. HadCM3 was the preferred GCM.
Doulabian et al. (2020) evaluated 25 CMIP5-GCMs for precipitation and surface air temperature (SAT) for six synoptic stations in Iran. Indicators chosen were RMSE and SS. They suggested suitable GCMs for each station, variable and indicator. GCMs performed better in simulating SAT compared to precipitation. As part of the literature review, they also discussed various studies related to Iran.
Homsi et al. (2020) evaluated 20 CMIP5-GCMs for precipitation for a case study of Syria. The indicator chosen was occurrence frequency of GCM. Employed ranking techniques were SU and multicriteria decision technique. HadGEM2-AO, CSIRO-Mk3.6.0, NorESM1-M and CESM1(CAM5) were the preferred GCMs. RF-based ensembling algorithm was performed to generate MME with four selected GCMs. Khayyun et al. (2020) evaluated 20 CMIP5-GCMs for precipitation for a case study of Iraq with SU as the indicator. Four GCMs, HadGEM2-AO, HadGEM2-ES, CSIRO-Mk3.6.0 and MIROC5, were found to be suitable.
South Asia, East Asia and South East Asia
Le & Bae (2013) evaluated 14 CMIP3-GCMs for temperature and precipitation for the South Korean peninsula. Indicators were CC and RMSE. CNRM-CM5, HadCM3, CSIRO-Mk3.0, IPSL-CM4, NCAR-CCSM3, CCCMA- CGCM3_T47 were the preferred GCMs. Prasanna (2015) studied 12 CMIP5-GCMs for their capability over South Asia using mean and coefficient of variation (Cv). NORESM, MPI-ESM, GISS, GFDL-ESM2M, CanESM, MIROC5, HadGEM2-ES, CNRM and ACCESS were the preferred GCMs.
Tan et al. (2014) chose six CMIP5-GCMs out of 18 for Johor River basin, Malaysia for projection of precipitation and temperature for hydrological impact assessment. They strongly discouraged the use of projections produced by a single GCM. Hussain et al. (2018) evaluated 20 CMIP5-GCMs for Rajang River basin (RRB), Sarawak, Malaysia for precipitation. Indicators used were MAE, RMSE, CC and normalised standard deviation (NSD). GFDL-ESM2M, ACCESS1.3 and ACCESS1.0 were the preferred GCMs and discouraged the use of a single GCM for climate change assessment. An overview of recent studies over the South and South East Asian region was also provided by Hussain et al. (2018). Noor et al. (2019) evaluated 58 CMIP5-GCMs for Malaysia for projection of precipitation. Ratio of standard deviations (rSD), FSS, NRMSE, modified index of agreement and PBIAS were the indicators employed to evaluate GCMs. Four GCMs, HadGEM2-ES, CCSM4, BCC_CSM1.1(m) and CSIRO-Mk3.6.0 were preferred. MME of four GCMs was performed with RF.
Ruan et al. (2018) analysed 34 CMIP5-GCMs for precipitation over the Lower Mekong basin, South East Asia. They used improved rank score method that aggregates RMSE, PBIAS, BS, linear correlation coefficient, Sen's slope, Mann–Kendall (MK) statistic and the significance score (SIS). The top five preferred GCMs were MPI-ESM-LR, IPSL-CM5A-MR, CMCC-CMS, CESM1(CAM5) and BNU-ESM. Raghavan et al. (2018) analysed ten CMIP5-GCMs for South East Asia for precipitation. Indicators were annual cycles, mean climatological spatial distributions, regional area averages, RMSE, CC, empirical orthogonal function (EOFs) and interannual variability. No GCM was suitable for the chosen case study. Kamworapan & Surussavadee (2019) evaluated 40 CMIP5-GCMs for precipitation and temperature in South East Asia. They employed 19 different performance metrics which included RMSE. The ability of the preferred GCM was correlated with six best GCM-ensemble and 40 GCM-ensemble for four categories. They recommended CNRM-CM5-2 and 6-GCM-ensemble for climate studies in South East Asia. Sridhar et al. (2019) discussed the choice of GCMs for the case study of Mekong basin for precipitation and temperature. MIROC was preferred for the later part of the century. IPSL and GFDL were preferred through the year 2040 for projection of variables.
Perkins et al. (2007) evaluated 14 GCMs of CMIP3 for precipitation, 13 for minimum temperature and ten for maximum temperature for 12 regions of Australia employing SS to evaluate the efficacy of GCMs. MIROC-m, CSIRO and ECHO-G were the top three GCMs for all three chosen variables. They suggested omitting weak GCMs from MME as these strongly biased the skill of MME. Suppiah et al. (2007) judged the performance of 23 GCMs for mean sea level pressure, precipitation and temperature over the Australian continent. Indicators were RMSE and spatial correlation (SC) thresholds; accordingly, they identified 15 GCMs. Smith & Chandler (2010) evaluated 22 GCMs for precipitation for Australia.
Johnson & Sharma (2009) proposed variable convergence score (VCS) for Australia, which was used to test the outputs of nine CMIP5-GCMs, eight climate variables and two emission scenarios. Pressure, temperature and humidity had high VCS that influence the climate in Australia. Johnson et al. (2011) evaluated 23 CMIP3-GCMs for Southeastern Australia. Wavelet-based skill scores (WSS) were used to compare GCM performance for sea surface temperature, precipitation and surface pressure anomaly. MPI-ECHAM5 was found to have the best performance for all the chosen climate variables. Fu et al. (2013) evaluated 25 CMIP3-GCMs for Southeastern Australia for air temperature, mean sea level pressure and precipitation. They used % RE, NRMSE, CC, trend magnitude, MK test, EOF, BS and SIS as indicators. Aggregation of all ranks for each GCM for each variable in the form of rank score (RS) was computed. CSIRO, MIROC-m and IPSL-CM4 were found to be suitable. They also compared 25 CMIP3-GCMs with 40 CMIP5-GCMs. Out of 65 GCMs, CMIP3-GCM was found to be the best. Wang et al. (2016) evaluated 28 CMIP5-GCMs for variable temperature. They used indicator, TSS and selected the seven best GCMs and used the same for MME for New South Wales (NSW) wheat belt in Southeastern Australia. Comparison of two MME approaches, arithmetic mean (AM) and independence weighted mean (IWM) was made.
Hughes et al. (2014) studied nine CMIP3-GCMs for their efficacy to simulate precipitation for 15 catchments in five regions of Africa. Evaluation measures included seasonal skill, statistical skill and serial correlation skill. GISS, CNRM and MPI were found to be more skilful with some exceptions. However, differences were observed in skills of GCMs between inland regions and coastal areas. They suggested these GCMs for exploring as ensembles. Aloysius et al. (2016) evaluated 25 CMIP5-GCMs for Central Africa for precipitation, mean surface air temperature, minimum surface air temperature and maximum surface air temperature. Mean square error (MSE), spatial pattern correlation (SPC) coefficient, spatial skill score were the indicators. It was observed that GCMs simulate temperature better than precipitation. Agyekum et al. (2018) evaluated 18 CMIP5-GCMs for precipitation in the Volta basin, West African region. Indicators were , CC, RMSE. CESM1(BGC), CCSM4, NorESM1-M, MPI-ESM-MR, and ensemble mean of all the 18 GCMs performed relatively well with small biases over most parts of the basin.
Ongoma et al. (2019) evaluated 22 CMIP5-GCMs for historical simulations of precipitation for East Africa. CC, , bias, PBIAS, RMSE and trend were the indicators. Performances of individual GCMs varied. Eight GCMs, MICROC5, INM-CM4, EC-Earth, CSIRO-Mk3.6.0, CNRM-CM5, CMCC-CESM, CESM1(CAM5) and CanESM2, performed relatively well. However, they suggested improvement in rainfall-related process in GCMs. MME of all GCMs was formulated by assigning equal weights to all GCMs and it was found that individual GCMs performed better than MME. Similar views were expressed by Joubert & Hewitson (1997) for Southern Africa.
Jury et al. (2015) evaluated a total of 81 realisations of 20 CMIP5-GCMs for reproducing near-surface variables for European domain of the Coordinated Regional Climate Downscaling Experiment (EURO-CORDEX). The indicator was model performance index (MPI). MIROC4 h was found to be suitable for the chosen region. Basharin et al. (2016) evaluated 12 CMIP5-GCMs for the European region for precipitation and temperature. GCMs well reproduced historical tendencies of regional warming. CNRM-CM5, GFDL-CM3, HadGEM2-ES, MIROC5, CanESM2 and MPI-ESM-LR were the preferred GCMs.
Walsh et al. (2008) analysed 15 CMIP3-GCMs for seasonal cycle of precipitation, mean sea level pressure and temperature in Alaska and Greenland, North America with RMSE as the indicator. No single GCM outperformed the other GCMs for either all regions or all variables. GFDL-CM2.1, MPI-ECHAM5, MIROC3 and HadCM3 were the preferred GCMs for future projections. They also suggested a subset of GCMs to narrow down uncertainty as well as for projections that were more robust. Radić & Clarke (2011) evaluated 22 CMIP3-GCMs for number of climate variables. Indicators were RMSE, model variability index (MVI) and model climate performance index (MCPI). MRI-CGCM2.3.2, ECHAM5–MPI-OM and MIROC3.2 (hires) were the top three ranked GCMs.
Anandhi et al. (2011) evaluated 41 GCMs/realisations of CMIP3 repository for the Catskill Mountain watersheds, New York, having variable snow water equivalent with the indicator SS. GFDL 2.0 was found suitable. They also classified GCMs into three groups based on SS, high (0.83–0.93), medium (0.72–0.83) and low (0.26–0.72). Rupp et al. (2013) evaluated 41 CMIP5-GCMs and 24 CMIP3-GCMs for the Pacific Northwest and surrounding regions in the US. Climate variables chosen were precipitation and temperature. Metrics were amplitude of seasonal cycle, long-term persistence, annual- to decadal-scale variance, diurnal temperature range, variance of mean seasonal spatial patterns, correlation and regional teleconnections to El Niño Southern Oscillation (ENSO). CNRM-CM5, CESM1(CAM5) and CanESM2 favoured CMIP5-GCMs and few differences existed between CMIP5 and CMIP3 with respect to the analysed statistics.
Gulizia & Camillonia (2015) evaluated 19 CMIP5-GCMs and 19 CMIP3-GCMs as well as a MME of eight GCMs for South America. Variables were summer precipitation, winter precipitation and annual precipitation. Indicators were RMSE, SC, RE and relative bias. Best representation of the observed patterns in most seasons and regions was reflected in MME. However, MIROC4h of CMIP5 and MIROC3.2 (hires) of CMIP3 repository is better performed than the ensemble in some regions and seasons. Venkataraman et al. (2016) analysed 21 CMIP5-GCMs for climate variables, minimum surface air temperature, maximum surface air temperature, average surface air temperature and monthly precipitation in the state of Texas. Performance indicators chosen were MAE and NSD. GCMs simulated historical temperature better than precipitation. They suggested MME of all 21 GCMs as compared to subset of GCMs.
Bhowmik et al. (2017) evaluated ten CMIP5-GCMs for conterminous United States with variable precipitation having indicator RMSE. ACCESS and BCC-CSM emerged as preferred GCMs. They proposed MME using equal weighting, percentile and non-percentile-based optimal weighting. Sensitivity analysis on ranking was also performed. Ahmadalipour et al. (2017) evaluated 20 CMIP5-GCMs for temperature and precipitation in the Columbia River basin (CRB) in the Pacific Northwest US. Indicators used were (i) mean, (ii) , (iii) Cv, (iv) relative change (variability), (v) Mann–Kendall (MK) trend and (vi) KS test. GCMs in the order of decreasing ranking were BCC_CSM1.1, GFDL-ESM2M, CCSM4, GFDL-ESM2G, MIROC5, CanESM2, IPSL-CM5A-MR, IPSL-CM5B-LR, IPSL-CM5A-LR and MIROC-ESM. Cheng et al. (2017) evaluated six CMIP5-GCMs and their ensemble mean for Athabasca River basin, Canada for precipitation, minimum temperature and maximum temperature. Ensemble mean did not outperform any GCM even though its overall accuracy was higher. It was suggested that GCMs be integrated according to accuracy variations. Bhowmik et al. (2017) evaluated ten CMIP5-GCMs for climate regions of the conterminous United States for projection of precipitation. No GCM was found to be dominant and a MME of ten GCMs using equal weighting, percentile and non-percentile-based optimal weighting was developed. Anandhi et al. (2019) evaluated 20 CMIP3-GCMs for New York City. Variables analysed were precipitation, minimum temperature, maximum temperature, average temperature and wind speed. The indicator used was SS. No single GCM was identified as superior because different GCMs reacted differently for different variables.
Moise & Delage (2011) evaluated 23 CMIP3-GCMs for the South Pacific Convergence Zone for precipitation. Indicators chosen were location metric and shape metric. It was observed that a group of GCMs performed well for one metric and not so well for the other metric. However, GFDL-CM2.0, CCSM3 and PCM were found consistent over both metrics.
Errasti et al. (2011) evaluated 24 CMIP3-GCMs for the Iberian Peninsula. Variables considered were precipitation, temperature and mean sea level pressure; the indicator used was SS. HadGEM1, BCCR-BCM2.0, GFDL-CM2.1, MPI-ECHAM5 and MIROC3.2 (hires) occupied top positions. Barfus & Bernhofer (2015) evaluated 12 CMIP3-GCMs for the Arabian Peninsula for tropospheric stability. Indicators used were cross totals index, sweat, k-index, show alter index, vertical and total totals index. No GCM was found to outperform other GCMs.
Su et al. (2013) analysed performance of 24 CMIP5-GCMs over the eastern Tibetan Plateau for temperature and precipitation. Climatological patterns and spatial variations of the observed temperature were reasonably captured by GCMs. Salunke et al. (2019) evaluated 28 CMIP5-GCMs for surface air temperature and precipitation for the Himalaya-Tibetan Plateau. They employed PC and RMSE as indicators for evaluating GCMs. CCSM4, CESM1(CAM5), EC-Earth and IPSL-CM5A-MR best simulated the surface air temperature, whereas EC-Earth and MIROC5 best simulated the precipitation with high SPC values and low RMSE. They suggested improvement in the resolution and parameterisation schemes. Jia et al. (2019) evaluated 33 CMIP5-GCMs for precipitation over the Tibetan Plateau. They used improved rank score method which is the amalgamation of mean value, , SC, temporal correlation coefficient, MK test statistics, BS and SIS. CSIRO-Mk3.6.0, EC-Earth, MRI-CGCM3, CNRM-CM5 and CanESM2 were the preferred GCMs.
Perez et al. (2014) evaluated 26 CMIP3-GCMs and 42 CMIP5-GCMs for the North-east Atlantic region. Variables were precipitation, snow, storm surge and wave height. Indicators used were scatter index (SI) and relative entropy (RE). Three GCMs, MIROC3.2 (hires), ECHAM5/MPI-OM and HadGEM2 of CMIP3 and seven CMIP5-GCMs, CMCC-CM, MPI-ESM-P, HadGEM2-ES, HadGEM2-CC HadGEM2-AO, ACCESS1.0 and EC-Earth were found to be the best. Ashofteh et al. (2016) analysed seven CMIP5-GCMs for simulating runoff in the Aidoghmoush basin, East Azerbaijan. The performance indicators used were NSE, CC, RMSE and MAE. HadCM3 was the preferred GCM.
Reifen & Toumi (2009) evaluated 17 CMIP3-GCMs for projection of temperature. They recommended MMEs (six GCMs to 16 GCMs) which produced better prediction capability than any single GCM. Cai et al. (2009) evaluated 17 CMIP3-GCMs for their simulating capability of temperature and precipitation. The indicator was SS. No single GCM was preferred for the whole world. Some GCMs are better for certain particular regions. Schaller et al. (2011) preferred five best CMIP3-GCMs out of 24 for MME worldwide. They used MME for projection of precipitation and temperature.
Macadam et al. (2010) evaluated 17 CMIP3-GCMs for the variables temperature and temperature anomaly for worldwide, USA and Europe. The indicator used was SS. They used the turnover concept to identify the best and the weakest GCMs. First ranked GCMs for temperature in the case of worldwide, USA and Europe were GISS-ER, ECHAM5 and CCSM3, respectively. A similar trend was observed for temperature anomaly. Watterson et al. (2014) explored precipitation, temperature and mean sea level pressure with non-dimensional arcsin Mielke measure M as indicator. Twenty-five CMIP5-GCMs and 24 CMIP3-GCMs were analysed for each continent and worldwide. Overall, CMIP5 MME represented a modest improvement in skill over CMIP3 for global land (excluding Antarctica) and six continents. Mehran et al. (2014) evaluated 34 CMIP5-GCMs for several parts of the world with Global Precipitation Climatology Project (GPCP) data. CMIP5 simulations and GPCP patterns were in close agreement in many regions. However, their replication is problematical over arid regions and certain subcontinental regions. Grose et al. (2014) evaluated 27 CMIP5-GCMs and 24 CMIP3-GCMs for the western tropical Pacific for projection of precipitation, temperature and mean sea level pressure. Indicators used to evaluate GCM performance were RMSE, SC and . CMIP3-GCMs that performed well were CSIRO-Mk3.5, ECHO-G, GFDL-CM2.0 and MRI-CGCM2.3.2 whereas CMIP5-GCMs that performed well were ACCESS1.0, CCSM4, CNRM-CM5 and NorESM1-M. They cautioned that selection of the best GCMs should not be perceived as guideline for weighting or sub-setting GCMs.
McMahon et al. (2015) evaluated 22 GCMs of CMIP3 category for temperature and precipitation for global land surface data obtained from the Climatic Research Unit. Indicators used were NSE, RMSE and R2. HadCM3, MIROC-m, MIUB, MPI and MRI were found to be preferred GCMs. They also provided insights from 15 related papers with various features. According to them, RMSE was the most preferred indicator. Baker & Taylor (2016) analysed 34 CMIP5-GCMs for top-of-atmosphere and surface radiative flux variance with 44 performance indicators for clouds, and the Earth's radiant energy system observations and GISS surface temperature analysis. CESM, ACCESS and NorESM were the best performing GCMs.
MULTI-MODEL ENSEMBLE OF GCMS WITHOUT EXPLICIT EVALUATION
There are several studies where multi-model ensembles were considered without explicit evaluation of GCMs. The authors of these studies suggested MMEs based on performance of GCMs in previous studies, data availability, holistic understanding of the case study and related analyses. Relevant case studies in this perspective are as follows.
Tian et al. (2017) employed four CMIP5-GCMs, BNU-ESM, GISS-E2-R, MIROC5 and MPI-ESM-LR, for Xiangjiang river basin in central China. These four GCMs are found to capture the ‘major features of distribution and variability of temperature and precipitation throughout China’. Wang et al. (2019) studied an ensemble of 29 CMIP5-GCMs for Xiangjiang watershed and Manicouagan-5 watersheds in China for precipitation, maximum temperature and minimum temperature. They employed several weighting schemes including reliability ensemble averaging (REA), upgraded REA, Bayesian model averaging, etc. They assigned weights to GCM simulations and investigated the impacts of weights on quantification of hydrological impacts.
Bae et al. (2015) employed MME of three CMIP3-GCMs, CGCM3_T47, CGCM2.3.2 and CM4 out of nine GCMs for ascertaining the Asian monsoon region. Das & Umamahesh (2017) employed REA, Bayesian analysis and delta method for combining six CMIP5-GCMs for Wainganga River basin, India for discharge. Abeysingha et al. (2018) used a hybrid-delta method for combining 22 CMIP5-GCMs projections for Gomti River basin, India for precipitation, minimum temperature, maximum temperature and average temperature. Saeed & Athar (2018) evaluated 22 CMIP3-GCMs for the projection of temperature and precipitation in Pakistan and suggested MMEs with all 22 GCMs. Bisht et al. (2019) performed MME of nine CMIP5-GCMs using the Taylor diagram statistics for the projection of precipitation and temperature for different homogeneous monsoon regions of India. MME was proposed based on the competence of GCMs in dealing with the climatic cycle for different homogeneous regions of India. MME was found to simulate the seasonal cycle of the regions with reasonable accuracy compared to IMD data. Vandana et al. (2019) used a hybrid-delta ensemble method for combining 16 CMIP3-GCMs for Brahmani River basin, India for precipitation and temperature. All these researchers employed MME for impact and related studies using hydrological models. Mustafa et al. (2019) employed MME of 22 CMIP5-GCMs for a drought-prone study area in Bangladesh.
Middle East (Iran and Iraq)
Osman et al. (2014) projected daily precipitation based on ensemble of seven CMIP3-GCMs, for three time periods for a case study of Central Iraq. They treated each GCM prediction as an equally possible evolution of climate. Zamani et al. (2017) developed an MME framework of 14 CMIP3-GCMs for precipitation and temperature for southwest Iran. They used a mean observed temperature-precipitation (MOTP) approach for MME. Saki et al. (2018) considered MME of 14 CMIP5-GCMs for projecting mean annual precipitation, maximum temperature and minimum temperature for Isfahan province, central Iran. They mentioned that selection of 14 GCMs is mainly due to the availability of the data for the chosen RCPs. Sayadi et al. (2019) examined the impact of climatic change on maximum temperature, minimum temperature and rainfall using MME of 15 CMIP3-GCMs for Doroudzan catchment, northeast Fars province, Iran for three time periods. Vaghefi et al. (2019) employed MME of five CMIP5-GCMs, namely, NorESM1-M, GFDL-ESM2M, MIROC, IPSL-CM5A-LR, HadGEM2-ES for rainfall, maximum temperature, minimum temperature and occurrences of extreme temperatures with reference to flooding for Iran. Nourani et al. (2019) considered CMIP5-GCMs, CanESM2, INM-CM4, BNU-ESM and their ensembles for projecting temperature and mean monthly precipitation for Tabriz and Ardabil in northwest Iran. These GCMs were chosen due to their successful application in the case study area.
Minville et al. (2010) used an ensemble of five CMIP3-GCMs for Peribonka water resource system, Quebec, Canada. Most of these GCMs were used earlier as part of the Atmospheric Model Inter-Comparison Project. Chen et al. (2017) employed five weighting methods, random weights, equal weights, REA, representation of annual cycle (RAC) in precipitation and temperature and upgraded REA for MME of 28 CMIP5-GCMs for Manicouagan 5 watershed, centre of the province of Quebec, Canada. It was observed that weighting of GCMs had limited impact.
Schepen & Wang (2013) employed MME of six GCMs for projecting Australian seasonal rainfall through Bayesian model averaging (BMA). Yan et al. (2018) considered MME of seven CMIP5 – third phase of the Paleoclimate Modeling Intercomparison Project (PMIP3) GCMs, namely, MRI-CGCM3, MIROC-ESM, MPI-ESM-P, GISS-E2-R, IPSL-CM5A-LR, CCSM4 and CNRM-CM5 for interpreting the Australian monsoon. These GCMs were found to have a better performance in simulating the Australian monsoon. Al-Safi & Sarukkalige (2017) formulated MME of eight CMIP5-GCMs for Richmond River catchment, Australia. According to the authors, MME ‘effectively represents the Australian future climate’. Wang et al. (2018) considered monthly rainfall and temperature for Australia with the ensemble of 33 CMIP5-GCMs. Bayesian model averaging (BMA), support vector machine (SVM), arithmetic ensemble mean (EM) and RF were considered for ensembling. RF and SVM are found to be preferred MME approaches. They also classified GCMs into top, middle and bottom category based on Taylor skill score observation. Al-Safi & Sarukkalige (2019) considered MME of eight CMIP5-GCMs for projecting rainfall and temperature at Harvey, Beardy and Goulburn catchments, Australia.
Adhikari & Nejadhashemi (2016) employed an ensemble of six CMIP5-GCMs, CanESM2, CCSM4, CSIRO-Mk3.6, IPSL-CM5A-LR, MPI-ESM-LR and MRI-CGCM3, for Malawi located in southeastern Africa. These GCMs were chosen due to their capability to predict the seasonal migration of the Inter-Tropical Convergence Zone that governs the precipitation in the Greater Horn of Africa. Akinsanola & Zhou (2019) proposed MME of four CMIP5-GCMs, CanESM2, INM-CM4, IPSL-CM5A-MR and MIROC5, for the West African summer monsoon (WASM) with respect to the present state and future changes. Selection of these GCMs were based on ‘their remarkable performance in reproducing the rainfall characteristics over West Africa’. Results from the historical period indicated that ensemble mean reasonably reproduced the characteristics of WASM rainfall.
Jackson et al. (2011) used ensembling of 13 CMIP3-GCMs for a case study area 70 km west of London for analysing groundwater resources in the uncertain perspective. Hagemann et al. (2013) employed MME of three CMIP3-GCMs, ECHAM5, IPSL, LMDZ-4 and CNRM-CM3 for precipitation, temperature and other climate variables along with eight hydrological models and used the data of WATCH project. The selection of GCMs was based on the available data. However, they demonstrated that uncertainty in selection of hydrology model was larger than that of selection of GCMs. Claudia et al. (2017) employed four CMIP5-GCMs for case study of Alto Sabor watershed, northeast Portugal. Saraiva et al. (2019) studied an ensemble of 21 scenario simulations driven by four CMIP5-GCMs, MPI-ESM-LR, EC-Earth, IPSL-CM5A-MR and HadGEM2-ES for the Baltic Sea and uncertainty in projections was due to GCMs. Niel et al. (2019) employed MME of 24 CMIP5-GCMs with 93 simulation runs for the case study of the Grote Nete catchment and the Dijle catchment, Belgium, for assessing flow extremes. Amraoui et al. (2019) employed seven CMIP3-GCMs, ARPV3, CCCMA-CGCM 3, MRI-CGCM2.3.2, GFDL-CM2.0, GFDL-CM2.1, GISS and MPI-ECHAM5 for ensembling and projection of precipitation and other climate variables for the Somme River basin, northern France.
Schuster et al. (2012) used MME of 14 CMIP3-GCMs for projection of precipitation for the state of Wisconsin, US. Acharya et al. (2012) used MME of 16 CMIP3-GCMs for projection of temperature and precipitation for North Platte River watershed, in the states of Wyoming and Colorado, US. Daraio (2017) used MME of 13 CMIP5-GCMs for the case study of the outer coastal plain of New Jersey, the Upper Maurice River and the Batsto River, US. Acharya (2017) employed 36 CMIP5-GCMs for the case study of Flint River watershed in Northern Alabama, US. Sultana & Choi (2018) used seven CMIP5-GCMs for an American river basin, a snow-dominated alpine watershed in Northern California and GCMs were chosen based on their previous performance. Mani & Tsai (2017) combined 13 CMIP5-GCMs for south Arkansas and north Louisiana, US using hierarchical Bayesian model averaging (HBMA), simple model averaging (SMA) and REA. HBMA performed marginally superior to REA and SMA in simulating the historical mean discharge.
OBSERVATIONS AND DISCUSSION
As discussed earlier, several researchers worked on performance indicators, ranking of GCMs and ensembling of GCMs on different aspects. Related observations and discussion are presented in the following sections.
Number and type of indicators
Observation: Most of these studies did not report the basis of selecting indicators. Smith & Chandler (2010) concluded that no universally agreed criterion existed for the assessment of GCMs.
Discussion: The fundamental question arises whether these many indicators are necessary to evaluate GCMs. Four perspectives exist in this direction, which are independent of each other and can be considered as possible alternatives:
Any one indicator will suffice for evaluating the performance ability of GCMs to simulate observed data on the assumption that all indicators are equally capable. The indicator can be either SS, based on probability density functions, or a much simpler indicator such as RMSE, or any other similar indicator.
The indicator can be selected category-wise (example: one from error, one from correlation coefficient, one from skill score). If required, a composite indicator can be formulated.
Compute weights of chosen indicators using either a rating method, analytic hierarchy process (AHP) or their fuzzy extension. Accordingly, for evaluation, the highest weighted indicator can be considered.
Lastly, the principal component analysis of indicators can be explored for dimension reduction, to arrive at (say) two components which will be sufficient and useful for evaluating the GCMs.
We suggest, these perspectives can be considered as a future research area.
Handling uncertainty in indicators
Observation: There may be imprecision in observed data and GCM simulated data leading to imprecision in the evaluation of GCMs. In addition, interpolation, averaging procedures and approximations can also cause imprecision. These inherent uncertainties were ignored by many researchers (Gu et al. 2015; Ahmed et al. 2018).
Discussion: One way of representing uncertainty relates to probability and another way to fuzzy logic. The first approach is based on distribution functions and hypothesis, whereas the second approach is based on membership functions and natural language. Fuzzy logic is becoming prominent due to its simple mathematics and seamless extension with classical logic. For example, crisp-based decision-making techniques can be easily extended to fuzzy-based by incorporating membership functions with classical logic mathematics intact.
One of the remedies is to provide an uncertainty allocation for each indicator in a fuzzy logic framework in the form of a triangular membership function, as shown in Figure 1, or trapezoidal, Gaussian, etc. In the triangular membership function, q represents the most likely value, whereas p and r represent the extreme values (lower and upper bounds). For example, let the evaluation indicator have most likely value of 0.6 for a GCM; by assuming deviation of (say) 10%, i.e., 0.06. Then the indicator value in the triangular membership framework is (0.54, 0.6, 0.66), respectively, representing (p, q, r). If the researcher feels that all simulated and observed values are precise, data will be represented as (0.6, 0.6, 0.6). Raju & Nagesh Kumar (2015b) used the standard deviation as the uncertain measure. However, the uncertainty allocation depends on the individual choice of the end user. Table 1 presents hypothetical data representing uncertain allocation of indicators that can be utilised while ranking GCMs. For example, GCM1, GCM2, GCM N can be any GCM of CMIP3 or CMIP5 repository.
|[p .||q .||r] .||[p .||q .||r] .||[p .||q .||r] .|
|[p .||q .||r] .||[p .||q .||r] .||[p .||q .||r] .|
However, the impact of these variations can be tested as part of sensitivity analysis, if necessary. Mathematical expressions for NSE, CC, RMSE and procedure of relevant computations are presented in Raju & Nagesh Kumar (2014b).
Ranking of GCMs
Jain et al. (2019) suggested selection of GCMs which would work for all climate variables as well as for all seasons. They cautioned that the outcome would be based on the chosen diagnostics and GCMs. The following common observations have emanated from the studied research papers on ranking of GCMs.
No logic of choosing X GCMs from Y
No unanimity on the choice of GCM selection approach
No complex ranking techniques to handle imprecision in data or resulting imprecision in indicators.
Discussion: Simple crisp decision-making techniques can handle unique value for each indicator. They may not suffice for the problem when data are imprecise that require consideration of uncertainty (Raju & Nagesh Kumar 2014a). Keeping this in view, (a) fuzzy logic-based decision-making techniques and (b) stochastic-based decision-making techniques are proved to be useful. Some of them are:
Fuzzy-based decision-making techniques
Similarity analysis – GCM with a higher degree of resemblance relative to a bench marking GCM was treated as suitable (Chen 1994). The performance indicators were expressed by interval valued fuzzy sets.
Fuzzy Technique for Order Preference by Similarity to an Ideal Solution (F-TOPSIS) – It is based on the philosophy that suitable GCM should have the shortest distance from a positive ideal solution and the farthest distance from the negative ideal solution. F-TOPSIS computes positive separation measure from positive ideal solution, the negative separation measure from negative ideal solution, and the closeness index . The higher the value, the better the GCM. Here, the payoff matrix can follow triangular, trapezoidal or any appropriate format (Opricovic & Tzeng 2004; Behzadian et al. 2012).
Fuzzy VIseKriterijumska Optimizacija I Kompromisno Resenje (F-VIKOR) – It is based on normalised fuzzy difference. F-VIKOR computes separation measures from the ideal and negative ideal and the summation operator with the given strategy weight. GCM with lower value was preferred (Wu et al. 2016; Ploskas & Papathanasiou 2019).
Stochastic-based decision-making techniques
In this type of approach, mean and standard deviation play major roles along with other parameters. Notable techniques in this category are extended TOPSIS in stochastic environment (Xiong & Huan 2010), stochastic VIKOR (Tavana et al. 2016) and Stochastic Preference Ranking Organization METHod for Enrichment of Evaluations (PROMETHEE) (Mareschal 1986; Goodwin et al. 2019). Celik et al. (2019) carried out a comprehensive discussion on stochastic decision-making and probability distribution functions. They also discussed the relevant potential research areas. In our opinion, ranking of GCMs can also be performed in the stochastic decision-making arena which was not explored by many researchers.
Ensembling of GCMs
Numerous researchers suggested multi-model ensembling of GCMs. One group of researchers evaluated GCMs based on performance measures and ranking, whereas other groups of researchers evaluated GCMs based on their expertise, holistic understanding about the case study and related results.
Observation: No logic of how many GCMs can be part of ensembling
Discussion: Limited research work was performed on this aspect. Some of the relevant and promising studies worth replicating are as follows.
Raju & Nagesh Kumar (2016) applied cluster analysis in association with cluster validation methods, namely, F-statistic and Davies–Bouldin Index for identifying optimal ensemble. They evaluated 36 CMIP5-GCMs for minimum temperature, maximum temperature and the combination of maximum and minimum temperature over India. The indicator was SS. Optimal clusters for minimum temperature, maximum temperature and the combination of maximum and minimum temperature scenarios were of two, three and two GCMs, respectively. Accordingly, respective ensembles of GCMs suggested were (ACCESS1.3, HadCM3), (HadCM3, IPSL-CM5A-LR, GFDL-ESM2M) and (MPI-ESM-MR, HadCM3).
Mendlik & Gobiet (2016) developed a subset of representative GCMs using principal component analysis and cluster analysis. They studied their methodology using ENSEMBLES project. Herger et al. (2018) analysed 81 CMIP5 simulations for identifying ensemble subset K. They proposed three approaches: (a) random ensemble, (b) performance ranking ensemble and (c) optimal ensemble. Optimal ensemble approach was found computationally efficient, with sufficient spread in projections as compared to (a) and (b). They used RMSE for their computations. More details can be obtained from Herger et al. (2018).
Ahmed et al. (2020) employed machine learning algorithms, namely, relevance vector machine (RVM), support vector machine (SVM), and artificial neural network (ANN), K-nearest neighbour (KNN) to establish MMEs for annual, monsoon and winter climate variables and found that KNN and RVM based MMEs showed better skills. They also used top ranked approach and bottom ranked approach to ascertain robustness of MMEs. It is suggested to refer to Ahmed et al. (2020) for extensive and informative details regarding machine learning algorithms that can be used as a guideline for MME of GCMs. Readers can refer to Ahmed et al. (2018, 2019, 2020) regarding various challenges and extensive discussion and logic of choosing GCMs from repositories.
Some data mining algorithms that facilitate clustering of GCMs are also briefly described below:
K-means cluster analysis can group GCMs into relatively homogeneous clusters, that is, GCMs in a cluster expectedly more analogous to each other than those in the other clusters (Raju & Nagesh Kumar 2007, 2018). In brief, the procedure of K-means is as follows: (1) initiating with a random number of clusters and allotment of each GCM into a cluster; (2) calculation of group average for each cluster and total error; (3) iteration and classification of GCMs such that termination criterion can be met; (4) estimation of total error for K clusters; (5) repeating of steps 1–4 for various clusters; (6) analysis for optimal K if required.
Fuzzy cluster analysis (FCA) is a classifying algorithm wherein each GCM is related to a cluster with some degree of membership. The procedure of computations is similar to that of K-means algorithm. Degree of membership of each GCM may alter from cluster to cluster bounded by zero to one, on the philosophy that the sum of the membership values for each GCM is one (Raju & Nagesh Kumar 2007).
Kohonen artificial neural networks (KANN) is a self-organising mapping technique comprising competitive layers that use the learning rule to cluster GCMs. Each neuron in the output layer is inter-connected to all those in the input layer by a set of weights (Kohonen 1989; Raju & Nagesh Kumar 2007, 2014a).
ELECTRE-TRI assigns GCMs to some predefined ordered clusters (Rogers et al. 2000; Raju et al. 2000). The limit between two consecutive clusters is demarcated by a profile. Clusters are mutually exclusive which means that one GCM cannot be entrusted to two different clusters. Construction and exploitation of an outranking relation is the basis for allotting GCMs to clusters.
For clustering algorithms, the number of clusters is an input. However, it is challenging to manually estimate optimum clusters for a set of GCMs. In this context, cluster validation techniques play a significant role in finding the optimal number of clusters. Detailed information about cluster validation techniques and their applications are available (Halkidi et al. 2001; Hämäläinen et al. 2017). Details of some of the validation indices, Dunn's, Calinski–Harabasz, Davies–Bouldin, Silhouette and external validation indices are available in Halkidi et al. (2001), Dalton et al. (2009) and Hämäläinen et al. (2017). Most of these validation techniques work with inter-cluster and intra-cluster distances.
To augment the earlier studies on the topic:
Researchers can use published literature to select suitable GCMs and ensembling for their study area.
Research institutions can take the lead, and region-wise ranking of GCMs may be done (if not done already or outcomes of previous studies not confirmed) in a robust manner. These GCMs can be frozen to avoid repetitive efforts of researchers. Time thus saved, can be utilised for analysing the projected outcomes and validating them.
One of the crucial aspects in assessment of GCMs is comparison with observed data which may have such limitations such as, accessibility to all researchers, authenticity, confidence in the data, acquisition procedures and permissions to use the data. It is suggested to have a central database at least for each country where data can be easily accessible. In our opinion, more accessibility of data to users may provide better outcomes which may provide confidence zones to planners for impact studies.
SUMMARY AND CONCLUSIONS
This state-of-the-art review paper has provided insights into mainly three aspects – performance indicators, ranking of GCMs and ensembling of GCMs. This study discussed the role of performance indicators, various types of performance indicators, necessity of GCM evaluation and relevant challenges and the basis of MME. Future research directions can be in terms of handling uncertainty in indicators, and ranking of GCMs both in fuzzy- and stochastic-based decision-making perspectives. In addition, various relevant MME-based approaches that provide optimum ensembling are also presented. In addition, a number of techniques which facilitate ranking and ensembling are also part of the paper.
Brief but relevant conclusions are as follows:
Most of the researchers opined that selection of suitable GCM(s) is necessary for impact studies with the understanding that past skill assessment of GCM(s) might help improved projections of climate variables. Accordingly, related components such as performance measures and suitable decision-making technique(s) played a leading role in the studied papers.
Research papers studied here and elsewhere identified a number of sources of uncertainty which, in fact, effect the accuracy of projections. MME approaches proposed by various researchers have been found promising and replicable in reducing the uncertainties of projections. Subsets of GCMs/efficient GCMs can be part of MME that serves the purpose.
It may be noted that the inferences provided in this review paper are based on interpretation from the reviewed research papers.
This work is supported by the Council of Scientific and Industrial Research, New Delhi, through the Project no. 22(0782)/19/EMR-II dated 24.7.19. The second author would like to acknowledge the funding support provided by Ministry of Earth Sciences, Government of India, through a project with reference number MoES/PAMC/H&C/41/2013-PC-II. The authors would like to thank Prof. Chris Perera, Editor-in-Chief, for encouraging them to write this article and also for his constructive criticism throughout the review process. Some views of esteemed researchers were quoted verbatim to convey their views without losing the meaning. Acknowledgements are due to the esteemed reviewers for providing valuable suggestions which helped us to think more critically while revising this paper.
The Supplementary Material for this paper is available online at https://dx.doi.org/10.2166/wcc.2020.128.