Performance evaluation and ranking of CMIP6 global climate models over Vietnam

This study comprehensively assesses the performance of 29 Coupled Model Intercomparison Project Phase 6 (CMIP6) Global Climate Models (GCMs) and their ensemble mean (ENS_MEAN) over Vietnam. The spatiotemporal variability of near-surface temperature and precipitation is thoroughly evaluated for the 30-year historical period of 1985 – 2014. Results show that the models can reasonably reproduce the observational annual cycles and spatial distribution of temperature and precipitation, though their performances vary across the seven climatic sub-regions of Vietnam. Due to their coarse resolutions, the models produce warm biases over certain highland and mountainous areas, and many of them cannot reproduce the rainy season shift from summer in most sub-regions toward the end of the year in Central Vietnam. Finally, these results are summarized, and a performance score is assigned to each model, inferring a ranking. The top three models are EC-Earth3-Veg, EC-Earth3, and HadGEM3-GC31-MM. Although created by simply averaging all the models, ENS_MEAN can compete with the best GCMs and ranks 4th overall. In the same model family, the higher-resolution experiment exhibits a higher ranking than the lower-resolution one. The results of this study can provide a reference for the choice of appropriate CMIP6 GCMs in the upcoming down-scaling activities in Vietnam.


INTRODUCTION
Climate change is taking place worldwide and is resulting in numerous impacts across the planet (Arias et al. 2021).In order to conduct climate change impact assessments and devise effective response strategies for a specific region, generating future climate projections for the region is a prerequisite.
Climate projections at the global scale have generally been based on outputs from global climate models (GCMs) under the Coupled Model Intercomparison Project (CMIP).The model results of the CMIP phase 6 (CMIP6) (Eyring et al. 2016) have significantly contributed to the recent Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC) (IPCC 2021).While CMIP6 GCMs have demonstrated adequate performance in representing historical climate features across various regions of the world (e.g.Seneviratne & Hauser 2020;Srivastava et al. 2020;Xin et al. 2020;Hong et al. 2021), they still exhibit systematic and region-specific biases arising from different sources.For instance, biases in CMIP6 models can be attributed to their representation of sea surface temperature (Wang et al. 2021;Tong et al. 2022;Rajendran et al. 2022), atmospheric circulation (Richter & Tokinaga 2020;Wang et al. 2021), land-atmosphere interaction (Abdelmoaty et al. 2021;Li et al. 2021), cloud processes (Cesana & Del Genio 2021;Wang et al. 2021), and other factors.Moreover, a model that performs well in one region may not necessarily perform as well in another region.Therefore, studies to rank the performance of individual CMIP6 models over specific regions have been initially performed (Papalexiou et al. 2020;Anil et al. 2021;Desmet & Ngo-Duc 2022;Gebresellase et al. 2022).Notably, Desmet & Ngo-Duc (2022), hereinafter referred to as DN22, have developed a novel method to rank CMIP6 models over Southeast Asia.
Vietnam is among the countries strongly affected by climate change and sea level rise (Dasgupta et al. 2007; Ministry of Natural Resources & Environment 2020).Various studies on climate change in Vietnam have been conducted in recent years.The downscaling of climate information from GCMs for Vietnam has been carried out using statistical (Ministry of Natural Resources & Environment 2009, 2012;Tran-Anh et al. 2022) and dynamical (Ministry of Natural Resources and Environment 2012, 2016, 2020;Ngo-Duc et al. 2014;Katzfey et al. 2016) approaches.Furthermore, the dynamical experiments that downscaled a number of CMIP5 GCMs under the Coordinated Regional Climate Downscaling Experiment -Southeast Asia (CORDEX-SEA) project (Tangang et al. 2020) have also been used to study future climate change characteristics in Vietnam (e.g.Trinh-Tuan et al. 2019;Tuyet et al. 2019;Nguyen-Thuy et al. 2021;Hoang-Cong et al. 2022;Vu Dinh et al. 2022).So far, the downscaled experiments for Vietnam have only been done with the model outputs of the CMIP phase 5 (CMIP5) and earlier.To our knowledge, some research groups are currently preparing to downscale the latest global outputs from CMIP6 GCMs for Vietnam.
To obtain detailed climate information at the local/regional scales, it is necessary to conduct downscaled experiments.However, this process, especially dynamical downscaling, requires significant computational resources.Therefore, choosing a suitable GCM for further downscaling is crucial to improve the accuracy of local-scale downscaled results and optimize computational resources.In such a context, the present study focuses on analyzing and ranking the performance of 29 CMIP6 GCMs over inland Vietnam.The results are expected to provide helpful information for the community in choosing adequate GCMs in upcoming downscaling studies for Vietnam.

Study area
In this study, we perform the analysis only over inland Vietnam (Figure 1).Located in the eastern part of the Indochinese Peninsula, Vietnam has a long coastline and a complex terrain of which three-quarters are covered by mountainous areas.Nguyen & Nguyen (2004) classified Vietnam into seven climatic sub-regions based on temperature, rainfall, radiation, and sunshine hours.These sub-regions, namely the Northeast (hereafter denoted as R1), Northwest (R2), Red River Delta (R3), North Central (R4), South Central (R5), Central Highlands (R6), and South (R7) are widely accepted by the climate research community in Vietnam (e.g.Phan et al. 2009;Ngo-Duc 2014;Trinh-Tuan et al. 2019).While we analyze some aspects of the model data separately by sub-region, we also consider inland Vietnam as a whole for other features (see Section 'Model ranking methodology').

Data
The capability of 29 CMIP6 GCMs in representing climate features over Vietnam is evaluated through their historical experiment for the 30-year period of 1985-2014.Two variables, namely 2m-temperature (T2m) and precipitation (pr), are considered in this study.Corresponding monthly data are downloaded from the Earth System Grid Federation (ESGF) system.The models we focus on, along with their spatial resolution, are listed in Table 1.Two reference datasets are used (referred to as OBS) to assess the models' performance: the Global Precipitation Climatology Centre (GPCC) gauge-analysis product at the 0.5°Â 0.5°resolution (Becker et al. 2013) for precipitation, and the Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE, version AphroTemp_V1808) dataset at the 0.25°Â 0.25°resolution (Yasutomi et al. 2011) for temperature.Both datasets provide land-only information and are derived from quality-controlled station data.
Since the resolution of the GCMs varies significantly, ranging from 0.5°Â 0.5°for CNRM-CM6-1-HR to 3.75°Â 2.24°for MCM-UA-1-0, their outputs are bi-linearly interpolated to the 0.5°Â 0.5°resolution before further analyses and rankings.Moreover, the ensemble mean of all models (referred to as ENS_MEAN) is computed for each variable at the same 0.5°Â 0.5°resolution.The chosen resolution allows direct comparison with the GPCC dataset.The APHRODITE dataset is also upscaled to this grid for convenience.

Model ranking methodology
We apply the method proposed by DN22 to rank each model and the ensemble mean based on their ability to reproduce the regional features of near-surface temperature and precipitation (note that we exclude the evaluation of 850-hPa wind, which DN22 performed, due to the narrow study area being unsuitable for a value-added monsoon circulation analysis).One is invited to acquaint oneself with the original design and implications of the DN22 ranking method, which we summarized below.
For each model and variable, a 0-to-1 score is calculated based on the model's ability to represent the variable's observed temporal and spatial patterns.For the temporal aspect, we rely on the assessment of the climatological 12-month seasonal cycle, averaged over each climatic sub-region.We examine these cycles in relation to observations, i.e.GPCC for precipitation and APHRODITE for temperature.To evaluate the accuracy of the cycles, we use metrics including the correlation coefficient (CC), the normalized standard deviation (NSTD), and the mean bias (MB).Spatial patterns are evaluated throughout inland Vietnam as a whole by focusing on two seasonal means: winter (December-January-February; DJF) and summer (June-July-August; JJA).This aspect of the assessment is based on the spatial versions of CC and NSTD.The statistics listed above constitute the basis of the scoring process.Each x value calculated for CC, NSTD, or MB is then transformed to X as follows: As can be seen, X is of the interval [À∞, 0], and the greater X, the better performance.It allows the score S(X ) to vary in [0, 1] and to increase with the model's reliability: (2) where the tolerance d, the scale factor s , and the base b are all dependent on the ensemble E of the 30 X results (i.e. 29 GCMs plus the ensemble mean) given a variable, an aspect (temporal/spatial), a sub-region/season, and a statistic.Several conditions ruling the choice of d and s have been mentioned in DN22.We opt to use the same expressions as originally proposed: This choice of d implies that the best performance in E gets S ¼ 1. β is selected at last in [1, þ∞], so that the standard deviation of the scores for E is maximum.
At this stage, each statistical criterion has resulted in one 0-to-1 score per model.Global scores are ultimately obtained by successively averaging (arithmetic mean) the scores from the most to the least specific distinctions in the assessment, i.e. in the order of one score per statistic CC, NSTD, and MB; one score per sub-region (season); one score for the temporal (spatial) aspect; one score per variable; and eventually, one global model score.Note that these scoring results are only meaningful for ranking this specific model list, as the scoring formula depends on all outputs' characteristics and would give different scores to the same models in another ensemble.

Annual cycle
Figure 2 shows the 2m-temperature annual cycles over the seven sub-regions of Vietnam for the GCMs, ENS_MEAN, and OBS.In Figure 3, these temperature series' statistics are summarized using Taylor diagrams (Taylor 2001).Each symbol corresponds to one model's performance with regard to its NSTD (radial distance) and CC (polar angle) when compared with OBS (NSTD ¼ 1; CC ¼ 1).Additionally, the distance between a model's symbol and the OBS point is proportional to the centered root mean square difference (RMSD) between the two series: the closer to the OBS point, the better performance.
In general, the models do a relatively good job of representing OBS across sub-regions, especially by capturing the annual temperature range, whether it is relatively large in the north (R1-R4; around 15 °C), or small in the south (R5-R7; 5-10 °C; Figure 2).However, the Taylor diagrams (Figure 3) highlight an ensemble trend in which the annual range is simulated slightly too large, as most GCMs have an NSTD over 1.The ensemble mean's cycle in Figure 2 allows us to identify the causes of this ensemble increased variability: most GCMs underestimate autumn-winter temperatures (except in R6), and warmer summers are generally modeled in R1 and R6.Notably, a strong positive MB is computed for the ensemble in the last sub-region, Central Highlands, which lies in a series of plateaus with heights ranging around 500-1,500 m.This warm bias is caused by the fact that GCMs with coarse resolutions often represent lower-than-real topography over highland areas, hence failing to fully reflect the real environmental lapse-rate, i.e. the decrease of temperature with height, in their simulations.
On another note, some individual results distinguish themselves, such as the northern sub-regions' cycles (R1-R4) of MCM-UA-1-0, which show the lowest CCs (∼0.7-0.8).Indeed, related curves in Figure 2 exhibit a significant overestimation in the first few months of the year, causing the annual peak to be reached 2 months earlier than observed.On the other hand, a few models show high CC (!0.95) and close-to-1 NSTD, albeit not uniformly across sub-regions: EC-Earth3 produces satisfactory results in R1-R3; FIO-ESM-2-0 outperforms in R4; so do GISS-E2-1-G in R5; NESM3 and EC-Earth3-Veg in R6; and ACCESS-CM2 in R7.Models with higher resolution, such as CNRM-CM6-1-HR, EC-Earth3, EC-Earth3-Veg, and HadGEM3-GC31-MM, seem to stand out in most sub-regions, except in R7, where the flat terrain does not particularly benefit from finer grids.Additionally, ENS_MEAN performs better than most models, especially in R5 and R7 (where it exhibits almost no bias; see Figure 2).This suggests that averaging the models can absorb models' outliers, thus improving the results.Overall, the best GCMs for reproducing the annual cycle for 2m-temperature are GISS-E2-1-G, FIO-ESM-2-0, and ACCESS-CM2.
Similarly, Figure 4 shows the annual precipitation cycles over the seven sub-regions of Vietnam for the GCMs and OBS, while Figure 5 gives a synthetic view of their statistics.
Over the sub-regions R1-R3, OBS values exhibit the summer monsoon's characteristics with its peak rainfall occurring in JJA (Figure 4).Most models show high correlation (CC .0.9, Figure 5), but not all of them can reproduce the reference amplitude well.Several models overestimate precipitation in summer, e.g.INM-CM5-0 and MCM-UA-1-0 (Figure 4), leading to high NSTDs of up to 1.4 (Figure 5).On the contrary, some outputs exhibit unreasonably low rainfall values, such as CAMS-CSM1-0 with NSTD less than 0.5.ENS_MEAN is among the best models in these regions, with CC .0.95 and NSTD .0.9.
In Central Vietnam (R4-R5), the rainy season gradually shifts towards the end of the year (Figure 4) because of the interaction between the winter monsoon and the nearby mountain range (Ngo-Duc 2014; Nguyen-Le et al. 2015).By contrast with the overall satisfying representations of temperature analyzed above, in the area, the GCMs face a challenge in simulating observed precipitation patterns (particularly this rainy season shift), as many models have CCs lower than 0.7, or even down to 0.3 (Figure 5).For example, MCM-UA-1-0 still depicts a rainfall peak in May and June (Figure 4).A possible reason lies in its coarse spatial resolution (3.75°Â 2.24°) leading to the failure to capture the fine-scale interaction between the monsoon and the local topography.On another note, most models underestimate the precipitation variability, with NSTD , 1, except for KACE-1-0-g, NESM3 and MCM-UA-1-0 over R4, and for FGOALS-g3 over R5 (Figure 5).The highest performance models in R4 are CNRM-CM6-1-HR, FIO-ESM-2-0, and IPSL-CM6A-LR, with CC ∼ 0.95 and NSTD ∼ 0.7; and in R5, IPSL-CM6A-LR stands out with CC ¼ 0.99 and NSTD ∼ 1. Ensemble averaging smooths the rainfall values from models that well represent the interaction between the winter monsoon and local topography, thus ENS_MEAN does not outperform the above models over these sub-regions.
In R6-R7, the OBS values show the monsoon with the peak rainfall occurring in summer and maintaining high values for a longer period than in R1-R3 (Figure 4).Overall correct phase performances are observed over these sub-regions, with CCs mostly greater than 0.8, except for INM-CM5-0 (Figure 5).However, simulated amplitudes are diverse.In R6, while some models strongly overestimate this feature (e.g.FGOALS-g3 with NSTD ∼ 1.8), others show underestimation (e.g.MPI-ESM1-2-HR with NSTD ∼ 0.6).Additionally, in R7, a strong trend of overestimated variability is observed: NSTDs are mostly over 1 (up to 1.8 with ACCESS-ESM1-5), and this is naturally the case of the ensemble mean as well.This trend is linked to an ensemble overestimation of rainfall in summer (Figure 4).
Overall, some models with good simulation results across all seven sub-regions (i.e.high CC and close-to-1 NSTD) particularly stand out: CESM2-WACCM, FIO-ESM-2-0, IPSL-CM6A-LR, and NorESM2-MM.On the other hand, some models are reliable in one sub-region but less in another.For instance, HadGEM3-GC31-LL displays satisfactory results over R6 but exhibits much larger variability than OBS over R7 (NSTD .1.6).Lastly, ENS_MEAN is among the best models in most of the sub-regions, except in R4 and R5 where many GCMs have difficulty representing the rainy season shift (Figure 4).When

Spatial variability
The spatial analysis has been conducted identically for each model, so that final rankings could be drawn and the best GCM identified for both variables (Section 'Model ranking and discussions').Yet, the models' spatial performances are presented in this section with a focus on the best GCM, ENS_MEAN, and the ensemble spread.
Figure 6 shows the spatial variability of winter (DJF) and summer (JJA) temperatures given by OBS, the best GCM for temperature (HadGEM3-GC31-MM as depicted in Figure 8(a), denoted as GCM_BEST) and ENS_MEAN, along with the standard deviation of the 29 GCMs.During the DJF months, the observational temperature increases from approximately 14 °C in the north to around 28 °C in the south.The lower temperature in the north of Vietnam is attributed to the cold northeasterly wind during the winter monsoon, while in the south, the temperature is influenced by the equator climate.The DJF spatial temperature pattern is well represented by GCM_BEST and ENS_MEAN, with NSTDs of 1.08 and 1.09, and CCs of 0.98 and 0.97, respectively.Both GCM_BEST and ENS_MEAN underestimate temperature over Vietnam, particularly over the northern regions R1-R4, by around 2 °C, which has been previously highlighted with sub-regional means in Figure 2. The models cannot accurately capture the low northwest-southeast temperature stripe in between the sub-regions R1 and R2, which coincides with the location of Hoang Lien Son, the highest mountain range in Vietnam.The warm bias of the models over this mountain range, similar to what happens in R6 (Figure 2), also owes to their coarse spatial resolutions; hence the decrease in temperatures due to the environmental lapse-rate cannot be simulated effectively.The ensemble spread (Figure 6(d)), which can be used as an indicator of the similarity among models, indicates that temperature outputs are highly consistent in R5 and R7, where the ensemble spread is less than 1 °C, but more varied in other regions with more complex topography, such as Central Highlands and the western mountainous areas of Vietnam.
During the JJA months, the observed temperature is high (28-30 °C), except in sub-regions R1 and R6 (22-26 °C).Although GCM_BEST tends to overestimate temperature in certain sub-regions, such as R1, R3, and R6, by 1-3 °C, and still cannot accurately resolve the cold temperature features induced by the mountain ranges in the northwest and Central Highlands, its performance is generally good with 0.82 NSTD and 0.84 CC.The ensemble mean tends to smooth out the local temperature features, leading to the overall decrease of NSTD to 0.62 though its CC is still comparable with GCM_BEST (0.82).The ensemble spread indicates higher values of 1.5-2 °C in Central Highlands (R6), suggesting the high uncertainty of the GCMs in this sub-region whatever the season.On the contrary, the spread in JJA differs by around 1 °C from that of DJF in northern Vietnam: there is a broader consensus among the models regarding summer temperature in sub-regions R1-R4.
Figure 7 is similar to Figure 6 but for precipitation.In DJF, the observed precipitation in Vietnam is typically less than 50 mm/month, except for the central coastal region of R4-R5, where monthly rainfall can reach as high as 160-200 mm (Figure 7(a)).GCM_BEST (i.e.EC-Earth3 depicted in Figure 8(b)) demonstrates good agreement with the observations, with NSTD and CC values of 0.92 and 0.93, respectively, though it slightly underestimates R4-R5 precipitation peaks (Figure 7(b)).The ensemble mean, on the other hand, cannot represent these maximums at all, as the amount of rainfall appears to be spread over nearby sub-regions and even down to R7 (Figure 7(c)).This goes along with lower values of both NSTD (0.61) and CC (0.72).The lower performance of ENS-MEAN is further supported by the high ensemble standard deviation in southern Vietnam.In particular over R6 and R7, the uncertainty reaches 30-50 mm/month, which is in the same order of magnitude as the OBS values.In summer, the precipitation is much higher with a minimum of 100 mm/month in R4 and R5 sub-regions, which is the opposite of the winter spatial pattern.The maximum precipitation of 500 mm/month can be observed in R1, R2, and R6.GCM_BEST demonstrates a reliable large-scale distribution; however, it fails to reproduce local scale patterns, with the local maximums in R1 and R2 being ignored, while those of R6 are overestimated.This discrepancy results in a relatively low CC of 0.68.Furthermore, the ensemble mean exhibits even poorer applicability as it smoothes out most of the spatial variability (NSTD ¼ 0.54) (Figure 7 ).The ensemble spread shows high uncertainty everywhere, ranging from a minimum of 70 mm/month to 200 mm/month in R6.

Model ranking and discussions
The final ranking of the 29 CMIP6 GCMs and the ensemble mean for inland Vietnam is presented in Figure 8.The figure clearly shows the top two models, which are EC-Earth3-Veg and EC-Earth3, followed by HadGEM3-GC31-MM and ENS_MEAN (Figure 8(c)).It is worth noting that of the three best models, one excels in temperature (HadGEM3-GC31-MM), one in precipitation (EC-Earth3), and one is the second-best for both temperature and precipitation (EC-Earth3-Veg).The top ten models are generally among the best for both variables' rankings (Figure 8(a) and 8(b)).However, there are exceptions due to offsets between the two variable scores of a model.For instance, NorESM2-MM ranks 8th overall because of its top-three precipitation performance, compensating for its lower temperature score (ranking 21st).On the other hand, HadGEM3-GC31-MM ranks poorly for precipitation (11th) but catches up with a top-one temperature performance, so it eventually ranks 3rd.Among the top ten models, those ranking in the top ten for both variables are EC-Earth3-Veg, EC-Earth3, ENS_MEAN, CNRM-CM6-1-HR, and GISS-E2-1-G.
While ENS_MEAN was created by simply averaging all the models, it competes with the best GCMs and ranks 4th overall.Interestingly, its final rank is even higher than its component ranks (7th and 8th for temperature and precipitation, respectively).Yet, it does not outperform all individual models, which differs from the conclusion of Li et al. (2022).We also computed additional integrated models using their method, but these models did not outperform any of the available GCMs in Vietnam.Most of the time, the integrated models underestimated spatial variability and had a lower correlation, resulting in a lower score.This divergence could be due to the design of our ranking methods and the fact that our study area is smaller, containing only 147 grid points, while Li et al. (2022) used more than 87,000 grid points over global land.
Figure 8 also reveals the temporal and spatial scores of each GCM.For temperature, the highest-applicability models seem to be supported by a balanced contribution of both aspects.However, from the 14th GCM, the temporal score appears to dominate.This is linked with the spatial/temporal difference in score variability observed for both variables with the shadings in Figure 8(a) and 8(b).In particular, EC-Earth3 and EC-Earth3-Veg stand out in simulating precipitation, owing to their substantially higher spatial performance, weighing more than two-thirds of their precipitation score and around 43% of their final one.These above comments could lead to the conclusion that the models' spatial performance unfairly weights more than the temporal aspect.Indeed, it is noteworthy that the temporal branch of the scoring process goes through more assessment criteria than the spatial branch (notably: seven sub-regions against two seasons), making it statistically less probable to score high after the multiple averages of the former branch.Yet, the selected statistical coefficients convey interrelated features of the GCMs' outputs, and more ranking criteria provide more opportunities to stand out.In that respect, when a scoring process results in a smaller range (i.e.here the temporal scoring), it means that the assessed features do not allow one model to differentiate, indicating a strong consensus among the ensemble members or compensations of good and bad performances throughout the evaluation.As a result, this scoring branch may account for less than the others.Furthermore, the precipitation score's contribution to the final rank fluctuates more than that of temperature assessment (precipitation scores vary in a larger range), thus having more influence.But in this case, the number of ranking criteria is the same for both variables, and the unbalance necessarily comes from the GCMs themselves.Therefore, in line with Sections 'Annual cycle' and 'Spatial variability' where several challenges in modeling precipitation over Vietnam have been discussed, it can be concluded that the performance in modeling precipitation (in particular its spatial features) has been a key point for ranking this GCM ensemble.
EC-Earth3-Veg, EC-Earth3, HadGEM3-GC31-MM, CNRM-CM6-1-HR, CESM2, GISS-E2-1-G, and FIO-ESM-2-0 are all in the top ten of both SEA (DN22) and Vietnam ranking.This introduces diverse links between this study and the results from DN22 (in this paragraph, rank comparisons are made with adjustments when ignoring the positions of ENS-MEAN and HadGEM3-GC31-LL, which were not part of the ranking in DN22).In particular, DN22 conducted an analysis of the same models' performance in capturing the regional wind circulation, which should be considered to complete this study's ranking.As such, satisfying representation of SEA general circulation supports the good Vietnam ranks of EC-Earth3-Veg, EC-Earth3, CNRM-CM6-1-HR, and CESM2; but not for HadGEM3-GC31-MM, which showed below-average wind performance.However, reliable wind should not outweigh the other variables in decision making.MPI-ESM1-2-HR is a good example of a model that produced excellent wind (ranked 3rd for SEA in DN22) but ranks average in the end (11th for SEA and 18th for Vietnam).Furthermore, the top six wind patterns of GFDL-ESM4 and FGOALS-f3-L have not been particularly helpful for scoring high over Vietnam, as one can notice a significant downgrade between their SEA and Vietnam final positions (7th to 19th and 4th to 24th, respectively).In addition to the fact that in the present study, they do not benefit from their wind score anymore, these rank changes also seem to originate from a substantial performance gap when looking at their annual cycles between the Maritime Continent (i.e. the near-equatorial region constituted with Malaysia and Indonesia) and the Indochinese Peninsula (these models performing better over the former).Conversely, reverse performance asymmetries (i.e.better applicability over Indochina than over the Maritime Continent) have been responsible for upgrading the ranks of, notably, CESM2-WACCM (17th to 8th) and NorESM2-MM (15th to 7th), whose general circulation performances were average.Lastly, if one is interested in using the outputs from IPSL-CM6A-LR and MRI-ESM2-0 in the region, their spatial performance over SEA should be taken into account, as precipitation and wind spatial patterns scored below the average.These GCMs have been upgraded from 20th and 19th over SEA to 11th and to 10th over Vietnam, respectively.

CONCLUSIONS
In this study, we examined and ranked the performances of 29 CMIP6 GCMs and their ensemble mean in representing nearsurface temperature and precipitation over the seven climatic sub-regions constituting the mainland of Vietnam.
In general, the models reasonably reproduce the observational annual cycles of temperature and precipitation, although their performance varies across sub-regions.Yet, the GCMs tend to overestimate the annual temperature variability, explained by either an autumn-winter temperature underestimation and/or a warm bias in summer.The uncertainties between the GCMs are more pronounced for precipitation than for temperatures, both in the annual cycle and in spatial variability.In complex topographical regions, the models face a challenge in simulating observed patterns due to their coarse resolutions, resulting in warm biases over certain highland and mountainous areas and many of them are incapable of reproducing the rainy season shift from summer in most sub-regions towards the end of the year in Central Vietnam.
Following the methodology of DN22 though only for temperature and precipitation, a performance rank is assigned to each model.The top three models are EC-Earth3-Veg, EC-Earth3, and HadGEM3-GC31-MM.ENS_MEAN competes with the best GCMs and ranks 4th overall, explained by the fact that the averaging operator may absorb models' outliers, thus improving the results.The model resolution seems to affect the final ranking: in the same model family, the higher-resolution experiment exhibits a higher ranking than the lower-resolution one.
Furthermore, seven out of the top ten models in Vietnam (excluding ENS_MEAN) are also among the top ten GCMs in Southeast Asia.These models are EC-Earth3-Veg, EC-Earth3, HadGEM3-GC31-MM, CNRM-CM6-1-HR, CESM2, GISS-E2-1-G, and FIO-ESM-2-0.This result should strengthen the confidence in the models on this list for simulating the regional climate, although each model produces local specificities that should be taken into account.In future work, it would be beneficial to examine the performance of the top-ranking GCMs both before and after applying a bias-correction technique.Such a study would provide insights into the degree to which bias correction improves model performance, as well as whether it affects the ranking of the models.
The Ministry of Natural Resources and Environment of Vietnam plans to update the national report on climate change and sea level rise scenarios by 2025 based on a new set of downscaling experiments from CMIP6 GCMs (P.T.T. Nga from the Vietnam Institute of Meteorology, Hydrology and Climate Change, personal communication, 28 September 2022).Since dynamical downscaling often requires large computational resources, it is not possible to downscale all CMIP6 GCMs for a limited area like Vietnam.The results of this study can help to identify the most appropriate GCMs to be considered in the upcoming downscaled experiments in Vietnam.

Figure 1 |
Figure 1 | Inland Vietnam and its seven climatic sub-regions.Grid cells with the size of 0.5°Â 0.5°are displayed.The inset map in the upper right shows the location of Vietnam within Southeast Asia.

Figure 2 |
Figure 2 | Annual cycles of temperature over the seven sub-regions of Vietnam, represented by the GCMs and the reference dataset OBS.

Figure 3 |
Figure 3 | Taylor diagrams of the GCMs' temperature annual cycles over the seven sub-regions of Vietnam.

Figure 4 |
Figure 4 | Same as Figure 2 but for rainfall.

Figure 5 |
Figure 5 | Same as Figure 3 but for rainfall.

Figure 6 |
Figure 6 | Spatial distribution of DJF (upper panel) and JJA (lower panel) temperatures for (a,e) OBS, (b,f) the GCM that best represents temperature, (c,g) the ensemble mean, and (d,h) the standard deviation of all 29 models.Units in °C.

Figure 7 |
Figure 7 | Same as Figure 6 but for precipitation.Units in mm/month.

Figure 8 |
Figure 8 | Scores and ranks of 29 CMIP6 GCMs and their ensemble mean for (a) 2m-temperature (b) precipitation, and (c) all.Light and dark yellow highlight the contribution of temporal and spatial scores for temperature, respectively, while two shades of cyan are used similarly for precipitation.The top three models for Vietnam are labeled with numbers and the next seven models are marked with an asterisk.Please refer to the online version of this paper to see this figure in colour: https://dx.doi.org/10.2166/wcc.2023.454.