Circulation type analysis of regional hydrology: the added value in using CMIP6 over CMIP5 simulations as exemplified from the MPI-ESM-LR model

This study addresses the applicability of general circulation models (GCMs) in studying the impact of climate change on hydrology. The statistical downscaling of precipitation based on circulation types (CTs) derived from the (fuzzy) obliquely rotated principal component analysis is suggested as a robust methodology in using climate models to research the impact of climate change on hydrology. The methodology allows understanding of the mechanism of atmospheric circulation in the study region, and the physical relationship between atmospheric circulation and the regional hydrological cycle. The capability of climate simulations from the MPI-ESM GCM to reproduce the observed CTs in the target region is examined in light of the uncertainty of atmospheric GCMs when used for circulation typing. The results were discussed and it showed that, generally, the analyzed GCM can reproduce the underlying physics of atmospheric circulation in the study region, represented by the CTs, together with their dominant periods, probability of occurrence, and annual frequency of occurrence with modest biases. Generally, the Coupled Model Intercomparison Project 6 (CMIP6) simulation indicates some improvement for the CT-based analysis relative to the CMIP5 counterpart; however, this depends on the analyzed CT.


INTRODUCTION
The impact of climate change on the regional hydrological cycle is of key interest in climate change impact studies. There is a need for optimized prediction of changes in moisture circulation in response to greenhouse gas radiative heating, which has already started to cause extreme conditions in dry and wet episodes in different parts of the globe (IPCC 2013). In the regional context of southern Africa, climate change can be associated with an increase in the intensity of tropical cyclone activities in the south Indian Ocean (e.g., Fitchett 2018) and changes in atmospheric modes that govern the regional moisture and heat circulation (e.g., Ibebuchi 2021aIbebuchi , 2021cLu et al. 2007). The poor economic situation in this region put it in a precarious situation. Hence, this study focuses its analysis on an optimized approach for the application of general circulation models (GCMs) and downscaling methods, in studying hydrological events, in addition to the physical relationship between extreme climatic conditions and extreme hydrological events.
GCMs are the tools to both understand the climate and study the climate change (Abiodun et al. 2008;Paeth et al. 2011). More often there is a reliance on a physical/statistical relationship established in the historical climate that is transferable in the future climate change scenarios to predict future changes in water distribution. Thus, a stationary relationship, that is valid under all climatic conditions, is necessary. The common application of GCMs in the prediction of water resources evolution is based on seasonal predictions (e.g., Phakula et al. 2018). However, this kind of study might not be robust given that seasons can shift in response to warmer global temperatures (Thomson 2009) so that hydrological predictions based on seasonal transfer functions developed in the current climatic condition might be inapplicable in the future climate change scenarios. Moreover, given the transient and fuzzy nature of atmospheric processes, there are no stationary atmospheric signals confined in a given seasonfor example, even the so-called wet seasons can be associated with periods dominated by drying signals. On the contrary, using the principal component analysis (PCA) applied to a T-mode (i.e., variable is time series and observation is grid point) climatic field that explains atmospheric circulation, a time decomposition resulting in an ample set of physically interpretable patterns of atmospheric circulation can be obtained (Compagnucci et al. 2001;Ibebuchi 2021aIbebuchi , 2021bIbebuchi , 2021c. The time-scale decomposition associated with this technique, when applied meticulously (e.g., as explained in Ibebuchi (2021a)), can capture the three aspects of time-scale variability comprising of trends (i.e., variations due to an anthropogenic influence), low-frequency, and high-frequency variability (Compagnucci et al. 2001).
Unlike forecasts based on seasons, atmospheric circulation patterns are stationary in the sense that the physical operational mechanism (and existence) of the CT are not subjected to time changes. However, the amplitude of the CT might be altered which can resultantly alter the relationship between the CT and regional hydrology (e.g., Wetterhall et al. 2012;Ibebuchi 2021a). Thus, a CT known from the historical analysis can be reproduced in the future climate together with its operational mechanism (e.g., Ibebuchi 2021a), so that the impact of radiative heating on the relationship between the CT and hydrological events can be examined. This kind of analysis also makes it possible to separate wetting and drying signals based on the CT, understand the mechanisms associated with wet and dry episodes in the target region, and analyze future climate change signals based on the CT-hydrological event relationship. The CT analysis can thus suffix for the lack of understanding of the mechanisms of atmospheric circulation and hydrological cycle in addition to the link between both.
To this end, the question is can GCM simulations reproduce CTs together with their physical and statistical properties as observed in reanalysis data sets. Studies have proved that under the historical analysis, GCMs can reproduce CTs as observed from reanalysis (e.g., Huth 2000; Sheridan & Lee 2010). However, reproducing the same CTs under future climate change scenarios is still uncertain and subjected to both uncertainties arising from the method used and the choice of GCM (e.g., Huth et al. 2008). Given the diverse techniques for the CT, it is almost difficult to conclude which CT arising from a given method is an actual synoptic condition and which is just a statistical artifact. Addressing this issue, in the light of climatological classifications, Gong & Richman (1995) noted that the negligence of researchers to examine the consequence of applying hard clustering techniques to a fuzzy and overlapping climatic field can result in a lack of consistency in climatological classifications. Following the justification of the rotated S-mode PCA by Gong & Richman (1995) as a fuzzy precipitation regionalization tool that outperforms other hard clustering techniques, Ibebuchi (2021a) showed that the T-mode analysis as used in this paper can be optimized and used in a fuzzy manner given the fuzzy nature of the PCA loadings. Moreover, Compagnucci et al. (2001) and the citations therein stated 'T-mode proved to be a useful tool for extracting and reproducing the circulation types, quantifying their frequency and showing the dominant weather periods in them'. Hence, the obliquely rotated T-mode PCA can be used to separate physically interpretable drying and wetting signals in the target region. Here, it is applied to two GCMs [one from the Coupled Model Intercomparison Project 6 (CMIP6) and its counterpart from the CMIP5] in hope of (i) reproducing the CTs as observed from the reanalysis data set; (ii) assessing the improvement in the CMIP6 model relative to the CMIP5; (iii) investigation of the aspects of the CTs where the GCMs need an overall improvement for such CT-based analysis of hydrological events in southern Africa and beyond.

DATA AND METHODOLOGY
The sea level pressure (SLP) and precipitation reanalysis data sets are obtained from the ERA5 (Hersbach et al. 2020) at a horizontal resolution of 0.25°longitude and latitude. The SLP and precipitation data sets from the MPI-ESM-LR GCM are obtained for the CMIP5 and CMIP6 simulations (Eyring et al. 2016) for the historical analysis. The MPI-ESM GCM is selected among the CMIP models, because of its capability to reproduce the CTs in the target region (e.g., Ibebuchi 2021a, 2021c). All data sets are obtained for the 1950-2005 period.
The methodology for the classification of the CTs and the detailed justification for all the subjective decisions followed are described in detail in previous studies (Ibebuchi 2021a(Ibebuchi , 2021b(Ibebuchi , 2021c. The obliquely rotated T-mode PCA is used for the CT classification (Richman 1981). The standardized SLP data are related using the correlation matrix. Singular value decomposition is applied to the correlation matrix to obtain the PC scores, eigenvalues and eigenvectors. The eigenvectors localize in time the patterns captured by the PC scores (Compagnucci & Richman 2008). The eigenvectors are weighted with the square root of their corresponding eigenvalues to obtain the PC loadings which have a different magnitude (Richman & Lamb 1985). The decision of the number of components to retain is based both on sensitivity analysis which implies that adding new components iteratively uncovers a new pattern that has not been delineated by previous vectors. The retained components are rotated obliquely with the Promax at a power of 2 (Richman 1981). For each retained component, a hyperplane threshold of +0.2 (Richman & Gong 1999) is used to separate loadings within the zero-interval (noise) from signal. Thus, each retained Uncorrected Proof component forms two classes (above and below the +0.2 threshold). The mean of the days in each class is the CT. The fuzziness of the approach is both because the oblique rotation allows inter-correlation of the PC scores and also a day can have its loadings greater than or less than +0.2. Thus, a day can be assigned to .1 CT, which logically implies the CTs that occurred on the day in question.
The classification process is applied to the three SLP data sets (i.e., ERA5, MPI-ESM GCMs for CMIP5 and CMIP6 simulations), and the resulting CTs from the two GCMs are compared to the observed CTs from the ERA5 in light of the added value from the CMIP6 GCM with respect to (i) the mean shape of the CTs; (ii) probability of occurrence of the CTs; (iii) annual cycle of the CTs; and (iv) distribution of mean precipitation from the CTs designated to be associated with extreme conditions in the study region.

RESULTS AND DISCUSSION
The climate of Africa south of the equator is marked by regional heterogeneity. It comprises the equatorial climate of rainforests, tropical savanna climate, arid climate of warm deserts, semiarid steppe climate, and the Mediterranean climate type at the southernmost tip of Africa. Vast regions in southern Africa receive rainfall during austral summer (December-February). Austral winter (June-August) is relatively drier. Figure 1 shows the classified CTs as observed from the ERA5 data and the MPI-ESM GCM, exemplified from the CMIP6 simulation. It can be seen that the CTs were reproduced when the classification scheme is applied to the climate model suggesting that the CTs reflect the underlying physics in the SLP data sets. From Table 1, the Person correlation coefficient (R) between the maps of the CTs from the CMIP5 and CMIP6 simulations as compared to the ERA5 shows that, generally, the analyzed climate models captured the spatial variation of SLP under each of the CT (R.0.9 in most cases). Under the CMIP6 simulation, there is a relative improvement in the spatial variation of the SLP field under each CT as observed from the ERA5 reanalysis. Also, the centers of action associated with each CT and the amplitude are faithfully reproduced by the climate model ( Figure 1). Figure 2 shows the probability of occurrence of the CTs. It was calculated as the ratio of the number of days assigned to a CT to the total number of days in the analysis period, expressed in percentage. The probabilities do not add up to 100% since   the classification is fuzzy (i.e., .1 CTs can occur in a day). The GCMs upon reproducing the CTs also quantified accurately the structure in the probability of occurrence of the CTs. CT1, CT4, CT5, and to some extent CT7 have a high probability to occur and are interpreted as the dominant states of the atmosphere. CT1 is close to the overall climatology of atmospheric circulation in the region and CT5 is close to the austral summer climatology of atmospheric circulation in the region (Figure 3). For the probability of occurrence of the CTs, the added value in using the CMIP6 simulation over the CMIP5 is slim, based on the mean absolute error (MAE), if at all there is any. Previous analysis linking the classified CTs to hydrological extreme conditions in southern Africa and beyond indicated that the CT12 and CT13 can be associated with wet events in the study domain, while the CT6 can be associated with dry events, except for the regions with the Mediterranean type of climate (e.g., Ibebuchi 2021aIbebuchi , 2021bIbebuchi , 2021c. Also, other CTs can be associated with dry and wet conditions with respect to the local climate of interest. The CT13 brings widespread rainfall in southern Africa as a result of enhanced onshore moisture transport by the western branch of the Mascarene high; the CT12 brings enhanced rainfall to the eastern regions of southern Africa as a result of the enhanced convective/cyclonic activity in the southwest Indian Ocean. Under the CT6, westerlies prevail in the study domain, so that onshore moisture transport by southeasterlies is weakened. Further analysis will focus on the CTs designated as close to the mean patterns of atmospheric circulation in the study region (i.e., CT1, CT4, and CT5) and the CTs that can be associated with widespread extreme conditions in hydrological events in the study region (i.e., CT6, CT12, and CT13). Figure 3 shows the annual cycle of the selected CTs from the ERA5 and as simulated by the GCMs. The annual cycle of the CTs is calculated as the ratio of the number of days assigned to a given CT for a specific month, and the total number of days classified under the CT. It can be seen that, generally, the GCMs can reproduce the dominant period of the CTs. Table 2 shows the MAE in the annual cycle of the CTs from the climate models as compared to the ERA5. The performances of the simulations are dependent on the CT in question, but overall, the CMIP6 outperforms the CMIP5 simulation. Figure 4 shows the annual frequency of occurrence of the CTs for the 1950-2020 period. It was calculated as the count of days per year when the CT occurred; in other words, the count of days per year with loadings under each class greater than the hyperplane threshold. The GCMs faithfully capture the annual occurrence of the CTs though with some biases which are specific to the CT in question. From Table 3, it is not clear if there is an added value in the CMIP6 simulation for the annual   Uncorrected Proof occurrence of the selected CTs. It seems that the improvement is dependent also on the CT considered. However, the inclusion of the period before the satellite age (i.e., before 1979), in the ERA5 classification might also introduce some bias in the annual occurrence of the CTs, so that the observation from the ERA5 might not be a true in the actual sense. Furthermore, in the subsequent study, which is intended to be made in a multi-model context, the continuous wavelet spectral analysis will be incorporated in the time-frequency analysis of the data sets. The precipitation estimate from climate models is often biased relative to the observed. The spatial distribution of precipitation under each CT is captured by the GCMs as obtained from reanalysis, however, the GCMs exhibit wet bias (not shown). For example, under CT6 which is dry, the average precipitation in the study domain is significantly higher in the GCMs compared to the ERA5, though there is a slight improvement in the CMIP6 simulation ( Figure 5).
To this end, the large-scale mechanism (CTs) that governs the evolution of rainfall in the study region is captured by the GCMs. The CTs can be reproduced in future greenhouse warming using appropriate climate models (e.g., as done in Ibebuchi 2021a, 2021c). Thus, the need for the stationarity of a statistical transfer function, established in the current climate, under any given climatic conditions and analysis period is met by the CT analysis. The operational mechanism of the CT itself does not change, but (slight) displacements and the strengthening/weakening of the amplitude of the synoptic systems captured under a given CT, in response to radiative heating, can alter moisture circulation and convective processes, which will in turn influence the relationship between the CT and regional precipitation. Thus, the separation of wetting and drying signals dependent on the CT, and its further application to unpick future climate change signals are the suggestion introduced in this work as a robust means to combine the statistical technique and physical understanding in the prediction of future rainfall changes.
Finally, a comparison of the added value for the CT-based analysis in using the CMIP6 simulations over CMIP5, in a multimodel context is necessary to at least establish the need for shifting completely to the updated version of the CMIP models. The CT-based analysis can be applicable to probe the ability of the climate models to simulate the underlying physics in a climate field that explains atmospheric circulation. A study by Bracegirdle et al. (2020) found improvements in circumpolar southern hemisphere extratropical atmospheric circulation in the CMIP6 compared to the CMIP5. The authors suggest relative improvement in the physics of the climate models under the CMIP6 simulations. Similarly, other studies have reported improvements in the performance of the CMIP6 simulations compared to the CMIP5 (e.g., Fan et al. 2020).

CONCLUSION
The application of the fuzzy obliquely rotated T-mode PCA to SLP data sets from the ERA5, CMIP5, and CMIP6 climate simulations from the MPI-ESM-LR GCM is used in classifying CTs in Africa south of the equator. The climate simulations were able to faithfully reproduce the CTs, together with their probability of occurrence, dominant periods, and annual frequency of occurrence as observed from the ERA5. Overall, the CMIP6 simulation showed marked improvement in the spatial variation of the SLP field for the classified CTs, and the annual cycle of the CTs, though this is dependent on the CT in question. The results suggest that the CT analysis with the rotated T-mode PCA can be a good tool in the application of climate models to establish a physical link between large-scale atmospheric circulation patterns and regional precipitation, and also to predict future changes in water circulation under climate change.

CONFLICTS OF INTEREST
The author declares no conflict of interest.
All relevant data are available from an online repository or repositories.