A new downscaling approach and its performance with bias correction and spatial disaggregation as contrast

Bias correction and spatial disaggregation (BCSD) is widely used in coupling general circulation models (GCMs) and hydrological models. However, there are some disadvantages in BCSD, such as only one GCM being selected, correcting biases through quantile-mapping (QM), and downscaling through interpolation. Then a combined approach of canonical correlation analysis filtering, multimodel ensemble, and extreme learning machine (ELM) regressions (CEE) was advanced. The performance of CEE and BCSD was evaluated with Manas River Basin as a study area. Results show it is unreasonable to correct biases through QM as it implies that the climate remains unchanged. Multi-model ensemble provides additional information, which is beneficial for regressions. CEE performs better than BCSD in temperature and precipitation rate downscaling. In CEE, the residual in temperature forecasting can be lower than 0.05 times temperature range and that in precipitation rate can be 0.33 times precipitation rate range. The performance of CEE in temperature downscaling in plains is better than mountainous areas, but for precipitation rate downscaling, it is better in mountainous areas. Increasing rate of temperature in the basin is 0.0254 K/decade, 0.1837 K/decade, and 0.5039 K/decade, and that of precipitation rate is 0.0028 mm/(day × decade), 0.0036 mm/(day × decade), and 0.0022 mm/(day × decade) in RCP2.6, RCP4.5, and RCP8.5,


INTRODUCTION General
General circulation models (GCMs) have become the most reliable and widely applicable way to assess climate change and forecast climate scenes in inter-decadal studies.
The climate predictions in the five Intergovernmental Panel on Climate Change (IPCC) Assessment Reports are all based on GCMs. However, the spatial resolutions of the GCMs5 are generally 2 W × 2 W , which is so coarse that the GCMs cannot be applied directly in regional-scale studies. Besides, the biases in GCMs should be corrected before their application in regional-scale studies. Due to the complexity of the general circulation, the difference between the simulated climate in GCMs and the practical climate cannot be eliminated thoroughly. It means that the average, variation and tendency biases will remain in the near future. As well, the spatial resolutions of GCMs cannot be fine enough due to the intolerable calculation and the relatively slow computing speed. Therefore, the downscaling approaches which transform their outputs from coarse grids to those with the necessary resolutions (Maurer & Hidalgo ) will continue to play an important role in regional studies. Some downscaling approaches can remove the systemic biases as well.
The statistical downscaling approaches which transform the coarse-scale outputs of GCMs to a much finer scale through trained transfer functions (Li et al. ) have been widely used in many studies (Vrac et al. ). Christensen & Lettenmaier ). Is the performance of downscaling one certain GCM better than a multi-model ensemble? Second, the spatial resolution of GCM will inevitably be finer with interpolation error introduced or coarser with information lost when unifying the GCM mesh and the observed dataset mesh. Is it possible to avoid changes in GCM resolution? Third, the quantile-mapping (QM) approach was applied to correct the biases in BCSD. Is the QM correction reasonable? Finally, the spatial downscaling was achieved through interpolation in BCSD.
However, the spatial resolutions of most GCMs are coarser than 1 W × 1 W , which is much coarser than the necessary resolutions in regional studies. Besides, the interpolation approaches generally perform worse than statistical regressions in mountainous areas. Is it reasonable to achieve the spatial downscaling through interpolation?
To overcome the disadvantages of BCSD, canonical cor-

Filtering GCMs in both CEE and BCSD
As there are 28 GCMs, it is necessary to filter them to obtain those with the best simulation capacities on temperature or precipitation. CCA is a statistical approach to evaluate the correlation between two matrixes (Hotelling ). In CCA, the two matrixes were converted to several pairs of canonical variables which contain the main information of the two matrixes first. Then the linear correlations between the canonical variables were calculated to assess the relationship between the two matrixes. CCA has been widely applied in many fields, such as facial expression recognition (Zheng et al. ) and statistical regressions (Merola & Abraham ). Therefore, the CCA approach was selected to filter the 28 GCMs.
The mean monthly temperature and precipitation rate of the selected ERA-Interim points in the period from January 1979 to December 2015 were extracted to constitute the ERA-Interim matrixes first. The mean monthly near-surface air temperature and precipitation flux of the nearest 25 Finally, CCA filtering was carried out and GCMs with the best simulation capacities were obtained.

Empirical Bayesian kriging interpolations in BCSD
To avoid changes in GCM spatial resolutions, the ERA-Interim dataset was interpolated into the regional dataset.
Then the ERA-Interim temperature and precipitation rate corresponding to the selected GCM points could be extracted from the regional dataset. As the ERA-Interim dataset spatial resolution is much finer than GCM, the interpolation error is relatively smaller.
Empirical Bayesian kriging (EBK) is a geostatistical interpolating approach which automates the parameters through a process of subsetting and simulations when building the kriging models. EBK calculates the interpolation error by estimating the underlying semivariogram. The interpolating results of EBK are generally better than other approaches (Cooper et al. ). EBK is very easy to apply due to its minimal interaction in building models and the most accurate standard error of prediction, especially in studies needing interpolations in batches.
As the interpolations should be done month by month, there are up to 888 interpolations in total, which are almost impossible to interpolate one by one. Therefore, the EBK approach was selected to interpolate the ERA-Interim temperature and precipitation rate in batches. The interpolating regions should contain all the selected GCM points. The interpolation should be done in projected coordinate systems rather than geographic coordinates system to reduce errors. Therefore, the interpolating region in this paper is 78 W E37 W N-93.5 W E52 W N, which contains all the GCMs points. The projected coordinate system is WGS_1984_UTM_Zone_45N as the Manas River Basin is located in the 45th projection belt.
The interpolations are done through ArcGIS to obtain the best interpolating results, while the disadvantage is the lack of index for evaluation. Then, the ERA-Interim temperature and precipitation rate corresponding to the selected GCM mesh points were extracted from the regional dataset.

The QM biases correction in BCSD
As shown in previous studies, the average, variation, and tendency biases in GCMs outputs are obvious (Wood et al.

;
Li et al. ; Bürger ). Therefore, it is necessary to remove these biases before their application. The average biases can be removed by adding constants to the GCM  QM is a post-processing approach which adjusts the distribution of the simulated data to match the observed data (Wood et al. ). QM is based on the assumption that the probability distribution of the simulated data in both reference period and forecast period is the same as that of the observed data. The probability distribution of the simulated temperature (F T,S ) and precipitation rates (F p,S ) can be calculated through Equations (1) and (2) with the help of the observed temperature (F T,O ) and precipitation rate (F p,O ) probability distribution: where P o,max is the maximum value of the observed precipitation rate; P o,min and T o,max are the minimum and maximum values of the observed temperature, respectively.

ELM regressions downscaling in both CEE and BCSD
The spatial resolutions of most GCMs are coarser than 1 W × 1 W , which is too coarse for interpolation. As well, the interpolation effects are generally worse in mountainous areas than those of statistical regressions. Therefore, the spatial downscaling was carried out through statistical regressions in the paper.
ELM is a single hidden layer feed forward neural network where n is the length of the time series, y ob is the observed temperature or precipitation rate, y si is the simulated temperature or precipitation rate.

GCM filtering in BCSD and CEE
The proportions of the ERA-Interim temperature matrix and precipitation rate matrix variation explained by the corresponding GCMs matrixes were calculated and are listed in Table 3.
As shown in Table 3, the percentages of the EAR-Interim

The QM biases correction
The average annual temperature of the ERA-Interim points and the MPI-ESM-MR points before and after correction were calculated and are shown in Figure 2. The average annual precipitation rate of the ERA-Interim points and the Had-GEM-AO points before and after correction was calculated and shown in Figure 3.
As shown in Figures 2 and 3, the average, variation, and tendency biases in GCMs outputs were adjusted through QM. However, the average, variation, and tendency biases can be corrected through many regression approaches by adjusting the constants and weights. Also, the QM performance is not good enough. In Figure  There is no need to downscale through interpolation in BCSD as the interpolation approaches cannot alter the average value of the time series. Therefore, the spatial downscaling was carried out through ELM regressions in BCSD.

ELM regressions performance in CEE and BCSD
After regression, the numbers of hidden neurons corresponding to the highest R 2 and lowest RMAR of the 100 regressions with the same transfer function were extracted (Table 4). The corresponding R 2 and RMAR were obtained (Table 5) afterwards.
As shown in Table 4, the numbers of hidden neurons corresponding to the highest R 2 or lowest RMAR in calibration periods are all 98, 99, or 100. This means that more hidden neurons generally produce better simulation results in the calibration period. However, too many hidden neurons would lead to over-fitting, which was proved by the much lower numbers of hidden neurons in the verification period.
Therefore, the numbers of hidden neurons should be determined by the indexes in the verification period, and the indexes in the calibration period should not be considered to avoid over-fitting. As shown in Table 4, different transfer functions correspond to different numbers of hidden neurons. Therefore, the numbers of hidden neurons should be tested one by one to obtain the best simulation.
As shown in Table 5, the performance of sigmoidal is generally better than hardlim and sine. The RMAR values with the In CEE, the RMAR values are all lower and the R 2 values are all higher than those in BCSD. Therefore, the CEE performs better than BCSD. When using the time series of the four GCMs ensemble as independent variable matrix, the higher RMAR and lower R 2 before correction than those after correction showed that correcting biases through QM increased the downscaling errors. Therefore, it is not appropriate to correct biases through QM. Finally, when using time series after correction as independent variable matrix, the generally lower RMAR and higher R 2 of the four GCMs ensemble than one certain GCM showed that multi-model ensemble generally provides additional information, which is very beneficial for regressions.

The increasing rate of temperature and precipitation rate in Manas River Basin in CEE
To illuminate the increasing rate of temperature and precipitation rate in the Manas River Basin, the increasing rate of the ERA-Interim points was interpolated through EBK and is shown in Figures 4 and 5. As shown in Figure 4, the increasing rate of temperature of the Manas River Basin is 0.0254 K/decade, 0.1837 K/decade, and 0.5093 K/decade in RCP2.6, RCP4.5, and RCP8.5, respectively. It means the mean temperature around 2,100 may be 0.2-4.3 K higher than that around 2015. Also, the increasing rate of temperature is generally   higher in plains than in mountainous areas. The reason may be that the water resource in mountainous areas is much richer than that in plains. More water resources mean higher specific heat capacity which can substantially slow down the temperature increase.
As shown in Figure 5, the increasing rate of precipitation in the Manas River Basin is 0.0028 mm/(day × decade), 0.0036 mm/(day × decade), and 0.0022 mm/(day × decade) in RCP2.6, RCP4.5, and RCP8.5, respectively. It means that the annual precipitation around 2,100 may be 6.8-11.2 mm higher than that around 2015. The increasing rate of precipitation is generally higher in mountainous areas than that in plains, which is mainly due to the high and steep terrain in mountainous areas.

Rationality of the QM biases correction in BCSD
As mentioned above, the GCM temperature and precipi- erally increasing, meaning that the value in the forecast period is generally higher than that in the reference period. Therefore, the quantile corresponding to the precipitation rate or temperature in the forecast period is generally higher than that in the reference period. Higher quantile corresponds to higher value. The time series after correction in the forecast period is generally higher than the average value of the ERA-Interim time series. In other words, the time series after correction in the reference period is generally lower than the ERA-Interim time series.
The probability distribution can never remain unchanged for an increasing or decreasing time series as the average value is increasing or decreasing. The invariant probability distribution means that the status of the time series remains unchanged. In climate, the assumption of QM implies that the climate status will be the same and the climate change is non-existent. Therefore, it may be unreasonable to correct biases through QM.

Rationality of QM together with ELM regressions in BCSD
As mentioned above, spatial downscaling is not appropriate through interpolation in BCSD. However, there may be some adverse effects for downscaling through statistical regressions, such as the changes in tendency. To answer the doubt, the increasing rate of the mean temperature and precipitation rate for the selected ERA-Interim points in CEE and BCSD was calculated and is listed in Table 6.
The mean increasing rate with the time series of the four GCMs ensemble after correction as the independent variable matrix is also listed in Table 6.
The increasing rate of mean temperature and precipitation rate for the selected ERA points in BCSD are all higher than that in CEE in the period 2016-2100. This is not caused by the time scale, as situations in the period The ELM regressions will adjust the tendencies of independent variable matrixes to approximate to those of the ERA-Interim matrixes, and the trained formula should be applied in the forecast period. As shown in Figure 3, the increasing rate of mean GCM precipitation rate after correction is closer to the increasing rate of ERA points than that before correction. It means that the changes of tendencies in regressions after correction are smaller than those before correction. However, the decrease of the increasing rate of the mean precipitation rate in the reference period is much larger than that in the forecast period, which is shown in Figure 3. Therefore, the increasing rate in BCSD is higher than that in CEE in the period 2016-2100 and the reason in temperature downscaling is similar. As shown in Figure 7, the temperature RMAR in the study area is lower than 0.05, meaning that the residual in temperature forecasting is lower than 0.05 times temperature range. According to Figure 9, the precipitation rate RMAR in the study area is lower than 0.33, meaning that the residual in precipitation rate forecasting is lower than 0.33 times precipitation rate range. Obviously, the CEE performance in temperature downscaling is much better than that in precipitation rate, which is also demonstrated by the R 2 in Figures 6 and 8. This is due to the low simulation capacities of GCMs in precipitation rate.  In temperature, the R 2 in plains is generally higher than that in mountainous areas and the RMAR is the opposite.
This means the CEE performance in temperature downscaling in plains is better than that in mountainous areas. As to the precipitation rate, the performance in mountainous areas is better than that in plains, because the R 2 is generally higher and RMAR is generally lower in mountainous areas than plains. The reason may be that the spatial and temporal variability of the temperature in mountainous areas is much higher than that in plains due to the complex terrain in mountainous areas. However, the high and steep terrain in mountainous areas leads to the rich and stable orographic rain, which is relatively easier to be simulated. In contrast, the scarce precipitation, most of which is artificial precipitation in transitional zones between oases and deserts, is really difficult to simulate. Therefore, the CEE performance in precipitation rate downscaling in plains is worse than that in mountainous areas.
Finally, as the values of RMAR in temperature downscaling are all lower than 0.05 and the R 2 are all higher than 0.95, the CEE performance in temperature downscaling is really good. As to the precipitation rate, the RMAR values are lower than 0.33 and R 2 are generally between 0.3 and 0.5, meaning that the CEE performance in precipitation rate is not good enough, but still acceptable. Although there are only three steps in CEE, filtering

Advantages and disadvantages of BCSD and CEE
GCMs through CCA, constituting the multi-model ensemble, and downscaling through ELM regressions, its performance is generally better than BCSD. Filtering GCMs guarantees

CONCLUSIONS
In this paper, the performance of CEE and BCSD were contrasted, with the Manas River Basin as the study area. The conclusions can be drawn as follows.
The QM approach can indeed adjust the variation and tendency biases, but it can be done through statistical regressions. The average of the GCM time series after correction in the reference period is unequal to that of the observed time series, which illustrates that it is unreasonable to downscale through interpolation in BCSD. The QM approach and statistical regressions should not be applied together as they both adjust the tendencies of the time series. The QM approach implies the assumption that the climate will remain unchanged, therefore, it may be unreasonable to correct the biases through QM.
It is reasonable to filter GCMs through CCA in both CEE and BCSD. The multi-model ensemble can provide additional information, which is very beneficial for regressions. Indexes in the verification period are more reliable than those in the calibration period due to over-fitting. The sigmoidal is a suitable transfer function in ELM regressions. The numbers of hidden neurons should be tested to obtain the best simulation. The CEE performs better than BCSD in temperature and precipitation rate downscaling, while enough GCMs for filtering and the long time series observed dataset are necessary in CEE.
In CEE, the residual in temperature forecasting can be lower than 0.05 times temperature range, and that in precipitation rate forecasting can be lower than 0.33 times precipitation rate range. The CEE performance in temperature downscaling is really good, while in precipitation rate it is not good enough but still acceptable. The CEE performance in temperature downscaling in plains is better than that in mountainous areas, while in precipitation rate downscaling it is better in mountainous areas.
In conclusion, the BCSD downscaling approach may not be reasonable due to the QM approach and the CEE performs better than BCSD in temperature and precipitation rate downscaling.