Abstract

Bias correction and spatial disaggregation (BCSD) is widely used in coupling general circulation models (GCMs) and hydrological models. However, there are some disadvantages in BCSD, such as only one GCM being selected, correcting biases through quantile-mapping (QM), and downscaling through interpolation. Then a combined approach of canonical correlation analysis filtering, multi-model ensemble, and extreme learning machine (ELM) regressions (CEE) was advanced. The performance of CEE and BCSD was evaluated with Manas River Basin as a study area. Results show it is unreasonable to correct biases through QM as it implies that the climate remains unchanged. Multi-model ensemble provides additional information, which is beneficial for regressions. CEE performs better than BCSD in temperature and precipitation rate downscaling. In CEE, the residual in temperature forecasting can be lower than 0.05 times temperature range and that in precipitation rate can be 0.33 times precipitation rate range. The performance of CEE in temperature downscaling in plains is better than mountainous areas, but for precipitation rate downscaling, it is better in mountainous areas. Increasing rate of temperature in the basin is 0.0254 K/decade, 0.1837 K/decade, and 0.5039 K/decade, and that of precipitation rate is 0.0028 mm/(day × decade), 0.0036 mm/(day × decade), and 0.0022 mm/(day × decade) in RCP2.6, RCP4.5, and RCP8.5, respectively.

INTRODUCTION

General

General circulation models (GCMs) have become the most reliable and widely applicable way to assess climate change and forecast climate scenes in inter-decadal studies. The climate predictions in the five Intergovernmental Panel on Climate Change (IPCC) Assessment Reports are all based on GCMs. However, the spatial resolutions of the GCMs5 are generally 2° × 2°, which is so coarse that the GCMs cannot be applied directly in regional-scale studies. Besides, the biases in GCMs should be corrected before their application in regional-scale studies. Due to the complexity of the general circulation, the difference between the simulated climate in GCMs and the practical climate cannot be eliminated thoroughly. It means that the average, variation and tendency biases will remain in the near future. As well, the spatial resolutions of GCMs cannot be fine enough due to the intolerable calculation and the relatively slow computing speed. Therefore, the downscaling approaches which transform their outputs from coarse grids to those with the necessary resolutions (Maurer & Hidalgo 2008) will continue to play an important role in regional studies. Some downscaling approaches can remove the systemic biases as well.

The statistical downscaling approaches which transform the coarse-scale outputs of GCMs to a much finer scale through trained transfer functions (Li et al. 2010) have been widely used in many studies (Vrac et al. 2007). Especially the bias correction and spatial disaggregation (BCSD) method advanced by Wood & Maurer (Wood et al. 2002) is a commonly used approach. BCSD unifies the certain GCM outputs and the observed dataset into the same grid by interpolation at first. Then the average, variation, and tendency biases in GCM outputs are removed by adjusting their average and probability distribution. Finally, the coarse spatial resolution of the GCM is transformed to the required resolution through interpolation. The downscaling error of BCSD can be limited to the allowed range of most hydrological models such as variable infiltration capacity (Buerger et al. 2010; Werner & Cannon 2015). The generally lower downscaling error makes BCSD more and more widely used, especially in coupling GCMs with hydrological models.

However, there are some disadvantages in BCSD which cannot be ignored. First, there are more than 40 GCMs, but only one GCM was downscaled in most previous studies. Then, which GCM should be downscaled? Many studies showed that the multi-model ensemble generally performed better than one single model (Gates 1999; Palmer et al. 2004; Christensen & Lettenmaier 2007). Is the performance of downscaling one certain GCM better than a multi-model ensemble? Second, the spatial resolution of GCM will inevitably be finer with interpolation error introduced or coarser with information lost when unifying the GCM mesh and the observed dataset mesh. Is it possible to avoid changes in GCM resolution? Third, the quantile-mapping (QM) approach was applied to correct the biases in BCSD. Is the QM correction reasonable? Finally, the spatial downscaling was achieved through interpolation in BCSD. However, the spatial resolutions of most GCMs are coarser than 1° × 1°, which is much coarser than the necessary resolutions in regional studies. Besides, the interpolation approaches generally perform worse than statistical regressions in mountainous areas. Is it reasonable to achieve the spatial downscaling through interpolation?

To overcome the disadvantages of BCSD, canonical correlation analysis filtering, multi-model ensemble and extreme learning machine (ELM) regression (CEE) downscaling approach was established. CEE filters all the GCMs through CCA, at first. Then, some GCMs with best simulation capacities are obtained to constitute the multi-model ensemble. Finally, the spatial downscaling can be done through the ELM regressions between the multi-model ensemble and observed dataset.

In this paper, the performance of CEE and BCSD were contrasted with the Manas River Basin as the study area. Advantages and disadvantages of CEE and BCSD were analyzed in detail. The climate scenarios in Manas River Basin were forecasted by downscaling GCMs, which can be referred to in other similar studies.

The study area

The Manas River Basin (Figure 1) is located in the south of Junggar Basin and north of Tianshan Mountain, Xinjiang, China. There is 1,667 km2 of farmland in the basin. The gross domestic product of the basin in 2013 amounted to 4.25 billion dollars. Due to climate change, floods are more frequent since 1996 (Tang et al. 2012). The catastrophic flood caused by the persistent high temperature and precipitation in 1999 lasted 25 days and caused a loss of two hundred million dollars. However, the spatial and temporal distribution of water resource in the basin is very uneven. When the regions around the upstream of the river are suffering floods, some regions in the downstream are enduring a water shortage. In the north part of the basin, the water shortage is very serious and the ecosystem is very weak. Much populus euphratica and sacsaoul died in recent years. As the Manasi River Basin is very important for Xinjiang, it is necessary to analyze its climate tendency.

Figure 1

The Manas River Basin and the selected ERA-Interim dataset points.

Figure 1

The Manas River Basin and the selected ERA-Interim dataset points.

According to the elevation and landform, the basin can be divided into five regions: high mountains, medium mountains, low mountains and hills, alluvial plains and deserts, from the south to the north (Zhou 1999). The elevations, annual average temperature, annual precipitation, and main land cover of the five regions are listed in Table 1 (Zhou 1999). As shown in Table 1, the climate and landform in the basin is very complex and cataclysmic. The temperature varies greatly with the lowest temperature being 230.35 K and highest being 316.25 K (Zhou 1999). The basin is very arid due to the continental climate. The annual evaporation in the basin amounts to 1,500–2,000 mm (Zhou 1999), which is much higher than the annual precipitation. The complex and cataclysmic climate and landform is very beneficial to verify the performance of CEE and BCSD. Therefore, the Manas River Basin was selected as the study area and the mean monthly temperature and precipitation rate were selected for downscaling in the paper.

Table 1

The elevations, annual average temperature, annual precipitation, and main land cover of the regions in the Manas River Basin

RegionsElevationsAnnual average temperatureAnnual precipitationMain land cover
High mountains >3,500 m <0 °C >500 mm Glacier 
Medium mountains 1,500–3,500 m 2 °C 300–500 mm Meadow 
Low mountains and hills 600–1,500 m 5 °C 200–300 mm Grassland 
Alluvial plains 250–600 m 6 °C 100–200 mm Farm 
Deserts <350 m >6 °C <100 mm Desert 
RegionsElevationsAnnual average temperatureAnnual precipitationMain land cover
High mountains >3,500 m <0 °C >500 mm Glacier 
Medium mountains 1,500–3,500 m 2 °C 300–500 mm Meadow 
Low mountains and hills 600–1,500 m 5 °C 200–300 mm Grassland 
Alluvial plains 250–600 m 6 °C 100–200 mm Farm 
Deserts <350 m >6 °C <100 mm Desert 

METHODS

Data

Due to the history and economic situation in the Manas River Basin, there is only one station monitoring temperature and precipitation in mountainous areas and it is located in a mountain-pass. Therefore, it is unrealistic to describe the climate scene in the mountainous areas with the station's observed dataset. Thus, the ERA-Interim reanalysis dataset was selected as the substitute dataset. The period of the ERA-Interim dataset is from January 1979 to December 2015. Its time and spatial resolutions are 1 month and 0.125° × 0.125°, respectively. The selected ERA-Interim dataset points are shown in Figure 1. Table 2 shows the 28 GCMs of which the simulating regions contained in the Manas River Basin were provided by Deutsches Klima Rechem Zentrum (DKRZ). The near-surface air temperature and precipitation rate were selected for downscaling to evaluate the performance of CEE and BCSD.

Table 2

The 28 GCMs provided by DKRZ

ModelsSpatial resolution (lon × lat)ModelsSpatial resolution (lon × lat)
BCC-CSM1-1 2.813° × 2.791° GISS-E2-H 2.5° × 2° 
BCC-CSM1-1-m 1.125° × 1.121° GISS-E2-R 2.5° × 2° 
BNU-ESM 2.813° ×2.791° HadGEM2-AO 1.875° × 1.25° 
CanESM2 2.813° × 2.791° HadGEM2-ES 1.875° × 1.25° 
CCSM4 1.25° × 0.942° IPSL-CM5A-LR 3.75° × 1.895° 
CESM1-CAM5 1.25° × 0.942° IPSL-CM5A-MR 2.5° × 1.268° 
CESM1-WACCM 2.5° × 1.895° MIROC5 1.406° × 1.401° 
CNRM-CM5 1.406° × 1.401° MIROC-ESM 2.813° × 2.791° 
CSIRO-Mk3-6-0 1.875° × 1.865° MIROC-ESM-CHEM 2.813° × 2.791° 
FGOALS-g2 2.813° × 2.791° MPI-ESM-LR 1.875° × 1.865° 
FIO-ESM 2.813° × 2.791° MPI-ESM-MR 1.875° × 1.865° 
GFDL-CM3 2.5° × 2° MRI-CGCM3 1.125° × 1.121° 
GFDL-ESM2G 2.5° × 2.022° NorESM1-M 2.5° × 1.895° 
GFDL-ESM2M 2.5° × 2.022° NorESM1-ME 2.5° × 1.895° 
ModelsSpatial resolution (lon × lat)ModelsSpatial resolution (lon × lat)
BCC-CSM1-1 2.813° × 2.791° GISS-E2-H 2.5° × 2° 
BCC-CSM1-1-m 1.125° × 1.121° GISS-E2-R 2.5° × 2° 
BNU-ESM 2.813° ×2.791° HadGEM2-AO 1.875° × 1.25° 
CanESM2 2.813° × 2.791° HadGEM2-ES 1.875° × 1.25° 
CCSM4 1.25° × 0.942° IPSL-CM5A-LR 3.75° × 1.895° 
CESM1-CAM5 1.25° × 0.942° IPSL-CM5A-MR 2.5° × 1.268° 
CESM1-WACCM 2.5° × 1.895° MIROC5 1.406° × 1.401° 
CNRM-CM5 1.406° × 1.401° MIROC-ESM 2.813° × 2.791° 
CSIRO-Mk3-6-0 1.875° × 1.865° MIROC-ESM-CHEM 2.813° × 2.791° 
FGOALS-g2 2.813° × 2.791° MPI-ESM-LR 1.875° × 1.865° 
FIO-ESM 2.813° × 2.791° MPI-ESM-MR 1.875° × 1.865° 
GFDL-CM3 2.5° × 2° MRI-CGCM3 1.125° × 1.121° 
GFDL-ESM2G 2.5° × 2.022° NorESM1-M 2.5° × 1.895° 
GFDL-ESM2M 2.5° × 2.022° NorESM1-ME 2.5° × 1.895° 

Filtering GCMs in both CEE and BCSD

As there are 28 GCMs, it is necessary to filter them to obtain those with the best simulation capacities on temperature or precipitation. CCA is a statistical approach to evaluate the correlation between two matrixes (Hotelling 1992). In CCA, the two matrixes were converted to several pairs of canonical variables which contain the main information of the two matrixes first. Then the linear correlations between the canonical variables were calculated to assess the relationship between the two matrixes. CCA has been widely applied in many fields, such as facial expression recognition (Zheng et al. 2006) and statistical regressions (Merola & Abraham 2001). Therefore, the CCA approach was selected to filter the 28 GCMs.

The mean monthly temperature and precipitation rate of the selected ERA-Interim points in the period from January 1979 to December 2015 were extracted to constitute the ERA-Interim matrixes first. The mean monthly near-surface air temperature and precipitation flux of the nearest 25 points to each ERA-Interim point were extracted from the certain GCM outputs. Then the redundancies were removed and the corresponding GCM matrixes were constituted. Finally, CCA filtering was carried out and GCMs with the best simulation capacities were obtained.

Empirical Bayesian kriging interpolations in BCSD

To avoid changes in GCM spatial resolutions, the ERA-Interim dataset was interpolated into the regional dataset. Then the ERA-Interim temperature and precipitation rate corresponding to the selected GCM points could be extracted from the regional dataset. As the ERA-Interim dataset spatial resolution is much finer than GCM, the interpolation error is relatively smaller.

Empirical Bayesian kriging (EBK) is a geostatistical interpolating approach which automates the parameters through a process of subsetting and simulations when building the kriging models. EBK calculates the interpolation error by estimating the underlying semivariogram. The interpolating results of EBK are generally better than other approaches (Cooper et al. 2015). EBK is very easy to apply due to its minimal interaction in building models and the most accurate standard error of prediction, especially in studies needing interpolations in batches.

As the interpolations should be done month by month, there are up to 888 interpolations in total, which are almost impossible to interpolate one by one. Therefore, the EBK approach was selected to interpolate the ERA-Interim temperature and precipitation rate in batches. The interpolating regions should contain all the selected GCM points. The interpolation should be done in projected coordinate systems rather than geographic coordinates system to reduce errors. Therefore, the interpolating region in this paper is 78 °E37 °N–93.5 °E52 °N, which contains all the GCMs points. The projected coordinate system is WGS_1984_UTM_Zone_45N as the Manas River Basin is located in the 45th projection belt. The interpolations are done through ArcGIS to obtain the best interpolating results, while the disadvantage is the lack of index for evaluation. Then, the ERA-Interim temperature and precipitation rate corresponding to the selected GCM mesh points were extracted from the regional dataset.

The QM biases correction in BCSD

As shown in previous studies, the average, variation, and tendency biases in GCMs outputs are obvious (Wood et al. 2004; Li et al. 2010; Bürger 2013). Therefore, it is necessary to remove these biases before their application. The average biases can be removed by adding constants to the GCM temperature and precipitation rate time series in BCSD. The variation and tendency biases can be removed through QM in BCSD.

QM is a post-processing approach which adjusts the distribution of the simulated data to match the observed data (Wood et al. 2004). QM is based on the assumption that the probability distribution of the simulated data in both reference period and forecast period is the same as that of the observed data. The probability distribution of the simulated temperature (FT,S) and precipitation rates (Fp,S) can be calculated through Equations (1) and (2) with the help of the observed temperature (FT,O) and precipitation rate (Fp,O) probability distribution:  
formula
(1)
 
formula
(2)
where Po,max is the maximum value of the observed precipitation rate; Po,min and To,max are the minimum and maximum values of the observed temperature, respectively.

ELM regressions downscaling in both CEE and BCSD

The spatial resolutions of most GCMs are coarser than 1° × 1°, which is too coarse for interpolation. As well, the interpolation effects are generally worse in mountainous areas than those of statistical regressions. Therefore, the spatial downscaling was carried out through statistical regressions in the paper.

ELM is a single hidden layer feed forward neural network (SLFN) (Huang et al. 2004). Unlike many traditional popular gradient-based learning algorithms, the ELM algorithm randomly chooses the input weights and analytically determines the output weights of SLFNs. The ELM algorithm can provide better generalization performance at faster learning speed than most SLFNs. ELM has been rapidly and widely used in statistical regressions and classifications (Huang et al. 2012). Therefore, ELM regressions were selected to carry out the spatial downscaling in CEE and BCSD.

In each selected ERA-Interim point, an ELM regression was implemented with the ERA-Interim temperature or precipitation rate time series as the dependent variable matrix. In BCSD, the corresponding independent variable matrix was constituted by the GCM time series after correction of the nearest 25 points to the ERA-Interim point. It means that only 25 time series after correction could offer information, as there is only one GCM used in temperature or precipitation rate downscaling. While in CEE, up to 100 time series before correction could offer information as the multi-model ensemble were constituted by the four selected GCMs both in temperature and precipitation rate downscaling. As well, the regressions with the independent variable matrixes as the 100 time series after correction of the ensemble were carried out to evaluate the performance of QM in correcting tendency biases. The calibration and verification period were set from January 1979 to December 2012 and from January 2013 to December 2015, respectively. Three transfer functions named sigmoidal, sine, hardlim were tested to select the best transfer function. The numbers of hidden neurons were adjusted from 1 to 100 to obtain the best performance.

Indexes for evaluating the BCSD and CEE performance

The coefficient of determination (R2) and relative mean absolute residual (RMAR) were selected to evaluate the performance of BCSD and CEE. The RMAR can be calculated through Equation (3). Unlike RMSE, there is no square and square root processing in calculating RMAR, which means that RMAR can better evaluate the residual:  
formula
(3)
where n is the length of the time series, yob is the observed temperature or precipitation rate, ysi is the simulated temperature or precipitation rate.

RESULTS

GCM filtering in BCSD and CEE

The proportions of the ERA-Interim temperature matrix and precipitation rate matrix variation explained by the corresponding GCMs matrixes were calculated and are listed in Table 3.

Table 3

The proportions of the ERA-Interim temperature matrix and precipitation rate matrix variation explained by the corresponding GCM matrixes

ModelsTemperaturePrecipitation ratesModelsTemperaturePrecipitation rates
BCC-CSM1-1 0.966 0.430 GISS-E2-H 0.970 0.430 
BCC-CSM1-1-m 0.969 0.062 GISS-E2-R 0.970 0.410 
BNU-ESM 0.966 0.375 HadGEM2-AO 0.966 0.559 
CanESM2 0.972 0.445 HadGEM2-ES 0.890 0.371 
CCSM4 0.974 0.451 IPSL-CM5A-LR 0.965 0.389 
CESM1-CAM5 0.974 0.378 IPSL-CM5A-MR 0.968 0.452 
CESM1-WACCM 0.970 0.435 MIROC5 0.973 0.528 
CNRM-CM5 0.974 0.422 MIROC-ESM 0.970 0.448 
CSIRO-Mk3-6-0 0.974 0.443 MIROC-ESM-CHEM 0.962 0.482 
FGOALS-g2 0.965 0.000 MPI-ESM-LR 0.974 0.423 
FIO-ESM 0.966 0.435 MPI-ESM-MR 0.977 0.409 
GFDL-CM3 0.971 0.455 MRI-CGCM3 0.965 0.501 
GFDL-ESM2G 0.969 0.336 NorESM1-M 0.970 0.478 
GFDL-ESM2M 0.970 0.471 NorESM1-ME 0.972 0.000 
ModelsTemperaturePrecipitation ratesModelsTemperaturePrecipitation rates
BCC-CSM1-1 0.966 0.430 GISS-E2-H 0.970 0.430 
BCC-CSM1-1-m 0.969 0.062 GISS-E2-R 0.970 0.410 
BNU-ESM 0.966 0.375 HadGEM2-AO 0.966 0.559 
CanESM2 0.972 0.445 HadGEM2-ES 0.890 0.371 
CCSM4 0.974 0.451 IPSL-CM5A-LR 0.965 0.389 
CESM1-CAM5 0.974 0.378 IPSL-CM5A-MR 0.968 0.452 
CESM1-WACCM 0.970 0.435 MIROC5 0.973 0.528 
CNRM-CM5 0.974 0.422 MIROC-ESM 0.970 0.448 
CSIRO-Mk3-6-0 0.974 0.443 MIROC-ESM-CHEM 0.962 0.482 
FGOALS-g2 0.965 0.000 MPI-ESM-LR 0.974 0.423 
FIO-ESM 0.966 0.435 MPI-ESM-MR 0.977 0.409 
GFDL-CM3 0.971 0.455 MRI-CGCM3 0.965 0.501 
GFDL-ESM2G 0.969 0.336 NorESM1-M 0.970 0.478 
GFDL-ESM2M 0.970 0.471 NorESM1-ME 0.972 0.000 

As shown in Table 3, the percentages of the EAR-Interim temperature matrix variation explained by the GCMs are generally higher than 0.96. This means that the simulation capacities of most GCMs on temperature are very good. However, the percentages of ERA-Interim precipitation rate matrix variation explained by the GCMs are generally lower than 0.50, which means that the simulation capacities of most GCMs on precipitation are not good enough. This is mainly due to the vast spatial and temporal variability of precipitation and the complex climate conditions in the Manas River Basin. The multi-model ensemble for temperature downscaling through CEE was constituted by the four best GCMs, namely, MPI-ESM-MR, CCSM4, CESM1-CAM5, and CNRM-CM5, and that for precipitation rate by HadGEM2-AO, MIROC5, MRI-CGCM3, and MIROC-ESM-CHEM. To guarantee the quality of BCSD, the MPI-ESM-MR and Had-GEM-AO were selected for temperature and precipitation rate downscaling in BCSD, respectively.

The QM biases correction

The average annual temperature of the ERA-Interim points and the MPI-ESM-MR points before and after correction were calculated and are shown in Figure 2. The average annual precipitation rate of the ERA-Interim points and the Had-GEM-AO points before and after correction was calculated and shown in Figure 3.

Figure 2

The average annual temperature of the ERA-Interim points and the MPI-ESM-MR points before (B) and after (A) correction.

Figure 2

The average annual temperature of the ERA-Interim points and the MPI-ESM-MR points before (B) and after (A) correction.

Figure 3

The average annual precipitation rate of the ERA-Interim points and the Had-GEM-AO points before (B) and after (A) correction.

Figure 3

The average annual precipitation rate of the ERA-Interim points and the Had-GEM-AO points before (B) and after (A) correction.

As shown in Figures 2 and 3, the average, variation, and tendency biases in GCMs outputs were adjusted through QM. However, the average, variation, and tendency biases can be corrected through many regression approaches by adjusting the constants and weights. Also, the QM performance is not good enough. In Figure 3, there is a sudden increase in the GCM time series before correction for RCP2.6 in 2006, but the error was not corrected through QM. The situation is similar in Figure 2 for RCP4.5 in 2006. As well, the GCM temperature and precipitation rate time series after correction in the reference period 1979–2005 are all lower than the ERA-Interim time series. The biases in temperature outputs of CCSM4, CESM1-CAM5, and CNRM-CM5, and precipitation rate outputs of MIROC5, MRI-CGCM3, and MIROC-ESM-CHEM were also corrected and the situations were similar.

There is no need to downscale through interpolation in BCSD as the interpolation approaches cannot alter the average value of the time series. Therefore, the spatial downscaling was carried out through ELM regressions in BCSD.

ELM regressions performance in CEE and BCSD

After regression, the numbers of hidden neurons corresponding to the highest R2 and lowest RMAR of the 100 regressions with the same transfer function were extracted (Table 4). The corresponding R2 and RMAR were obtained (Table 5) afterwards.

Table 4

The numbers of hidden neurons corresponding to the highest R2 and lowest RMAR in calibration (C) and verification (V) periods of the 100 regressions with the same transfer function

Independent variable matrixIndexes/factors/ periodsHardlim
Sigmoidal
Sine
RCP2.6RCP4.5RCP8.5RCP2.6RCP4.5RCP8.5RCP2.6RCP4.5RCP8.5
4 GCMs ensemble/Before correction RMAR/T/C 98 100 100 100 100 100 100 100 100 
RMAR/T/V 62 64 87 20 28 31 34 33 27 
R2/T/C 98 100 100 100 100 100 100 100 100 
R2/T/V 62 64 64 24 24 33 39 31 95 
RMAR/P/C 100 100 100 100 100 100 98 100 100 
RMAR/P/V 30 23 27 25 15 28 81 56 73 
R2/P/C 100 100 100 100 100 100 100 100 100 
R2/P/V 93 97 95 97 99 99 98 55 97 
1 GCM/After correction RMAR/T/C 100 100 99 100 100 100 100 100 100 
RMAR/T/V 63 76 75 20 13 19 13 
R2/T/C 100 100 98 100 100 100 100 100 100 
R2/T/V 63 74 57 20 18 19 17 
RMAR/P/C 99 99 100 99 100 99 100 100 100 
RMAR/P/V 34 22 23 43 21 34 70 47 79 
R2/P/C 99 99 100 99 100 100 100 100 100 
R2/P/V 97 94 98 98 89 97 93 
4 GCMs ensemble/After correction RMAR/T/C 100 100 100 99 100 100 100 100 100 
RMAR/T/V 61 71 86 19 44 49 33 33 63 
R2/T/C 100 100 100 99 100 100 99 100 100 
R2/T/V 61 71 86 19 29 40 33 31 53 
RMAR/P/C 100 100 100 100 99 100 100 100 100 
RMAR/P/V 24 21 29 30 23 27 95 95 81 
R2/P/C 100 100 100 100 99 100 99 100 100 
R2/P/V 92 88 97 99 93 94 17 58 77 
Independent variable matrixIndexes/factors/ periodsHardlim
Sigmoidal
Sine
RCP2.6RCP4.5RCP8.5RCP2.6RCP4.5RCP8.5RCP2.6RCP4.5RCP8.5
4 GCMs ensemble/Before correction RMAR/T/C 98 100 100 100 100 100 100 100 100 
RMAR/T/V 62 64 87 20 28 31 34 33 27 
R2/T/C 98 100 100 100 100 100 100 100 100 
R2/T/V 62 64 64 24 24 33 39 31 95 
RMAR/P/C 100 100 100 100 100 100 98 100 100 
RMAR/P/V 30 23 27 25 15 28 81 56 73 
R2/P/C 100 100 100 100 100 100 100 100 100 
R2/P/V 93 97 95 97 99 99 98 55 97 
1 GCM/After correction RMAR/T/C 100 100 99 100 100 100 100 100 100 
RMAR/T/V 63 76 75 20 13 19 13 
R2/T/C 100 100 98 100 100 100 100 100 100 
R2/T/V 63 74 57 20 18 19 17 
RMAR/P/C 99 99 100 99 100 99 100 100 100 
RMAR/P/V 34 22 23 43 21 34 70 47 79 
R2/P/C 99 99 100 99 100 100 100 100 100 
R2/P/V 97 94 98 98 89 97 93 
4 GCMs ensemble/After correction RMAR/T/C 100 100 100 99 100 100 100 100 100 
RMAR/T/V 61 71 86 19 44 49 33 33 63 
R2/T/C 100 100 100 99 100 100 99 100 100 
R2/T/V 61 71 86 19 29 40 33 31 53 
RMAR/P/C 100 100 100 100 99 100 100 100 100 
RMAR/P/V 24 21 29 30 23 27 95 95 81 
R2/P/C 100 100 100 100 99 100 99 100 100 
R2/P/V 92 88 97 99 93 94 17 58 77 
Table 5

The highest R2 and lowest RMAR in the verification period of the 100 regressions with the same transfer function

Independent variable matrixIndexes/FactorsHardlim
Sigmoidal
Sine
RCP2.6RCP4.5RCP8.5RCP2.6RCP4.5RCP8.5RCP2.6RCP4.5RCP8.5
4 GCMs ensemble/Before correction (CEE) RMAR/T 0.054 0.052 0.051 0.040 0.040 0.038 0.043 0.041 0.038 
R2/T 0.946 0.953 0.953 0.967 0.971 0.973 0.965 0.969 0.973 
RMAR/P 0.185 0.179 0.186 0.172 0.169 0.174 0.223 0.234 0.239 
R2/P 0.436 0.450 0.447 0.439 0.450 0.485 0.448 0.473 0.479 
1 GCM/After correction (BCSD) RMAR/T 0.055 0.049 0.053 0.053 0.048 0.051 0.053 0.048 0.051 
R2/T 0.933 0.953 0.951 0.943 0.958 0.956 0.943 0.958 0.957 
RMAR/P 0.188 0.186 0.201 0.174 0.172 0.189 0.205 0.195 0.221 
R2/P 0.405 0.414 0.430 0.408 0.402 0.460 0.424 0.445 0.454 
4 GCMs ensemble/After correction RMAR/T 0.052 0.051 0.054 0.040 0.042 0.039 0.043 0.042 0.040 
R2/T 0.949 0.954 0.950 0.968 0.969 0.971 0.963 0.968 0.972 
RMAR/P 0.186 0.184 0.193 0.175 0.173 0.182 0.259 0.255 0.279 
R2/P 0.429 0.445 0.438 0.440 0.449 0.464 0.468 0.494 0.483 
Independent variable matrixIndexes/FactorsHardlim
Sigmoidal
Sine
RCP2.6RCP4.5RCP8.5RCP2.6RCP4.5RCP8.5RCP2.6RCP4.5RCP8.5
4 GCMs ensemble/Before correction (CEE) RMAR/T 0.054 0.052 0.051 0.040 0.040 0.038 0.043 0.041 0.038 
R2/T 0.946 0.953 0.953 0.967 0.971 0.973 0.965 0.969 0.973 
RMAR/P 0.185 0.179 0.186 0.172 0.169 0.174 0.223 0.234 0.239 
R2/P 0.436 0.450 0.447 0.439 0.450 0.485 0.448 0.473 0.479 
1 GCM/After correction (BCSD) RMAR/T 0.055 0.049 0.053 0.053 0.048 0.051 0.053 0.048 0.051 
R2/T 0.933 0.953 0.951 0.943 0.958 0.956 0.943 0.958 0.957 
RMAR/P 0.188 0.186 0.201 0.174 0.172 0.189 0.205 0.195 0.221 
R2/P 0.405 0.414 0.430 0.408 0.402 0.460 0.424 0.445 0.454 
4 GCMs ensemble/After correction RMAR/T 0.052 0.051 0.054 0.040 0.042 0.039 0.043 0.042 0.040 
R2/T 0.949 0.954 0.950 0.968 0.969 0.971 0.963 0.968 0.972 
RMAR/P 0.186 0.184 0.193 0.175 0.173 0.182 0.259 0.255 0.279 
R2/P 0.429 0.445 0.438 0.440 0.449 0.464 0.468 0.494 0.483 

As shown in Table 4, the numbers of hidden neurons corresponding to the highest R2 or lowest RMAR in calibration periods are all 98, 99, or 100. This means that more hidden neurons generally produce better simulation results in the calibration period. However, too many hidden neurons would lead to over-fitting, which was proved by the much lower numbers of hidden neurons in the verification period. Therefore, the numbers of hidden neurons should be determined by the indexes in the verification period, and the indexes in the calibration period should not be considered to avoid over-fitting. As shown in Table 4, different transfer functions correspond to different numbers of hidden neurons. Therefore, the numbers of hidden neurons should be tested one by one to obtain the best simulation.

As shown in Table 5, the performance of sigmoidal is generally better than hardlim and sine. The RMAR values with the sigmoidal transfer function are almost all lower than those with the hardlim or sine transfer functions and the R2 values with the sigmoidal transfer function are generally higher. Therefore, sigmoidal is a suitable transfer function in ELM regressions.

In CEE, the RMAR values are all lower and the R2 values are all higher than those in BCSD. Therefore, the CEE performs better than BCSD. When using the time series of the four GCMs ensemble as independent variable matrix, the higher RMAR and lower R2 before correction than those after correction showed that correcting biases through QM increased the downscaling errors. Therefore, it is not appropriate to correct biases through QM. Finally, when using time series after correction as independent variable matrix, the generally lower RMAR and higher R2 of the four GCMs ensemble than one certain GCM showed that multi-model ensemble generally provides additional information, which is very beneficial for regressions.

The increasing rate of temperature and precipitation rate in Manas River Basin in CEE

To illuminate the increasing rate of temperature and precipitation rate in the Manas River Basin, the increasing rate of the ERA-Interim points was interpolated through EBK and is shown in Figures 4 and 5.

Figure 4

The increasing rate of temperature in the Manas River Basin.

Figure 4

The increasing rate of temperature in the Manas River Basin.

Figure 5

The increasing rate of precipitation in the Manas River Basin.

Figure 5

The increasing rate of precipitation in the Manas River Basin.

As shown in Figure 4, the increasing rate of temperature of the Manas River Basin is 0.0254 K/decade, 0.1837 K/decade, and 0.5093 K/decade in RCP2.6, RCP4.5, and RCP8.5, respectively. It means the mean temperature around 2,100 may be 0.2–4.3 K higher than that around 2015. Also, the increasing rate of temperature is generally higher in plains than in mountainous areas. The reason may be that the water resource in mountainous areas is much richer than that in plains. More water resources mean higher specific heat capacity which can substantially slow down the temperature increase.

As shown in Figure 5, the increasing rate of precipitation in the Manas River Basin is 0.0028 mm/(day × decade), 0.0036 mm/(day × decade), and 0.0022 mm/(day × decade) in RCP2.6, RCP4.5, and RCP8.5, respectively. It means that the annual precipitation around 2,100 may be 6.8–11.2 mm higher than that around 2015. The increasing rate of precipitation is generally higher in mountainous areas than that in plains, which is mainly due to the high and steep terrain in mountainous areas.

DISCUSSION

Rationality of the QM biases correction in BCSD

As mentioned above, the GCM temperature and precipitation rate time series after correction in the reference period 1979–2005 are lower than the ERA-Interim time series. This is due to the assumption of QM that the time series in both the reference period 1979–2005 and forecast period 2006–2100 obey the same distribution with the observed time series. This means that the average of the GCM time series in the period 1979–2100 after correction equals the average of the ERA-Interim time series in the period 1979–2005. However, the GCM time series are generally increasing, meaning that the value in the forecast period is generally higher than that in the reference period. Therefore, the quantile corresponding to the precipitation rate or temperature in the forecast period is generally higher than that in the reference period. Higher quantile corresponds to higher value. The time series after correction in the forecast period is generally higher than the average value of the ERA-Interim time series. In other words, the time series after correction in the reference period is generally lower than the ERA-Interim time series.

The probability distribution can never remain unchanged for an increasing or decreasing time series as the average value is increasing or decreasing. The invariant probability distribution means that the status of the time series remains unchanged. In climate, the assumption of QM implies that the climate status will be the same and the climate change is non-existent. Therefore, it may be unreasonable to correct biases through QM.

Rationality of QM together with ELM regressions in BCSD

As mentioned above, spatial downscaling is not appropriate through interpolation in BCSD. However, there may be some adverse effects for downscaling through statistical regressions, such as the changes in tendency. To answer the doubt, the increasing rate of the mean temperature and precipitation rate for the selected ERA-Interim points in CEE and BCSD was calculated and is listed in Table 6. The mean increasing rate with the time series of the four GCMs ensemble after correction as the independent variable matrix is also listed in Table 6.

Table 6

The increasing rate of mean temperature and precipitation rate for the selected ERA points

Independent variable matrixPeriodsP/(mm/(day × decade))
T/(K/decade)
RCP2.6RCP4.5RCP8.5RCP2.6RCP4.5RCP8.5
1 GCM/After correction (BCSD) 1979–2100 0.007221 0.007608 0.006553 0.123605 0.239912 0.442924 
2016–2100 0.007189 0.006475 0.006075 0.033955 0.214050 0.533687 
4 GCMs ensemble/Before correction (CEE) 1979–2100 0.003179 0.004596 0.002586 0.111660 0.217187 0.427908 
2016–2100 0.002900 0.003557 0.002155 0.025041 0.180602 0.502883 
4 GCMs ensemble/After correction 1979–2100 0.007039 0.007561 0.006454 0.124029 0.235166 0.475835 
2016–2100 0.006808 0.006287 0.005989 0.031682 0.217376 0.581345 
Independent variable matrixPeriodsP/(mm/(day × decade))
T/(K/decade)
RCP2.6RCP4.5RCP8.5RCP2.6RCP4.5RCP8.5
1 GCM/After correction (BCSD) 1979–2100 0.007221 0.007608 0.006553 0.123605 0.239912 0.442924 
2016–2100 0.007189 0.006475 0.006075 0.033955 0.214050 0.533687 
4 GCMs ensemble/Before correction (CEE) 1979–2100 0.003179 0.004596 0.002586 0.111660 0.217187 0.427908 
2016–2100 0.002900 0.003557 0.002155 0.025041 0.180602 0.502883 
4 GCMs ensemble/After correction 1979–2100 0.007039 0.007561 0.006454 0.124029 0.235166 0.475835 
2016–2100 0.006808 0.006287 0.005989 0.031682 0.217376 0.581345 

The increasing rate of mean temperature and precipitation rate for the selected ERA points in BCSD are all higher than that in CEE in the period 2016–2100. This is not caused by the time scale, as situations in the period 1979–2100 are almost all the same. It is not caused by the ensemble either, because the difference of the increasing rate between the independent variable matrixes as the time series after correction of one GCM and four GCMs ensemble is very small. As well, the difference of increasing rate between the independent variable matrixes for the time series of four GCMs ensemble before and after correction is almost as high as that between CEE and BCSD. Therefore, it is only caused by the QM correction.

The ELM regressions will adjust the tendencies of independent variable matrixes to approximate to those of the ERA-Interim matrixes, and the trained formula should be applied in the forecast period. As shown in Figure 3, the increasing rate of mean GCM precipitation rate after correction is closer to the increasing rate of ERA points than that before correction. It means that the changes of tendencies in regressions after correction are smaller than those before correction. However, the decrease of the increasing rate of the mean precipitation rate in the reference period is much larger than that in the forecast period, which is shown in Figure 3. Therefore, the increasing rate in BCSD is higher than that in CEE in the period 2016–2100 and the reason in temperature downscaling is similar.

The result shows that the QM approach and statistical regressions should not be applied together as they both adjust the tendencies of the time series. In other words, correcting biases through QM should be skipped when downscaling through statistical regressions as the biases can be corrected through statistical regressions. Besides, it is unreasonable to downscale through interpolation in BCSD, as mentioned in the section ‘The QM biases correction’. Therefore, the disadvantage of correcting biases through QM in BCSD cannot be overcome.

The spatial distribution of RMAR and R2 in CEE

To analyze the CEE performance in different regions, the RMAR and R2 of the selected ERA-Interim points were interpolated into regional dataset through EBK. The RMAR and R2 in the Manas River Basin are shown in Figures 69.

Figure 6

The spatial distribution of R2 in temperature downscaling through CEE.

Figure 6

The spatial distribution of R2 in temperature downscaling through CEE.

Figure 7

The spatial distribution of RMAR in temperature downscaling through CEE.

Figure 7

The spatial distribution of RMAR in temperature downscaling through CEE.

Figure 8

The spatial distribution of R2 in precipitation rate downscaling through CEE.

Figure 8

The spatial distribution of R2 in precipitation rate downscaling through CEE.

Figure 9

The spatial distribution of RMAR in precipitation rate downscaling through CEE.

Figure 9

The spatial distribution of RMAR in precipitation rate downscaling through CEE.

As shown in Figure 7, the temperature RMAR in the study area is lower than 0.05, meaning that the residual in temperature forecasting is lower than 0.05 times temperature range. According to Figure 9, the precipitation rate RMAR in the study area is lower than 0.33, meaning that the residual in precipitation rate forecasting is lower than 0.33 times precipitation rate range. Obviously, the CEE performance in temperature downscaling is much better than that in precipitation rate, which is also demonstrated by the R2 in Figures 6 and 8. This is due to the low simulation capacities of GCMs in precipitation rate.

In temperature, the R2 in plains is generally higher than that in mountainous areas and the RMAR is the opposite. This means the CEE performance in temperature downscaling in plains is better than that in mountainous areas. As to the precipitation rate, the performance in mountainous areas is better than that in plains, because the R2 is generally higher and RMAR is generally lower in mountainous areas than plains. The reason may be that the spatial and temporal variability of the temperature in mountainous areas is much higher than that in plains due to the complex terrain in mountainous areas. However, the high and steep terrain in mountainous areas leads to the rich and stable orographic rain, which is relatively easier to be simulated. In contrast, the scarce precipitation, most of which is artificial precipitation in transitional zones between oases and deserts, is really difficult to simulate. Therefore, the CEE performance in precipitation rate downscaling in plains is worse than that in mountainous areas.

Finally, as the values of RMAR in temperature downscaling are all lower than 0.05 and the R2 are all higher than 0.95, the CEE performance in temperature downscaling is really good. As to the precipitation rate, the RMAR values are lower than 0.33 and R2 are generally between 0.3 and 0.5, meaning that the CEE performance in precipitation rate is not good enough, but still acceptable.

Advantages and disadvantages of BCSD and CEE

Some disadvantages in BCSD can be overcome. First, CCA can be applied to select the GCM in BCSD. Second, changes in GCMs spatial resolutions can be avoided by interpolating the observed dataset into the aerial dataset and then extracting values corresponding to the GCM points. However, the results may not be good enough in places where there are few weather stations. Third, downscaling through statistical regressions may perform better than interpolations in mountainous areas. However, the time series of the observed dataset should be long enough. The disadvantage of correcting biases through QM in BCSD cannot be overcome due to the assumption of QM; however, the biases can be corrected through statistical regressions when downscaling through statistical regressions.

Although there are only three steps in CEE, filtering GCMs through CCA, constituting the multi-model ensemble, and downscaling through ELM regressions, its performance is generally better than BCSD. Filtering GCMs guarantees the qualities of the selected GCMs. The multi-model ensemble provides additional information for regressions, which is an important advantage of CEE and the ELM regressions provide generally better regression effects. Together, the three steps lead to the good performance of CEE. They are equally important and indispensable. However, there are also disadvantages in CEE, such as needing enough GCMs for filtering and the long time series observed dataset.

CONCLUSIONS

In this paper, the performance of CEE and BCSD were contrasted, with the Manas River Basin as the study area. The conclusions can be drawn as follows.

The QM approach can indeed adjust the variation and tendency biases, but it can be done through statistical regressions. The average of the GCM time series after correction in the reference period is unequal to that of the observed time series, which illustrates that it is unreasonable to downscale through interpolation in BCSD. The QM approach and statistical regressions should not be applied together as they both adjust the tendencies of the time series. The QM approach implies the assumption that the climate will remain unchanged, therefore, it may be unreasonable to correct the biases through QM.

It is reasonable to filter GCMs through CCA in both CEE and BCSD. The multi-model ensemble can provide additional information, which is very beneficial for regressions. Indexes in the verification period are more reliable than those in the calibration period due to over-fitting. The sigmoidal is a suitable transfer function in ELM regressions. The numbers of hidden neurons should be tested to obtain the best simulation. The CEE performs better than BCSD in temperature and precipitation rate downscaling, while enough GCMs for filtering and the long time series observed dataset are necessary in CEE.

In CEE, the residual in temperature forecasting can be lower than 0.05 times temperature range, and that in precipitation rate forecasting can be lower than 0.33 times precipitation rate range. The CEE performance in temperature downscaling is really good, while in precipitation rate it is not good enough but still acceptable. The CEE performance in temperature downscaling in plains is better than that in mountainous areas, while in precipitation rate downscaling it is better in mountainous areas.

The increasing rate of the temperature in Manas River Basin is 0.0254 K/decade, 0.1837 K/decade, and 0.5039 K/decade, and that of precipitation rate is 0.0028 mm/(day × decade), 0.0036 mm/(day × decade), and 0.0022 mm/(day × decade) in RCP2.6, RCP4.5, and RCP8.5, respectively.

In conclusion, the BCSD downscaling approach may not be reasonable due to the QM approach and the CEE performs better than BCSD in temperature and precipitation rate downscaling.

ACKNOWLEDGEMENTS

Thanks to the DKRZ for supplying the GCMs5 datasets, the ECMWF for supplying the ERA-Interim dataset which makes the study possible. The study was supported by the National Science Foundation of China (U1203282, 51469028) and Foundation of State Key Laboratory of Hydraulic Engineering Simulation and Safety (HESS-1405).

REFERENCES

REFERENCES
Buerger
,
G.
,
Murdock
,
T.
&
Werner
,
A. T.
2010
Downscaling extremes with EDS, TreeGen, and BCSD
. In:
American Geophysical Union, Fall Meeting 2010
.
Cooper
,
H. M.
,
Zhang
,
C.
&
Selch
,
D.
2015
Incorporating uncertainty of groundwater modeling in sea-level rise assessment: a case study in South Florida
.
Climatic Change
129
(
1
),
1
14
.
Gates
,
W. L.
1999
An overview of the results of the Atmospheric Model Intercomparison Project (AMIP I)
.
Bulletin of the American Meteorological Society
80
(
2
),
29
55
.
Hotelling
,
H.
1992
Relations Between Two Sets of Variates
.
Springer
,
New York
.
Huang
,
G. B.
,
Zhu
,
Q. Y.
&
Siew
,
C. K.
2004
Extreme learning machine: a new learning scheme of feedforward neural networks
. In:
Proceedings International Joint Conference on Neural Networks
,
Budapest, Hungary
, pp.
985
990
.
Huang
,
G. B.
,
Zhou
,
H.
,
Ding
,
X.
&
Zhang
,
R.
2012
Extreme learning machine for regression and multiclass classification
.
IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics
42
,
513
529
.
Merola
,
G. M.
&
Abraham
,
B.
2001
Dimensionality reduction approach to multivariate prediction
.
Canadian Journal of Statistics
29
(
2
),
191
200
.
Palmer
,
T. N.
,
Alessandri
,
A.
,
Andersen
,
U.
,
Cantelaube
,
P.
,
Davey
,
M.
,
Délécluse
,
P.
&
Déqué
,
M.
2004
Development of a European multimodel ensemble system for seasonal-to prediction (demeter)
.
Bulletin of the American Meteorological Society
85
(
6
),
853
872
.
Tang
,
X. L.
,
Li
,
J. F.
,
Lv
,
X.
&
Long
,
H. L.
2012
Analysis of the characteristics of runoff in Manasi River Basin in the past 50 years
.
Procedia Environmental Sciences
13
,
1354
1362
.
Vrac
,
M.
,
Stein
,
M. L.
,
Hayhoe
,
K.
&
Liang
,
X. Z.
2007
A general method for validating statistical downscaling methods under future climate change
.
Geophysical Research Letters
34
(
18
),
266
278
.
Werner
,
A. T.
&
Cannon
,
A. J.
2015
Hydrologic extremes – an intercomparison of multiple gridded statistical downscaling methods
.
Hydrology & Earth System Sciences Discussions
12
(
6
),
6179
6239
.
Wood
,
A. W.
,
Maurer
,
E. P.
,
Kumar
,
A.
&
Lettenmaier
,
D. P.
2002
Long-range experimental hydrologic forecasting for the eastern United States
.
Journal of Geophysical Research Atmospheres
107
(
D20
),
ACL 6-1
ACL 6-15
.
Wood
,
A. W.
,
Leung
,
L. R.
,
Sridhar
,
V.
&
Lettenmaier
,
D. P.
2004
Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs
.
Climatic Change
62
(
1
),
189
216
.
Zheng
,
W.
,
Zhou
,
X.
,
Zou
,
C.
&
Zhao
,
L.
2006
Facial expression recognition using kernel canonical correlation analysis (KCCA)
.
IEEE Transactions on Neural Networks
17
(
1
),
233
238
.
Zhou
,
J.
1999
The River Hydrology and Water Resource in Xinjiang
.
Health Science and Technology Publishing
,
Xinjiang
(in Chinese)
.