Abstract

Long-term precipitation data plays an important role in climate impact studies, but the observation for a given catchment is very limited. To significantly expand our sample size for the extreme rainfall analysis, we considered ERA-20c, a century-long reanalysis daily precipitation provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). Preliminary studies have already indicated that ERA-20c can reproduce the mean reasonably well, but rainfall intensity is underestimated while wet-day frequency is overestimated. Thus, we first adopted a relatively simple approach to adjust the frequency of wet-days by imposing an optimal threshold. Moreover, we introduced a quantile mapping approach based on a composite distribution of a generalized Pareto distribution for the upper tail (e.g. 95th and 99th percentile), and a gamma distribution for the interior part of the distribution. The proposed composite distributions provide a significant reduction of the biases over the conventional method for the extremes. We suggested an interpolation method for the set of parameters of bias correction approach in ungauged catchments. A comparison of the corrected precipitation using spatially interpolated parameters shows that the proposed modelling scheme, particularly with the 99th percentile, can reliably reduce the systematic bias.

INTRODUCTION

Recent studies have documented that long-term climate change has impacted a wide range of fields such as agriculture, environment, health, economy and water resources (Vörösmarty et al. 2000; Patz et al. 2005; Nelson et al. 2009; IPCC 2014). A long-term change in climate variables such as precipitation and temperature can affect the growth of crops, ecosystems, human diseases, and water-related hazards. Of these impacts, water related hazards are closely linked to changes in rainfall intensity, which are of primary concern to water resource managers.

To systematically assess water resources and water related hazards, it is necessary to collect reliable long-term climate data. Locally recorded data have played an important role, and they have been considered to be accurate values in the modelling process. However, it has been widely acknowledged that the use of observed climate data is affected by the lack of spatial/temporal coverage, and long-term climate data are not readily available in many countries around the world. A primary strength of the reanalysis data is that compared with observation, they provide spatially finer scale climate data over a longer period, a few of which can cover the whole 20th century. For example, the National Oceanic and Atmospheric Administration (NOAA) has produced the 20th century reanalysis (20cR) spanning from 1850 to 2014, and the European Centre for Medium-Range Weather Forecasts (ECMWF) has also released century-long datasets such as the ECMWF 20th century atmospheric model ensemble (ERA-20 cm) and ECWMF 20th century assimilation surface observations only (ERA-20c), which cover the years from 1900 to 2010 (Compo et al. 2011; Hersbach et al. 2015; Poli et al. 2016). All of them can globally provide daily or sub-daily scale precipitation data, but differences exist in the assimilation techniques and spatial-temporal resolution. The products from the ECMWF (such as ERA-20c and ERA-20 cm), are based on the Integrated Forecasting System version Cy38r1 with 0.125° spatial resolution, which are more relevant in regional-scale studies in South Korea due to their higher spatial resolution. The difference between ERA-20c and ERA-20 cm is that the former assimilates pressure and wind observations but the latter does not consider them in the modelling process (Hersbach et al. 2015; Donat et al. 2016; Poli et al. 2016). Therefore, ERA-20 cm is limited in reproducing the actual synoptic situation (Hersbach et al. 2015; Gao et al. 2016). On the other hand, NOAA-20cR was processed by an Ensemble Kalman Filter technique (Compo et al. 2011), but its spatial resolution (i.e. 1.875 × 1.9°) is much coarser than the other century-long reanalysis data. Under these conditions, this study has selected the ERA-20c daily precipitation data with 0.125 × 0.125° spatial resolution, as an alternative for the observation of precipitation over South Korea.

However, although substantial improvements have been made in the modeling process, previous studies have shown that reanalysis datasets still have their own systematic errors which vary in space and time (Bosilovich et al. 2008; Ma et al. 2009; Bao & Zhang 2013; Gao et al. 2016; Kim & Han 2018). It is also clear that century-long reanalysis data may misrepresent long-term climatic trends or synoptic scale variability, especially for the first half of the twentieth century, and there exists the difference in temporal variability (Brands et al. 2012; Krueger et al. 2013; Poli et al. 2013; Befort et al. 2016; Donat et al. 2016). However, there are limited studies on bias correction for long-term daily reanalysis precipitation data in hydrologic applications. Most of the existing studies have been performed mainly within the context of comparison across different reanalysis data, but not bias correction technique issues (Befort et al. 2016; Donat et al. 2016; Poli et al. 2016). Thus, to better understand the biases and their roles in hydrologic applications, this study focuses on exploring bias correction methods, especially for extreme value analysis associated with the sampling error in rainfall frequency analysis, in a certain area with a spatio-temporally sparse observation network.

The underlying concepts for the bias correction approach vary from a simple delta change (or mean bias correction) to a quantile mapping (QM) or multivariate approach based on a copula-based technique (Teutschbein & Seibert 2012; Haerter et al. 2015; Mao et al. 2015; Vrac & Friederichs 2015; Gao et al. 2016; Maraun 2016; Nyunt et al. 2016; Frank et al. 2018; Macias et al. 2018). For instance, Frank et al. (2018) applied a scaling approach based on orthogonal distance regressions for bias correction of a European reanalysis data. Macias et al. (2018) employed a simple bias-correction approach using a linear transfer function between the cumulative distribution functions (CDFs) of the modeled and observed atmospheric variables. Vrac & Friederichs (2015) proposed a multivariate bias correction scheme based on a copula concept. Although each method has its own merits and limitations, previous studies have shown that bias correction methods were generally capable of reducing systematic errors in numerical model outputs and, among them, QM showed better performance than other approaches, especially for precipitation (Teutschbein & Seibert 2012; Themeßl et al. 2012; Fang et al. 2015; Maraun & Widmann 2018). The QM method, referred to as “distribution mapping” or “probability mapping”, was used to rectify the cumulative distribution of the modelled data against that of the observed data by employing a transfer function.

However, there are two main drawbacks to the QM approach based on a gamma distribution (gQM). First, it has been acknowledged that gQM often fails to reproduce extreme rainfalls, which are mainly described by the upper tail of the distribution (Wilks 1999; Vrac & Naveau 2007; Hundecha et al. 2009; Volosciuk et al. 2017). In other words, the gQM approach may result in misrepresentation of the upper tail of the distribution, which, in turn, can lead to underestimation of the design rainfalls. On the one hand, one may intuitively consider the heavy tailed distributions such as extreme value distribution (e.g. Gumbel distribution, generalized extreme value distribution and Weibull distribution). On the other hand, the heavy tailed distribution for the bias correction may result in overestimation of daily rainfall in the lower tail of the distribution. In these contexts, a composite distribution including the mixture distribution (such as the Pareto mixture distribution) has been applied to the quantile mapping approach, especially for the correction of climate change scenarios (Gutjahr & Heinemann 2013; Smith et al. 2014; Nyunt et al. 2016; Volosciuk et al. 2017). Comparatively little attention has been given to the bias correction of the century-long reanalysis like ERA-20c. In these contexts, this study aims to introduce a quantile mapping approach based on a composite distribution of a generalized Pareto distribution (GPD) for the upper tail (e.g. 95th and 99th percentile) and a gamma distribution for the interior part of the distribution.

The conventional QM method is also limited in that it cannot be applied directly to the ungauged basin, where a one-to-one mapping between the observed and the modelled data does not exist. More specifically, only a transfer function of a set of grid points for the paired precipitation data can be obtained. Thus, an alternative method for the synthesis of unpaired data needs to be established. The general approaches to the interpolation of in-situ data for the quantile mapping are the inverse distance weighting (IDW) and the kriging method, and the interpolated values can then be used to obtain the transfer function for the ungauged basin. For example, Gutjahr & Heinemann (2013) applied the IDW method to produce spatially continuous estimates of the daily precipitation for the spatial bias correction. However, the systematic error in the process of the spatial interpolation of daily rainfall can be propagated through to the parameter estimation in the quantile mapping approach. Thus, a primary question in the statistical bias correction analysis is whether the QM method can reliably improve ERA-20c daily precipitation, especially for extreme values, over 100 years when including the ungauged sites.

From this background, this study mainly focuses on exploring the following questions:

  1. What are the characteristics of the uncertainty associated with the ERA-20c daily precipitation data in South Korea? Do the reanalysis data well describe the statistical properties in terms of the extreme as well as the mean values?

  2. How well does the traditional QM method approach perform on the reanalysis data? Can a combined distribution based bias correction be more effective for the reduction of the systematic error compared with the bias correction approach based on a single distribution (gQM)?

  3. How can we effectively extend the combined distribution approach to the spatial bias correction for ungauged catchments? Can the proposed scheme facilitate a reconstruction of long-term precipitation, especially for the estimation of annual maximum series (AMS) of daily precipitation?

To address these questions, we investigated the bias correction in three phases. First, we attempted to understand the statistical behavior of the ERA-20c data and further analyze the biases and errors in the reanalysis mean and extreme precipitation. Second, the QM approach was explored by using a combined Gamma-Pareto distribution in the bias correction method to better represent the upper tail of the distribution for 48 stations for the baseline period 1973–2010. The corrected data for the proposed approach were then compared with that of the observed. Finally, we proposed a spatial bias correction approach based on the parameter contour maps (IM-PCM). The correction approach consists of three steps for ungauged catchments. The reanalysis data and observed precipitation are summarized in the next section, and the theoretical background for the proposed bias correction approach follows. The proposed model was applied to the daily rainfall data for the baseline period and a retrospective analysis of the data was then conducted for the estimation of AMS rainfalls in the Results and discussion section. Finally, concluding remarks are provided in the last section.

STUDY AREA AND DATA

Study area and local gauged data

South Korea is located in the northeast part of Asia, and lies between latitudes 33–39°N and longitudes 125–132°E, including all the islands. The total area is approximately 100,032 km2, and its annual average rainfall is about 1,277 mm. In South Korea, there are hundreds of local weather stations available. However, most of them were installed after 1970, and only a few stations provide long-term daily precipitation records for more than 40 years. The observed daily precipitation sequences were obtained and compiled from the Korea Meteorological Administration (KMA). The location of the study area and the local gauging stations used in this study are illustrated in Figure 1, and the details for the stations are summarized in Table 1.

Table 1

The local rainfall stations used in this study

Station no. Name Latitude (°N) Longitude (°E) Elevation (m.asl) Annual rainfall (mm)a 
St. 1 Sokcho 38.2508 128.5644 19.5 1,374.6 
St. 2 Daegwallyeong 37.6769 128.7181 774.0 1,736.4 
St. 3 Chuncheon 37.9025 127.7356 79.1 1,304.9 
St. 4 Gangneung 37.7514 128.8908 27.4 1,436.6 
St. 5 Seoul 37.5714 126.9656 11.1 1,386.8 
St. 6 Incheon 37.4775 126.6247 69.6 1,183.0 
St. 7 Wonju 37.3375 127.9464 150.0 1,318.6 
St. 8 Suwon 37.2700 126.9875 38.3 1,274.9 
St. 9 Chungju 36.9700 127.9525 116.5 1,202.0 
St. 10 Seosan 36.7736 126.4958 30.3 1,254.9 
St. 11 Cheongju 36.6361 127.4428 58.6 1,229.7 
St. 12 Daejeon 36.3689 127.3742 70.3 1,353.0 
St. 13 Chupungyeong 36.2197 127.9944 246.1 1,171.5 
St. 14 Andong 36.5728 128.7072 141.5 1,017.3 
St. 15 Pohang 36.0325 129.3794 3.7 1,145.4 
St. 16 Gunsan 36.0019 126.7631 24.6 1,210.8 
St. 17 Daegu 35.8850 128.6189 65.5 1,047.0 
St. 18 Jeonju 35.8214 127.1547 54.8 1,291.6 
St. 19 Ulsan 35.5600 129.3200 36.0 1,265.5 
St. 20 Gwangju 35.1728 126.8914 73.8 1,387.9 
St. 21 Busan 35.1044 129.0319 71.0 1,500.2 
St. 22 Mokpo 34.8167 126.3811 39.4 1,139.4 
St. 23 Yeosu 34.7392 127.7406 66.0 1,420.1 
St. 24 Jinju 35.1636 128.0400 31.6 1,504.8 
St. 25 Yangpyeong 37.4886 127.4944 49.4 1,359.6 
St. 26 Icheon 37.2639 127.4842 79.4 1,330.9 
St. 27 Inje 38.0600 128.1669 201.6 1,167.8 
St. 28 Hongcheon 37.6833 127.8803 142.3 1,353.2 
St. 29 Jecheon 37.1592 128.1942 265.0 1,345.8 
St. 30 Boeun 36.4875 127.7339 176.4 1,275.0 
St. 31 Cheonan 36.7794 127.1211 24.0 1,229.4 
St. 32 Boryeong 36.3269 126.5572 16.9 1,219.6 
St. 33 Buyeo 36.2722 126.9206 12.7 1,323.3 
St. 34 Geumsan 36.1056 127.4817 171.7 1,277.1 
St. 35 Buan 35.7294 126.7164 13.4 1,249.8 
St. 36 Imsil 35.6122 127.2853 249.3 1,340.2 
St. 37 Jeongeup 35.5631 126.8658 46.0 1,317.1 
St. 38 Namwon 35.4053 127.3328 91.7 1,351.0 
St. 39 Jangheung 34.6886 126.9194 46.4 1,493.7 
St. 40 Haenam 34.5533 126.5689 14.4 1,322.4 
St. 41 Goheung 34.6181 127.2756 54.5 1,459.2 
St. 42 Yeongju 36.8717 128.5167 212.2 1,268.1 
St. 43 Mungyeong 36.6272 128.1486 172.0 1,241.5 
St. 44 Uiseong 36.3558 128.6883 83.2 1,016.5 
St. 45 Gumi 36.1306 128.3206 50.3 1,051.1 
St. 46 Yeongcheon 35.9772 128.9514 95.0 1,039.3 
St. 47 Geochang 35.6711 127.9108 222.4 1,298.9 
St. 48 Sancheong 35.4128 127.8789 0.8 1,512.7 
Station no. Name Latitude (°N) Longitude (°E) Elevation (m.asl) Annual rainfall (mm)a 
St. 1 Sokcho 38.2508 128.5644 19.5 1,374.6 
St. 2 Daegwallyeong 37.6769 128.7181 774.0 1,736.4 
St. 3 Chuncheon 37.9025 127.7356 79.1 1,304.9 
St. 4 Gangneung 37.7514 128.8908 27.4 1,436.6 
St. 5 Seoul 37.5714 126.9656 11.1 1,386.8 
St. 6 Incheon 37.4775 126.6247 69.6 1,183.0 
St. 7 Wonju 37.3375 127.9464 150.0 1,318.6 
St. 8 Suwon 37.2700 126.9875 38.3 1,274.9 
St. 9 Chungju 36.9700 127.9525 116.5 1,202.0 
St. 10 Seosan 36.7736 126.4958 30.3 1,254.9 
St. 11 Cheongju 36.6361 127.4428 58.6 1,229.7 
St. 12 Daejeon 36.3689 127.3742 70.3 1,353.0 
St. 13 Chupungyeong 36.2197 127.9944 246.1 1,171.5 
St. 14 Andong 36.5728 128.7072 141.5 1,017.3 
St. 15 Pohang 36.0325 129.3794 3.7 1,145.4 
St. 16 Gunsan 36.0019 126.7631 24.6 1,210.8 
St. 17 Daegu 35.8850 128.6189 65.5 1,047.0 
St. 18 Jeonju 35.8214 127.1547 54.8 1,291.6 
St. 19 Ulsan 35.5600 129.3200 36.0 1,265.5 
St. 20 Gwangju 35.1728 126.8914 73.8 1,387.9 
St. 21 Busan 35.1044 129.0319 71.0 1,500.2 
St. 22 Mokpo 34.8167 126.3811 39.4 1,139.4 
St. 23 Yeosu 34.7392 127.7406 66.0 1,420.1 
St. 24 Jinju 35.1636 128.0400 31.6 1,504.8 
St. 25 Yangpyeong 37.4886 127.4944 49.4 1,359.6 
St. 26 Icheon 37.2639 127.4842 79.4 1,330.9 
St. 27 Inje 38.0600 128.1669 201.6 1,167.8 
St. 28 Hongcheon 37.6833 127.8803 142.3 1,353.2 
St. 29 Jecheon 37.1592 128.1942 265.0 1,345.8 
St. 30 Boeun 36.4875 127.7339 176.4 1,275.0 
St. 31 Cheonan 36.7794 127.1211 24.0 1,229.4 
St. 32 Boryeong 36.3269 126.5572 16.9 1,219.6 
St. 33 Buyeo 36.2722 126.9206 12.7 1,323.3 
St. 34 Geumsan 36.1056 127.4817 171.7 1,277.1 
St. 35 Buan 35.7294 126.7164 13.4 1,249.8 
St. 36 Imsil 35.6122 127.2853 249.3 1,340.2 
St. 37 Jeongeup 35.5631 126.8658 46.0 1,317.1 
St. 38 Namwon 35.4053 127.3328 91.7 1,351.0 
St. 39 Jangheung 34.6886 126.9194 46.4 1,493.7 
St. 40 Haenam 34.5533 126.5689 14.4 1,322.4 
St. 41 Goheung 34.6181 127.2756 54.5 1,459.2 
St. 42 Yeongju 36.8717 128.5167 212.2 1,268.1 
St. 43 Mungyeong 36.6272 128.1486 172.0 1,241.5 
St. 44 Uiseong 36.3558 128.6883 83.2 1,016.5 
St. 45 Gumi 36.1306 128.3206 50.3 1,051.1 
St. 46 Yeongcheon 35.9772 128.9514 95.0 1,039.3 
St. 47 Geochang 35.6711 127.9108 222.4 1,298.9 
St. 48 Sancheong 35.4128 127.8789 0.8 1,512.7 

aAnnual mean precipitation estimated from 1973 to 2010.

Figure 1

A map showing the study area, local gauging stations and grid points of ERA-20c. The grey shading on the map indicates elevations.

Figure 1

A map showing the study area, local gauging stations and grid points of ERA-20c. The grey shading on the map indicates elevations.

ERA-20c daily precipitation

As previously mentioned in the Introduction, we explored the ERA-20c daily precipitation, which is one of the longest reanalysis data covering the whole 20th century (i.e. 1900–2010) (Donat et al. 2016; Poli et al. 2016). ERA-Interim data has been widely adopted in the field of hydrometeorology among many other reanalysis products (Simmons et al. 2014; de Leeuw et al. 2015; Betts & Beljaars 2017), but the ERA-Interim only covers the data-rich period from 1979 to the present (Dee et al. 2011). In this research, we focused on the ERA-20c data with its highest resolution, 0.125 × 0.125° (approximately 13.8 × 11.2 km), which consists of 603 grid points (http://apps.ecmwf.int/datasets/). The data taken over the sea were excluded from this study. The specific gridded points for ERA-20c are illustrated in Figure 1.

It is crucial to understand the features of the model biases to improve the modelled reanalysis data. Some of the general features of ERA-20c daily precipitation over South Korea are examined in terms of the mean and the extreme values. For the mean precipitation, we compared the intra-seasonal variability within the annual cycle by exploring the monthly means and the 10-day running means between the observed and ERA-20c precipitation (as shown in Figure 2) averaged over all 48 stations during the baseline period (1973–2010). The model performance was evaluated by both the Nash–Sutcliffe efficiency (NSE) and root-mean-square error (RMSE), which are described in the Methodology section. The results confirmed that ERA-20c can reproduce the mean values quite well during the dry season. There is a significant difference between modelled and observed precipitation during the summer (i.e. July–September), which may lead to an underestimation of extreme rainfall.

Figure 2

A comparison of the mean values of ERA-20c daily precipitation on an annual basis. (a) Monthly mean comparison between the observed (Obs) and ERA-20c, and (b) observed 38-year (1973–2010) mean of daily precipitation (yellow bar) and its 10-day running mean (black solid line) along with 10-day running mean estimated from ERA-20c (blue dotted line) for all 48 stations. Please refer to the online version of this paper to see this figure in color: http://dx.doi:10.2166/nh.2019.127.

Figure 2

A comparison of the mean values of ERA-20c daily precipitation on an annual basis. (a) Monthly mean comparison between the observed (Obs) and ERA-20c, and (b) observed 38-year (1973–2010) mean of daily precipitation (yellow bar) and its 10-day running mean (black solid line) along with 10-day running mean estimated from ERA-20c (blue dotted line) for all 48 stations. Please refer to the online version of this paper to see this figure in color: http://dx.doi:10.2166/nh.2019.127.

In terms of the extreme rainfall episodes, the 50 top events were extracted for the baseline period, and an underestimation of extremes in the ERA-20c was clearly identified, as illustrated in Figure 3. The deviations are generally large, even for relatively larger upper tail parts of the distribution with –1.088 for NSE and 76.69 mm for RMSE (Figure 3(a)). On the one hand, the deviations are quite systematic in the sense of the bias correction. The relationships between the 50 top extreme rainfalls showed that the discrepancies were largely attributed to differences in rainfall during the summer season, as noted in Figure 2. The overall relationships are similar to each other, as shown in Appendix A (available with the online version of this paper), and the comparisons in the stations 4, 16, 28 and 40 are representatively illustrated in Figure 3(b). The biases in extreme values are generally proportional to the amount of rainfall, and the biases are likely to be higher in the upper tails of the distribution than that of the middle layer.

Figure 3

Evaluation of bias associated with 50 top extreme rainfall events. (a) Scatter plot of the extremes between the observed and ERA-20c over the entire region of interest and (b) comparison of the deviation corresponding to the rank for the station 4, 16, 28 and 40 for the baseline period 1973–2010.

Figure 3

Evaluation of bias associated with 50 top extreme rainfall events. (a) Scatter plot of the extremes between the observed and ERA-20c over the entire region of interest and (b) comparison of the deviation corresponding to the rank for the station 4, 16, 28 and 40 for the baseline period 1973–2010.

In summary, the ERA-20c precipitation data are capable of reliably reproducing the mean values with 0.968 for NSE and 15.59 mm for RMSE, while the extreme values in the 50 top records are consistently underestimated with –1.088 for NSE and 76.69 mm for RMSE. The results obtained here could indicate that although the ERA-20c modelling process adequately represents the mean climate of the historical period, heavy rainfalls in the summer season can be significantly underestimated due to the fact that intensive rainfall events driven by convective storms may not be effectively resolved by the current climate modelling approach and spatial resolution. On the other hand, as shown in Figure 4, ERA-20c exhibits a much higher frequency of wet-days (>0 mm/day), varying from 11.75 to 26.64 days per month, than that of the observed precipitation (6.07–14.5 days) for all months in South Korea. More generally, the over-pronounced frequency of light precipitation by climate models is a well-known problem, and it may partially cause the underestimation of the extremes. In these contexts, a two-stage bias correction approach to daily precipitation is typically adopted to first adjust the overestimated wet-day frequency and then rectify the biases associated with both the mean and extreme values.

Figure 4

Monthly wet-day frequency for the observed (black solid line) and ERA-20c (blue dotted line) for all 48 stations for the baseline period (1973–2010). Please refer to the online version of this paper to see this figure in color: http://dx.doi:10.2166/nh.2019.127.

Figure 4

Monthly wet-day frequency for the observed (black solid line) and ERA-20c (blue dotted line) for all 48 stations for the baseline period (1973–2010). Please refer to the online version of this paper to see this figure in color: http://dx.doi:10.2166/nh.2019.127.

METHODOLOGY

As illustrated in the previous section, two deficiencies in the ERA-20c became evident: the overestimation of the wet-day frequency and underestimation of the extreme values. To correct the biases, we adopted a two-stage bias correction scheme that consists of the wet-day frequency correction scheme and the composite distribution based QM approach. The proposed methods and their assumptions used in this study are provided in this section.

Wet-day frequency correction scheme

It is well known that the wet-day frequencies of the simulated precipitation data from climate models are typically inflated due to the generation of small precipitation amounts near 0.1 mm/day (Piani et al. 2010; Kim et al. 2015b; Nyunt et al. 2016). For this reason, a cut-off threshold (TH) approach has been commonly applied to adjust the wet-day frequency in the bias correction for daily precipitation using different criteria (Schmidli et al. 2006; Piani et al. 2010; Themeßl et al. 2012; Kim et al. 2015a, 2015b; Rabiei & Haberlandt 2015; Nyunt et al. 2016; Volosciuk et al. 2017). For example, Piani et al. (2010) and Volosciuk et al. (2017) applied 0.1 mm/day as the threshold, whereas the wet-day frequency of simulated precipitation was set equal to that of the observed precipitation (Kim et al. 2015a, 2015b; Nyunt et al. 2016). Rabiei & Haberlandt (2015) compared five different thresholds (0, 0.02, 0.05, 0.07 and 0.1 mm/h) for spatial bias correction of hourly radar data and concluded that the threshold of 0.05 mm/h performed the best among the five in terms of the reduction of biases.

In our study, a set of predetermined thresholds were used to adjust the wet-day frequency of the modelled daily precipitation from ERA-20c, and the thresholds used in this study can be found in previous studies (Piani et al. 2010; Kim et al. 2015a, 2015b; Rabiei & Haberlandt 2015; Volosciuk et al. 2017). We considered four different thresholds to identify an optimal threshold (TH) for the ERA-20c: (TH1) >0 mm/day, (TH2) >0.1 mm/day, (TH3) >1 mm/day, and (TH4), the frequency of wet days was set to the observed value, which varied from 0 to 4.66, as shown in Figure 5. On the one hand, changes in the wet-day frequency can affect the overall performance in the bias correction process through the QM approach, because a transfer function between the simulated and observed precipitation is established on the basis of non-zero precipitation. In this context, the optimum threshold was evaluated through the experiment with gQM for a pair of daily rainfall series for each station. It should be noted that daily rainfalls below the thresholds were set to zero for ERA-20c. Among four thresholds, the determined threshold was then applied in the next steps.

Figure 5

Monthly distribution of cut-off thresholds for TH4 over all stations.

Figure 5

Monthly distribution of cut-off thresholds for TH4 over all stations.

Statistical bias correction model: QM with a composite distribution

As stated in the Introduction, a gamma distribution with two parameters has been commonly used in bias correction of daily precipitation. The gamma distribution and its transfer function for the QM can be expressed as follows: 
formula
(1)
 
formula
(2)
where and are the corrected data and the uncorrected (or modelled) data in the baseline period. F is a gamma CDF and is its inverse function, while and are the shape and scale parameters of the gamma distribution, respectively. To account for the seasonality, it is common to have bias correction models for each month that are independent from the others (Kim et al. 2015b).

To effectively improve the bias in the extreme rainfall for ERA-20c, we propose a composite distribution based on the QM approach which is comprised of different types of distributions. More specifically, the extreme value distribution can be utilized for the upper tail of the distribution, while a gamma distribution is applied for the interior part of the distribution. For extremes, the 95th or 99th percentiles have been applied as an upper threshold in numerous studies because the distribution of excesses over the high thresholds is asymptotically approximated by a generalized Pareto distribution (GPD) (Manton et al. 2001; Wilson & Toumi 2005; Acero et al. 2011; Gutjahr & Heinemann 2013; Chan et al. 2015; Nyunt et al. 2016). In this study, we apply both the 95th and 99th percentiles as the upper thresholds.

The GPD has been widely applied to the peak-over-threshold (POT) series for the selection of the best-fit distribution for the extreme rainfalls (Vrac & Naveau 2007; Hundecha et al. 2009; Gutjahr & Heinemann 2013; Nyunt et al. 2016; Volosciuk et al. 2017), although there have been a considerable number of studies using other extreme value distributions including the generalized extreme value (GEV), Weibull (WEI), Gumbel (GUM), and Log-normal (LOGN). To ensure the suitability of the GPD, we first evaluated six different distributions, GPD, GEV, GUM, WEI, LOGN and gamma, for the extremes in both the observed and ERA-20c over the 95th and 99th percentiles using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). The model with the lowest AIC and BIC is preferred as the best-fit distribution. For a given threshold, the GPD was selected as the best-fit distribution for the extremes as shown in Table 2. The numbers in Table 2 indicate the number of stations which belong to a certain distribution.

Table 2

The selected distributions among six distributions based on AIC and BIC values for the extremes from observed and ERA-20c daily precipitation over the 95th and 99th percentiles for all 48 stations

Percentile Data GPD GEV LOGN WBL GUM GAM 
95th Observation 47 
ERA-20c 48 
99th Observation 47 
ERA-20c 47 
Percentile Data GPD GEV LOGN WBL GUM GAM 
95th Observation 47 
ERA-20c 48 
99th Observation 47 
ERA-20c 47 
As previously mentioned, the GPD is separately applied to the extreme values defined by the 95th and 99th thresholds at each station as a transfer function, whereas the gamma distribution was mainly applied to the interior part of the distribution, as illustrated in Equation (3) (Gutjahr & Heinemann 2013): 
formula
(3)
Here, and are the CDFs of the ERA-20c model for gamma and GPD. Similarly, and are the inverse (or quantile) function of CDFs of observations for gamma and GPD, respectively. The heavy tailed distribution for POTs is defined as follows for a GPD with a high upper threshold (u) (Coles 2001; Gutjahr & Heinemann 2013): 
formula
(4)
Here, is the reparametrized scale parameter, and is the shape parameter. In this study, the thresholds (u, the 95th or 99th percentile) for observed and modelled precipitation were derived at each station.

In this approach, the four parameters to be estimated are the shape () and scale () parameters for the gamma distribution, and the shape () and scale () parameter for GPD, while the upper thresholds are assumed to be known for the given 95th or 99th percentile. The parameters for gamma distribution are estimated on a monthly basis, whereas the parameters of GPD are estimated using entire POTs for all months in each station. Here, the maximum likelihood method is used to estimate all the parameters. Hereafter, the proposed method with a composite distribution of gamma and GPD is referred to as gpQM. Moreover, the gpQM with the 95th and 99th upper thresholds were abbreviated as gpQM95 and gpQM99, respectively. For comparison, the conventional bias correction gQM was also applied and compared in terms of the accuracy of both the extreme and the mean value.

Spatial interpolation by parameter contour maps

In the gpQM approach, a pair of observed and modelled data are required to estimate the six parameters (TH, , , , and u). However, because there are a limited number of available weather stations, the transfer function for the QM could not be established for all grid points. Therefore, the existing methods can only be applied over gauged catchments. In contrast, we introduce an interpolation method based on parameter contour maps (IM-PCM) which consist of three steps as summarized in Figure 6. For gpQM95 and gpQM99, the six parameters (TH, , , , and u) were first estimated for each station as already noted in the previous sections. Secondly, a contour map for each parameter was then constructed using a two-dimensional linear interpolation technique as shown in Figure 7. Finally, a set of parameters for the gpQM were taken from the maps to construct the transfer function for all grid points. The cut-off threshold (TH) is the first interpolated variable, and the maps of shape () and scale () parameters for the gamma distribution were then generated on a monthly basis, while the shape (), scale () and upper threshold (u) parameter maps of the GPD were created by using the entire POTs on an annual basis. For the gQM, a similar process to the one described above was used to produce three parameter (TH, and ) maps for the transfer function.

Figure 6

A flowchart of the proposed quantile mapping approaches (gpQM95/gpQM99 and gQM) based on the parameter contour maps (IM-PCM).

Figure 6

A flowchart of the proposed quantile mapping approaches (gpQM95/gpQM99 and gQM) based on the parameter contour maps (IM-PCM).

Figure 7

Parameter contour maps for gpQM99 approach. (a) Maps of shape () and scale () parameter of the gamma distribution in August, (b) maps of shape () and scale () parameter of the GPD, (c) map of frequency of wet-days corresponding to the cut-off threshold (TH) in August, and (d) maps of upper threshold (u) for the GPD. Here, the GPD is applied to entire POTs on an annual basis.

Figure 7

Parameter contour maps for gpQM99 approach. (a) Maps of shape () and scale () parameter of the gamma distribution in August, (b) maps of shape () and scale () parameter of the GPD, (c) map of frequency of wet-days corresponding to the cut-off threshold (TH) in August, and (d) maps of upper threshold (u) for the GPD. Here, the GPD is applied to entire POTs on an annual basis.

Evaluation criteria

In this study, we evaluate the bias-corrected ERA-20c in terms of both the extreme and the mean values. For the extremes, we compared the rainfalls for a given 99th threshold between three different QM approaches including gQM, gpQM95 and gpQM99. In addition, the annual maximum series (AMS) for all stations were extracted and compared to that of the corrected ERA-20c. For the mean values, both the monthly mean and 10-day running means between the observed and ERA-20c precipitation were compared in the context of the intra-seasonal variability. Moreover, we used the RMSE and NSE, which are well known goodness-of-fit measures for model evaluation in the field of hydrology (Legates & McCabe Jr 1999). These are provided in Equations (5) and (6): 
formula
(5)
 
formula
(6)
Here, is the ith observation, is the mean of the observation, while is the modelled data, and n is the number of observations. The RMSE represents the square root of the second sample moment of the residuals (or deviations) between observed and modelled values (Liu et al. 2014; Mohanty et al. 2015). A value of zero indicates a perfect fit, and compared with the mean absolute error (MAE), the RMSE is beneficial and more appropriate to represent model performance when the error distribution follows Gaussian (Chai & Draxler 2014). For NSE, essentially, the better the model efficiency is close to 1, and a model over 0.5 for NSE is considered to be of sufficient quality (Wȩglarczyk 1998; Gupta et al. 2009; Mohanty et al. 2015).

The performance of the proposed interpolation method was evaluated by a leave-one-out procedure within a cross validation framework. To be more specific, this approach estimates a set of parameters for the observation of daily precipitation for 47 out of 48 stations, and the estimated parameters were further used to build contour maps as shown in Figure 7. The set of parameters of the grid point corresponding to the excluded station were taken from the maps, and the proposed bias correction approaches were then applied. Again, note that the model performance for the extreme and mean values were evaluated with regard to RMSE and NSE as described above.

RESULTS AND DISCUSSION

Evaluation for the lower threshold

This study examined four different thresholds (TH1, TH2, TH3, and TH4) for adjustment of the wet-day frequency of ERA-20c daily precipitation through an experiment with the gQM approach in terms of both the mean and extreme values. We investigated the intra-seasonal variability within the annual cycle by comparing the monthly means and the 10-day running means as an overall evaluation of the bias corrected precipitation. Here, all the values were averaged over all 48 stations during the baseline period (1973–2010) as illustrated in Figure 8. We found that the threshold TH4 yielded the best results among the four in terms of the reduction of biases, as summarized in Figure 8(a) and Table 3. Again note that TH4 is the case where the frequency of wet days of ERA-20c is set to that of the observed precipitation. On the other hand, the other thresholds, TH1, TH2 and TH3, showed a significant overestimation, whereas the uncorrected ERA-20c showed a relatively small bias. Our results offer insight on how improper thresholds for the wet-day frequency may affect bias correction results, leading to a significant overestimation of daily rainfall. Such discrepancies may arise from the significantly different thresholds used to adjust the wet-day frequency. As illustrated in the previous section, the lower thresholds for TH4 were varied over the range 0–4.66 mm while the thresholds assumed in the TH1, TH2 and TH3 are much lower than the one measured in the TH4, especially for the summer season (July–September). Indeed, the similar results seen in the 10-day moving mean suggests that our findings may be generalizable to cut-off thresholds seen in different locations and seasons, as shown in Figure 8(b) and Table 3. We also found that the bias associated with the cut-off thresholds significantly varied within a specific season, especially in the summer. The biases for both TH1 and TH2 range from 2.21 to 10.49 and from 1.92 to 10.09 during the summer, respectively, while TH3 and TH4 varied from 0.16 to 6.27 and from –1.06 to 2.97, respectively.

Table 3

Comparisons of root-mean-square-error (RMSE) and Nash–Sutcliffe efficiency (NSE) between the observed and the corrected ERA-20c for different thresholds (TH1 (>0 mm/day), TH2 (>0.1 mm/day), TH3 (>1 mm/day) and TH4 (frequency adjustment)] and the uncorrected ERA-20c precipitation

Data Measures TH1 TH2 TH3 TH4 ERA − 20c 
Monthly mean (mm/month) RMSE (mm) 119.24 110.50 42.57 4.77 15.59 
NSE −0.899 −0.631 0.758 0.997 0.968 
10-days running mean (mm/day) RMSE (mm) 4.03 3.74 1.49 0.51 0.56 
NSE −0.886 −0.622 0.744 0.970 0.963 
Data Measures TH1 TH2 TH3 TH4 ERA − 20c 
Monthly mean (mm/month) RMSE (mm) 119.24 110.50 42.57 4.77 15.59 
NSE −0.899 −0.631 0.758 0.997 0.968 
10-days running mean (mm/day) RMSE (mm) 4.03 3.74 1.49 0.51 0.56 
NSE −0.886 −0.622 0.744 0.970 0.963 
Figure 8

A comparison of mean rainfall between the observation and the corrected ERA-20c with different thresholds (TH1 (>0 mm/day), TH2 (>0.1 mm/day), TH3 (>1 mm/day) and TH4 (frequency adjustment)) and the uncorrected ERA-20c (RAW)) on the annual basis. All values are averaged over all 48 stations from 1973 to 2010. (a) Monthly mean comparison between different thresholds and (b) observed 38-year (1973–2010) mean of daily precipitation (yellow bar) and its 10-day running mean (black solid line), along with a set of 10-day running means estimated from bias corrected ERA-20c daily precipitations using four different thresholds for all 48 stations. Please refer to the online version of this paper to see this figure in color: http://dx.doi:10.2166/nh.2019.127.

Figure 8

A comparison of mean rainfall between the observation and the corrected ERA-20c with different thresholds (TH1 (>0 mm/day), TH2 (>0.1 mm/day), TH3 (>1 mm/day) and TH4 (frequency adjustment)) and the uncorrected ERA-20c (RAW)) on the annual basis. All values are averaged over all 48 stations from 1973 to 2010. (a) Monthly mean comparison between different thresholds and (b) observed 38-year (1973–2010) mean of daily precipitation (yellow bar) and its 10-day running mean (black solid line), along with a set of 10-day running means estimated from bias corrected ERA-20c daily precipitations using four different thresholds for all 48 stations. Please refer to the online version of this paper to see this figure in color: http://dx.doi:10.2166/nh.2019.127.

For the evaluation of the extreme rainfalls associated with different thresholds, we extracted rainfall events exceeding a given 99th threshold and we compared the four different thresholds for all stations. As illustrated in Figure 9, a systematic significant underestimation of extremes in the ERA-20c is most apparent, while the improvements appear to result from enhanced representation of the bias associated with extreme values regardless of the threshold. Specifically, TH4 performs the best with 0.755 for NSE and 27.33 mm for RMSE, followed by TH3, TH2 and TH1. The errors may be largely attributed to their number of data with different thresholds for a given time series. To be more specific, the lower threshold allows a relatively large number of data, while the higher threshold could reduce the number of available data. Given these results, TH4 could be the most reliable cut-off threshold for the ERA-20c under the gQM approach. On the other hand, there remains considerable potential for improving extremes, especially over 300 mm/day. Thus, we will further explore the bias correction approach for the upper tail of the distribution.

Figure 9

Scatter plots between the observed and the modelled extreme rainfalls associated with different thresholds over the 99th percentile for all 48 stations. RAW indicates the uncorrected ERA-20c and the others represent the results from the corrected ERA-20c by gQM with different thresholds (TH1 (>0 mm/day), TH2 (>0.1 mm/day), TH3 (>1 mm/day) and TH4 (frequency adjustment)).

Figure 9

Scatter plots between the observed and the modelled extreme rainfalls associated with different thresholds over the 99th percentile for all 48 stations. RAW indicates the uncorrected ERA-20c and the others represent the results from the corrected ERA-20c by gQM with different thresholds (TH1 (>0 mm/day), TH2 (>0.1 mm/day), TH3 (>1 mm/day) and TH4 (frequency adjustment)).

Bias correction based on a composite gamma-GPD distribution

This study applies a composite (or piecewise) distribution based QM approach which consists of gamma distribution and GPD, for a given set of thresholds. Here, after adopting TH4 as a lower threshold, the 95th or 99th quantiles have been considered as an upper threshold for the correction of extremes (gpQM95 and gpQM99). The composite distribution approach was evaluated by comparing the obtained extreme rainfalls from modelled ERA-20c with the ones observed for the baseline, as shown in Figure 10. In comparison with the extreme daily rainfalls over the 99th percentile, the GPD based bias correction schemes (i.e. gpQM99 and gpQM95) demonstrate better performance in terms of reproducing the extremes than gQM (Figure 10(a)). gpQM99 shows the best performance in terms of NSE with an efficiency of 0.906, and a good agreement was achieved with 0.879 in gpQM95, whereas the gQM was 0.755. For RMSE, gpQM99 (i.e. 16.92 mm) and gpQM95 (i.e. 19.16 mm) showed a significant reduction of the errors by 38.1 and 29.9% relative to gQM (27.33 mm). Moreover, a comparison of the AMS rainfall also confirmed that gpQM99 and gpQM95 were capable of reproducing rainfall characteristics observed in the AMS more effectively than gQM. Specifically, gpQM99 showed the best performance with 0.912 for NSE and 18.80 mm for RMSE, whereas gpQM95 was 0.892 for NSE and 20.77 mm for RMSE. The results obtained in this study suggest that the gpQM approach is more appropriate to reduce the systematic errors in estimating extreme rainfalls than gQM.

Figure 10

Scatter plots for (a) the extreme rainfalls over the 99th percentile and (b) annual maximum series (AMS) extracted from the observed and the bias corrected ERA-20c daily precipitation over all 48 stations.

Figure 10

Scatter plots for (a) the extreme rainfalls over the 99th percentile and (b) annual maximum series (AMS) extracted from the observed and the bias corrected ERA-20c daily precipitation over all 48 stations.

Apart from evaluating the models in the extreme cases, it is important to ensure that the proposed bias correction model with the GPD can reproduce the mean values as well. Again, we evaluate both the monthly mean and 10-day moving mean of the corrected daily precipitation as shown in Figure 11 and Table 4. For the monthly mean, gQM and gpQM99 give the best performance (Figure 11(a)), leading to the highest efficiency for NSE of 0.997 for both methods, and the lowest RMSE, about 4.77–5.12 mm/month, respectively (Table 4). For gpQM95, the efficiency for NSE is close to one, but the RMSE, at 9.41 mm/month, is nearly twice those of gQM and gpQM99. In terms of the 10-day moving mean, the results have shown that all QM approaches work equally well, although gpQM99 offers the best performance (Table 4). More generally, the gpQM99 approach can effectively correct the biases associated with the upper tails of the distribution without a loss in the efficiency of the bias correction process.

Table 4

A comparison of the mean values between the observed and modelled data (i.e. the corrected ERA-20c by gQM, gpQM95 and gpQM99, and the uncorrected ERA-20c)

Data Measures gQM gpQM95 gpQM99 ERA-20c 
Monthly mean (mm/month) RMSE (mm) 4.77 9.41 5.12 15.59 
NSE 0.997 0.988 0.997 0.968 
10-days running mean (mm/day) RMSE (mm) 0.507 0.545 0.497 0.563 
NSE 0.970 0.966 0.971 0.963 
Data Measures gQM gpQM95 gpQM99 ERA-20c 
Monthly mean (mm/month) RMSE (mm) 4.77 9.41 5.12 15.59 
NSE 0.997 0.988 0.997 0.968 
10-days running mean (mm/day) RMSE (mm) 0.507 0.545 0.497 0.563 
NSE 0.970 0.966 0.971 0.963 
Figure 11

A comparison of mean rainfall between the observation and the corrected ERA-20c with different QM approaches. (a) Monthly mean comparison between different QMs and (b) observed 38-year (1973–2010) mean of daily precipitation (yellow bar) and its 10-day running mean (black solid line), along with a set of 10-day running means estimated from bias corrected ERA-20c daily precipitations using three different QM approaches for all 48 stations. Please refer to the online version of this paper to see this figure in color: http://dx.doi:10.2166/nh.2019.127.

Figure 11

A comparison of mean rainfall between the observation and the corrected ERA-20c with different QM approaches. (a) Monthly mean comparison between different QMs and (b) observed 38-year (1973–2010) mean of daily precipitation (yellow bar) and its 10-day running mean (black solid line), along with a set of 10-day running means estimated from bias corrected ERA-20c daily precipitations using three different QM approaches for all 48 stations. Please refer to the online version of this paper to see this figure in color: http://dx.doi:10.2166/nh.2019.127.

It should be noted that the bias still remains large in the summer season as seen in the 10-day moving mean. The difference was mainly attributed to the discrepancies in the seasonal or monthly distribution of the heavy rainfall events between the observed and modelled data (Nyunt et al. 2016). In other words, there is a clear difference in the monthly number of extreme events over the 95th or 99th thresholds between the observed and ERA-20c (Figure 12), and this is considered to be the main source of the bias in terms of extremes in the intra-seasonal band. The results obtained in these experiments imply that the upper thresholds could be different (or updated) for each month to better represent the intra-seasonal change. On the other hand, estimation of different thresholds on the monthly basis could lead to unreliable estimates of extreme values due to insufficient data for estimating the GPD parameters.

Figure 12

Monthly mean frequency of the heavy rainfalls over the 95th and 99th percentile from the observed (Obs) and ERA-20c daily precipitation. Here, the mean frequency is averaged over 48 stations from 1973 to 2010.

Figure 12

Monthly mean frequency of the heavy rainfalls over the 95th and 99th percentile from the observed (Obs) and ERA-20c daily precipitation. Here, the mean frequency is averaged over 48 stations from 1973 to 2010.

Spatial interpolation on bias correction parameters

The proposed IM-PCM approach is validated by leave-one-out cross validation. In this study, we estimated a set of parameters for the observation of daily precipitation, and the estimated parameters were then used to build contour maps. For extreme values of the interpolated daily precipitation, POTs exceeding a given 99th percentile and AMS were first constructed and compared between three different QM approaches including gQM, gpQM95 and gpQM99. Note again that all results were obtained from the cross-validation procedure having considered different possible samples. As illustrated in Figure 13(a), the corrected extremes using an interpolated set of parameters by IM-PCM showed good agreement with the observed values for the three QMs. Among them, gpQM95 and gpQM99 gave the best performance for the given POTs (Figure 13(a)) with 0.781 for NSE, and 0.714 for gQM. Similar results were obtained for the RMSE. Moreover, the proposed gpQM99 approach using the interpolated parameters was capable of reproducing the AMS with 26.35 mm for RMSE and 0.827 for NSE (Figure 13(b)). However, it should be noted that an increased bias exists, which is largely attributable to the parameter interpolation process. For example, the RMSE in AMS using gpQM99 with IM-PCM increased from 18.80 to 26.35 mm for RMSE when compared with a pointwise bias correction as already seen in Figure 10(b). A similar increase (i.e. 20.77–26.30 mm) was also observed in the gpQM95. Nevertheless, the RMSE for the corrected AMS data by IM-PCM with gpQM99, 26.35 mm, is still smaller than that of the pointwise bias correction from gQM, 28.07 mm.

Figure 13

Scatter plots for (a) the extreme rainfalls over the 99th percentile and (b) annual maximum series (AMS) extracted from the observed and the bias corrected ERA-20c daily precipitation over all 48 stations. All the results presented here are obtained by leave-one-out cross validation.

Figure 13

Scatter plots for (a) the extreme rainfalls over the 99th percentile and (b) annual maximum series (AMS) extracted from the observed and the bias corrected ERA-20c daily precipitation over all 48 stations. All the results presented here are obtained by leave-one-out cross validation.

In terms of the mean precipitation, the monthly mean and 10-day moving average of bias corrected rainfall using a set of parameters obtained from IM-PCM were evaluated (Figure 14 and Table 5). Although all three QM approaches yielded slightly different estimates, overall favorable performance was obtained for the monthly mean with a model efficiency over 0.98 for NSE. Among the options, gQM and gpQM99 performed the best and showed the lowest RMSE (Figure 14(a) and Table 5). Figure 14(b) shows a similar result for the 10-day moving average with an efficiency over 0.96 for NSE.

Table 5

A comparison of the mean values between the observed and the modelled precipitation for three different approaches by using a set of parameters interpolated from IM-PCM within the leave-one-out cross validation framework

Data Measures gQM gpQM95 gpQM99 ERA-20c 
Monthly mean (mm/month) RMSE (mm) 4.14 10.31 5.27 15.59 
NSE 0.998 0.986 0.996 0.968 
10-days running mean (mm/day) RMSE (mm) 0.502 0.562 0.498 0.563 
NSE 0.971 0.963 0.971 0.963 
Data Measures gQM gpQM95 gpQM99 ERA-20c 
Monthly mean (mm/month) RMSE (mm) 4.14 10.31 5.27 15.59 
NSE 0.998 0.986 0.996 0.968 
10-days running mean (mm/day) RMSE (mm) 0.502 0.562 0.498 0.563 
NSE 0.971 0.963 0.971 0.963 
Figure 14

A comparison of cross validation results for the mean rainfall between the observation and the corrected ERA-20c with different QM approaches. (a) Monthly mean comparison between different QMs and (b) observed 38-year (1973–2010) mean of daily precipitation (yellow bar) and its 10-day running mean (black solid line), along with a set of 10-day running means estimated from bias corrected ERA-20c daily precipitations using three different QM approaches for all 48 stations. All the results presented here are obtained by leave-one-out cross validation. Please refer to the online version of this paper to see this figure in color: http://dx.doi:10.2166/nh.2019.127.

Figure 14

A comparison of cross validation results for the mean rainfall between the observation and the corrected ERA-20c with different QM approaches. (a) Monthly mean comparison between different QMs and (b) observed 38-year (1973–2010) mean of daily precipitation (yellow bar) and its 10-day running mean (black solid line), along with a set of 10-day running means estimated from bias corrected ERA-20c daily precipitations using three different QM approaches for all 48 stations. All the results presented here are obtained by leave-one-out cross validation. Please refer to the online version of this paper to see this figure in color: http://dx.doi:10.2166/nh.2019.127.

For a more specific analysis in each weather station in the context of cross validation, we generated a map showing the spatial errors in both AMS rainfalls and mean. The AMS errors were evaluated by RMSE and NSE in Figure 15. For the mean, we additionally evaluated the IM-PCM method by estimating the relative error between the observed and modelled precipitation in Figure 16. As shown in the figures, for the AMS rainfalls, gpQM95 and gpQM99 generally perform well except for a few stations. Most stations showed NSE over 0.8 and RMSE less than 30 mm. For the mean daily rainfall, the relative errors are generally below 10%. Given these results, the proposed gpQM approaches, especially for gpQM99, with IM-PCM can effectively rectify the spatial-temporal bias of the ERA-20c model data without a loss in efficiency for the mean values. The interpolated parameter of the transfer function including the wet-day frequency over each of these grids covered the South Korea in its entirety and can be interpreted as an approximation of the observed rainfall over that grid. The accuracy of the interpolated parameters (or rainfall estimates) are largely affected by potential bias associated with spatial interpolation and inadequate sampling of rain gauges. We acknowledge that the potential bias in the interpolated rainfall estimates can be attributed to a limited number of rain gauges and the systematic bias in the rainfall scenarios.

Figure 15

Cross validation results of the IM-PCM for the annual maximum series rainfalls of the bias corrected data by QM approaches (gQM, gpQM95 and gpQM99) over 48 grid points. (a) Nash–Sutcliffe efficiency (NSE) and (b) root-mean-square-error (RMSE).

Figure 15

Cross validation results of the IM-PCM for the annual maximum series rainfalls of the bias corrected data by QM approaches (gQM, gpQM95 and gpQM99) over 48 grid points. (a) Nash–Sutcliffe efficiency (NSE) and (b) root-mean-square-error (RMSE).

Figure 16

Relative error of the bias-corrected mean rainfalls by QM approaches (gQM, gpQM95 and gpQM99) in 48 grid points compared with the corresponding in-situ measurements.

Figure 16

Relative error of the bias-corrected mean rainfalls by QM approaches (gQM, gpQM95 and gpQM99) in 48 grid points compared with the corresponding in-situ measurements.

It is well known that precipitation is mainly influenced by the topology in mountainous areas, so numerous studies have used elevation as an exogenous factor for rainfall interpolation (Goovaerts 2000; Lloyd 2005; Adhikary et al. 2017). We therefore explored the relationship between the elevation and parameters for all 48 stations. As summarized in Table 6, the Pearson correlation r-values were not statistically significant, leading to a weak dependence between the elevation and parameters. The results imply that the elevation may not be important in terms of the interpolation of the parameter. In summary, the proposed interpolation scheme for the QM approach provided bias corrected long-term precipitation data, especially for ungauged catchments. On the other hand, the proposed approach was easy to use and may help to reduce bias associated with the interpolation of daily precipitation. Moreover, this approach can be further used to obtain a century-long daily precipitation series over the Korean peninsula, which could be useful in terms of reducing uncertainty in the parameter estimation of rainfall frequency analysis.

Table 6

Pearson correlation coefficients (r) between elevations and parameters for gQM, gpQM95 and gpQM99 for all 48 stations

Type Gamma Distribution
 
GPD
 
Para. r
 
Para. r 
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
gQM  −0.40 −0.14 0.06 0.18 0.07 0.16 0.22 0.06 0.15 0.00 −0.06 −0.14  − 
gpQM95 −0.37 −0.13 0.05 0.17 0.09 0.19 0.26 0.09 0.15 0.12 −0.13 −0.18 −0.01 
gpQM99 −0.40 −0.14 0.06 0.18 0.07 0.16 0.24 0.08 0.14 0.03 −0.08 −0.14 −0.05 
gQM  0.09 −0.15 −0.25 −0.22 −0.14 −0.20 −0.11 −0.11 −0.02 0.17 −0.02 −0.11  − 
gpQM95 0.02 −0.16 −0.22 −0.23 −0.20 −0.25 −0.18 −0.14 −0.08 −0.10 0.02 −0.08 −0.05 
gpQM99 0.09 −0.14 −0.25 −0.23 −0.17 −0.21 −0.13 −0.16 −0.03 0.09 −0.03 −0.11 −0.01 
Type Gamma Distribution
 
GPD
 
Para. r
 
Para. r 
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
gQM  −0.40 −0.14 0.06 0.18 0.07 0.16 0.22 0.06 0.15 0.00 −0.06 −0.14  − 
gpQM95 −0.37 −0.13 0.05 0.17 0.09 0.19 0.26 0.09 0.15 0.12 −0.13 −0.18 −0.01 
gpQM99 −0.40 −0.14 0.06 0.18 0.07 0.16 0.24 0.08 0.14 0.03 −0.08 −0.14 −0.05 
gQM  0.09 −0.15 −0.25 −0.22 −0.14 −0.20 −0.11 −0.11 −0.02 0.17 −0.02 −0.11  − 
gpQM95 0.02 −0.16 −0.22 −0.23 −0.20 −0.25 −0.18 −0.14 −0.08 −0.10 0.02 −0.08 −0.05 
gpQM99 0.09 −0.14 −0.25 −0.23 −0.17 −0.21 −0.13 −0.16 −0.03 0.09 −0.03 −0.11 −0.01 

The bias correction methods developed in this study both statistically improved the quality of the data and could extend daily precipitation over the 20th century in South Korea. More specifically, this study further utilizes the derived transfer function for the baseline period 1973–2010 to provide the daily precipitation for the period 1900–2010 under the stationary assumption. Finally, we explored changes in the mean and extreme using the gpQM99 approach for three different periods, 1900–1972, 1973–2010 and 1900–2010, in the context of a retrospective analysis. As shown in Figure 17(a), the evaluation results for the monthly mean show a very noticeable and sudden increase in the recent period, especially for the summer season (July–September), while no significant changes were observed for the dry season (October–April). Figure 17(b) shows boxplots representing a distribution of the AMS for the three periods. The distribution of the AMS derived from the gpQM99 approach for the period 1973–2010 was almost identical to that of the observed, which indicates that the proposed gpQM99 was capable of reproducing the extremes of daily precipitations. As expected from the changes in summer rainfall, the distribution of the AMS for the recent period 1973–2010 is much wider than that of the period 1900–1972 (i.e. gpQM99-1), especially for the upper tail of the distribution. This may lead to an increase in design rainfalls for a specific return period. On the other hand, the distribution of the AMS for the entire period 1900–2010 is quite similar to that of the observed in terms of median AMS, while its range is relatively narrower than the recent period.

Figure 17

A retrospective analysis for a comparison between the observed precipitation (1973–2010) and the corrected ERA-20c by gpQM99 with three different periods: 1900–1972 (gpQM99-1), 1973–2010 (gpQM99-2) and 1900–2010 (gpQM99-3). (a) Monthly mean rainfalls and (b) box plot of the annual maximum series (AMS) rainfalls.

Figure 17

A retrospective analysis for a comparison between the observed precipitation (1973–2010) and the corrected ERA-20c by gpQM99 with three different periods: 1900–1972 (gpQM99-1), 1973–2010 (gpQM99-2) and 1900–2010 (gpQM99-3). (a) Monthly mean rainfalls and (b) box plot of the annual maximum series (AMS) rainfalls.

CONCLUSIONS

The main objective of this study was to explore the century-long reanalysis data, ERA-20c, especially for daily precipitation over South Korea in the context of bias correction. We first investigated the utility of the ERA-20c data as proxy data over South Korea for hydrological applications and further examined several issues concerning the aspects of the bias correction that influence the use of modelled data in practice. In general, we found that there is fairly good agreement between the observed and the ERA reanalysis data for the baseline period 1973–2010. On the one hand, the results obtained here have shown that the ERA-20c precipitation data still have their own systematic biases, particularly in the frequency of wet-days and the extreme upper tail of the distribution. More specifically, the over-pronounced frequency of wet-days and the considerable underestimation of daily precipitation have been identified in the ERA-20c over South Korea. Given these results, we proposed a two-stage bias correction approach to daily precipitation, which is comprised of two distinct parts: a model for adjusting the overestimated wet-day frequency and a model for reducing the biases associated with extreme values. To adjust the wet-day frequency, we explored four different thresholds through an experiment with the QM approach. In terms of extremes, a composite Gamma-GPD distribution based QM approach was introduced. Finally, we proposed an IM-PCM approach as an alternative to constructing the transfer function for the ungauged basin. The key findings obtained in this analysis are summarized as follows:

  • 1.

    Our findings are consistent with the notion that the mean daily precipitation is reproduced well by the reanalysis. Our study also confirms that the mean and annual cycle of daily precipitation as observed over South Korea is well simulated by the ERA-20c reanalysis. However, considerable underestimation of the daily maximum precipitation was consistently seen in the ERA-20c, especially during the summer season. The results presented here illustrate that the heavy rainfalls in the summer season could be significantly underestimated by the current climate modelling system, although the reanalysis system adequately reproduces the mean climate of the historical period. Another issue with respect to the evaluation of ERA-20c daily precipitation is related to the much higher frequency of wet-days than that of the observed, which may in turn influence the underestimation of the extremes.

  • 2.

    In this study, a two-stage bias correction approach to the ERA-20c precipitation was proposed to adjust the overestimated wet-day frequency and the biases associated with the upper tail of the distribution. In terms of the wet-day frequency, we examined four different types of thresholds (i.e. TH1, TH2, TH3 and TH4) to identify an optimal threshold. TH4 is the case where the frequency of wet-days of ERA-20c is set to that of the observed and produces the best results among the four. Moreover, TH4 is allowed to have different thresholds for each month, unlike the other three approaches (i.e. TH1, TH2 and TH3) in which a fixed value was assumed over all the months for all the stations. Our results offer insights on how inappropriate thresholds for the wet-day frequency may significantly influence the bias correction results. To better represent the bias in the extreme rainfall, we proposed a composite distribution based QM approach, which consists of the gamma distribution and GPD for the two thresholds (i.e. the 95th and 99th percentiles). Given the efficiency gains, this study suggests that the gpQM approach is more appropriate to reduce the systematic errors in estimating extreme rainfalls than gQM. To be more specific, the gpQM99 approach can effectively reduce the biases in the upper tails of the distribution without a loss of efficiency in the overall bias correction process. However, a large bias still exists in the summer season, and thus the bias in extreme rainfall that the qpQM99 offers in the process of bias correction suggests that the ERA-20c data may be insufficient in terms of reflecting the specific regional patterns associated with extreme rainfall over South Korea.

  • 3.

    We explored an alternative to obtain the transfer function of the QM approach for the ungauged catchments in the context of the cross-validation process. From this perspective, we have proposed an interpolation method based on parameter contour maps (IM-PCM), which is based on the interpolation of the five parameters over the entire region of interest. The corrected daily precipitation series using an interpolated set of parameters by the IM-PCM showed good agreement with the observed precipitation, and particularly the proposed gpQM99 with the IM-PCM performs the best in terms of reducing the spatial-temporal bias of the ERA-20c model data without a loss of efficiency. We finally utilized the derived transfer function for the baseline period 1973–2010 to extend the daily precipitation for the period 1900–2010 under the stationary assumption, and we examined the changes in daily precipitation for three different periods, 1900–1972, 1973–2010 and 1900–2010, as a retrospective analysis. We found that a very noticeable and sudden increase in the recent period was observed during the summer season (July–September).

The findings demonstrated in this study help to understand the knowledge gaps about the bias correction of the century-long reanalysis, ERA-20c, as well as the key characteristics of daily precipitation over South Korea. Further, the results obtained here can provide a useful perspective on the bias correction of the modelled data in the reanalysis and regional climate modelling systems for the regional-scale analysis with a limited network of rainfall stations. The impact of climate change on water resources using the extended daily precipitation data for the period 1900–2010 will be explored further. Although the study has been carried out in South Korea, the methodology has the potential to be applied in other parts of the world. We hope this paper will stimulate the hydrometeorological community to explore the issues raised in the long-term reanalysis data in other countries under different climate and geographical conditions.

ACKNOWLEDGEMENTS

The first author is funded by the Government of South Korea for carrying out his doctoral studies at the University of Bristol. We are grateful for the relevant data provided by KMA and ECMWF. The second author is supported by a grant (17AWMP-B121100-02) from Advanced Water Management Research Program (AWMP) funded by Ministry of Land, Infrastructure and Transport of the Korean Government. The abbreviations and symbols used in this study are listed in Appendix B (available with the online version of this paper).

REFERENCES

REFERENCES
Acero
F. J.
,
García
J. A.
&
Gallego
M. C.
2011
Peaks-over-threshold study of trends in extreme rainfall over the Iberian Peninsula
.
J. Clim.
24
(
4
),
1089
1105
.
Adhikary
S. K.
,
Muttil
N.
&
Yilmaz
A. G.
2017
Cokriging for enhanced spatial interpolation of rainfall in two Australian catchments
.
Hydrol. Process.
31
(
12
),
2143
2161
.
Befort
D. J.
,
Wild
S.
,
Kruschke
T.
,
Ulbrich
U.
&
Leckebusch
G. C.
2016
Different long-term trends of extra-tropical cyclones and windstorms in ERA-20C and NOAA-20CR reanalyses
.
Atmos. Sci. Lett.
17
(
11
),
586
595
.
Betts
A. K.
&
Beljaars
A. C. M.
2017
Analysis of near-surface biases in ERA-Interim over the Canadian Prairies
.
J. Adv. Model. Earth Syst.
9
(
5
),
2158
2173
.
Bosilovich
M. G.
,
Chen
J.
,
Robertson
F. R.
&
Adler
R. F.
2008
Evaluation of global precipitation in reanalyses
.
J. Appl. Meteorol. Climatol.
47
(
9
),
2279
2299
.
Brands
S.
,
Gutiérrez
J. M.
,
Herrera
S.
&
Cofiño
A. S.
2012
On the use of reanalysis data for downscaling
.
J. Clim.
25
(
7
),
2517
2526
.
Chan
S. C.
,
Kendon
E. J.
,
Roberts
N. M.
,
Fowler
H. J.
&
Blenkinsop
S.
2015
Downturn in scaling of UK extreme rainfall with temperature for future hottest days
.
Nat. Geosci.
9
(
1
),
24
28
.
Coles
S. G.
2001
An Introduction to Statistical Modeling of Extreme Values
.
Springer
,
London
.
Compo
G. P.
,
Whitaker
J. S.
,
Sardeshmukh
P. D.
,
Matsui
N.
,
Allan
R. J.
,
Yin
X.
,
Gleason
B. E.
,
Vose
R. S.
,
Rutledge
G.
&
Bessemoulin
P.
2011
The twentieth century reanalysis project
.
Quarterly Journal of the Royal Meteorological Society
137
(
654
),
1
28
.
Dee
D. P.
,
Uppala
S. M.
,
Simmons
A. J.
,
Berrisford
P.
,
Poli
P.
,
Kobayashi
S.
,
Andrae
U.
,
Balmaseda
M. A.
,
Balsamo
G.
&
Bauer
P.
2011
The ERA-Interim reanalysis: configuration and performance of the data assimilation system
.
Q. J. R. Meteorol. Soc.
137
(
656
),
553
597
.
de Leeuw
J.
,
Methven
J.
&
Blackburn
M.
2015
Evaluation of ERA-Interim reanalysis precipitation products using England and Wales observations
.
Q. J. R. Meteorol. Soc.
141
(
688
),
798
806
.
Donat
M. G.
,
Alexander
L. V.
,
Herold
N.
&
Dittus
A. J.
2016
Temperature and precipitation extremes in century-long gridded observations, reanalyses, and atmospheric model simulations
.
J. Geophys. Res. Atmos.
121
(
19
),
11,174
11,189
.
Frank
C. W.
,
Wahl
S.
,
Keller
J. D.
,
Pospichal
B.
,
Hense
A.
&
Crewell
S.
2018
Bias correction of a novel European reanalysis data set for solar energy applications
.
Solar Energy
164
(
December 2017
),
12
24
.
Gao
L.
,
Bernhardt
M.
,
Schulz
K.
,
Chen
X. W.
,
Chen
Y.
&
Liu
M. B.
2016
A first evaluation of ERA-20CM over China
.
Month. Weather Rev.
144
(
1
),
45
57
.
Haerter
J. O.
,
Eggert
B.
,
Moseley
C.
,
Piani
C.
&
Berg
P.
2015
Statistical precipitation bias correction of gridded model data using point measurements
.
Geophys. Res. Lett.
42
(
6
),
1919
1929
.
Hersbach
H.
,
Peubey
C.
,
Simmons
A.
,
Berrisford
P.
,
Poli
P.
&
Dee
D.
2015
ERA-20CM: a twentieth-century atmospheric model ensemble
.
Q. J. R. Meteorol. Soc.
141
(
691
),
2350
2375
.
Hundecha
Y.
,
Pahlow
M.
&
Schumann
A.
2009
Modeling of daily precipitation at multiple locations using a mixture of distributions to characterize the extremes
.
Water Resour. Res.
45
(
w12412
),
1
15
.
IPCC
.
2014
Climate Change 2014 – Impacts, Adaptation and Vulnerability: Regional Aspects
.
Cambridge University Press
,
Cambridge
.
Kim
D.-I.
&
Han
D.
2018
Comparative study on long term climate data sources over South Korea
.
J. Water Clim. Change
(in press).
Macias
D.
,
Garcia-Gorriz
E.
,
Dosio
A.
,
Stips
A.
&
Keuler
K.
2018
Obtaining the correct sea surface temperature: bias correction of regional climate model data for the Mediterranean Sea
.
Clim. Dyn.
51
(
3
),
1095
1117
.
Manton
M. J.
,
Haylock
M. R.
,
Hennessy
K. J.
,
Nicholls
N.
,
Chambers
L. E.
,
Collins
D. A.
,
Daw
G.
,
Finet
A.
,
Gunawan
D.
,
Inape
K.
,
Kestin
T. S.
,
Lefale
P.
,
Leyu
C. H.
,
Lwin
T.
,
Maitrepierre
L.
,
Ouprasitwong
N.
,
Page
C. M.
,
Pahalad
J.
,
Plummer
N.
,
Salinger
M. J.
,
Suppiah
R.
,
Tran
V. L.
,
Trewin
B.
,
Tibig
I.
&
Yee
D.
2001
Trends in extreme daily rainfall and temperature in southeast Asia and the South Pacific: 1961–1998
.
Int. J. Climatol.
21
,
269
284
.
Maraun
D.
&
Widmann
M.
2018
Statistical Downscaling and Bias Correction for Climate Research
.
Cambridge University Press
,
Cambridge
.
Mohanty
S.
,
Jha
M. K.
,
Raul
S. K.
,
Panda
R. K.
&
Sudheer
K. P.
2015
Using artificial neural network approach for simultaneous forecasting of weekly groundwater levels at multiple sites
.
Water Resour. Manage.
29
(
15
),
5521
5532
.
Nelson
G. C.
,
Rosegrant
M. W.
,
Koo
J.
,
Robertson
R.
,
Sulser
T.
,
Zhu
T.
,
Ringler
C.
,
Msangi
S.
,
Palazzo
A.
&
Batka
M.
2009
Climate Change: Impact on Agriculture and Costs of Adaptation
.
International Food Policy Research Institute
,
Washington, DC
.
Nyunt
C. T.
,
Koike
T.
&
Yamamoto
A.
2016
Statistical bias correction for climate change impact on the basin scale precipitation in Sri Lanka, Philippines, Japan and Tunisia
.
Hydrol. Earth Syst. Sci. Discuss.
DOI: 10.5194/hess-2016-14
.
Patz
J. A.
,
Campbell-Lendrum
D.
,
Holloway
T.
&
Foley
J. A.
2005
Impact of regional climate change on human health
.
Nature
438
(
7066
),
310
317
.
Piani
C.
,
Haerter
J. O.
&
Coppola
E.
2010
Statistical bias correction for daily precipitation in regional climate models over Europe
.
Theor. Appl. Climatol.
99
(
1
),
187
192
.
Poli
P.
,
Hersbach
H.
,
Tan
D.
,
Dee
D.
,
Thépaut
J.-N.
,
Simmons
A.
,
Peubey
C.
,
Laloyaux
P.
,
Komori
T.
,
Berrisford
P.
&
Dragani
R.
2013
The data assimilation system and initial performance evaluation of the ECMWF pilot reanalysis of the 20th-century assimilating surface observations only (ERA-20C). European Centre for Medium Range Weather Forecasts
.
ERA Rep. Series
14
,
62
.
Poli
P.
,
Hersbach
H.
,
Dee
D. P.
,
Berrisford
P.
,
Simmons
A. J.
,
Vitart
F.
,
Laloyaux
P.
,
Tan
D. G. H.
,
Peubey
C.
&
Thépaut
J.-N.
2016
ERA-20C: an atmospheric reanalysis of the twentieth century
.
J. Clim.
29
(
11
),
4083
4097
.
Simmons
A. J.
,
Poli
P.
,
Dee
D. P.
,
Berrisford
P.
,
Hersbach
H.
,
Kobayashi
S.
&
Peubey
C.
2014
Estimating low-frequency variability and trends in atmospheric temperature using ERA-Interim
.
Q. J. R. Meteorol. Soc.
140
(
679
),
329
353
.
Volosciuk
C.
,
Maraun
D.
,
Vrac
M.
&
Widmann
M.
2017
A combined statistical bias correction and stochastic downscaling method for precipitation
.
Hydrol. Earth Syst. Sci.
21
(
3
),
1693
1719
.
Vörösmarty
C. J.
,
Green
P.
,
Salisbury
J.
&
Lammers
R. B.
2000
Global water resources: vulnerability from climate change and population growth
.
Science
289
(
5477
),
284
288
.
Vrac
M.
&
Naveau
P.
2007
Stochastic downscaling of precipitation: from dry events to heavy rainfalls
.
Water Resour. Res.
43
(
w07402
),
1
13
.
Wilson
P. S.
&
Toumi
R.
2005
A fundamental probability distribution for heavy rainfall
.
Geophys. Res. Lett.
32
(
14
),
1
4
.

Supplementary data