Abstract
Long-term precipitation data plays an important role in climate impact studies, but the observation for a given catchment is very limited. To significantly expand our sample size for the extreme rainfall analysis, we considered ERA-20c, a century-long reanalysis daily precipitation provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). Preliminary studies have already indicated that ERA-20c can reproduce the mean reasonably well, but rainfall intensity is underestimated while wet-day frequency is overestimated. Thus, we first adopted a relatively simple approach to adjust the frequency of wet-days by imposing an optimal threshold. Moreover, we introduced a quantile mapping approach based on a composite distribution of a generalized Pareto distribution for the upper tail (e.g. 95th and 99th percentile), and a gamma distribution for the interior part of the distribution. The proposed composite distributions provide a significant reduction of the biases over the conventional method for the extremes. We suggested an interpolation method for the set of parameters of bias correction approach in ungauged catchments. A comparison of the corrected precipitation using spatially interpolated parameters shows that the proposed modelling scheme, particularly with the 99th percentile, can reliably reduce the systematic bias.
INTRODUCTION
Recent studies have documented that long-term climate change has impacted a wide range of fields such as agriculture, environment, health, economy and water resources (Vörösmarty et al. 2000; Patz et al. 2005; Nelson et al. 2009; IPCC 2014). A long-term change in climate variables such as precipitation and temperature can affect the growth of crops, ecosystems, human diseases, and water-related hazards. Of these impacts, water related hazards are closely linked to changes in rainfall intensity, which are of primary concern to water resource managers.
To systematically assess water resources and water related hazards, it is necessary to collect reliable long-term climate data. Locally recorded data have played an important role, and they have been considered to be accurate values in the modelling process. However, it has been widely acknowledged that the use of observed climate data is affected by the lack of spatial/temporal coverage, and long-term climate data are not readily available in many countries around the world. A primary strength of the reanalysis data is that compared with observation, they provide spatially finer scale climate data over a longer period, a few of which can cover the whole 20th century. For example, the National Oceanic and Atmospheric Administration (NOAA) has produced the 20th century reanalysis (20cR) spanning from 1850 to 2014, and the European Centre for Medium-Range Weather Forecasts (ECMWF) has also released century-long datasets such as the ECMWF 20th century atmospheric model ensemble (ERA-20 cm) and ECWMF 20th century assimilation surface observations only (ERA-20c), which cover the years from 1900 to 2010 (Compo et al. 2011; Hersbach et al. 2015; Poli et al. 2016). All of them can globally provide daily or sub-daily scale precipitation data, but differences exist in the assimilation techniques and spatial-temporal resolution. The products from the ECMWF (such as ERA-20c and ERA-20 cm), are based on the Integrated Forecasting System version Cy38r1 with 0.125° spatial resolution, which are more relevant in regional-scale studies in South Korea due to their higher spatial resolution. The difference between ERA-20c and ERA-20 cm is that the former assimilates pressure and wind observations but the latter does not consider them in the modelling process (Hersbach et al. 2015; Donat et al. 2016; Poli et al. 2016). Therefore, ERA-20 cm is limited in reproducing the actual synoptic situation (Hersbach et al. 2015; Gao et al. 2016). On the other hand, NOAA-20cR was processed by an Ensemble Kalman Filter technique (Compo et al. 2011), but its spatial resolution (i.e. 1.875 × 1.9°) is much coarser than the other century-long reanalysis data. Under these conditions, this study has selected the ERA-20c daily precipitation data with 0.125 × 0.125° spatial resolution, as an alternative for the observation of precipitation over South Korea.
However, although substantial improvements have been made in the modeling process, previous studies have shown that reanalysis datasets still have their own systematic errors which vary in space and time (Bosilovich et al. 2008; Ma et al. 2009; Bao & Zhang 2013; Gao et al. 2016; Kim & Han 2018). It is also clear that century-long reanalysis data may misrepresent long-term climatic trends or synoptic scale variability, especially for the first half of the twentieth century, and there exists the difference in temporal variability (Brands et al. 2012; Krueger et al. 2013; Poli et al. 2013; Befort et al. 2016; Donat et al. 2016). However, there are limited studies on bias correction for long-term daily reanalysis precipitation data in hydrologic applications. Most of the existing studies have been performed mainly within the context of comparison across different reanalysis data, but not bias correction technique issues (Befort et al. 2016; Donat et al. 2016; Poli et al. 2016). Thus, to better understand the biases and their roles in hydrologic applications, this study focuses on exploring bias correction methods, especially for extreme value analysis associated with the sampling error in rainfall frequency analysis, in a certain area with a spatio-temporally sparse observation network.
The underlying concepts for the bias correction approach vary from a simple delta change (or mean bias correction) to a quantile mapping (QM) or multivariate approach based on a copula-based technique (Teutschbein & Seibert 2012; Haerter et al. 2015; Mao et al. 2015; Vrac & Friederichs 2015; Gao et al. 2016; Maraun 2016; Nyunt et al. 2016; Frank et al. 2018; Macias et al. 2018). For instance, Frank et al. (2018) applied a scaling approach based on orthogonal distance regressions for bias correction of a European reanalysis data. Macias et al. (2018) employed a simple bias-correction approach using a linear transfer function between the cumulative distribution functions (CDFs) of the modeled and observed atmospheric variables. Vrac & Friederichs (2015) proposed a multivariate bias correction scheme based on a copula concept. Although each method has its own merits and limitations, previous studies have shown that bias correction methods were generally capable of reducing systematic errors in numerical model outputs and, among them, QM showed better performance than other approaches, especially for precipitation (Teutschbein & Seibert 2012; Themeßl et al. 2012; Fang et al. 2015; Maraun & Widmann 2018). The QM method, referred to as “distribution mapping” or “probability mapping”, was used to rectify the cumulative distribution of the modelled data against that of the observed data by employing a transfer function.
However, there are two main drawbacks to the QM approach based on a gamma distribution (gQM). First, it has been acknowledged that gQM often fails to reproduce extreme rainfalls, which are mainly described by the upper tail of the distribution (Wilks 1999; Vrac & Naveau 2007; Hundecha et al. 2009; Volosciuk et al. 2017). In other words, the gQM approach may result in misrepresentation of the upper tail of the distribution, which, in turn, can lead to underestimation of the design rainfalls. On the one hand, one may intuitively consider the heavy tailed distributions such as extreme value distribution (e.g. Gumbel distribution, generalized extreme value distribution and Weibull distribution). On the other hand, the heavy tailed distribution for the bias correction may result in overestimation of daily rainfall in the lower tail of the distribution. In these contexts, a composite distribution including the mixture distribution (such as the Pareto mixture distribution) has been applied to the quantile mapping approach, especially for the correction of climate change scenarios (Gutjahr & Heinemann 2013; Smith et al. 2014; Nyunt et al. 2016; Volosciuk et al. 2017). Comparatively little attention has been given to the bias correction of the century-long reanalysis like ERA-20c. In these contexts, this study aims to introduce a quantile mapping approach based on a composite distribution of a generalized Pareto distribution (GPD) for the upper tail (e.g. 95th and 99th percentile) and a gamma distribution for the interior part of the distribution.
The conventional QM method is also limited in that it cannot be applied directly to the ungauged basin, where a one-to-one mapping between the observed and the modelled data does not exist. More specifically, only a transfer function of a set of grid points for the paired precipitation data can be obtained. Thus, an alternative method for the synthesis of unpaired data needs to be established. The general approaches to the interpolation of in-situ data for the quantile mapping are the inverse distance weighting (IDW) and the kriging method, and the interpolated values can then be used to obtain the transfer function for the ungauged basin. For example, Gutjahr & Heinemann (2013) applied the IDW method to produce spatially continuous estimates of the daily precipitation for the spatial bias correction. However, the systematic error in the process of the spatial interpolation of daily rainfall can be propagated through to the parameter estimation in the quantile mapping approach. Thus, a primary question in the statistical bias correction analysis is whether the QM method can reliably improve ERA-20c daily precipitation, especially for extreme values, over 100 years when including the ungauged sites.
From this background, this study mainly focuses on exploring the following questions:
What are the characteristics of the uncertainty associated with the ERA-20c daily precipitation data in South Korea? Do the reanalysis data well describe the statistical properties in terms of the extreme as well as the mean values?
How well does the traditional QM method approach perform on the reanalysis data? Can a combined distribution based bias correction be more effective for the reduction of the systematic error compared with the bias correction approach based on a single distribution (gQM)?
How can we effectively extend the combined distribution approach to the spatial bias correction for ungauged catchments? Can the proposed scheme facilitate a reconstruction of long-term precipitation, especially for the estimation of annual maximum series (AMS) of daily precipitation?
To address these questions, we investigated the bias correction in three phases. First, we attempted to understand the statistical behavior of the ERA-20c data and further analyze the biases and errors in the reanalysis mean and extreme precipitation. Second, the QM approach was explored by using a combined Gamma-Pareto distribution in the bias correction method to better represent the upper tail of the distribution for 48 stations for the baseline period 1973–2010. The corrected data for the proposed approach were then compared with that of the observed. Finally, we proposed a spatial bias correction approach based on the parameter contour maps (IM-PCM). The correction approach consists of three steps for ungauged catchments. The reanalysis data and observed precipitation are summarized in the next section, and the theoretical background for the proposed bias correction approach follows. The proposed model was applied to the daily rainfall data for the baseline period and a retrospective analysis of the data was then conducted for the estimation of AMS rainfalls in the Results and discussion section. Finally, concluding remarks are provided in the last section.
STUDY AREA AND DATA
Study area and local gauged data
South Korea is located in the northeast part of Asia, and lies between latitudes 33–39°N and longitudes 125–132°E, including all the islands. The total area is approximately 100,032 km2, and its annual average rainfall is about 1,277 mm. In South Korea, there are hundreds of local weather stations available. However, most of them were installed after 1970, and only a few stations provide long-term daily precipitation records for more than 40 years. The observed daily precipitation sequences were obtained and compiled from the Korea Meteorological Administration (KMA). The location of the study area and the local gauging stations used in this study are illustrated in Figure 1, and the details for the stations are summarized in Table 1.
Station no. . | Name . | Latitude (°N) . | Longitude (°E) . | Elevation (m.asl) . | Annual rainfall (mm)a . |
---|---|---|---|---|---|
St. 1 | Sokcho | 38.2508 | 128.5644 | 19.5 | 1,374.6 |
St. 2 | Daegwallyeong | 37.6769 | 128.7181 | 774.0 | 1,736.4 |
St. 3 | Chuncheon | 37.9025 | 127.7356 | 79.1 | 1,304.9 |
St. 4 | Gangneung | 37.7514 | 128.8908 | 27.4 | 1,436.6 |
St. 5 | Seoul | 37.5714 | 126.9656 | 11.1 | 1,386.8 |
St. 6 | Incheon | 37.4775 | 126.6247 | 69.6 | 1,183.0 |
St. 7 | Wonju | 37.3375 | 127.9464 | 150.0 | 1,318.6 |
St. 8 | Suwon | 37.2700 | 126.9875 | 38.3 | 1,274.9 |
St. 9 | Chungju | 36.9700 | 127.9525 | 116.5 | 1,202.0 |
St. 10 | Seosan | 36.7736 | 126.4958 | 30.3 | 1,254.9 |
St. 11 | Cheongju | 36.6361 | 127.4428 | 58.6 | 1,229.7 |
St. 12 | Daejeon | 36.3689 | 127.3742 | 70.3 | 1,353.0 |
St. 13 | Chupungyeong | 36.2197 | 127.9944 | 246.1 | 1,171.5 |
St. 14 | Andong | 36.5728 | 128.7072 | 141.5 | 1,017.3 |
St. 15 | Pohang | 36.0325 | 129.3794 | 3.7 | 1,145.4 |
St. 16 | Gunsan | 36.0019 | 126.7631 | 24.6 | 1,210.8 |
St. 17 | Daegu | 35.8850 | 128.6189 | 65.5 | 1,047.0 |
St. 18 | Jeonju | 35.8214 | 127.1547 | 54.8 | 1,291.6 |
St. 19 | Ulsan | 35.5600 | 129.3200 | 36.0 | 1,265.5 |
St. 20 | Gwangju | 35.1728 | 126.8914 | 73.8 | 1,387.9 |
St. 21 | Busan | 35.1044 | 129.0319 | 71.0 | 1,500.2 |
St. 22 | Mokpo | 34.8167 | 126.3811 | 39.4 | 1,139.4 |
St. 23 | Yeosu | 34.7392 | 127.7406 | 66.0 | 1,420.1 |
St. 24 | Jinju | 35.1636 | 128.0400 | 31.6 | 1,504.8 |
St. 25 | Yangpyeong | 37.4886 | 127.4944 | 49.4 | 1,359.6 |
St. 26 | Icheon | 37.2639 | 127.4842 | 79.4 | 1,330.9 |
St. 27 | Inje | 38.0600 | 128.1669 | 201.6 | 1,167.8 |
St. 28 | Hongcheon | 37.6833 | 127.8803 | 142.3 | 1,353.2 |
St. 29 | Jecheon | 37.1592 | 128.1942 | 265.0 | 1,345.8 |
St. 30 | Boeun | 36.4875 | 127.7339 | 176.4 | 1,275.0 |
St. 31 | Cheonan | 36.7794 | 127.1211 | 24.0 | 1,229.4 |
St. 32 | Boryeong | 36.3269 | 126.5572 | 16.9 | 1,219.6 |
St. 33 | Buyeo | 36.2722 | 126.9206 | 12.7 | 1,323.3 |
St. 34 | Geumsan | 36.1056 | 127.4817 | 171.7 | 1,277.1 |
St. 35 | Buan | 35.7294 | 126.7164 | 13.4 | 1,249.8 |
St. 36 | Imsil | 35.6122 | 127.2853 | 249.3 | 1,340.2 |
St. 37 | Jeongeup | 35.5631 | 126.8658 | 46.0 | 1,317.1 |
St. 38 | Namwon | 35.4053 | 127.3328 | 91.7 | 1,351.0 |
St. 39 | Jangheung | 34.6886 | 126.9194 | 46.4 | 1,493.7 |
St. 40 | Haenam | 34.5533 | 126.5689 | 14.4 | 1,322.4 |
St. 41 | Goheung | 34.6181 | 127.2756 | 54.5 | 1,459.2 |
St. 42 | Yeongju | 36.8717 | 128.5167 | 212.2 | 1,268.1 |
St. 43 | Mungyeong | 36.6272 | 128.1486 | 172.0 | 1,241.5 |
St. 44 | Uiseong | 36.3558 | 128.6883 | 83.2 | 1,016.5 |
St. 45 | Gumi | 36.1306 | 128.3206 | 50.3 | 1,051.1 |
St. 46 | Yeongcheon | 35.9772 | 128.9514 | 95.0 | 1,039.3 |
St. 47 | Geochang | 35.6711 | 127.9108 | 222.4 | 1,298.9 |
St. 48 | Sancheong | 35.4128 | 127.8789 | 0.8 | 1,512.7 |
Station no. . | Name . | Latitude (°N) . | Longitude (°E) . | Elevation (m.asl) . | Annual rainfall (mm)a . |
---|---|---|---|---|---|
St. 1 | Sokcho | 38.2508 | 128.5644 | 19.5 | 1,374.6 |
St. 2 | Daegwallyeong | 37.6769 | 128.7181 | 774.0 | 1,736.4 |
St. 3 | Chuncheon | 37.9025 | 127.7356 | 79.1 | 1,304.9 |
St. 4 | Gangneung | 37.7514 | 128.8908 | 27.4 | 1,436.6 |
St. 5 | Seoul | 37.5714 | 126.9656 | 11.1 | 1,386.8 |
St. 6 | Incheon | 37.4775 | 126.6247 | 69.6 | 1,183.0 |
St. 7 | Wonju | 37.3375 | 127.9464 | 150.0 | 1,318.6 |
St. 8 | Suwon | 37.2700 | 126.9875 | 38.3 | 1,274.9 |
St. 9 | Chungju | 36.9700 | 127.9525 | 116.5 | 1,202.0 |
St. 10 | Seosan | 36.7736 | 126.4958 | 30.3 | 1,254.9 |
St. 11 | Cheongju | 36.6361 | 127.4428 | 58.6 | 1,229.7 |
St. 12 | Daejeon | 36.3689 | 127.3742 | 70.3 | 1,353.0 |
St. 13 | Chupungyeong | 36.2197 | 127.9944 | 246.1 | 1,171.5 |
St. 14 | Andong | 36.5728 | 128.7072 | 141.5 | 1,017.3 |
St. 15 | Pohang | 36.0325 | 129.3794 | 3.7 | 1,145.4 |
St. 16 | Gunsan | 36.0019 | 126.7631 | 24.6 | 1,210.8 |
St. 17 | Daegu | 35.8850 | 128.6189 | 65.5 | 1,047.0 |
St. 18 | Jeonju | 35.8214 | 127.1547 | 54.8 | 1,291.6 |
St. 19 | Ulsan | 35.5600 | 129.3200 | 36.0 | 1,265.5 |
St. 20 | Gwangju | 35.1728 | 126.8914 | 73.8 | 1,387.9 |
St. 21 | Busan | 35.1044 | 129.0319 | 71.0 | 1,500.2 |
St. 22 | Mokpo | 34.8167 | 126.3811 | 39.4 | 1,139.4 |
St. 23 | Yeosu | 34.7392 | 127.7406 | 66.0 | 1,420.1 |
St. 24 | Jinju | 35.1636 | 128.0400 | 31.6 | 1,504.8 |
St. 25 | Yangpyeong | 37.4886 | 127.4944 | 49.4 | 1,359.6 |
St. 26 | Icheon | 37.2639 | 127.4842 | 79.4 | 1,330.9 |
St. 27 | Inje | 38.0600 | 128.1669 | 201.6 | 1,167.8 |
St. 28 | Hongcheon | 37.6833 | 127.8803 | 142.3 | 1,353.2 |
St. 29 | Jecheon | 37.1592 | 128.1942 | 265.0 | 1,345.8 |
St. 30 | Boeun | 36.4875 | 127.7339 | 176.4 | 1,275.0 |
St. 31 | Cheonan | 36.7794 | 127.1211 | 24.0 | 1,229.4 |
St. 32 | Boryeong | 36.3269 | 126.5572 | 16.9 | 1,219.6 |
St. 33 | Buyeo | 36.2722 | 126.9206 | 12.7 | 1,323.3 |
St. 34 | Geumsan | 36.1056 | 127.4817 | 171.7 | 1,277.1 |
St. 35 | Buan | 35.7294 | 126.7164 | 13.4 | 1,249.8 |
St. 36 | Imsil | 35.6122 | 127.2853 | 249.3 | 1,340.2 |
St. 37 | Jeongeup | 35.5631 | 126.8658 | 46.0 | 1,317.1 |
St. 38 | Namwon | 35.4053 | 127.3328 | 91.7 | 1,351.0 |
St. 39 | Jangheung | 34.6886 | 126.9194 | 46.4 | 1,493.7 |
St. 40 | Haenam | 34.5533 | 126.5689 | 14.4 | 1,322.4 |
St. 41 | Goheung | 34.6181 | 127.2756 | 54.5 | 1,459.2 |
St. 42 | Yeongju | 36.8717 | 128.5167 | 212.2 | 1,268.1 |
St. 43 | Mungyeong | 36.6272 | 128.1486 | 172.0 | 1,241.5 |
St. 44 | Uiseong | 36.3558 | 128.6883 | 83.2 | 1,016.5 |
St. 45 | Gumi | 36.1306 | 128.3206 | 50.3 | 1,051.1 |
St. 46 | Yeongcheon | 35.9772 | 128.9514 | 95.0 | 1,039.3 |
St. 47 | Geochang | 35.6711 | 127.9108 | 222.4 | 1,298.9 |
St. 48 | Sancheong | 35.4128 | 127.8789 | 0.8 | 1,512.7 |
aAnnual mean precipitation estimated from 1973 to 2010.
ERA-20c daily precipitation
As previously mentioned in the Introduction, we explored the ERA-20c daily precipitation, which is one of the longest reanalysis data covering the whole 20th century (i.e. 1900–2010) (Donat et al. 2016; Poli et al. 2016). ERA-Interim data has been widely adopted in the field of hydrometeorology among many other reanalysis products (Simmons et al. 2014; de Leeuw et al. 2015; Betts & Beljaars 2017), but the ERA-Interim only covers the data-rich period from 1979 to the present (Dee et al. 2011). In this research, we focused on the ERA-20c data with its highest resolution, 0.125 × 0.125° (approximately 13.8 × 11.2 km), which consists of 603 grid points (http://apps.ecmwf.int/datasets/). The data taken over the sea were excluded from this study. The specific gridded points for ERA-20c are illustrated in Figure 1.
It is crucial to understand the features of the model biases to improve the modelled reanalysis data. Some of the general features of ERA-20c daily precipitation over South Korea are examined in terms of the mean and the extreme values. For the mean precipitation, we compared the intra-seasonal variability within the annual cycle by exploring the monthly means and the 10-day running means between the observed and ERA-20c precipitation (as shown in Figure 2) averaged over all 48 stations during the baseline period (1973–2010). The model performance was evaluated by both the Nash–Sutcliffe efficiency (NSE) and root-mean-square error (RMSE), which are described in the Methodology section. The results confirmed that ERA-20c can reproduce the mean values quite well during the dry season. There is a significant difference between modelled and observed precipitation during the summer (i.e. July–September), which may lead to an underestimation of extreme rainfall.
In terms of the extreme rainfall episodes, the 50 top events were extracted for the baseline period, and an underestimation of extremes in the ERA-20c was clearly identified, as illustrated in Figure 3. The deviations are generally large, even for relatively larger upper tail parts of the distribution with –1.088 for NSE and 76.69 mm for RMSE (Figure 3(a)). On the one hand, the deviations are quite systematic in the sense of the bias correction. The relationships between the 50 top extreme rainfalls showed that the discrepancies were largely attributed to differences in rainfall during the summer season, as noted in Figure 2. The overall relationships are similar to each other, as shown in Appendix A (available with the online version of this paper), and the comparisons in the stations 4, 16, 28 and 40 are representatively illustrated in Figure 3(b). The biases in extreme values are generally proportional to the amount of rainfall, and the biases are likely to be higher in the upper tails of the distribution than that of the middle layer.
In summary, the ERA-20c precipitation data are capable of reliably reproducing the mean values with 0.968 for NSE and 15.59 mm for RMSE, while the extreme values in the 50 top records are consistently underestimated with –1.088 for NSE and 76.69 mm for RMSE. The results obtained here could indicate that although the ERA-20c modelling process adequately represents the mean climate of the historical period, heavy rainfalls in the summer season can be significantly underestimated due to the fact that intensive rainfall events driven by convective storms may not be effectively resolved by the current climate modelling approach and spatial resolution. On the other hand, as shown in Figure 4, ERA-20c exhibits a much higher frequency of wet-days (>0 mm/day), varying from 11.75 to 26.64 days per month, than that of the observed precipitation (6.07–14.5 days) for all months in South Korea. More generally, the over-pronounced frequency of light precipitation by climate models is a well-known problem, and it may partially cause the underestimation of the extremes. In these contexts, a two-stage bias correction approach to daily precipitation is typically adopted to first adjust the overestimated wet-day frequency and then rectify the biases associated with both the mean and extreme values.
METHODOLOGY
As illustrated in the previous section, two deficiencies in the ERA-20c became evident: the overestimation of the wet-day frequency and underestimation of the extreme values. To correct the biases, we adopted a two-stage bias correction scheme that consists of the wet-day frequency correction scheme and the composite distribution based QM approach. The proposed methods and their assumptions used in this study are provided in this section.
Wet-day frequency correction scheme
It is well known that the wet-day frequencies of the simulated precipitation data from climate models are typically inflated due to the generation of small precipitation amounts near 0.1 mm/day (Piani et al. 2010; Kim et al. 2015b; Nyunt et al. 2016). For this reason, a cut-off threshold (TH) approach has been commonly applied to adjust the wet-day frequency in the bias correction for daily precipitation using different criteria (Schmidli et al. 2006; Piani et al. 2010; Themeßl et al. 2012; Kim et al. 2015a, 2015b; Rabiei & Haberlandt 2015; Nyunt et al. 2016; Volosciuk et al. 2017). For example, Piani et al. (2010) and Volosciuk et al. (2017) applied 0.1 mm/day as the threshold, whereas the wet-day frequency of simulated precipitation was set equal to that of the observed precipitation (Kim et al. 2015a, 2015b; Nyunt et al. 2016). Rabiei & Haberlandt (2015) compared five different thresholds (0, 0.02, 0.05, 0.07 and 0.1 mm/h) for spatial bias correction of hourly radar data and concluded that the threshold of 0.05 mm/h performed the best among the five in terms of the reduction of biases.
In our study, a set of predetermined thresholds were used to adjust the wet-day frequency of the modelled daily precipitation from ERA-20c, and the thresholds used in this study can be found in previous studies (Piani et al. 2010; Kim et al. 2015a, 2015b; Rabiei & Haberlandt 2015; Volosciuk et al. 2017). We considered four different thresholds to identify an optimal threshold (TH) for the ERA-20c: (TH1) >0 mm/day, (TH2) >0.1 mm/day, (TH3) >1 mm/day, and (TH4), the frequency of wet days was set to the observed value, which varied from 0 to 4.66, as shown in Figure 5. On the one hand, changes in the wet-day frequency can affect the overall performance in the bias correction process through the QM approach, because a transfer function between the simulated and observed precipitation is established on the basis of non-zero precipitation. In this context, the optimum threshold was evaluated through the experiment with gQM for a pair of daily rainfall series for each station. It should be noted that daily rainfalls below the thresholds were set to zero for ERA-20c. Among four thresholds, the determined threshold was then applied in the next steps.
Statistical bias correction model: QM with a composite distribution
To effectively improve the bias in the extreme rainfall for ERA-20c, we propose a composite distribution based on the QM approach which is comprised of different types of distributions. More specifically, the extreme value distribution can be utilized for the upper tail of the distribution, while a gamma distribution is applied for the interior part of the distribution. For extremes, the 95th or 99th percentiles have been applied as an upper threshold in numerous studies because the distribution of excesses over the high thresholds is asymptotically approximated by a generalized Pareto distribution (GPD) (Manton et al. 2001; Wilson & Toumi 2005; Acero et al. 2011; Gutjahr & Heinemann 2013; Chan et al. 2015; Nyunt et al. 2016). In this study, we apply both the 95th and 99th percentiles as the upper thresholds.
The GPD has been widely applied to the peak-over-threshold (POT) series for the selection of the best-fit distribution for the extreme rainfalls (Vrac & Naveau 2007; Hundecha et al. 2009; Gutjahr & Heinemann 2013; Nyunt et al. 2016; Volosciuk et al. 2017), although there have been a considerable number of studies using other extreme value distributions including the generalized extreme value (GEV), Weibull (WEI), Gumbel (GUM), and Log-normal (LOGN). To ensure the suitability of the GPD, we first evaluated six different distributions, GPD, GEV, GUM, WEI, LOGN and gamma, for the extremes in both the observed and ERA-20c over the 95th and 99th percentiles using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). The model with the lowest AIC and BIC is preferred as the best-fit distribution. For a given threshold, the GPD was selected as the best-fit distribution for the extremes as shown in Table 2. The numbers in Table 2 indicate the number of stations which belong to a certain distribution.
Percentile . | Data . | GPD . | GEV . | LOGN . | WBL . | GUM . | GAM . |
---|---|---|---|---|---|---|---|
95th | Observation | 47 | 1 | 0 | 0 | 0 | 0 |
ERA-20c | 48 | 0 | 0 | 0 | 0 | 0 | |
99th | Observation | 47 | 1 | 0 | 0 | 0 | 0 |
ERA-20c | 47 | 1 | 0 | 0 | 0 | 0 |
Percentile . | Data . | GPD . | GEV . | LOGN . | WBL . | GUM . | GAM . |
---|---|---|---|---|---|---|---|
95th | Observation | 47 | 1 | 0 | 0 | 0 | 0 |
ERA-20c | 48 | 0 | 0 | 0 | 0 | 0 | |
99th | Observation | 47 | 1 | 0 | 0 | 0 | 0 |
ERA-20c | 47 | 1 | 0 | 0 | 0 | 0 |
In this approach, the four parameters to be estimated are the shape () and scale () parameters for the gamma distribution, and the shape () and scale () parameter for GPD, while the upper thresholds are assumed to be known for the given 95th or 99th percentile. The parameters for gamma distribution are estimated on a monthly basis, whereas the parameters of GPD are estimated using entire POTs for all months in each station. Here, the maximum likelihood method is used to estimate all the parameters. Hereafter, the proposed method with a composite distribution of gamma and GPD is referred to as gpQM. Moreover, the gpQM with the 95th and 99th upper thresholds were abbreviated as gpQM95 and gpQM99, respectively. For comparison, the conventional bias correction gQM was also applied and compared in terms of the accuracy of both the extreme and the mean value.
Spatial interpolation by parameter contour maps
In the gpQM approach, a pair of observed and modelled data are required to estimate the six parameters (TH, , , , and u). However, because there are a limited number of available weather stations, the transfer function for the QM could not be established for all grid points. Therefore, the existing methods can only be applied over gauged catchments. In contrast, we introduce an interpolation method based on parameter contour maps (IM-PCM) which consist of three steps as summarized in Figure 6. For gpQM95 and gpQM99, the six parameters (TH, , , , and u) were first estimated for each station as already noted in the previous sections. Secondly, a contour map for each parameter was then constructed using a two-dimensional linear interpolation technique as shown in Figure 7. Finally, a set of parameters for the gpQM were taken from the maps to construct the transfer function for all grid points. The cut-off threshold (TH) is the first interpolated variable, and the maps of shape () and scale () parameters for the gamma distribution were then generated on a monthly basis, while the shape (), scale () and upper threshold (u) parameter maps of the GPD were created by using the entire POTs on an annual basis. For the gQM, a similar process to the one described above was used to produce three parameter (TH, and ) maps for the transfer function.
Evaluation criteria
The performance of the proposed interpolation method was evaluated by a leave-one-out procedure within a cross validation framework. To be more specific, this approach estimates a set of parameters for the observation of daily precipitation for 47 out of 48 stations, and the estimated parameters were further used to build contour maps as shown in Figure 7. The set of parameters of the grid point corresponding to the excluded station were taken from the maps, and the proposed bias correction approaches were then applied. Again, note that the model performance for the extreme and mean values were evaluated with regard to RMSE and NSE as described above.
RESULTS AND DISCUSSION
Evaluation for the lower threshold
This study examined four different thresholds (TH1, TH2, TH3, and TH4) for adjustment of the wet-day frequency of ERA-20c daily precipitation through an experiment with the gQM approach in terms of both the mean and extreme values. We investigated the intra-seasonal variability within the annual cycle by comparing the monthly means and the 10-day running means as an overall evaluation of the bias corrected precipitation. Here, all the values were averaged over all 48 stations during the baseline period (1973–2010) as illustrated in Figure 8. We found that the threshold TH4 yielded the best results among the four in terms of the reduction of biases, as summarized in Figure 8(a) and Table 3. Again note that TH4 is the case where the frequency of wet days of ERA-20c is set to that of the observed precipitation. On the other hand, the other thresholds, TH1, TH2 and TH3, showed a significant overestimation, whereas the uncorrected ERA-20c showed a relatively small bias. Our results offer insight on how improper thresholds for the wet-day frequency may affect bias correction results, leading to a significant overestimation of daily rainfall. Such discrepancies may arise from the significantly different thresholds used to adjust the wet-day frequency. As illustrated in the previous section, the lower thresholds for TH4 were varied over the range 0–4.66 mm while the thresholds assumed in the TH1, TH2 and TH3 are much lower than the one measured in the TH4, especially for the summer season (July–September). Indeed, the similar results seen in the 10-day moving mean suggests that our findings may be generalizable to cut-off thresholds seen in different locations and seasons, as shown in Figure 8(b) and Table 3. We also found that the bias associated with the cut-off thresholds significantly varied within a specific season, especially in the summer. The biases for both TH1 and TH2 range from 2.21 to 10.49 and from 1.92 to 10.09 during the summer, respectively, while TH3 and TH4 varied from 0.16 to 6.27 and from –1.06 to 2.97, respectively.
Data . | Measures . | TH1 . | TH2 . | TH3 . | TH4 . | ERA − 20c . |
---|---|---|---|---|---|---|
Monthly mean (mm/month) | RMSE (mm) | 119.24 | 110.50 | 42.57 | 4.77 | 15.59 |
NSE | −0.899 | −0.631 | 0.758 | 0.997 | 0.968 | |
10-days running mean (mm/day) | RMSE (mm) | 4.03 | 3.74 | 1.49 | 0.51 | 0.56 |
NSE | −0.886 | −0.622 | 0.744 | 0.970 | 0.963 |
Data . | Measures . | TH1 . | TH2 . | TH3 . | TH4 . | ERA − 20c . |
---|---|---|---|---|---|---|
Monthly mean (mm/month) | RMSE (mm) | 119.24 | 110.50 | 42.57 | 4.77 | 15.59 |
NSE | −0.899 | −0.631 | 0.758 | 0.997 | 0.968 | |
10-days running mean (mm/day) | RMSE (mm) | 4.03 | 3.74 | 1.49 | 0.51 | 0.56 |
NSE | −0.886 | −0.622 | 0.744 | 0.970 | 0.963 |
For the evaluation of the extreme rainfalls associated with different thresholds, we extracted rainfall events exceeding a given 99th threshold and we compared the four different thresholds for all stations. As illustrated in Figure 9, a systematic significant underestimation of extremes in the ERA-20c is most apparent, while the improvements appear to result from enhanced representation of the bias associated with extreme values regardless of the threshold. Specifically, TH4 performs the best with 0.755 for NSE and 27.33 mm for RMSE, followed by TH3, TH2 and TH1. The errors may be largely attributed to their number of data with different thresholds for a given time series. To be more specific, the lower threshold allows a relatively large number of data, while the higher threshold could reduce the number of available data. Given these results, TH4 could be the most reliable cut-off threshold for the ERA-20c under the gQM approach. On the other hand, there remains considerable potential for improving extremes, especially over 300 mm/day. Thus, we will further explore the bias correction approach for the upper tail of the distribution.
Bias correction based on a composite gamma-GPD distribution
This study applies a composite (or piecewise) distribution based QM approach which consists of gamma distribution and GPD, for a given set of thresholds. Here, after adopting TH4 as a lower threshold, the 95th or 99th quantiles have been considered as an upper threshold for the correction of extremes (gpQM95 and gpQM99). The composite distribution approach was evaluated by comparing the obtained extreme rainfalls from modelled ERA-20c with the ones observed for the baseline, as shown in Figure 10. In comparison with the extreme daily rainfalls over the 99th percentile, the GPD based bias correction schemes (i.e. gpQM99 and gpQM95) demonstrate better performance in terms of reproducing the extremes than gQM (Figure 10(a)). gpQM99 shows the best performance in terms of NSE with an efficiency of 0.906, and a good agreement was achieved with 0.879 in gpQM95, whereas the gQM was 0.755. For RMSE, gpQM99 (i.e. 16.92 mm) and gpQM95 (i.e. 19.16 mm) showed a significant reduction of the errors by 38.1 and 29.9% relative to gQM (27.33 mm). Moreover, a comparison of the AMS rainfall also confirmed that gpQM99 and gpQM95 were capable of reproducing rainfall characteristics observed in the AMS more effectively than gQM. Specifically, gpQM99 showed the best performance with 0.912 for NSE and 18.80 mm for RMSE, whereas gpQM95 was 0.892 for NSE and 20.77 mm for RMSE. The results obtained in this study suggest that the gpQM approach is more appropriate to reduce the systematic errors in estimating extreme rainfalls than gQM.
Apart from evaluating the models in the extreme cases, it is important to ensure that the proposed bias correction model with the GPD can reproduce the mean values as well. Again, we evaluate both the monthly mean and 10-day moving mean of the corrected daily precipitation as shown in Figure 11 and Table 4. For the monthly mean, gQM and gpQM99 give the best performance (Figure 11(a)), leading to the highest efficiency for NSE of 0.997 for both methods, and the lowest RMSE, about 4.77–5.12 mm/month, respectively (Table 4). For gpQM95, the efficiency for NSE is close to one, but the RMSE, at 9.41 mm/month, is nearly twice those of gQM and gpQM99. In terms of the 10-day moving mean, the results have shown that all QM approaches work equally well, although gpQM99 offers the best performance (Table 4). More generally, the gpQM99 approach can effectively correct the biases associated with the upper tails of the distribution without a loss in the efficiency of the bias correction process.
Data . | Measures . | gQM . | gpQM95 . | gpQM99 . | ERA-20c . |
---|---|---|---|---|---|
Monthly mean (mm/month) | RMSE (mm) | 4.77 | 9.41 | 5.12 | 15.59 |
NSE | 0.997 | 0.988 | 0.997 | 0.968 | |
10-days running mean (mm/day) | RMSE (mm) | 0.507 | 0.545 | 0.497 | 0.563 |
NSE | 0.970 | 0.966 | 0.971 | 0.963 |
Data . | Measures . | gQM . | gpQM95 . | gpQM99 . | ERA-20c . |
---|---|---|---|---|---|
Monthly mean (mm/month) | RMSE (mm) | 4.77 | 9.41 | 5.12 | 15.59 |
NSE | 0.997 | 0.988 | 0.997 | 0.968 | |
10-days running mean (mm/day) | RMSE (mm) | 0.507 | 0.545 | 0.497 | 0.563 |
NSE | 0.970 | 0.966 | 0.971 | 0.963 |
It should be noted that the bias still remains large in the summer season as seen in the 10-day moving mean. The difference was mainly attributed to the discrepancies in the seasonal or monthly distribution of the heavy rainfall events between the observed and modelled data (Nyunt et al. 2016). In other words, there is a clear difference in the monthly number of extreme events over the 95th or 99th thresholds between the observed and ERA-20c (Figure 12), and this is considered to be the main source of the bias in terms of extremes in the intra-seasonal band. The results obtained in these experiments imply that the upper thresholds could be different (or updated) for each month to better represent the intra-seasonal change. On the other hand, estimation of different thresholds on the monthly basis could lead to unreliable estimates of extreme values due to insufficient data for estimating the GPD parameters.
Spatial interpolation on bias correction parameters
The proposed IM-PCM approach is validated by leave-one-out cross validation. In this study, we estimated a set of parameters for the observation of daily precipitation, and the estimated parameters were then used to build contour maps. For extreme values of the interpolated daily precipitation, POTs exceeding a given 99th percentile and AMS were first constructed and compared between three different QM approaches including gQM, gpQM95 and gpQM99. Note again that all results were obtained from the cross-validation procedure having considered different possible samples. As illustrated in Figure 13(a), the corrected extremes using an interpolated set of parameters by IM-PCM showed good agreement with the observed values for the three QMs. Among them, gpQM95 and gpQM99 gave the best performance for the given POTs (Figure 13(a)) with 0.781 for NSE, and 0.714 for gQM. Similar results were obtained for the RMSE. Moreover, the proposed gpQM99 approach using the interpolated parameters was capable of reproducing the AMS with 26.35 mm for RMSE and 0.827 for NSE (Figure 13(b)). However, it should be noted that an increased bias exists, which is largely attributable to the parameter interpolation process. For example, the RMSE in AMS using gpQM99 with IM-PCM increased from 18.80 to 26.35 mm for RMSE when compared with a pointwise bias correction as already seen in Figure 10(b). A similar increase (i.e. 20.77–26.30 mm) was also observed in the gpQM95. Nevertheless, the RMSE for the corrected AMS data by IM-PCM with gpQM99, 26.35 mm, is still smaller than that of the pointwise bias correction from gQM, 28.07 mm.
In terms of the mean precipitation, the monthly mean and 10-day moving average of bias corrected rainfall using a set of parameters obtained from IM-PCM were evaluated (Figure 14 and Table 5). Although all three QM approaches yielded slightly different estimates, overall favorable performance was obtained for the monthly mean with a model efficiency over 0.98 for NSE. Among the options, gQM and gpQM99 performed the best and showed the lowest RMSE (Figure 14(a) and Table 5). Figure 14(b) shows a similar result for the 10-day moving average with an efficiency over 0.96 for NSE.
Data . | Measures . | gQM . | gpQM95 . | gpQM99 . | ERA-20c . |
---|---|---|---|---|---|
Monthly mean (mm/month) | RMSE (mm) | 4.14 | 10.31 | 5.27 | 15.59 |
NSE | 0.998 | 0.986 | 0.996 | 0.968 | |
10-days running mean (mm/day) | RMSE (mm) | 0.502 | 0.562 | 0.498 | 0.563 |
NSE | 0.971 | 0.963 | 0.971 | 0.963 |
Data . | Measures . | gQM . | gpQM95 . | gpQM99 . | ERA-20c . |
---|---|---|---|---|---|
Monthly mean (mm/month) | RMSE (mm) | 4.14 | 10.31 | 5.27 | 15.59 |
NSE | 0.998 | 0.986 | 0.996 | 0.968 | |
10-days running mean (mm/day) | RMSE (mm) | 0.502 | 0.562 | 0.498 | 0.563 |
NSE | 0.971 | 0.963 | 0.971 | 0.963 |
For a more specific analysis in each weather station in the context of cross validation, we generated a map showing the spatial errors in both AMS rainfalls and mean. The AMS errors were evaluated by RMSE and NSE in Figure 15. For the mean, we additionally evaluated the IM-PCM method by estimating the relative error between the observed and modelled precipitation in Figure 16. As shown in the figures, for the AMS rainfalls, gpQM95 and gpQM99 generally perform well except for a few stations. Most stations showed NSE over 0.8 and RMSE less than 30 mm. For the mean daily rainfall, the relative errors are generally below 10%. Given these results, the proposed gpQM approaches, especially for gpQM99, with IM-PCM can effectively rectify the spatial-temporal bias of the ERA-20c model data without a loss in efficiency for the mean values. The interpolated parameter of the transfer function including the wet-day frequency over each of these grids covered the South Korea in its entirety and can be interpreted as an approximation of the observed rainfall over that grid. The accuracy of the interpolated parameters (or rainfall estimates) are largely affected by potential bias associated with spatial interpolation and inadequate sampling of rain gauges. We acknowledge that the potential bias in the interpolated rainfall estimates can be attributed to a limited number of rain gauges and the systematic bias in the rainfall scenarios.
It is well known that precipitation is mainly influenced by the topology in mountainous areas, so numerous studies have used elevation as an exogenous factor for rainfall interpolation (Goovaerts 2000; Lloyd 2005; Adhikary et al. 2017). We therefore explored the relationship between the elevation and parameters for all 48 stations. As summarized in Table 6, the Pearson correlation r-values were not statistically significant, leading to a weak dependence between the elevation and parameters. The results imply that the elevation may not be important in terms of the interpolation of the parameter. In summary, the proposed interpolation scheme for the QM approach provided bias corrected long-term precipitation data, especially for ungauged catchments. On the other hand, the proposed approach was easy to use and may help to reduce bias associated with the interpolation of daily precipitation. Moreover, this approach can be further used to obtain a century-long daily precipitation series over the Korean peninsula, which could be useful in terms of reducing uncertainty in the parameter estimation of rainfall frequency analysis.
Type . | Gamma Distribution . | GPD . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Para. . | r . | Para. . | r . | ||||||||||||
Jan . | Feb . | Mar . | Apr . | May . | Jun . | Jul . | Aug . | Sep . | Oct . | Nov . | Dec . | ||||
gQM | −0.40 | −0.14 | 0.06 | 0.18 | 0.07 | 0.16 | 0.22 | 0.06 | 0.15 | 0.00 | −0.06 | −0.14 | − | ||
gpQM95 | −0.37 | −0.13 | 0.05 | 0.17 | 0.09 | 0.19 | 0.26 | 0.09 | 0.15 | 0.12 | −0.13 | −0.18 | −0.01 | ||
gpQM99 | −0.40 | −0.14 | 0.06 | 0.18 | 0.07 | 0.16 | 0.24 | 0.08 | 0.14 | 0.03 | −0.08 | −0.14 | −0.05 | ||
gQM | 0.09 | −0.15 | −0.25 | −0.22 | −0.14 | −0.20 | −0.11 | −0.11 | −0.02 | 0.17 | −0.02 | −0.11 | − | ||
gpQM95 | 0.02 | −0.16 | −0.22 | −0.23 | −0.20 | −0.25 | −0.18 | −0.14 | −0.08 | −0.10 | 0.02 | −0.08 | −0.05 | ||
gpQM99 | 0.09 | −0.14 | −0.25 | −0.23 | −0.17 | −0.21 | −0.13 | −0.16 | −0.03 | 0.09 | −0.03 | −0.11 | −0.01 |
Type . | Gamma Distribution . | GPD . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Para. . | r . | Para. . | r . | ||||||||||||
Jan . | Feb . | Mar . | Apr . | May . | Jun . | Jul . | Aug . | Sep . | Oct . | Nov . | Dec . | ||||
gQM | −0.40 | −0.14 | 0.06 | 0.18 | 0.07 | 0.16 | 0.22 | 0.06 | 0.15 | 0.00 | −0.06 | −0.14 | − | ||
gpQM95 | −0.37 | −0.13 | 0.05 | 0.17 | 0.09 | 0.19 | 0.26 | 0.09 | 0.15 | 0.12 | −0.13 | −0.18 | −0.01 | ||
gpQM99 | −0.40 | −0.14 | 0.06 | 0.18 | 0.07 | 0.16 | 0.24 | 0.08 | 0.14 | 0.03 | −0.08 | −0.14 | −0.05 | ||
gQM | 0.09 | −0.15 | −0.25 | −0.22 | −0.14 | −0.20 | −0.11 | −0.11 | −0.02 | 0.17 | −0.02 | −0.11 | − | ||
gpQM95 | 0.02 | −0.16 | −0.22 | −0.23 | −0.20 | −0.25 | −0.18 | −0.14 | −0.08 | −0.10 | 0.02 | −0.08 | −0.05 | ||
gpQM99 | 0.09 | −0.14 | −0.25 | −0.23 | −0.17 | −0.21 | −0.13 | −0.16 | −0.03 | 0.09 | −0.03 | −0.11 | −0.01 |
The bias correction methods developed in this study both statistically improved the quality of the data and could extend daily precipitation over the 20th century in South Korea. More specifically, this study further utilizes the derived transfer function for the baseline period 1973–2010 to provide the daily precipitation for the period 1900–2010 under the stationary assumption. Finally, we explored changes in the mean and extreme using the gpQM99 approach for three different periods, 1900–1972, 1973–2010 and 1900–2010, in the context of a retrospective analysis. As shown in Figure 17(a), the evaluation results for the monthly mean show a very noticeable and sudden increase in the recent period, especially for the summer season (July–September), while no significant changes were observed for the dry season (October–April). Figure 17(b) shows boxplots representing a distribution of the AMS for the three periods. The distribution of the AMS derived from the gpQM99 approach for the period 1973–2010 was almost identical to that of the observed, which indicates that the proposed gpQM99 was capable of reproducing the extremes of daily precipitations. As expected from the changes in summer rainfall, the distribution of the AMS for the recent period 1973–2010 is much wider than that of the period 1900–1972 (i.e. gpQM99-1), especially for the upper tail of the distribution. This may lead to an increase in design rainfalls for a specific return period. On the other hand, the distribution of the AMS for the entire period 1900–2010 is quite similar to that of the observed in terms of median AMS, while its range is relatively narrower than the recent period.
CONCLUSIONS
The main objective of this study was to explore the century-long reanalysis data, ERA-20c, especially for daily precipitation over South Korea in the context of bias correction. We first investigated the utility of the ERA-20c data as proxy data over South Korea for hydrological applications and further examined several issues concerning the aspects of the bias correction that influence the use of modelled data in practice. In general, we found that there is fairly good agreement between the observed and the ERA reanalysis data for the baseline period 1973–2010. On the one hand, the results obtained here have shown that the ERA-20c precipitation data still have their own systematic biases, particularly in the frequency of wet-days and the extreme upper tail of the distribution. More specifically, the over-pronounced frequency of wet-days and the considerable underestimation of daily precipitation have been identified in the ERA-20c over South Korea. Given these results, we proposed a two-stage bias correction approach to daily precipitation, which is comprised of two distinct parts: a model for adjusting the overestimated wet-day frequency and a model for reducing the biases associated with extreme values. To adjust the wet-day frequency, we explored four different thresholds through an experiment with the QM approach. In terms of extremes, a composite Gamma-GPD distribution based QM approach was introduced. Finally, we proposed an IM-PCM approach as an alternative to constructing the transfer function for the ungauged basin. The key findings obtained in this analysis are summarized as follows:
- 1.
Our findings are consistent with the notion that the mean daily precipitation is reproduced well by the reanalysis. Our study also confirms that the mean and annual cycle of daily precipitation as observed over South Korea is well simulated by the ERA-20c reanalysis. However, considerable underestimation of the daily maximum precipitation was consistently seen in the ERA-20c, especially during the summer season. The results presented here illustrate that the heavy rainfalls in the summer season could be significantly underestimated by the current climate modelling system, although the reanalysis system adequately reproduces the mean climate of the historical period. Another issue with respect to the evaluation of ERA-20c daily precipitation is related to the much higher frequency of wet-days than that of the observed, which may in turn influence the underestimation of the extremes.
- 2.
In this study, a two-stage bias correction approach to the ERA-20c precipitation was proposed to adjust the overestimated wet-day frequency and the biases associated with the upper tail of the distribution. In terms of the wet-day frequency, we examined four different types of thresholds (i.e. TH1, TH2, TH3 and TH4) to identify an optimal threshold. TH4 is the case where the frequency of wet-days of ERA-20c is set to that of the observed and produces the best results among the four. Moreover, TH4 is allowed to have different thresholds for each month, unlike the other three approaches (i.e. TH1, TH2 and TH3) in which a fixed value was assumed over all the months for all the stations. Our results offer insights on how inappropriate thresholds for the wet-day frequency may significantly influence the bias correction results. To better represent the bias in the extreme rainfall, we proposed a composite distribution based QM approach, which consists of the gamma distribution and GPD for the two thresholds (i.e. the 95th and 99th percentiles). Given the efficiency gains, this study suggests that the gpQM approach is more appropriate to reduce the systematic errors in estimating extreme rainfalls than gQM. To be more specific, the gpQM99 approach can effectively reduce the biases in the upper tails of the distribution without a loss of efficiency in the overall bias correction process. However, a large bias still exists in the summer season, and thus the bias in extreme rainfall that the qpQM99 offers in the process of bias correction suggests that the ERA-20c data may be insufficient in terms of reflecting the specific regional patterns associated with extreme rainfall over South Korea.
- 3.
We explored an alternative to obtain the transfer function of the QM approach for the ungauged catchments in the context of the cross-validation process. From this perspective, we have proposed an interpolation method based on parameter contour maps (IM-PCM), which is based on the interpolation of the five parameters over the entire region of interest. The corrected daily precipitation series using an interpolated set of parameters by the IM-PCM showed good agreement with the observed precipitation, and particularly the proposed gpQM99 with the IM-PCM performs the best in terms of reducing the spatial-temporal bias of the ERA-20c model data without a loss of efficiency. We finally utilized the derived transfer function for the baseline period 1973–2010 to extend the daily precipitation for the period 1900–2010 under the stationary assumption, and we examined the changes in daily precipitation for three different periods, 1900–1972, 1973–2010 and 1900–2010, as a retrospective analysis. We found that a very noticeable and sudden increase in the recent period was observed during the summer season (July–September).
The findings demonstrated in this study help to understand the knowledge gaps about the bias correction of the century-long reanalysis, ERA-20c, as well as the key characteristics of daily precipitation over South Korea. Further, the results obtained here can provide a useful perspective on the bias correction of the modelled data in the reanalysis and regional climate modelling systems for the regional-scale analysis with a limited network of rainfall stations. The impact of climate change on water resources using the extended daily precipitation data for the period 1900–2010 will be explored further. Although the study has been carried out in South Korea, the methodology has the potential to be applied in other parts of the world. We hope this paper will stimulate the hydrometeorological community to explore the issues raised in the long-term reanalysis data in other countries under different climate and geographical conditions.
ACKNOWLEDGEMENTS
The first author is funded by the Government of South Korea for carrying out his doctoral studies at the University of Bristol. We are grateful for the relevant data provided by KMA and ECMWF. The second author is supported by a grant (17AWMP-B121100-02) from Advanced Water Management Research Program (AWMP) funded by Ministry of Land, Infrastructure and Transport of the Korean Government. The abbreviations and symbols used in this study are listed in Appendix B (available with the online version of this paper).