Changes in extreme rainfall and its implications for design rainfall using a Bayesian quantile regression approach

This study aims to explore possible distributional changes in annual daily maximum rainfalls (ADMRs) over South Korea using a Bayesian multiple non-crossing quantile regression model. The distributional changes in the ADMRs are grouped into nine categories, focusing on changes in the location and scale parameters of the probability distribution. We identified seven categories for a distributional change in the selected stations. Most of the stations (28 of 50) are classified as Category III, which is characterized by an upward trend with an increase in variance in the distribution. Moreover, stations with a downward trend with a decrease in the variance pattern (Category VII) are mainly distributed on the southern Korean coast. On the other hand, Category I stations are mostly located in eastern Korea and primarily show a statistically significant upward trend with a decrease in variance. Moreover, this study explored changes in design rainfall estimates for different categories in terms of distributional changes. For Categories I, II, III, and VI, a noticeable increase in design rainfall was observed, while Categories IV, V, and VII showed no evidence of association with risk of increased extreme rainfall.


GRAPHICAL ABSTRACT INTRODUCTION
It has been reported that precipitation changes associated with a climate change can differ spatially and temporally for a region due to the geographic distribution of rain, which is largely driven by climate and topography (Schmidli Nonstationarity in precipitation extremes has been a major concern for decades in the currently adopted stationary modeling procedure. Changes in rainfall distribution affect current operations and future design criteria for water resource systems; thus, it might be necessary to explore rainfall trends. In this context, statistical modeling of extreme rainfall trends (or nonstationarity) has been extensively conducted. In particular, an increasing number of studies have been carried out to explore rainfall trends (Matti et al. ; Alexander et al. ; Clarke et al. ; regression and the MK test to identify the linear trends in rainfall for South Korea and used a point rainfall frequency analysis based on the generalized extreme value (GEV) distribution to explore changes in the magnitude of 2-day maximum rainfall. They concluded that, although the existing studies revealed significant trends in rainfall over South Korea, several of the methods were insufficient to detect substantial distribution changes in the extreme since the existing approaches aim to identify possible monotonic trends in any given time series.
For these reasons, quantile regression was developed to better identify distinct changes in time-series data at any percentile values (Koenker & Bassett Jr ). A traditional ordinary regression method is designed to estimate the conditional expectation (mean) of the response variable given certain values of the independent variables. In contrast, a quantile regression method allows flexibility to estimate either the conditional median or any quantiles of the response variable (Sankarasubramanian & Lall ). Quantile regression has been applied to explore changes in annual

).
More recently, a technique was applied to explore possible distributional changes of rainfall characteristics using quantile regression (Shiau & Huang ). Although quantile regression has been successfully applied in hydrological applications, the systematic estimation of parameter uncertainty has not been properly addressed. In this context, this study applies a Bayesian parameter estimation approach to a quantile regression model to better explore the distributional changes in extreme rainfall for given years. More specifically, Bayesian inference for the quantile regression is used to categorize the distributional changes in extreme rainfall into certain classes and then further utilized to provide a nonstationary estimation of the design rainfall corresponding to the classes for a better understanding of changes in design rainfall estimates in a changing climate.
However, estimated conditional quantile functions often fail to be a monotonically increasing function of quantile, leading to quantile crossing problems. To alleviate crossing issues in quantile functions, this study develops a Bayesian multiple non-crossing quantile regression (BMN-QR) approach within a fully Bayesian framework, which is applied to multiple rainfall stations in South Korea.
The theoretical background for the BMN-QR and the rainfall data used in this study is discussed in the following section. The BMN-QR is then applied to the annual daily maximum rainfalls (ADMRs) over the period of 1973-2015, and a comprehensive discussion of the results is provided in the following section. Finally, a summary and conclusion from the study are offered.

Study area and data
South Korea tends to have a humid continental and subtropical climate. The country's terrain is mostly mountainous, with lowlands located in the western and southeastern regions. Due to the mountainous terrain, the country experiences uneven rainfall distribution. Annual precipitation ranges from 1,000 to 1,800 mm in the southern part of South Korea and from 1,100 to 1,400 mm in the central part. More than half of the annual precipitation happens during the monsoon season when a stationary front forms across the Korean Peninsula for about a month in summer (Kwon et al. b This study obtained daily precipitation data of 93 stations, subjected to a certain level of quality assurance by the Korea Meteorological Administration (KMA). As seen in Figure 1, rainfall stations uniformly cover South Korea, including Jeju Island. We considered length and the proportion of missing values for quality assessment criteria. The periods of record vary by station, but for this study, we select stations with less than 5% missing value and with continued observed data of 43 years from 1973 to 2015.

Bayesian multiple non-crossing quantile regression
For the analysis of distributional changes in ADMRs, we propose the following quantile regression model. Let y i,p denote the ADMRs for year i and p denote the quantile level p ∈ (0, 1), then the generic structure of the model is given by: where x 0 i is a vector of regressors, β p is a vector of regression coefficients for the pth conditional quantile Q ϵ i,p (p), ϵ i,p is an unspecified quantile-specific error term where no distribution is specified other than the constraint Q ϵ i,p (p) ¼ 0 and estimated by minimizing the asymmetrically weighted absolute deviations through linear programming. Here, the prime ( 0 ) denotes the transpose of the matrix x. As discussed in Koenker & Bassett (), no distribution is specified for ϵ i,p . The only existing condition is Q ϵ i,p (p) ¼ 0, with the estimates obtained by minimizing the error according to: where ρ p ( Á ) is the loss function given by the following equation (Friederichs & Hense ): In general, the above estimates for the quantile regression model are solved separately through linear programming for each of the q quantiles, p 1 < . . . < p q . In this setting, the estimated conditional quantile functions for a set of predictors often fail to be a monotonically increasing function of the quantile q, which ultimately leads to a quantile crossing problem (Koenker & Regression ). In particular, the crossing intensifies in the upper quantile and in the presence of a larger number of predictors. In order to minimize the crossing problem, we develop a multiple non-crossing quantile regression model in which all quantile functions are simultaneously estimated under the non-crossing constraints within a fully Bayesian framework, which is subject to: The use of Bayesian inference in generalized linear and additive models is standard due to its ability to estimate the posterior distribution of parameters even in more complex models, which allows us to explicitly account for parameter uncertainty in time-series analysis (Yu & Moyeed ). For these reasons, this study adopted a more flexible Bayesian modeling approach. A Bayesian inference commonly requires a likelihood; thus, a typical approach to Bayesian quantile regression is to assume an asymmetric Laplace distribution (ALD) for ϵ (Yu & Moyeed ), which produces minimization of the loss function (Equation (3)) equivalent to the maximization of a likelihood function of independently distributed ALD (Yue & Rue ). The ALD is characterized by a set of parameters such as location (μ), precision (δ 2 ), and skewness (0 < p < 1) by letting μ ¼ 0 to ensure Q ϵ (p) ¼ 0. The density ϵ is given by: where I( Á ) denotes the indicator function.
According to features of ALD (Yue & Rue ), ϵ can be written as a linear combination of two random variables: where W is an exponentially distributed random variable with the rate parameter δ 2 , and Z has a standard normal distribution N(0,1). Posterior estimates can be subsequently obtained using Bayesian updates conditioned on the exponential random variable W. Using the above stochastic representation with Equation (1), the proposed quantile regression model can be rewritten as: where w i,p and z i,p are subject-specific values of W and Z, respectively, and η i,p is defined in Equation (1).

Bayesian inference for parameter estimation
To enable the inclusion and improved consideration of the uncertainties in the proposed quantile regression model parameters, we adopted a Bayesian inference framework.
Following Bayes' rule, the joint posterior for the set of parameters θ given the observed data y and the prior distribution of the parameter π(θ) can be written as: where π(yjθ) is the likelihood function.
The prior distribution of the parameter π(θ) should reflect our knowledge of the data before observing it. For the parameters of the proposed model, we do not have any prior knowledge of their values other than that regression parameters β p can be any real number, and the ALD precision parameter δ 2 p should be greater than zero. Hence, we assume a conjugate normal prior to regression coefficients β p and a gamma prior to ALD precision parameter δ 2 p . The posterior inference for the desired quantiles, p j , simultaneously proceeds via data augmentation by introducing observation specific latent weights, w i,p j , as specified in Equation (7). Furthermore, let η i,p j ¼ x 0 i β p j , where for brevity β p j and x i include the regression coefficients and regression functions, as seen in Equation (1). Therefore, given p j and w ¼ (w i,p j ), y i,p j are independent and identically distributed random variables with normal distribution: Therefore, the joint posterior for all model parameters is given by: Analytical integration of the joint posterior distribution, Equation (10), for all model parameters θ(β p , δ 2 p , w p ) is not tractable, and this study adopts the widely used Markov   Therefore, this study further explored the evidence of nonstationarity in the ADMRs by the proposed Bayesian quantile regression in the following section.
Distributional changes in the ADMRs and their relations to design rainfall Here, we aimed to identify the distributional changes in the desired quantiles using the entire posterior distributions   The two-tier diagnostic approach is introduced to detect the changes in ADMRs over South Korea, as illustrated in Figure 4. From the tier-1 criteria using the two-sample t-test and the F-test, we identified seven categories (i.e., Categories I, II, III, IV, V, VI, and VII) of a distributional change in the selected stations, as summarized in Table 3.  Accurate design rainfall estimation is vital for the effective performance of water resource management. Design rainfall is defined as the rainfall intensity for a given duration and return period. The primary objective of frequency analysis is to estimate the exceedance probability (or non-exceedance probability) of rainfall intensity for each duration. For instance, a 100-year return period corresponds to an extreme rainfall event that an exceedance probability of 0.01 (or 1% chance) that the extreme event will exceed

Categories I and III showed an increasing trend in the
where y is the ADMRs and k, σ, and μ are the shape, scale, and location parameters, respectively.  In other words, the estimated conditional distributions on the upper quantiles may be susceptible to single peaks (or outliers). In other words, uncertainty in trends can be affected by the location and magnitude of the peak, as illustrated in Figure 6. Furthermore, we experiment with different locations and magnitudes of the peak to explore changes in the uncertainty, as shown in Supplementary material, Figure S1. The obtained results are within our expectations with respect to the association between locations and magnitudes of the peak. The results demonstrated that uncertainty in trends is substantially narrower when maxima of precipitation extremes occur at the beginning or end of the period. Moreover, uncertainty in trends becomes more substantial as the magnitude of single peak increases at the same location. To consider this issue, we consider the entire posterior distributions for classifying possible categories associated with distributional changes in the ADMRs.

Discussion of nonstationarity in the ADMRs
Assessing distributional changes in the ADMRs over the various quantiles contributes to identifying the trends involved in different aspects of water resource management. Moreover, information on the distributional changes for a specific quantile supports estimating design rainfalls due to the fact that appropriate design rainfalls required for various water users differ with regard to quantity and frequency.
Specifically, understanding the lower tails of the rainfall distribution function is more beneficial for irrigated agriculture, hydro-power generation, and water supply users. In contrast, the information obtained from the middle part of the rainfall distribution is useful for urban drainage design practice.
As shown in Figure 6 In contrast, other stations (Cheongju and Yeongdeok) exhibited a broader uncertainty. This study further utilized the uncertainty to assess changes in design rainfalls for a given specific category of nonstationarity.
The return period is an estimate of the likelihood that an extreme event of any given magnitude occurs in any given year. It is a statistical measure typically based on historical data denoting the average recurrence interval over an extended period of time and is usually used for risk analysis.
In this sense, estimating the return period of an extreme rainfall event using BMN-QR suggests the collective behavior of the ADMR quantiles in a changing climate.   (Figure 7(d)).
Therefore, the design rainfall estimates for 2015 are likely to be similar to those obtained in the stationary or less than those for the first year.
For Gwangju station (Category V), the same design rainfalls between for the first year (1973)  (28 of 50) were classified as Category III, which is characterized by an upward-divergent pattern in the distribution. Several of the stations classified as Category VII were distributed in the southern part of South Korea, in which there was a downward-convergent pattern over the period. However, the differences between the distributions of ADMRs over the quantiles were found not to be statistically significant, meaning that the differences in most of the stations classified as Category VII could be attributed to a random sample error rather than a systematic regime change.
3. It was confirmed that the BMN-QR-based nonstationary frequency analysis model provided a method for understanding the key attributes of the dynamic distributional changes in the ADMR in a changing climate. In order to focus more clearly on nonstationarity and its direct impact on design rainfall estimates, this study explored changes in design rainfall estimates for different nonstationarity categories. For Categories I, II, III, and VI, a noticeable increase in design rainfalls was observed. In contrast, Categories IV, V, and VII showed no evidence of association with the risk of increased extreme rainfalls.
4. It should be noted that the categorization process was still adapted in an ad hoc manner according to the distribution change in the ADMRs, although the two-tier significance test was considered. In this regard, future work will focus on developing a more systematic approach that explicitly combines a likelihood function of the categorization process within a hierarchical Bayesian modeling framework. Moreover, the climate interpretation and diagnosis of the distribution changes in extreme rainfalls over South Korea will be further explored in future work.
5. It should be noted that the categorization process was still adapted in an ad hoc manner according to the distribution change in the ADMRs although the two-tier significance test was considered. In this regard, future work will focus on developing a more systematic approach that explicitly combines a likelihood function of the categorization process within a hierarchical Bayesian modeling framework. Recent advances in data science can play a significant role in understanding key climatic dynamics, which can be involved in the classification of pairs of regions that show mutually interconnected climate patterns. Thus, the well-identified clusters may subsequently help us better identify and discover distinct patterns (or clusters) in climate, and insights from the classification can also assist in refining existing models to better reproduce hydrologic processes. Moreover, the climate interpretation and diagnosis of the distribution changes in extreme rainfalls over South Korea will be further explored in future work.