Abstract

Monthly precipitation modeling is important in various applications, e.g. streamflow forecasts and water resources management. This paper develops an operational precipitation forecasting scheme, using Bayesian Non-homogeneous Hidden Markov Chain (NHMM) model and teleconnection index. Although the Hidden Markov Chain model has been investigated before in similar studies, the NHMM algorithm employed in this study allows modeling both non-stationary transition probabilities and emission matrix. Climatic teleconnection that affect precipitation is used to drive changes in transition probabilities of different states in the Markov model. The proposed framework is illustrated for multiple-station precipitation analysis in NingXiang County, a southern inland area in China with a high population density. A simulation model is constructed to examine the model's capacity in capturing variabilities and temporal-spatial characteristics exhibiting in monthly precipitation data during 1961–2013. Results indicate that the proposed NHMM model captures the precipitation characteristics at different stations well. Spearman correlation between conditional mean of simulated ensembles and observed data is 0.87–0.9, with few variations at distinct stations. The proposed framework has general applications and can be applied to simulate and generate stochastic monthly precipitation. Further application of the method is also described in the paper.

INTRODUCTION

Unknown future precipitation is a key factor to consider in relevant decision-making processes. It also serves as an input to hydrologic modeling for streamflow forecasts. Hence, precipitation forecasts at multiple time scales are useful for water resources planning and management and hazard management. For instance, flooding managers typically pay close attention to daily and sub-daily rainfall forecasts to mitigate possible damage caused by storm surge and flash flooding. Water managers would use weekly and up to seasonal rainfall forecasts to guide operations to satisfy water delivery for customers. Water resources planners would then examine historical rainfall patterns and any changes in statistical characteristics of rainfall pattern in planning water infrastructures for future years. The utility of precipitation forecasts has been examined and demonstrated in various studies. Brown et al. (1986) examined the economic value of seasonal-precipitation forecasts for wheat farmers in the northern Great Plains to decide each spring whether to plant a crop or to let their land lie fallow. Schneider & Garbrecht (2003) developed a measure to evaluate the usefulness of seasonal precipitation forecasts for agricultural applications. Usefulness was defined as the ability of the forecasts to predict conditions significantly different from climatological norms, assuming that producers would be more likely to use a forecast if it predicts a larger departure from normal conditions. Meza et al. (2008) provided a review of economic value of seasonal climate forecasts, including precipitation forecasts, for agriculture applications.

Precipitation modeling and forecasts are of great importance to forecasting other important hydrologic attributes, e.g. streamflow. Cuo et al. (2011) reviewed approaches used for quantitative precipitation forecasts and discussed their use in short- to medium-range streamflow forecasts. Habets et al. (2004) examined the utility of quantitative forecast precipitation for the prediction of daily streamflow, using two numerical weather prediction models used in France, ARPEGE and ALADIN. The authors found that such precipitation forecasts can be of interest to forecast the progress of long-duration floods for their study area. Clark & Hay (2004) down-scaled precipitation forecasts from NCEP reanalysis project to station locations and then used them as input for streamflow prediction at four illustrative basins. Collischonn et al. (2007) developed a forecasting scheme of reservoir inflow for hydroelectric reservoirs in Brazil, based on precipitation forecasts. Precipitation forecasts from general circulation models (GCMs) have been demonstrated as useful in improving streamflow forecasting (Sankarasubramanian et al. 2008; Block et al. 2009; Sankarasubramanian et al. 2009; Wang et al. 2013; Wang & Fu 2014; Tsai et al. 2015; Crochemore et al. 2017). Courdent et al. (2018) developed a scheme to distinguish high and low flow domains in urban drainage systems 2 days ahead using numerical weather predictions.

Efforts to improve quantitative precipitation forecasts can be categorized into three general approaches. The first one is to improve numerical modeling of physical processes that drives the amount of future precipitation. Application of data assimilation techniques falls into this category as those techniques typically aim to improve initial conditions for precipitation forecasts in the future. The second one applies statistical methods that aim to capture statistic features of the precipitation pattern. This ranges from traditional time series applications to the recent development of machine learning algorithms. The third is a mixture of the previous two, where statistic methods are used to post-process outputs from numerical models. Various model output statistical approaches have been investigated to improve quantitative precipitation forecasts at different time scales.

Among all the statistical approaches, Hidden Markov Models (HMM) have gained much attention in the research community (Zucchini & Guttorp 1991; MacDonald & Zucchini 1997; Bellone et al. 2000; Greene et al. 2011; Zucchini et al. 2016) as it owns the capacity of capturing the probabilistic transition between different hidden ‘states’ of rainfall events. The hidden states, as indicated by its name, are not explicitly modeled but are implicitly considered for the changes in precipitation amount. Once an HMM model is trained, homogeneous HMM assumes that the probabilities of jumping from one state to another stays the same and the probability density function (PDF) of rainfall amount for a certain state is unchanged. Compared to homogeneous HMM, non-homogeneous HMM (NHMM) allows changes in transition probabilities or PDF for each state due to other driving forces (Hughes et al. 1999; Kim et al. 2008). Instead of using non-homogeneous transition probabilities, Bracken et al. (2014) used an explicit state model that is regressed on climate index. Emission distribution is assumed to be stationary. In addition, parameters of the Hidden Markov Chain are not estimated based on the Bayesian method. Very recently, Bayesian NHMM was developed to capture parameter uncertainties of the Markov model (Robert et al. 2000; Paroli & Spezia 2008).

Although NHMM has been applied to precipitation simulation, Bayesian NHMM is a new development and very few previous studies have examined its application in precipitation simulation. The importance of monthly precipitation simulation in many applications, e.g. water resources management, cannot be overstated. This serves as the primary motivation of this study. This paper focuses on monthly rainfall modeling at multiple stations incorporating climate index, using a Bayesian NHMM algorithm.

The objectives of this study are to: (1) develop a NMHH model for multi-stations in the study area and estimate the model parameter using the Bayesian method; and (2) examine the model's capacity in capturing characteristics associated with monthly precipitation.

This paper is arranged as follows. Background and motivation is provided in this section, followed by a description of the methodology. Study area and rainfall data as well as climate index are described in the subsequent section. The results and discussion are then provided, and finally the conclusions.

METHODOLOGY

Bayesian Non-homogenous Hidden Markov Chain

NHMM has general application and it is described in the context of rainfall modeling in this section. Consider a multivariate discrete time series of rainfall observation yt,s, which is rainfall at time step t at a spatial location s. The observed yt,s is assumed to be a stochastic function of a finite-state Markov process z with a component zt associated with each time step. The hidden state space consists of N possible values, denoting each of the N possible states. For a given time step t, state zt is assumed to be conditioned on only one of its previous time steps next to it, namely t − 1. In essence, the finite-state Markov process has a short memory of one time step and anything beyond one time step is not remembered by the process. Rainfall observation, yt,s, is often described as a stochastic process that follows a certain probability distribution. There are a few terminologies that are used in describing the hidden Markov process. Transition probability Pi,j is described as the chance of transitioning from one state to another Markov state and the transition probability matrix, P, describes transition among all different states. For each possible state there is a probability density function that determines the rainfall amount. The PDF associated with each state are components of the emission matrix. The transition matrix and emission matrix, hence, are the two most important elements in estimating HMM models.

Compared to HMM, NHMM models introduce non-homogeneity by allowing varying components in the transition matrix or emission matrix, depending on other relevant variables. As shown in Figure 1, hidden state zt is determined jointly by the state of the previous time step zt−1 and another variable. In precipitation modeling, for instance, it could be large scale atmospheric teleconnections that leads to such non-stationary transition probabilities. Similarly, rainfall at time step t at a spatial location s, yt,s, does not only depend on the discrete state but also another variable. Two types of exogenous variables W and X are included in the modeling scheme. X is a variable that could affect emission distribution and W could influence transition probabilities. This allows the NHMM model to incorporate any factors that could affect stochastic distribution of the modeling quantity, precipitation in this case. For instance, precipitation seasonality could be easily incorporated. Different approaches have been investigated to estimate parameters, represented in circles in Figure 1 (Rajagopalan et al. 1996; Hughes et al. 1999; Meligkotsidou & Dellaportas 2011; Heaps et al. 2015; Zucchini et al. 2016). Holsclaw et al. (2017) developed a Bayesian approach to estimate all the parameters and quantify corresponding uncertainties. All parameters are sampled via effective Gibbs sampling algorithm; hence, no tuning is needed. The W variable coefficients are sampled through an ordered Multinomial probit (Albert & Chib 1998). The X variable coefficients are sampled through an unordered Multinomial logit model Polya-Gamma formulation (Polson et al. 2013). The hidden states Zt are sampled through a blocked Gibbs sampler. Details of Gibbs sampling for all the parameters are omitted due to space limitation. Interested readers are referred to Holsclaw et al. (2017), where the application of NHMM to simulate daily rainfall data is demonstrated.

Figure 1

The schematic of non-homogeneous Markov Chain model. Current state Zt is dependent on state at previous time step Zt–1 and external variables Xt. Probability density function of attributes yt,s is dependent on its state, as well as any external variables at time t and locations s.

Figure 1

The schematic of non-homogeneous Markov Chain model. Current state Zt is dependent on state at previous time step Zt–1 and external variables Xt. Probability density function of attributes yt,s is dependent on its state, as well as any external variables at time t and locations s.

Precipitation modeling using climate index forecasts

To test the applicability of NHMM model for monthly precipitation simulation, a simulation model is set to fit historical precipitation for both spatial and temporal characteristics. In doing so, seasonality is first removed before fitting any NHMM model. Normal distribution is assumed for emission distribution, which is based on the assumption that normal kernel estimation can be used to mimic any probability distribution functions. Transition probabilities are modeled as non-stationary components of the NHMM model, which is dependent on variation of the Nino 3.4 index.

Model evaluation

Several metrics, including spatial and temporal correlations, are used to evaluate the performance of the simulation model (a description of the metrics is omitted since they are often used in the hydrology community). In addition, root mean absolute error (MSPE) and Nash–Sutcliffe efficiency (NSE) are used to evaluate deterministic forecasts deriving from simulation ensemble for the period 1961–2013. NSE is often used to evaluate how effective the forecasts are, compared to long-term mean values of the quantity of interests. It is calculated based on Equation (1).

The NSE is a normalized statistic that determines the relative magnitude of the residual variance (‘noise’) compared to the measured data variance (‘information’) (Nash & Sutcliffe 1970). NSE indicates how well the plot of observed versus simulated data fits the 1:1 line. NSE is computed as shown in Equation (1):  
formula
(1)
where is the th observation for the constituent being evaluated, is the th simulated value for the constituent being evaluated, is the mean of observed data for the constituent being evaluated, and n is the total number of observations.
In addition, ensemble simulations of NHMM for each month can be easily converted to probabilistic forecasts by counting the number of ensemble members falling into three categories, e.g. below normal (BN), normal (N) and above normal (AN), of the observation data corresponding to each calendar month. Rank Probability Skill (RPS) is used to evaluate the tercile forecasts of the NHMM forecasts. RPS summarizes the sum of square of errors in the cumulative probabilities of the given categorical forecast and the observed category, which has an assigned value of 1. The lower the value of RPS, the smaller the error is between accumulative probabilities and observed category. When RPS is 0, it denotes perfect forecasts. For example, 100% of the probability mass resides in the AN category and the observation is AN. For three-category forecasts used in this study, the upper boundary of RPS is 2 when none of the probability mass resides in the observation category. In calculating the RPS for the reference model (climatology), the same amount of probability mass is assigned for each of the three categories. A long-term average of RPS over the retrospective period is calculated for one, two and three month-ahead monthly precipitation forecasts. The calculation of RPS is defined in Equation (2):  
formula
(2)
where icat is the category number (for instance, 1 for below normal, 2 for near normal, 3 for above normal), ncat is the total number of categories (3 in a tercile-based system), Pcumfct is the cumulative forecast probability up to category icat, and Pcumobs is the comparable term for the cumulative observation ‘probability’.
The Rank Probability Skill Score (RPSS), defined in Equation (3), is employed to compare the improvements using NHMM forecasts with regard to the reference model, which is climatology in this study. For the reference model, it is assumed that the probability of below normal, normal, and AN precipitation is almost equal. Since the 34th percentile of monthly precipitation is used as upper boundary of below normal category and the 67th percentile is used as the lower boundary of AN category, the probability of below normal, normal and AN precipitation is 34, 33, and 33% respectively.  
formula
(3)
where RPS is the average RPS of the proposed model over the retrospective period; and RPS0 is the average RPS of the reference model (climatology) for the retrospective period. When RPSS is positive, NMME forecasts outperform climatology. The RPSS value represents how much of the RPS has been reduced (RPSS > 0) or increased (RPSS < 0) compared to the reference model (climatology). RPSS is calculated for each of the three forecasting leads.

STUDY AREA AND DATA

The proposed methodology is illustrated by monthly precipitation simulation and operational forecasts for NingXiang county (NXC), a southern populous area in China. The county has a total area of 2,905 square kilometers (∼1.121 square miles) and it is part of Weishui basin, which contributes to Xiangjiang River. It has a population of over 1.4 million. Precipitation in the study area shows strong seasonality and inter-annual variabilities. Average annual rainfall for NCX is about 1,360–1,750 mm with a rainy season in boreal summer (May–July). Nearly 43% of rainfall over a year is received during the summer rainy season. Variations in monthly precipitation amount is important for local water resources planning and management, food production and human safety, among which summer flooding is one of the primary concerns. For instance, during the 1998 Yangtze flood, rainfall at the main stations in NXC was between 540 and 650 mm. In the summer of 2017, a regional storm at Xiangjiang river basin caused detrimental loss of life and property due to intensive rainfall in a few days. Therefore, forecasting precipitation amount in 1–3 months is of great importance.

Monthly precipitation data at 14 rainfall stations, from 1961 to 2013, were obtained, as shown in Figure 2. The stations are scattered in the region of NXC and their names are provided in Table 1. Monthly time series of rainfall over 636 months are available at each station. Preliminary analyses using one station clearly shows intra-annual seasonality with a summer peak. It is also found that de-seasonalized monthly rainfall follows a normal distribution after lognormal pretreatment. Lilliefors tests were applied to de-seasonalized monthly rainfall time series at all stations. They follow normal distribution at 5% significance level and Figure 3(b) shows a histogram of the de-seasonalized time series for illustrative purposes.

Table 1

Mean of emission distribution for the three states across all the 14 stations

Station ID Station name State I (mm) State II (mm) State III (mm) 
Shibazi 70.0 134.3 214.9 
Ningxiang 69.7 129.8 203.5 
Huangcai 71.9 127.6 193.9 
Daolin 62.8 121.6 196.9 
Xiangzikou 71.8 129.0 199.7 
Weishan 82.6 148.3 224.5 
Qingshanqiao 64.4 126.4 199.4 
Liushahe 63.6 124.4 199.4 
Waniao 65.1 126.8 205.8 
10 Shangfupu 67.1 127.9 203.8 
11 Bashitou 65.2 127.0 210.9 
12 Shiluoshan 64.5 126.8 207.5 
13 Tanmuqiao 69.4 130.6 209.1 
14 Donghutang 67.0 128.8 206.6 
Station ID Station name State I (mm) State II (mm) State III (mm) 
Shibazi 70.0 134.3 214.9 
Ningxiang 69.7 129.8 203.5 
Huangcai 71.9 127.6 193.9 
Daolin 62.8 121.6 196.9 
Xiangzikou 71.8 129.0 199.7 
Weishan 82.6 148.3 224.5 
Qingshanqiao 64.4 126.4 199.4 
Liushahe 63.6 124.4 199.4 
Waniao 65.1 126.8 205.8 
10 Shangfupu 67.1 127.9 203.8 
11 Bashitou 65.2 127.0 210.9 
12 Shiluoshan 64.5 126.8 207.5 
13 Tanmuqiao 69.4 130.6 209.1 
14 Donghutang 67.0 128.8 206.6 
Figure 2

The 14 rainfall stations in the study area, Ningxiang County (NXC), located in southern inland China.

Figure 2

The 14 rainfall stations in the study area, Ningxiang County (NXC), located in southern inland China.

Figure 3

(a) Histogram of monthly precipitation between 1961 and 2013 for rainfall station Shibazi at NXC region. (b) Histogram of de-seasonalized data of monthly precipitation between 1961 and 2013 for rainfall station Shibazi at NXC region.

Figure 3

(a) Histogram of monthly precipitation between 1961 and 2013 for rainfall station Shibazi at NXC region. (b) Histogram of de-seasonalized data of monthly precipitation between 1961 and 2013 for rainfall station Shibazi at NXC region.

El Niño-Southern Oscillation (ENSO) phenomena, characterized by inter-annual sea surface temperature (SST) variations in the eastern-to-central equatorial Pacific, is identified as the most dominant global climatic teleconnections that affects climate variability at the sub-decadal scale (Rasmusson & Carpenter 1982; Rasmusson & Wallace 1983). Peel et al. (2002) investigated the relationship between annual precipitation variability and ENSO on a global scale and found that the annual precipitation variability is higher in the ENSO-influenced continent. It has been employed to improve seasonal streamflow and precipitation forecasts (Quan et al. 2006; Devineni & Sankarasubramanian 2010). In this study, the Nino-3.4 index reflects the SST anomaly over the region bounded by 120 W–170 W and 5S–5N is used. Monthly Nino 3.4 and retrospective precipitation forecasts for the study area were obtained from the International Research Institute for Climate and Society (http://portal.iri.columbia.edu) for the period 1961–2013.

RESULTS AND DISCUSSION

Monthly precipitation for all the 14 stations in NXC region presents strong seasonality, as shown in Figure 4. The late boreal spring and summer season (April–June) is of the highest precipitation within the range of 200–250 mm each month, on average, over 53 years. The boreal winter months of November–January are the dry season with long-term average precipitation of about 50 mm. Besides evident seasonality, strong inter-annual variability is also observed for monthly precipitation. The coefficient of variation for each calendar month for each gage station was calculated and is presented in Figure 4(b). Not surprisingly, the coefficient of variation for the wet season is smaller compared to the dry season, where August and September are of the largest coefficient of variation. The coefficient of variation for the winter season is about 0.6–0.8 and it is 0.3–0.5 for the boreal spring season. This denotes that the inter-annual variability of precipitation for those months is large. Since surface water storage reservoirs in NXC region are within-year reservoirs that largely depend on inflow in summer and fall seasons to fill it back to full. Hence, below normal precipitation in the dry fall could often lead to below normal reservoir storage. A following dry winter season could pose challenges for meeting regional water demand. Therefore, improved prediction of monthly precipitation is informative in implementing drought mitigation strategies. Intra-annual and inter-annual variabilities are properties related to temporal characteristics of monthly precipitation in the NXC region.

Figure 4

(a) Average monthly precipitation for each rainfall stations in NXC region. Intra-annual seasonality across all the 14 stations are nearly the same due to the short distance between each other. (b) Inter-annual variability of monthly precipitation quantified by coefficient of variation, for each calendar month. Boreal summer months July–September are of the highest coefficient of variation.

Figure 4

(a) Average monthly precipitation for each rainfall stations in NXC region. Intra-annual seasonality across all the 14 stations are nearly the same due to the short distance between each other. (b) Inter-annual variability of monthly precipitation quantified by coefficient of variation, for each calendar month. Boreal summer months July–September are of the highest coefficient of variation.

Spatial correlation of rainfall

Spatial characteristics of regional precipitation are primarily reflected in spatial correlation. Monthly precipitation data for each station are pulled first and spatial correlation among the 14 rainfall stations is then calculated. The results are presented in Figure 5, all of which are statistically significant at a confidence level of 0.95. Additional analysis of spatial correlation for each calendar month shows similar results (not shown). This indicates that there is little spatial inhomogeneity in rainfall across the region at monthly time scale and high consistency in dry or wet states among all the stations. Such spatial characteristics should be well recognized and can be used as one criteria in evaluating the proposed modeling framework.

Figure 5

Spatial spearman correlation calculated for each pair of rainfall stations in the study area, using monthly precipitation data in the period 1961–2013. All of these are statistically significant at a confidence level of 0.95.

Figure 5

Spatial spearman correlation calculated for each pair of rainfall stations in the study area, using monthly precipitation data in the period 1961–2013. All of these are statistically significant at a confidence level of 0.95.

Nino 3.4 is the averaged SST calculated from 5S–5N and 170–120 W, which is used as an ocean-atmospheric teleconnection index of ENSO phenomena. Figure 6(a) shows raw Nino 3.4 data for the period 1961–2013, which is found to have a low frequency oscillation of 2–7 years. Statistically significant correlation between Nino 3.4 and precipitation is observed for all the stations. For instance, Figure 6(b) is a scatterplot of Nino 3.4 and monthly precipitation at station ID 1. Correlation between the two attributes is 0.34, which is statistically significant at a confidence level of 0.95.

Figure 6

(a) Time series of teleconnection index Nino 3.4 for the study period. (b) Scatterplot of Nino 3.4 index and monthly precipitation at rainfall station 1.

Figure 6

(a) Time series of teleconnection index Nino 3.4 for the study period. (b) Scatterplot of Nino 3.4 index and monthly precipitation at rainfall station 1.

NHMM model fitting

Before fitting the NHMM model for monthly precipitation over 53 years at all the 14 stations, the time series at each station is first de-seasonalized and a natural logarithm is then applied to the de-seasonalized data. The de-seasonalizing process first determines the trend component using a moving average and removes it from the time series of monthly precipitation. Then, the seasonal component is computed for each calendar month by averaging over all periods. The seasonal component is then centered. Finally, the error component is determined by a removing trend and seasonal component from the original time series of monthly precipitation (Kendall & Stuart 1983).

In fitting the NHMM model, three states implicitly denoting below normal, normal and AN conditions across the region are used. The 34th percentile of monthly precipitation is used as the upper boundary of below normal category, while the 67th percentile is used as the lower boundary of the AN category. Preliminary analysis found that de-seasonalized data follows normal distribution (Figure 3(b)). Hence, normal distribution is assumed for the emission distributions for each state. In Bayesian sampling, 500 iterations of Markov Chain Monte Carlo (MCMC) sampler are used to sample from the posterior distribution of all parameters of interest. To examine if the estimated NHMM model can capture the statistical properties of the historical data well, an ensemble of 100 simulations, each using estimated states of monthly precipitation and emission distribution, are generated. As stated in the methodology section, the proposed model uses all 53 years' data and the primary objective is to evaluate if the model captures all the statistics related to regional monthly precipitation.

Figure 7 depicts transition probabilities among all three states of the NHMM model. Sub-plots across each row correspond to probabilities transiting from one to the other states. The X axis of all the sub-plots denotes time in months. Compared to HMM, transition probabilities vary in time depending on external variables, e.g. Nino3.4 in this case. For instance, the probability of transiting from state I to state II is from 0.38 to 0.52, as shown in the middle of the first row in Figure 7. There is one state of precipitation condition associated with all the 14 stations in the region. The most probable state for one month is defined as the dominant state for all MCMC iterations, while each iteration may not infer the same state over time. The most probable sequence from all iterations indicates the relative frequency of state 1 is 269 out of 636 months (42.3%) and it is 35.4 and 22.3% for states 2 and 3 respectively.

Figure 7

Transition probabilities for each pair of the three states used in NHMM model of monthly precipitation.

Figure 7

Transition probabilities for each pair of the three states used in NHMM model of monthly precipitation.

Emission distribution is a key component of the NMME model, which defines the probability density functions for each state of monthly precipitation. Table 1 lists the mean of the normal distribution associated with each state of NMME model for all the 14 rainfall stations. Hence, three different normal distributions can represent low, normal and high precipitation across all the months. Low precipitation is associated with state 1 and high precipitation is associated with state 3.

Model evaluation

To evaluate the performance of the NHMM model for historical data, simulated precipitation is compared to observation for each rainfall station. The first is to examine if intra-annual seasonality is preserved. For each member of the 200 simulation ensembles, average precipitation is obtained for each calendar month. Using ensemble members, a time series comprised of 200 mean values are available for one calendar month. The 95th prediction interval defined as the interval between the 2.5th and 97.5th percentile of the time series is then calculated. This procedure is repeated for all the twelve calendar months, station by station. A simple verification is to examine if average precipitation calculated from observed data falls into the 95th prediction interval. Figure 8 shows such evaluation for four rainfall stations and results for the rest are similar but are not shown here. It is found that monthly mean precipitation from the observed data is well bounded by the prediction intervals for all the stations.

Figure 8

The 95% prediction interval of monthly mean precipitation calculated from 200 ensemble members for different stations.

Figure 8

The 95% prediction interval of monthly mean precipitation calculated from 200 ensemble members for different stations.

To examine if inter-annual variabilities exhibiting in monthly precipitation is captured by the fitted NHMM model, the median of ensembles at each month across the 200 ensemble members are used as a deterministic simulation. Spearman correlation between ensemble median and observed time series is within the range of 0.87–0.90 for all rainfall stations. Figure 9(a) shows scatterplots between observation and ensemble median for one station as an illustrative example. Another aspect worth examining is spatial correlation among all the rainfall stations for simulation data. Analyses show that such spatial characteristics are well preserved by ensemble median, where the Spearman correlation is greater than 0.9 for each pair of rainfall gages. Figure 9(b) displays a scatterplot of median ensemble for two stations for illustrative purposes.

Figure 9

(a) Scatter-plot of observation and conditional mean of simulation ensembles for station 1; (b) scatter-plot of ensemble mean between station 1 and station 2 with a high correlation of 0.95.

Figure 9

(a) Scatter-plot of observation and conditional mean of simulation ensembles for station 1; (b) scatter-plot of ensemble mean between station 1 and station 2 with a high correlation of 0.95.

Evaluation metrics discussed in the methodology section are used to evaluate the performance of the simulation model. Table 2 lists evaluation results for each calendar month. For each evaluation metric, results from all the 14 stations are provided as a range. For instance, correlation between NHMM model simulation and observation of January ranges from 0.83 to 0.91 among the 14 stations.

Table 2

Evaluation of monthly precipitation simulation for historical years 1961–2013

  Correlation
 
MSE (mm2)
 
RMSE (mm)
 
NSC
 
Min Max Min Max Min Max Min Max 
Jan 0.83 0.91 251.15 546.28 15.85 23.37 0.60 0.79 
Feb 0.80 0.91 373.35 849.11 19.32 29.14 0.54 0.79 
Mar 0.76 0.90 646.69 1,447.62 25.43 38.05 0.56 0.78 
Apr 0.82 0.90 923.33 1,765.79 30.39 42.02 0.55 0.77 
May 0.72 0.87 1,587.02 3,736.58 39.84 61.13 0.52 0.74 
Jun 0.63 0.85 3,762.47 6,042.59 61.34 77.73 0.39 0.64 
Jul 0.73 0.90 1,235.98 4,848.85 35.16 69.63 0.52 0.80 
Aug 0.61 0.82 2,675.47 7,147.89 51.72 84.55 0.38 0.65 
Sep 0.81 0.88 625.47 1,138.28 25.01 33.74 0.48 0.77 
Oct 0.76 0.92 521.17 1,249.45 22.83 35.35 0.57 0.84 
Nov 0.87 0.94 342.35 765.37 18.50 27.67 0.74 0.87 
Dec 0.70 0.86 351.87 766.35 18.76 27.68 0.34 0.59 
  Correlation
 
MSE (mm2)
 
RMSE (mm)
 
NSC
 
Min Max Min Max Min Max Min Max 
Jan 0.83 0.91 251.15 546.28 15.85 23.37 0.60 0.79 
Feb 0.80 0.91 373.35 849.11 19.32 29.14 0.54 0.79 
Mar 0.76 0.90 646.69 1,447.62 25.43 38.05 0.56 0.78 
Apr 0.82 0.90 923.33 1,765.79 30.39 42.02 0.55 0.77 
May 0.72 0.87 1,587.02 3,736.58 39.84 61.13 0.52 0.74 
Jun 0.63 0.85 3,762.47 6,042.59 61.34 77.73 0.39 0.64 
Jul 0.73 0.90 1,235.98 4,848.85 35.16 69.63 0.52 0.80 
Aug 0.61 0.82 2,675.47 7,147.89 51.72 84.55 0.38 0.65 
Sep 0.81 0.88 625.47 1,138.28 25.01 33.74 0.48 0.77 
Oct 0.76 0.92 521.17 1,249.45 22.83 35.35 0.57 0.84 
Nov 0.87 0.94 342.35 765.37 18.50 27.67 0.74 0.87 
Dec 0.70 0.86 351.87 766.35 18.76 27.68 0.34 0.59 

RPS is calculated for the simulation period across all the 14 rainfall stations. It was then compared to RPS of climatology at each station, resulting in a RPSS. Figure 10 presents boxplots of RPSS across all the 14 rainfall stations in the study area. It is found that median RPSS is greater than 0.5 for all stations, denoting that the RPS calculated from the NHMM model is at least 50% less than that of climatology (Equation (3)).

Figure 10

Boxplots of rank probability skill score of ensemble simulation of monthly precipitation across all the 14 stations in the study area.

Figure 10

Boxplots of rank probability skill score of ensemble simulation of monthly precipitation across all the 14 stations in the study area.

CONCLUSIONS

Monthly precipitation simulation and forecasts are useful in many applications, e.g. streamflow forecasts and water resources management. This study examines the use of Bayesian Non-homogeneous Markov Chain model in modeling monthly precipitation data, incorporating climate index and climate model forecasted gridded precipitation. One major advantage of the NHMM is to allow variations in transition probabilities or emission probabilities depending on external factors. The proposed approach was demonstrated using precipitation data over 1961–2003 for 14 stations in NXC, a southern county in China. The conditional mean of 200 ensemble members at each time period can be treated as deterministic simulation, while probabilistic outcome can be derived from ensemble members. The results indicate that the NHMM model captures the characteristics of precipitation data well, e.g. inter-annual and intra-annual variabilities, spatial correlation matrix. The 95% prediction intervals of the long-term average precipitation capture the long-term mean of monthly precipitation. The spatial correlation of monthly precipitation among different rainfall stations are also preserved. Both deterministic and probabilistic results for the simulation time period 1961–2013 were evaluated by multiple metrics, including mean square error, root mean square error and rank probability score. The results for different seasons reveal that simulation performance is better for the non-rainy season than for the rainy season. The RPS values are compared to the RPS of climatology calculated from each calendar month, resulting in RPSS. The RPSS is greater than 0.5 for half of the simulation period across all the stations, denoting a significant improvement of probabilistic precipitation. The framework demonstrated in this study is useful in generating stochastic rainfall in historical years and has general applications to other areas. Further extension of this study will be investigated later to examine its application in real-time precipitation forecasts and short-term water resources management.

ACKNOWLEDGEMENTS

This research was partially supported by the National Natural Science Foundation of China (51809020 and 51839002), Natural Science Foundation of Hunan Province, China (2016JJ3011 and 2018JJ3535). The authors appreciate the two anonymous reviewers whose comments have substantially improved the manuscript.

REFERENCES

REFERENCES
Albert
J. H.
&
Chib
S.
1998
Bayesian analysis of binary and polychotomous response data
.
J. Am. Stat. Assoc.
88
,
669
679
.
Block
P.
,
Filho
A. S.
,
Sun
L.
&
Kwon
H.
2009
A streamflow forecasting framework using multiple climate and hydrological models
.
J. Am. Water Resour. Assoc.
45
(
4
),
828
843
.
Bracken
C.
,
Rajagopalan
B.
&
Zagona
E.
2014
A hidden Markov model combined with climate indices for multidecadal streamflow simulation
.
Water Resour. Res.
50
,
7836
7846
.
Brown
B. G.
,
Katz
R. W.
&
Murphy
A. H.
1986
On the economic value of seasonal-precipitation forecasts: the fallowing/planting problem
.
Bull. Am. Meteorol. Soc.
67
(
7
),
833
841
.
Clark
P. M.
&
Hay
L. E.
2004
Use of medium-range numerical weather prediction model output to produce forecasts of streamflow
.
J. Hydrometeorol.
5
(
1
),
1525
7541
.
Collischonn
W.
,
Tucci
C. E. M.
,
Clarke
R. T.
,
Chou
S. C.
,
Guilhon
L. G.
,
Cataldi
M.
&
Allasia
D.
2007
Medium-range reservoir inflow predictions based on quantitative precipitation forecasts
.
J. Hydrol.
344
(
1–2
),
112
122
.
Crochemore
L.
,
Ramos
M.-H.
,
Pappenberger
F.
&
Perrin
C.
2017
Seasonal streamflow forecasting by conditioning climatology with precipitation indices
.
Hydrol. Earth Syst. Sci.
21
,
1573
1591
.
Heaps
S. E.
,
Boys
J. R.
&
Farrow
M.
2015
Bayesian modelling of rainfall data by using non-homogeneous hidden Markov models and latent Gaussian variables
.
J. R. Stat. Soc. Ser. C. Appl. Stat.
64
,
543
568
.
Holsclaw
T.
,
Greene
A. M.
,
Robertson
A. W.
&
Smyth
P.
2017
Bayesian nonhomogeneous Markov models via Pólya-Gamma data augmentation with applications to rainfall modeling
.
Ann. Appl. Stat.
11
(
1
),
393
426
.
Hughes
J. P.
,
Guttorp
P.
&
Charles
P. S.
1999
A non-homogeneous hidden Markov model for precipitation occurrence
.
J. R. Stat. Soc. Ser. C. Appl. Stat.
48
,
15
30
.
Kendall
M.
&
Stuart
A.
1983
The Advanced Theory of Statistics
,
Vol. 3
.
Griffin
,
London
, pp.
410
414
.
MacDonald
I. L.
&
Zucchini
W.
1997
Hidden Markov and Other Models for Discrete-Valued Time Series. Monographs on Statistics and Applied Probability
.
Chapman & Hall
,
London
.
Meligkotsidou
L.
&
Dellaportas
P.
2011
Forecasting with non-homogeneous hidden Markov models
.
Stat. Comput.
21
,
439
449
.
Peel
M. C.
,
McMahon
T. A.
&
Finlayson
B. L.
2002
Variability of annual precipitation and its relation to El Niño-Southern oscillation
.
J. Clim.
15
(
3
),
545
551
.
Polson
N. G.
,
Scott
J. G.
&
Windle
J.
2013
Bayesian inference for logistic models using Poly-Gamma latent variables
.
J. Am. Stat. Assoc.
108
,
1339
1349
.
Quan
X.
,
Hoerling
M.
,
Whitaker
J.
,
Bates
G.
&
Xu
T.
2006
Diagnosing sources of US seasonal forecast skill
.
J. Clim.
19
,
3279
3293
.
Rajagopalan
B.
,
Lall
U.
&
Tarboton
G. D.
1996
Nonhomogeneous Markov Model for daily precipitation
.
J. Hydrol. Eng.
1
(
1
),
33
40
.
Rasmusson
E. M.
&
Wallace
J. M.
1983
Meteorological aspects of the El Niño/Southern oscillation
.
Science
222
,
1195
1202
.
Robert
C. P.
,
Ryden
T.
&
Titterington
D. M.
2000
Bayesian inference in hidden Markov models through reversible jump Markov chain Monte Carlo method
.
J. R. Stat. Soc. Ser. B. Stat. Methodol.
62
,
57
75
.
Wang
H.
,
Sankarasubramanian
A.
&
Ranjithan
R. S.
2013
Integration of climate and weather information for improving 15-day-ahead accumulated precipitation forecasts
.
J. Hydrometeorol.
14
(
1
),
186
202
.
Zucchini
W.
&
Guttorp
P.
1991
A hidden Markov model for space-time precipitation
.
Water Resour. Res.
27
,
1917
1923
.
Zucchini
W.
,
MacDonald
I.
&
Langrock
R.
2016
Hidden Markov Models for Time Series, An Introduction Using R
.
Chapman & Hall
,
Boca Raton, FL
.