## Abstract

This paper presents a new Bayesian probabilistic forecast (BPF) model to improve the efficiency and reliability of normal distribution transformation and to describe the uncertainties of medium-range forecasting inflows with 10 days forecast horizons. In this model, the inflow data will be transformed twice to a standard normal distribution. The Box–Cox (BC) model is first used to quickly transform the inflow data with a normal distribution, and then, the transformed data are converted to a standard normal distribution by the meta-Gaussian (MG) model. Based on the transformed inflows in the standard normal distribution, the prior and likelihood density functions of the BPF are established, respectively. In this study, the newly developed model is tested on China's Huanren hydropower reservoir and is compared with BPFs using MG and BC, separately. Comparative results show that the new BPF model exhibits significantly improved data transformation efficiency and forecast accuracy.

## INTRODUCTION

There are significant hydrologic uncertainties in the simulated and forecasted processes that limited the practical application of the forecast information. Many methods have been proposed to analyze the uncertainties of the hydrological data, in which the Bayesian hydrological probability forecasting based on the Bayesian theory is the most widely applied method (Krzysztofowicz 1999; Biondi & Luca 2013; Liu *et al.* 2016; Han & Coulibaly 2017). The Bayesian probabilistic forecast (BPF) model is currently the most efficient model to address the uncertainties of the hydrologic process (Han & Coulibaly 2017). In the BPF model, the data of the hydrological process are generally assumed as a linear-normal distribution. The uncertainties of the forecasting inflows mainly come from the randomness of the runoff and the hydrology model (Krzysztofowicz 1999; Krzysztofowicz & Kelly 2000). The uncertainties of the inflow process and the hydrology model can be described by prior density and likelihood density functions. Based on the prior density and likelihood density functions, the posterior probability density function of the forecasting inflows can be generated by the Bayesian theory (Krzysztofowicz & Kelly 2000; Krzysztofowicz & Maranzano 2004).

The BPF model gives the uncertainty distributions of the forecasting values based on the statistical historical information. The BPF model has the corrective capability for forecasting information and thus provides better information (Han & Coulibaly 2017; Liang *et al.* 2017; Ma *et al.* 2018). The BPF model adopts a linearization description to the observed and the forecasting inflows and then obtains the analytical solution of the posterior density function (Krzysztofowicz & Maranzano 2004; Krzysztofowicz 2014). Since Krzysztofowicz introduced a Bayesian forecasting system in 1999, it has been gaining in popularity worldwide, and various improved models were developed to improve the efficiency (Han & Coulibaly 2017; Barbetta *et al.* 2018). Reggiani & Weerts (2008) used a Bayesian forecasting system to obtain the posterior probability distributions for making effective decisions under inflow uncertainties during flood periods. Biondi *et al.* (2010) applied the hydrologic uncertainty processor (HUP) proposed by Krzystofowicz in 1999 to a small semi-arid watershed in southern Italy. Ma *et al.* (2013) established a Bayesian statistical forecasting theory for middle and long-term runoff forecasting at a hydropower reservoir to study the mathematical forecasting theory, and the precision of forecasting inflows was improved significantly. Liu *et al.* (2015) applied the quantile water inflows of the Bayesian forecasting system for dynamic control of the flood limiting water level in the Three Gorges Reservoir, and the impacts of the input uncertainty were evaluated concerning benefits, dam safety, and downstream effects during the flood season. Barbetta *et al.* (2016) constructed a model conditional processor based on the Bayesian forecasting approach to address flood forecasting and warning systems in real-time operating. Ahmadisharaf *et al.* (2018) proposed a probabilistic framework to evaluate the impact of the uncertainties on six hydrograph attributes, and the sensitives of the attributes were analyzed.

The Bayesian forecasting system can improve the accuracy of the forecasting information; however, the characteristics of the non-linear and non-normal distribution of the hydrological processes increase the difficulty to modeling. Much effort have been made to solve this problem. Liu *et al.* (2016) investigated and compared the capability of three updating procedures, namely, autoregressive (AR) model, recursive least-squares model and HUP in the real-time flood forecasting. Liu *et al.* (2017) proposed a copula-based HUP model based on a copula function, in which the prior and likelihood density functions were explicitly expressed, and the posterior density function generated by using Monte Carlo sampling. Liang *et al.* (2017) presented a hybrid model called SVR-HUP (support vector regression) by combining the support vector regression and HUP to predict long-term runoff and quantify the uncertainties. Xie *et al.* (2019) proposed a Bayesian probabilistic forecasting framework to characterize the uncertainties of the wind forecasts in the wind power generation operation, in which the infinite Markov switching AR model was developed to describe the non-linear data. In the above studies, the non-linear data of the hydrological processes are generally characterized by the complex non-linear mathematical models to make the residual errors closer to the normal distribution.

The Bayesian probabilistic forecasting models, combining non-linear mathematical models, are too complicated in real-time operation, and the theories are difficult for the operator to understand. In addition to the non-linear model, the accuracy of the BPF model is also affected by the transformation model (Biondi & Luca 2013; Chen *et al.* 2015). The transformation model with high efficiency and stability can reduce the complexity of the BPF model and makes the BPF model run faster. Currently, two types of transformation method are used to transform the hydrologic processes, i.e., the meta-Gaussian (MG) model and the Box–Cox (BC) transformation model. Krzysztofowicz (1999) transformed the observed and forecasted data into Gaussian or normal space using a MG model, i.e., normal quantile transform (NQT). Then, the MG model was widely used to transform the non-normal data into normal distribution space for the Bayesian forecasting model (Biondi & Luca 2013; Liu *et al.* 2015; Hao *et al.* 2016; Barbetta *et al.* 2018). Weerts *et al.* (2011) proposed a quantile regression model to estimate the predictive uncertainty, and the MG model was used to transform the residual of the rainfall-runoff. Chen *et al.* (2013) constructed a normal-linear BPF and MG BPF to convert deterministic forecasts for Bayesian probabilistic forecasting, and the performances demonstrated that the MG BPF has better applicability. In the MG model, the marginal distribution function of the data needs to be determined before transforming (Krzysztofowicz 1999; Krzysztofowicz & Maranzano 2004; Tanessong *et al.* 2017). However, in real-time operation, the marginal distribution function is difficult to identify from the sets of distributions. Since Box and Cox proposed the BC model in 1964, it has been widely used in non-normal data transformation for only one parameter needs to be determined, especially in hydrological Bayesian model averaging (BMA). In the BMA model, the hydrological data, e.g., streamflow, precipitation, and evaporation, are commonly transformed to ensure that the converted data are approximately Gaussian distribution (Zhang *et al.* 2011; Madadgar & Moradkhani 2014; Chen *et al.* 2015). Madadgar & Moradkhani (2014) constructed a BMA model, in which the BC transformation was used to transform the inflows. Qu *et al.* (2017) developed the BC transformation to convert the runoff forecasts of the Fu River basin in China to satisfy the assumption of a normal distribution. Zhong *et al.* (2018) developed a Bayesian averaging model, and the BC transformation was performed on both the observed inflows and the ensemble forecasting members. Ma *et al.* (2018) proposed a general framework for blending multiple satellite precipitation data using the dynamic BMA, and the data were transformed by the BC model.

The primary purpose of this paper is to study the efficiency and stability of the normal distribution transformation models by using numerical weather forecast information to forecast 10 days inflows in the future. In the study, the MG model and the BC transformation model (BC) were first constructed as a normal distribution transformation model. Then, the new model is proposed by combining the BC and MG models, which is represented as a BC–MG model. Based on the three transformation models mentioned above, the observed and forecasted inflows of the Huanren Reservoir in China were transformed and the probabilistic inflow forecasts with 10 days horizon were simulated, respectively. Finally, the efficiency and stability of the probabilistic inflow forecasts were evaluated and compared.

## METHODS

### Transformation model

#### MG model

*H*and

_{k}*S*represent the observed and forecasted inflows, respectively;

_{k}*K*

_{1}and

*K*

_{2}represent the number of observed and forecasted inflows, respectively;

*Q*represents the standard normal distribution function;

*Q*

^{−1}is its inverse function; and

*W*and

_{k}*X*represent the normal quantiles of the

_{k}*H*and

_{k}*S*, respectively; and represent the marginal distribution functions of the

_{k}*H*and

_{k}*S*, respectively.

_{k}#### BC model

By using the MG model, the marginal distribution function should be determined at first. In real-time operation, it is difficult to identify the marginal distribution function and calibrate the parameters. However, the BC model can transform the non-normal distribution data into a normal distribution, and only one parameter needs to be calibrated.

In the BPF model, the observed and the forecasted inflows need to be transformed to the normal distribution space, and then, the distribution functions of the prior and likelihood of the Bayesian theory are established, respectively. In the BPF model, the prior and likelihood density functions must be established in the same space. This study uses a uniform parameter of the BC to transform the observed and the forecasted inflows at the same time period to ensure that the inflows are transformed to the same space. The uniform parameter is generated by averaging the parameters, which are estimated by the observed and the forecasted inflows, separately.

*et al.*2015). The BC model consists of input data, a transformation parameter, and output data. represents the input data sequence,

*λ*represents the transformation parameter, and is the output data sequence. where

*x*represents the flow information, and the value is greater than zero. If there is a number less than zero, the whole sequence needs to be shifted to make the number greater than zero, meaning (

_{j}*x*+

_{j}*a*) > 0. The

*λ*can be estimated by the maximum likelihood model.

The *λ* estimation by the maximum likelihood model is as follows:

*y*is defined as . The likelihood function contains the parameters of

_{j}*β*and

*σ*

^{2}, which can be obtained by fixing the

*λ*, as shown in Equation (4).

#### BC–MG model

In the BPF model, the observed and forecasted inflows need to be transformed, respectively. When using the BC model with the uniform parameter to transform the observed and forecasted inflows, the inflows will be transformed into the same distribution space. In this way, the new uncertainties will be generated in the BPF model for the two inflows obeying different marginal distributions. When using different parameters, the observed and forecasted inflows will be transformed into different distribution spaces. Using the transformed inflows in different spaces as inputs for the BPF model, the results are unreasonable.

To solve the problems as mentioned above, the BC–MG model was proposed to define the marginal distribution function in real-time operation as quickly as possible. In the BC–MG model, the BC model is used first to transform the observed and forecasted inflows in different normal distribution spaces, respectively. Then, the parameters of the normal distribution functions for the transformed observed and forecasted inflows can be calibrated directly. Next, the transformed inflows are transformed again from the normal distribution to the standard normal distribution in a friendly way using the MG model. In the transformation processes, when the inflows cannot be transformed into normal distribution using the BC parameter during [−0.8, 0.8], then the MG needs to test different candidate distributions for the marginal distribution function. Based on the observed and forecasted inflows in the standard normal distribution space, the parameters of the prior and likelihood density functions are determined.

#### BPF model

Due to the randomness of the inflow processes and the uncertainty of the forecast information, the efficiency of the forecasting inflows in practical applications is reduced. The BPF model is the best methodology to couple the randomness of the inflows and the uncertainty of forecasting inflows, and to provide more inflow information for practical applications, which is beneficial to improve the application efficiency and reliability of the forecasting inflows. The BPF model was established first by Krzysztofowicz (1999) and applied to short-range flood forecasting. This study establishes the BPF model for the medium-range inflow forecasting with 10 days forecast horizon. In this study, the prior and likelihood density functions of the BPF are established based on the transformed observed and forecasted inflows.

*H*

_{0}represents the observed flow at the current time period, and

*H*(

_{k}*k*= 1, 2, …,

*K*) represents the observed flows in the future. The

*K*represents the forecast horizon. Similarly,

*S*(

_{k}*k*= 1, 2, …,

*K*) denotes the deterministic forecasting inflows with

*k*forecast horizon. The symbols of

*h*

_{0},

*h*, and

_{k}*s*represent the actual values of the

_{k}*H*

_{0},

*H*, and

_{k}*S*, respectively. When the observed inflow is at the current time period and the forecasting inflow with

_{k}*k*forecast horizon is , the posterior density function of the actual inflow at the forecast horizon

*k*is represented as follows: where represents the posterior density function, represents the likelihood density function, and represents the prior density distribution.

#### Verification metrics

In this study, these performances of the BPF models are evaluated using the single value and confidence interval. The single value is evaluated by root mean square error (RMSE). The interval performances are evaluated using the indicators of average relative interval length (ARIL), percentage of observations bracketed by the confidence interval (PCI), and percentage of observations bracketed by the unit confidence interval (PUCI) (Li *et al.* 2011).

*t*, represents the forecasting values, and represents the length of the entire simulation period.

*n*is the number of the observed inflows that occur in the forecast intervals.

*W*represents the number of the transformed data.

## CASE STUDY

### Huanren hydropower reservoir

In this study, the Huanren hydropower reservoir is taken as an example, which is located in the middle and lower reaches of the Hunjiang River in the northeast of China, as shown in Figure 1. There is approximately 10,400 km² basin area, and the annual average precipitation is about 860 mm. Precipitation in dry and wet seasons varies significantly, and about 75% of the precipitation occurs in the wet seasons (from May to September).

### Forecasted and observed data

In this study, the observed precipitation and inflow data of the Huanren Reservoir are obtained from 1967 to 2010. The observed data are used to calibrate and verify the forecasting model. The medium-term numerical weather prediction data of the GFS from 2001 to 2010 is used to forecast the inflows by using the hydrology model. The GFS is developed by the U.S. National Centers for Environmental Prediction. The hydrology model was constructed for the study basin by combining the multiple linear regression model and the Xin'anjiang model (Xu *et al.* 2013). The observed inflows from 1968 to 2007 and the deterministic forecasted inflows from 2001 to 2007 were used to calibrate the model parameters, and the observed and forecasted data from 2008 to 2010 were used to verify the models.

Based on the transformation models, the observed and forecasted data were transformed first by the MG, BC, and BC–MG, respectively. Then, the BPF–MG, BPF–BC, and BPF–BC–MG models were constructed using the transformed data to calibrate the model parameters, respectively. In real-time operation, the inflows of the *h*_{0}, *h _{k}*, and

*s*for each time period as input data were used by the three models to make probability forecasts, respectively.

_{k}## RESULTS AND DISCUSSION

### Data preprocessing

In this study, the inflow data in a year were divided into 36 time periods, and the inflows for each day in each period with 10-day time horizons were used to calibrate the parameters. In the BPF model, the data of the *h _{k}* and

*s*should be transformed into the same distribution space. Then, the data can be used to forecast and simulate. However, in the BPF–BC model, the data of the

_{k}*h*and

_{k}*s*may have different transformation parameters. In this study, the parameters in the BPF–BC at the same time period were averaged, and the average parameter was used to transform the data into the same distribution space.

_{k}### Transformation performances

In the BPF model, the transformation of the forecasted inflow data is relatively easy for the distribution of the forecast residual is closer to the normal distribution. However, the observed inflow data are difficult to transform by the transformation models. To compare the transformation ability of the MG, BC, and BC–MG models on the observed inflows, this study evaluates the efficiency of the models by the AREQ indicator, as shown in Figure 2. The AREQ in Figure 2 indicates that the efficiency of the BC–MG model is significantly better than the MG and BC models from 1 day to 10 days.

Due to the large number of time periods and time horizons, this study only lists the 12 time periods in the observed inflow transformation results, as shown in Figure 3, which are transformed by the MG and BC–MG with 1-day time horizon. In Figure 3, the BC–MG model has a better efficiency, and the quantiles of the observed inflows are closer to the 45-degree diagonal. Comparing the transformed data of the MG model, the results of the extreme value flows deviate from the 45-degree diagonal obviously.

### Simulation performances

Based on the BPF–MG, BPF–BC, and BPF–BC–MG models, the daily probabilistic forecasting inflows are simulated during the calibration and validation periods. The values of the 50% quantile of the posterior density function are taken as single probabilistic forecasting inflows, which are in the middle of the posterior probability. Moreover, the confidence intervals of the posterior probability density function as 5–95%, 15–85%, and 25–75% are evaluated as forecasting inflow intervals. The daily single forecasting inflows and 5–95% intervals with 1-day, 3-day, and 7-day forecast horizons from 1st July to 31st August in 2008 are forecasted, respectively, using the three models, as shown in Figure 4.

In Figure 4, the deviations between observed and 50% quantile inflows are not obviously between the three BPF models. However, the intervals with different models and forecast horizons have significant deviation. When using the forecasting precipitation with 1-day forecast horizon, the accuracy of the forecasting inflows is high for the three models. Thus, the fewer deviations between the inflows with 1-day forecast horizon and the intervals are narrow. Comparing the performances with the forecast horizon extending, the intervals diffuse and the accuracy of the forecasting inflows diminishes. The results indicate that the uncertainty of the forecasting inflows increases with the forecast horizon extending. Comparing the performances of the three BPF models, the intervals of the BPF–BC–MG are narrower than the others, and the BPF–BC model is the worst performance model. In the BPF–BC model, the BC model is used to transform the inflows by using the uniform parameter. When the observed and forecasted inflows do not obey the same marginal distribution, the transformation efficiency of the two sets of serial data will be affected by the parameter, and this is the main reason that the intervals of the BPF–BC are more diffuse than the others.

### Performance evaluation for single values

The values of the RMSE with different forecast horizons during calibration and verification were evaluated, respectively, as shown in Figure 5. The results show that the RMSE values increase with the forecast horizon extending both in calibration and verification. It means that the uncertainties of the forecasted inflows increase with the forecast horizon extending.

The RMSE values of the three BPF models are lower than that of the deterministic forecasts from GFS in the entire forecast horizon. The deterministic forecasts from GFS are represented as the GFS model in Figure 5. The results in Figure 5 demonstrate that the three BPF models have correction capability by combining the prior and likelihood density, especially during the forecast horizons from 4 days to 7 days. In general, the 50% quantile values of the BPF–BC–MG have higher accuracy than the others. During the forecast horizons from 1 day to 3 days, the deviations of the values between the GFS model and three BPF models are less than those of the other forecast horizons. Thus, the linear relationship between the forecasted and observed data is relatively consistent from 1 day to 3 days, and the likelihood density has little effect on the posterior density in the BPF. Thus, the 50% quantile values during 1 day to 3 days are close to the observed inflows.

### Performance evaluation for intervals

The indicator values of the ARIL represent the variation of the uncertainties of the inflows with forecast horizon extending, as shown in Figure 6. Comparing the variations of the 5–95%, 15–85%, and 25–75%, the results demonstrate that the posterior density functions are more diffuse with the forecast horizon extending. However, relying only on the ARIL indicators is not enough to show the forecast efficiency between the three BPF models.

The indicator of the PCI represents the percentage of the observed inflows occurred in the forecasting intervals, as shown in Figure 7. The larger values of the PCI represent the better performances of the forecasting intervals. The PCI values in Figure 7 are gradually reduced with the interval narrowing. However, the PCI values of the BPF–BC–MG model maintain high level, and the results indicate that this model has better efficiency than the others.

The indicators of the ARIL and PCI reflect the different characteristics of the intervals. The lower value of the ARIL may include fewer observed inflows and have lower values of the PCI. Thus, the PUCI are used to evaluate the performances by combing the values of the PCI and ARIL. The higher the value of the PUCI, the confidence intervals have better performance.

According to Equation (21), the PUCI values of the three models during calibration and verification are evaluated, as shown in Figure 8. With the forecast horizon extending, the PUCI values of the forecast intervals in the three models gradually decreased. Comparing the performances of the PUCI, the BPF–BC–MG model has greater PUCI values than those of the BPF–MG and BPF–BC models both in calibration and verification. With the forecast interval narrows, the PUCI value gap of the three models becomes obvious, and the efficiency of the BPF–BC–MG model is fully demonstrated. Meanwhile, the PUCI values of the 25–75% are the highest, and it indicates that the interval of the 25–75% has a smaller interval width and has higher inclusion efficiency for the inflow data. The results illustrate that the forecasting intervals of the BPF–BC–MG model have better efficiency than that of the BPF–MG and the BPF–BC models.

## CONCLUSIONS

This paper investigates the efficiency and stability of the BPF model to describe the uncertainties of forecasting inflows by using different methodologies to transform non-normal distribution data into normal distribution space. Three models, i.e., BPF–MG, BPF–BC, and BPF–BC–MG models, have been developed to describe the uncertainties of the observed and forecasted inflows, respectively. The models are tested on the Huanren hydropower reservoir in China. The findings obtained are summarized below.

The MG model is the most popular transform model in the BPF model. In real-time operation, the marginal distribution functions for each time period should be identified. However, it is difficult to identify the functions, and this is the main barrier that applies the BPF model to real-time operation.

The BC model can transform non-normal distribution data into normal distribution space without marginal distribution function. However, in the BPF model, the optimal transform parameters of the observed and forecasted inflows may be different. Thus, the average value of the parameters is used to transform the two sets of inflows into the same normal distribution space. The efficiency and stability of the BPF–BC will be affected by the transformed data.

In the BPF–BC–MG model, the observed and forecasted inflows are transformed into different normal distribution spaces by using the BC model. Then, the transformed inflows are transformed again into standard normal distribution space by using the MG model. In real-time operation, the density distribution functions of the prior and likelihood are determined quickly, and the probabilistic forecasts with high accuracy are obtained.

In this study, the medium-term precipitations of the GFS are used to forecast the inflows. The uncertainties of the inflows increase with the forecast horizon extending. The results demonstrate that the BPF model is a useful methodology to describe the uncertainties of the observed and forecasted inflows.

## ACKNOWLEDGEMENTS

This research is supported by the National Natural Science Foundation of China (Grant Nos. 51609025, 91547111, and 51709108), the Open Fund Approval (SKHL1713, 2017), Chongqing Technology Innovation and Application Demonstration Project (cstc2018jscx-msybX0274 and cstc2016shmszx30002), and the Hun River Cascade Hydropower Reservoirs Development Company, Ltd.