## Abstract

In this study, we present a post-processing technique designed to assess conditional predictive uncertainty for quantitative precipitation forecasts (QPFs) that extends to the multivariate predictor case. The theoretical foundation of the mixed binary-continuous precipitation process representation for a single predictor is coupled with the univariate version of the model conditional processor (MCP) that allows considering ensemble forecasts while maintaining a parsimonious approach. The experiment set-up is based on a weather vigilance zone (WVZ) of the national warning system in southern Italy and the relative QPFs issued by the Italian Civil Protection Department. Various aspects of the quality of the probabilistic forecast from the uncertainty processor were evaluated. The results indicate that post-processed forecasts can provide improved performances in terms of accuracy and reliability, tend to correct bias and are generally less under-dispersive than raw forecasts for the investigated area. Furthermore, we explored the possibility of issuing warnings employing the full predictive distribution and moving to the use of probabilistic thresholds here identified through a receiver operating characteristic (ROC) analysis. Considering the probability of exceeding a critical rainfall value allowed successful discrimination between events and non-events for critical precipitation occurrences and proved to be a valuable approach to decision-makers and information providers.

## HIGHLIGHTS

A post-processor to assess predictive uncertainty that extends to the multivariate rainfall predictors case is presented.

The mixed binary-continuous precipitation process is coupled with the UMCP that allows considering ensemble forecasts.

Post-processed forecasts improved accuracy and reliability.

Moving to probabilistic forecasts and probabilistic thresholds enhanced skill statistics to correctly forecast critical events.

## INTRODUCTION

The process of providing accurate and timely forecast of impending events in a systematic way is crucial for an efficient and effective implementation of early warning systems (EWSs) aimed at reducing the impact of natural hazards like floods and landslides (UNPD 2018). The raw forecasts are often single deterministic or ensemble forecasts that lack provision of a complete characterization of the uncertainty needed by rational decision-makers and by information providers (Demeritt *et al.* 2007; Cloke & Pappenberger 2009). Statistical post-processors have been proposed to remove systematic errors as well as to improve the description of the predictive probability distribution, often by transforming a non-probabilistic forecast into a probabilistic one and increasing forecast reliability. Both goals are in many cases pursued by assessing the predictive uncertainty defined as the probability distribution of the true future state of the predictand conditional on appropriate predictors (Krzysztofowicz 1999). The value added by statistical post-processing of forecasts has been demonstrated over a variety of forecasting applications ranging from river stages and floods (e.g., Krzysztofowicz 2002; Todini 2008; Reggiani & Weerts 2009; Madadgar *et al.* 2014; Hemri *et al.* 2015) to weather-climate data (Hamill *et al.* 2004; Wilks & Hamill 2007; Verkade *et al.* 2013; Reggiani *et al.* 2021). It is worth noting that post-processing meteorological forecasts imply dealing with the inherent uncertainty of numerical weather prediction (NWP) models, given the imperfect knowledge of initial conditions and the nonlinearity of the processes, as well as to account for the heteroscedasticity of forecast error and the discrepancy between spatial and temporal scales (Reggiani & Byoko 2019). Li *et al.* (2017) reviewed a variety of post-processing approaches in meteorological forecasting that have been proposed and tested over the past years, as follows:

empirical methods such as quantile mapping (QM) (Panofsky & Brier 1968; Piani

*et al.*2010) and analogue method (Hamill & Whitaker 2006), also corresponding to the early approaches, that can be largely assumed as data-driven thus providing no process insights (Koutsoyiannis*et al.*2008);regression-based approaches like logistic regression (Wilks 2009), quantile regression (Bremnes 2004; Weerts

*et al.*2011) and model output statistics (MOS) (Glahn & Lowry 1972; Wilks 2005);methodologies that provide the predictive probability of the variable of interest conditional on a single forecast, e.g., the Bayesian processor of output (Krzysztofowicz & Maranzano 2006), the ensemble preprocessor (EPP, Schaake

*et al.*2007), the Bayesian joint probability (Wang*et al.*2009; Robertson*et al.*2013);multivariate post-processing methods that combine outputs from multiple models or ensemble weather forecasts to achieve better forecast skill like, for example, the Bayesian processor of ensemble (Krzysztofowicz 2008), the ensemble model output statistics (EMOS; Gneiting

*et al.*2005), the Bayesian model averaging (BMA; Raftery*et al.*2005; Sloughter*et al.*2007), and the model conditional processor (Todini 2008; Reggiani & Byoko 2019).

Post-processors specifically aiming at providing a probabilistic quantitative precipitation forecast (PQPF) pose an extra challenge in comparison to other hydrometeorological processes given the intermittent character of the variable at short time scales. The analysis of intermittency and the modelling of the occurrence process of rainfall have attracted the interest of many studies; most of the post-processors developed for rainfall forecasts employ a widely used representation for the precipitation process as a mixed discrete/continuous distribution (e.g., Bell 1987; Koutsoyiannis 2006; Bárdossy & Pegram 2009; Serinaldi 2009; Tsoukalas *et al.* 2019).

Describing the precipitation process as a mixture of binary and continuous random variables rests on a distribution consisting of two components: the probability of a dichotomous variable such as precipitation occurrence (PoP) and the distribution of the continuous process representing the amount of precipitation (DoA). If the random variable *X* describes the precipitation depth, with realization *x*, the mixed binary-continuous process assigns a probability mass of (1−PoP) to the event *X* = 0 and spreads the remaining probability mass, i.e., the probability of precipitation occurrence PoP = P(*X* > 0), over the complement of its support, that is the interval (0, ∞) for the precipitation process (Kelly & Krzysztofowicz 2000). Usually, precipitation event occurrence is estimated as climatological relative frequency from a sample of observations, while the continuous process is modelled with a suitable parametric distribution function. Exponential, Gamma and Weibull distributions are applied in the vast majority of the existing precipitation models; nevertheless, with the focus on the statistical features of the extremes of rainfall process at small time scale (e.g., daily or hourly), many studies based on very long time series of observations have concluded that light-tailed distributions often lead to an underestimation of high precipitation amounts and that heavy-tailed distributions (e.g., Pareto type distributions, generalized beta of the second kind – GB2) are generally in better agreement with observed precipitation extremes (Mielke & Johnson 1974; Todini & Di Bacco 1997; Evin *et al.* 2018; Koutsoyiannis 2021).

Recently, Reggiani & Byoko (2019) extended the use of MCP to multivariate mixed binary-continuous distribution variables, specifically daily precipitation, by transforming the intermittent random process into a continuous one: the intermittent multivariate precipitation process is considered as censored and turned into a continuous one by replacing censored data via imputation using a Gibbs sampler. The sample is augmented into the negative range, recovering the true, a priori unknown variance–covariance structure of the full uncensored sample. The MCP was also applied to the precipitation process in Massari *et al.* (2019) to provide its probability distribution conditional on satellite precipitation estimates and a soil moisture-based rainfall product. Nevertheless, the MCP was applied here only to cases when both predictand and predictors are non-zero during calibration and when predictors are non-zero during validation. To account for the intermittency issue, in addition to MCP, a categorical case integration method based on the detection capability of the predictors was considered.

In the present study, the univariate version of the MCP (UMCP), introduced in Biondi & Todini (2018) for probabilistic flow forecasting, is adapted to post-processing of ensemble rainfall forecasts. Here, we coupled the theoretical foundation of the mixed binary-continuous precipitation process representation for a single predictor (Herr & Krzysztofowicz 2005) with the UMCP that allows ensemble forecasts to be considered. By summarizing the ensemble forecast with its mean, the post-processor provides the predictive uncertainty about a predictand given ensemble forecasts in a univariate model framework and, at the same time, properly takes into account the information from the ensemble spread. The principal advantage of the proposed approach is that the extension of the single predictor case to multivariate does not entail the increasing of the number of mutually conditional PoP combinations and conditional/marginal distributions. Moreover, the method allows inclusion of the nonstationary behaviour of the precipitation process, the non-Gaussian form of the probability distributions of the random variables, the nonlinear and heteroscedastic dependence structure between the forecast and the predictand and does not require invoking the exchangeability of the ensemble members as other post-processors. A weather vigilance zone (WVZ) of the Italian national alert system coordinated by the Civil Protection Department was considered as a case study to test the performance of the approach as well as potential implications for operational forecasting of critical rainfall that may trigger floods, landslides or other damaging events. The post-processor is applied to rainfall forecasts issued daily by the department for civil protection purposes in the form of a range of quantitative precipitation forecasts (QPFs). In the current operational setting, the forecasts are subjective in that they are made judgmentally by expert forecasters based on information from multiple sources, including output from numerical weather prediction (NWP) models. The aim of this paper is the assessment of the ability of UMCP to be integrated with mixed binary-continuous distribution models in order to obtain a parameter-parsimonious and computationally efficient processor applicable in operational settings and to provide a PQPF that can be useful to decision-makers and information providers. Furthermore, we explored the possibility of issuing warnings by comparing resulting PQPF to a critical probability threshold value identified through a receiver operating characteristic (ROC) analysis.

The paper is organized as follows. Section 2 presents the case study and the available data. Section 3 details the assumptions for the mixed binary-continuous precipitation process representation and the univariate version of the MCP. Results are presented in Section 4. Section 5 summarizes implications and potential advantages of the proposed approach.

## STUDY AREA AND DATA

The Civil Protection Department coordinates the Italian national warning system that involves the regions through a network of functional centers for collecting and sharing of meteorological, hydrogeological and hydraulic data, but also for real-time analysis and monitoring, impact-based forecasting and dissemination of warning messages. The forecasting phase mainly relies on the combined use of rainfall thresholds and QPFs. The National Weather Vigilance Bulletin, issued by the department, is an informative tool for the civil defence national system that highlights relevant adverse weather phenomena expected up to the next 24 hours, and the tendency for the following days. In the operational setting for impact assessment, the department issues forecasts in the form of a range of QPFs over WVZs covering the national territory.

WVZs represent wide areas according to geographic, physical and climatic boundaries often going beyond regional limits. An area of about 5,000 km^{2} located in central-north-eastern Calabria (south Italy) corresponding to WVZ #36 (numbering valid until December 2016) and depicted in Figure 1, was selected as case study.

Issued QPFs are a quantitative estimate of precipitation, elaborated judgmentally by expert forecasters based on information from multiple sources, including output from numerical weather prediction models. QPFs represent the best possible forecast information at the state in which they are issued and constitute the main technical-scientific contribution to allow daily critical assessments at the regional level, particularly where local meteorological services are not available. The reference meteorological model is the deterministic model developed by the ECMWF in the 00 UTC and/or 12 UTC run. The forecasters also evaluate the output from the high-resolution models of the COSMO operational chain of ARPA ER and other available global and local models. This study considers QPFs elaborated daily by the Italian Civil Protection Department for the period 2007–2015, that refer to the maximum value of cumulated point rainfall expected over the investigated area for different durations, namely, 3, 6 and 12 hours. For a given duration, the limits of a range, corresponding to given probabilities of exceedance, for the maximum cumulated rainfall over the WVZ are provided. Probability of exceedance values are available until November 2013. The forecasts can refer to phenomena expected up to midnight of the day of issuance or for the following day; also, the tendency of the day after can be provided. In this study, only forecasts related to the day of issuance are considered.

Furthermore, it is worth noting that:

weather forecasts for civil protection purposes are specifically devoted to highlight potentially harmful situations to people or things; they are often the result of subjective evaluations and do not strictly correspond to the outputs of NWP models;

time series of QPFs for the various precipitation durations are characterized by different sample size as daily bulletins do not always include evaluations for every duration;

the lack of continuity in QPF time series for the different durations may not guarantee the presence of valuable data and high precipitation values could be missed (particularly relevant in the case of extreme events);

available QPFs assume integer strictly positive values with an increment Δ of 5 mm.

The local warning systems are mostly based on the comparison of the QPFs with a set of rainfall thresholds that identifies precipitation critical values concerning landslides and/or flood occurrence and supports emission of warnings for the relative alert zone at three severity levels (ordinary, moderate and high).

Observed rainfall at 20 min time resolution, for the same period (2007–2015), is drawn by the rain gauge network covering the WVZ consisting of 39 stations (data available at https://www.cfd.calabria.it/). To compare coarser operational QPFs with observed data, information from each rain gauge has been used to obtain daily maximum cumulated rainfall values for different durations (specifically 3, 6 and 12 hours corresponding to durations of available QPFs) over the investigated WVZ.

Table 1 summarizes seasonality and sample size of QPFs and observed rainfall datasets. In Table 1, the number of data corresponding to observed maximum data is equal for all investigated durations and reveals the availability of a longer climatic sample of the predictand and a shorter sample of operational forecasts.

## METHODS

### The mixed binary-continuous precipitation process

Herr & Krzysztofowicz (2005) derived a generic bivariate probability distribution developed for forecasting precipitation in space with a single predictor. The bivariate distribution function is of mixed discrete-continuous form, with the discrete component modelling the occurrence or non-occurrence of precipitation and the continuous component modelling the amount of precipitation given that some precipitation occurs. A closed form of the distribution function of the predictand conditional on the predictor was derived with the meta-Gaussian approach in terms of the mutually conditional PoP occurrence, four marginal conditional distributions and a dependence parameter. The extension of the bivariate case to the multivariate one, as required in the case of multiple or ensemble predictors, poses some problems because of the large number of involved conditional PoP combinations and conditional marginal distributions (Reggiani & Byoko 2019). Here, we propose the coupling of the bivariate conditional dependence structure with the UMCP (Biondi & Todini 2018) to be used for predictive probability estimation in the case of multiple forecasts, adapting the methodology to the specific operational constraints of the selected case study. UMCP was designed for the post-processing of ensemble forecasts by using the expected value of the ensemble as the predictor, thus allowing its integration within the general framework of the bivariate dependence case.

Nevertheless, given the peculiarity of the case study and the relative operational constraints, the approach presented here refers to the case of non-zero predictors; indeed, available forecasts are integer strictly positive values and daily bulletins do not always include evaluations for each duration, thus, missing values are difficult to interpret. Having more information, the same methodology allows treating the ensemble of precipitation forecasts considering also the zero case for predictors, by integrating the UMCP in the complete bivariate conditional model (assuming the mean value of the ensemble as predictor) described in Herr & Krzysztofowicz (2005).

Referring to the specific case study, let *X* and be the amounts of observed precipitation and *m* ensemble predictions, respectively, with and given that provided forecasts when issued are always greater than zero. Let be the average value of the predictions.

To construct the conditional predictive distribution according to its definition in Equation (8) requires five elements: .

Our aim is to estimate by modelling the conditional distribution through the UMCP processor that allows considering ensemble forecasts in the above-described bivariate dependence structure and takes advantage of relationships in the normal space.

### Univariate model conditional processor (UMCP)

Based on the properties of the multi-normal distribution, Todini (2008) developed the MCP for producing a full predictive probability distribution of the variable of interest that conveniently handles multivariate situations, making it suitable for multi-model, multisite, and multi-lead time problems of streamflow predictions. MCP provides a predictive density defined as the probability density of a future occurrence given (conditional to) all the available knowledge, encapsulated in one or more model forecasts (Krzysztofowicz 1999). MCP, which under the assumption of the lag-1 Markov process made by Krzysztofowicz (1999) was proven (Todini 2013) to lead exactly to the same results of the HUP post-processor, allows for easier extensions to the multivariate problems arising from multi-model and multi-time forecasts.

The original MCP approach implies homoscedasticity and was further extended from the normal to the truncated normal distributions in order to account for heteroscedasticity (Coccia & Todini 2011). So far, the MCP has been applied to the derivation of predictive densities for continuous random variables such as river stage and discharge, monthly average precipitation and temperature (Reggiani *et al.* 2016), and recently extended to multivariate mixed binary continuous variables, primarily daily precipitation (Massari *et al.* 2019; Reggiani & Byoko 2019). Biondi & Todini (2018) introduced a univariate version of the MCP designed for the analysis of ensembles of streamflow forecasts that does not require the reordering of the members and allows for an adequate assessment of the time heteroscedasticity of the estimated predictive distribution variance. Specifically, the predictive distribution is obtained by using the expected value of the ensemble as the predictor, and assuming the ensemble mean being affected, at each point in time, by a different estimation error, whose variance is derived from the ensemble spread. UMCP takes advantage of the flexibility of the meta-Gaussian approach that allows any form of marginal distribution functions and for a closed form of the resulting predictive distribution. The fundamentals of the UMCP are described in the following. For more details, the reader can refer to Biondi & Todini (2018).

Given the marginal conditional distribution , the observations with *t*∈1, …, *T* where *T* is the length of the observations record , are converted into the Gaussian space via the NQT to obtain the normal image of the observations. Similarly, the *m* × *T* ensemble predictions, with *t*∈1, …, *T* and *i*∈1, …, *m* where *m* is the number of ensemble members, are converted into the Gaussian space to obtain , the normal image of the ensemble forecasts. All the ensemble members are converted together to underline that no a priori preference is set on individual ensemble members, which are assumed equiprobable and no reordering is required. In the Gaussian space, the converted variables will result into standard normal variables, namely, ∼N(0, 1) and ∼N(0, 1).

*et al.*1979), the MCP algorithm was modified into a Deming (1943) type regression in the standard normal space, providing the UMCP estimator of the conditional mean and variance of the predictive density of given as follows:where is the covariance of the observations and the ensemble mean and . The conditional variance, as varies at each step, is not constant and residuals are heteroscedastic.

It is also worth noting that assuming a joint multivariate cumulative distribution function of transformed data in the meta-Gaussian approach, corresponding to the well-known multivariate normal distribution, can also be viewed as a Gaussian copula. The meta-Gaussian approach has several properties that are noteworthy, for example its analytical tractability that greatly facilitates computational burden particularly when dealing with multiple dependent variables, but, as noted for the Gaussian copula, there are complex situations in which it may result as inappropriate to model the dependence of transformed variables by a simple covariance matrix (Renard & Lang 2007; Bárdossy & Pegram 2009). It is, therefore, necessary to check the adequacy of the dependence structure implied by the Gaussian model. However, although the suitability of the Gaussian dependence model has been proved to be useful within the observations range in many applications, another subtle problem may arise from the tail dependence properties of the variables that may have a non-negligible influence on the computation of probabilities related to extreme values. The multivariate Gaussian model implies the asymptotic independence of marginal values and presents limitation in preserving the high-order joint moments, thus the actual joint probability of observing very extreme values, outside the range of observations, can be higher than that suggested by the Gaussian multivariate model (Renard & Lang 2007; Dimitriadis & Koutsoyiannis 2018; Koutsoyiannis 2021). Among the different approaches that can be used to improve results, Coccia & Todini (2011) within the MCP framework divided the entire normal domain into sub-domains where truncated normal distributions (TNDs) can be used.

The post-processing approach implies that an ensemble of QPFs is available at each time step. Unfortunately, the Civil Protection Department provides only two forecasting bounds, which are associated with specific quantiles of the ensembles. For this reason, in order to use the UMCP approach, a two-parameter Gamma distribution was fitted to the mentioned QPF forecasting bounds and used to generate a more numerous ensemble with equiprobable members at each time step. We use the predictive distribution from the UMCP approach as an estimate of in the binary-continuous process of Equation (11) to derive the conditional CDF .

### Performance metrics

#### Assessment of forecast accuracy, bias and reliability

*t*and corresponds to a deterministic predictor.

Considering the raw QPFs, provided at each time step in the form of lower and upper bounds, their mean value or one of the bounds can be assumed as deterministic predictor.

When dealing with post-processors providing a probabilistic forecast of continuous predictands, a single deterministic value, in this case the expected value of the predictive distribution, is considered as predictor in Equations (15) and (16).

Nevertheless, to allow a complete and fair evaluation of a probabilistic forecast performance, the use of other appropriate diagnostic approaches is recommended. To assess reliability of probabilistic forecasts, with the aim of verifying the statistical consistency between the observed time series of the predictand and its predictive distribution (Gneiting *et al.* 2007), Laio & Tamea (2007) proposed the use of a diagnosis plot based on the probability integral transform (PIT). The approach plots transformed values , i.e., the predictive CDF Φ calculated for observed values , against their empirical CDF; if a forecasting system is reliable, the uniform distribution U (0, 1) is appropriate for the constructed sample of *z* and the plot follows the 1:1 line. Moreover, analysing the shape of the resulting curve provides interesting insights about the properties of the predictive distribution (i.e., problems related to under or over dispersion, positive or negative bias).

*ϖ*≥

*x*and 0 otherwise.

*CRPS*has the dimension of the variable of interest and has a minimum value of zero, which is obtained only in the case of a perfect deterministic forecast. The average value over time steps is then evaluated.

#### Assessment of forecast skills of probability thresholds

Finally, the added operational value of the proposed approach is assessed through the ability to forecast rainfalls that exceed the critical thresholds set for the WZV. In case of deterministic forecasts, for example, considering the average value of the QPFs, skills are evaluated from the contingency table (Mason 1982) of the four possible outcomes for a binary classifier: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). Using values from the contingency table, it is possible to calculate the true positive rate (TPR), i.e., an estimate of the probability that a critical rainfall will be correctly forewarned (TPR = TP/(TP + FN), and the false positive rate (FPR), i.e., how often an actual negative instance is classified as positive (FPR = FP/(FP + TN). In the case of probability forecasts, the prediction of a severe rainfall occurrence is translated into the definition of a threshold on the exceedance probability of the critical rainfall value (e.g., De Luca & Versace 2017). The optimal choice for the probability threshold is task specific and can be determined through an assessment of the economic value for decision-making (Mylne 2002) and, for example, taking the decision which corresponds to the minimum (or the maximum) expected value of a Bayesian cost utility function (Martina & Todini 2009). In this study, the optimal probabilistic threshold is determined assessing the ROC curves, which relate TPR and FPR evaluated for different exceedance probability thresholds. See Wilks (2005) for details on ROC curves.

## RESULTS

### Comparing QPFs and observed rainfall

Figure 2 shows, for different durations, the scatter plots of the maximum values observed among the rain gauges of the WVZ on the day to which the forecast refers and the QPF ranges along with their mean values. In the following, refers to the QPF mean value and QPF range limits are noted as and , respectively. The available observations and generally show a very low level of correlation (Pearson correlation coefficients *r* ranging from 0.488 to 0.585), resulting in QPFs overestimating the observed low precipitation values and, conversely, highly underestimating the higher ones.

### Marginal distributions

As is apparent from Equations (6) and (7), marginal distribution functions of observations *X* and the predictor can be constructed from the conditional distributions , and , and the PoP occurrence for observed precipitation given a non-null prediction, .

The continuous component of the mixed process, referring to the amount of precipitation, is here assumed Gamma distributed with shape parameter *α* and scale parameter *β*. Gamma distribution has been widely used for the probability distribution of precipitation amounts (e.g., Coe & Stern 1982; Wilks 1990; Hamill & Colucci 1998; Sloughter *et al.* 2007) as it can fit skewed data and is quite flexible (Heo *et al.* 2019).

Gamma distributions are fitted to datasets of observations and forecasts available for each duration. Moreover, the PoP occurrence, , is estimated for each duration from the observations' sample corresponding to available QPF issuances.

Specifically, regarding forecasts, at each time step *t*, a Gamma distribution is fitted to available QPF limits and their relative provided values of probability of exceedance (after November 2013 a probability of exceedance of 0.7 and 0.3 is assumed for and , respectively; the assumed probability values are the most observed in the available sample), and the resulting mean value and variance, consequently estimated. The dataset of QPF average values resulting from the Gamma fitting, that will be denoted as in the following, thus correspond to the that is further modelled with a Gamma distribution to estimate both the conditional CDFs and . It is worth noting that the distribution parameters of and are estimated from the mean and the variance of (see Tables 2 and 3) that are, in turn, evaluated from the mean of the first and second order moments of the Gamma distributions at each time step.

To accomplish the UMCP and to allow for a better representation of the ensemble spread, an ensemble of 100 members was generated at each time step according to the Gamma distributions fitted to the quantile values provided by the Civil Protection Department; ensemble members were then grouped and analysed together in a single sample, , to avoid the reordering of the ensembles, under the assumption of exchangeability (Fraley *et al.* 2010).

Table 2 summarizes sample statistics (i.e., mean and standard deviation (S.D.)) and distribution parameters of , estimated from samples of QPFs and strictly positive corresponding precipitation observations, as well as statistics and distribution parameters derived for the conditional distribution of the generated . Please note that the S.D. relevant to and is estimated as the square root of the total variance (Mahler & Dean 2001; Weiss 2006), given that when a forecast is issued at each time step in the form of a probability distribution, one cannot limit the estimation of the variance to the variance of the mean forecasts, but must estimate the total variance, namely, the sum of the variance of the mean forecast at each time step and the mean variance of all the time variant predictive densities.

Overall statistics of observations and precipitation forecasts suggest that underestimates the actual rainfall in the mean and generally shows lower variances with the exception of *d** =*3 h. Consequently, both the shape and the scale parameter values for observations differ from those of the forecasts. This statistical distinction, for the investigated example, can be traced back to the inadequacy of the forecasting process, to scaling issues and/or to the limited information gathered from the joint samples. Statistics of generated are, as expected, closer to those of .

Moreover, given the peculiarity of the case study, samples provide high values of PoP occurrence for all the investigated durations.

Similarly, the distribution for conditional on null observations, as well as the corresponding conditional distribution for are estimated; parameters and relative statistics are summarized in Table 3.

### Accuracy and reliability performance of raw and post-processed forecasts

In order to assess forecast accuracy, in Table 4 we compare the RMSE of raw forecasts and of the post-processed outcomes. Regarding QPFs, RMSE is evaluated not only for , but also for both and ; RMSE of post-processed forecasts is estimated considering the expected values of the predictive distribution . RMSE for the mean values of UMCP outcomes () back-transformed in the real space is also evaluated.

Results show that provides the best performance for *d =*3 h, while RMSE values are found to be the highest for longer durations (i.e., *d =*6, 12 h). Regarding unprocessed forecasts, performs better for *d =*6, 12 h. Accuracy metrics of , not shown here, are significantly worse. shows performance similar to for *d =*6 and 12 h. RMSEs estimated in the real variable space reveal that the mean value of the predictive distribution outperforms the best of the unprocessed forecasts for longer durations (*d =*6, 12 h) and achieves performance comparable to for *d =*3 h. RMSEs for the mean values of the UMCP-derived , are found to be within the range of values obtained for all the other investigated deterministic predictors. It should be noted, that UMCP is used to estimate the conditional distribution defined and calibrated for the subset .

The bias of , and of the expected value of the predictive distribution is shown in Table 5. It is evident that raw forecasts are more biased than post-processed forecasts. The bias for tends to be negative (e.g., −15.23 mm for *d =*12 h), indicating a propensity to underestimate rainfall amounts. Bias of shows positive value for *d =*3 h and confirms negative values, although less marked than , for longer durations. Post-processing almost reduces biases particularly for *d =*6 and 12 h (e.g., −0.05 mm for *d =*12 h).

By means of graphical representations in Figure 3, further insight is gained into the calibration and reliability of the probabilistic forecasts. For each forecasting occasion, PITs of the predictive distribution function calculated for the observed values, are plotted against their empirical cumulative distribution functions. Here, only strictly positive observations, which are the vast majority in the investigated samples, are considered. Unprocessed QPFs are also included in this analysis by assuming, at each step in time, as described above a Gamma distribution estimated from the QPF limits and the probability assigned to them. An ideal processor producing perfectly reliable forecasts should lead to a graph with markers lying on the bisector (i.e., the black solid line corresponding to a theoretical uniform distribution). Gamma distributions assumed for the two-member ensembles deviate from the ideal behaviour although points fall within the Kolmogorov bands (black dashed lines). A modest positive bias is detected at *d =*3 h (Figure 3(a)), as apparent from the fact that the points lie above the bisector of the diagram, while a slight under-dispersive behaviour is recognized for *d =*6 and 12 h, i.e., provide narrow predictions around the central values. Conversely, predictive distributions obtained via post-processing are always within the 5% Kolmogorov significance bands but closer to the bisector. The diagrams thus show that the reliability of probability forecasts is improved by post-processing QPFs and both underdispersion and frequency bias are reduced.

The results can be better interpreted looking at the predictive distributions in Figure 4. Figure 4(a), 4(c) and 4(e) depict *d =*3, 6 and 12 h, the median of the predictive distribution (red line) and the 50% as well as the 90% uncertainty bands (dark and light grey bands, respectively) in the real space, along with observed precipitation. For the same rainfall durations, Figure 4(b), 4(d) and 4(f) compare observed precipitation with QPF ranges (full coloured bands) and with the 90% credible intervals and the expected value of the Gamma distribution fitted to the forecasts.

First, we observe that predictive distribution in the real space assumes the shape of a skewed Gamma-type distribution, as well as the QPF credible intervals derived from the Gamma assumption for the distribution of forecasts at each occasion; it is worth noting that, by construction, may fall outside the QPF limits. The width of the uncertainty bands of the QPF Gamma distribution reveals a higher sharpness for *d =*6 and 12 h compared to *d* = 3 h confirming the under-dispersive behaviour shown in PIT plots of Figure 3(b) and 3(c). If, at least in principle, narrow confidence bands can be considered as a good property, the sharpness of the distribution does not give information about the correspondence between forecast and observations and are mitigated in this case by less satisfactory reliability and observed bias. The percentage of observations falling within the interquartile range (IQR) of QPF Gamma distribution is 39% for *d =*3 h, 49% for *d =*6 h and 44% for *d =*12 h; percentages for observation included within the IQR of post-processed predictive distribution significantly increase to 45% for *d =*3 h, 56% for *d =*6 h and 52% for *d =*12 h.

Mean CRPS values in Table 5 confirm that post-processed probabilistic forecasts yield better results than Gamma distribution fitted to QPFs limits. of QPFs increases with duration, varying from 9.38 for *d =*3 h to 22.61 for *d* = 12 h. The proposed post-processing approach leads to a slight improvement in , that shows the same tendency to increase with duration from 8.78 for *d* = 3 h to 21.52 for *d =*12 h. The bias and the sharpness of the distribution, as for QPFs, generally increase the difference between associated probability distributions and heaviside functions and consequently result in higher *CRPS* values.

### Skills in forecast of critical rainfall occurrence

The ability of the PQPF derived with the proposed post-processor of being used to identify the occurrence of potentially critical rainfall in the day of issuance is also tested. Moving to a probability perspective, an event is defined when the predictive probability of exceedance of an assumed critical rainfall exceeds a defined probability level. We choose to consider as critical, and indicate it with , the rainfall value associated with the lowest critical rainfall threshold (ordinary level) over the alert zone (Table 6). It should be noted that the events exceeding higher rainfall thresholds are a subset of the events exceeding this threshold.

The first step consisted in establishing the probability threshold value, suited to identify critical rainfall, through a ROC analysis. In the case of probabilistic forecasts, to be able to calculate associated TPR and FPR, a probability threshold is considered as the classifier to discriminate between event and non-event for the forecasts. To evaluate the requested performance statistics, forecast binary outcomes have been compared with the actual occurrences of the overcoming of by the observed rainfall.

The ROC curves in Figure 5(a) for *d =*3, 6, 12 h plot TPR versus FPR metrics for a set of possible thresholds, obtained by assuming different , i.e., different values for the probability of exceedance of the defined critical rainfall (in this application *p* varies from 0.05 to 0.6). Here, we consider D2PC defined at Equation (18) to achieve the trade-off between correct detection and false alarms. Solid circles on the ROC curves highlight points corresponding to the minimum D2PC values obtained among the investigated probability thresholds. Optimal D2PC values at each duration are found for different values of *p* that hence define the probabilistic threshold : specifically, = 0.10 for *d* = 3 h, = 0.15 for *d* = 6 h and = 0.25 for *d* = 12 h.

We also evaluated the skill performances for the case of a deterministic forecast considering as predictor, and calculated TPR as well as FPR directly from the contingency table obtained by comparing and observed rainfall with the rainfall threshold . The resulting statistics are shown in Table 6 and evidenced as square symbols in the ROC space of Figure 5(b). Both TPR and FPR show low values, with TPR only slightly exceeding FPR.

As evident from performance metrics in Table 6, assuming a probability threshold as binary classifier significantly improves skill statistics of forecasting critical rainfalls. Using probabilistic forecasts, TPR values are strongly improved leading to a moderate increase in corresponding FPR (always < = 0.3). Exemplarily, when considering the as deterministic forecasts for a duration of 6 hours, TPR is 0.20 (of all observed rainfall threshold exceedances, only 20% are forecasted), while FPR is 0 (no event is forecasted when no event is observed). By using probabilistic forecasts, a much higher TPR, equal to 0.6, can be achieved at the same time keeping FPR at 0.28. This means that 60% of all events are correctly forecasted, which poses a great improvement, particularly dealing with early warning systems. Performances are evaluated also for and of the raw forecasts; compared to only a slight improvement in terms of TPR and D2PC is found for at *d =*12 h as is shown in Table 6 and by triangles in Figure 5(b).

## SUMMARY AND CONCLUSIONS

The presented case study shows a framework aimed at testing a post-processing approach for ensemble precipitation forecasts and uncertainty assessment, in the operational context of a WVZ of the national warning system in southern Italy. Specifically, we propose a parsimonious and computationally efficient methodology based on the inclusion of the UMCP in the mixed binary-continuous precipitation process for the derivation of the predictive uncertainty associated with multiple QPFs. It is worth noting that the analyses carried out on QPF statistics do not refer to the effectiveness of the alert system, whose warnings rely also on several pieces of other information, and only represent a case study to test the performance of the approach as well as potential implications for operational forecasting of critical rainfall and rainfall-induced damaging events.

Our main purpose was to assess the ability and the robustness of the proposed post-processing technique to be accurate and to reliably quantify forecast uncertainty. Evaluating a reliable predictive uncertainty distribution is challenging (Berthet *et al.* 2020) and is crucial in decision-making scenarios in order to set up robust Bayesian decision-making approaches (see, for example, Martina *et al.* (2006)). Overall, the results suggest that post-processing not only shows advantages in terms of accuracy (measured by RMSE) when considering the expected value of the predictive distribution, but can yield satisfactory correction of biases in raw forecasts and generate reliable PQPFs suited to further generate coherent ensemble members. In addition, looking at the overall quality, the probabilistic forecasts proved to provide enhanced average CRPS values. The proposed approach can be further extended by using the post-processor in a multi-temporal framework allowing to account for the time dependence of the forecasting errors and to derive a representation of the predictive density as the joint probability of precipitation at different future times conditional on the forecasts for the same time intervals. The multi-temporal version of the MCP approach and results of its applications to flow forecast are presented in Barbetta *et al.* (2017). Here, the lack of sufficiently long time series of QPFs issued for different time intervals prevented the application of the multi-temporal approach and future work is envisaged to assess its skill in the post-processing of rainfall forecasts. It is likewise important to stress the implications of some assumptions (e.g., the use of light tail distributions) of the methodology that need to be verified on longer time series of observations and that can lead to an underestimation of the probability of extreme events.

Moreover, the use of a predictive distribution that accounts for the inherent uncertain nature of the forecast (Montanari & Koutsoyiannis 2012) requires a change of perspective in the warning threshold definition to thoroughly benefit from probabilistic forecasts. To take full advantage of the probabilistic forecasts, one should revert from the commonly used deterministic to probabilistic thresholds, as advocated by Todini (2017, 2018). In this perspective, the predictive ability of the post-processed PQPF was compared to that of raw forecasts by switching from the classic ‘deterministic threshold paradigm’ to the ‘probabilistic threshold paradigm’. The ‘deterministic’ rainfall threshold, represented by a rainfall critical value, is de facto substituted by a threshold on the probability of exceedance of a critical rainfall value. For the tested instances, assuming a probability threshold as binary classifier to determine the occurrence of potentially critical rainfall, improved skill statistics compared to the standard use of deterministic rainfall forecasts and deterministic thresholds. Particularly, the tested approach improved the ability to correctly forecast overcoming the critical rainfall, i.e., enhanced TPR values at the cost of a moderate increase in corresponding FPR. Overall, results confirm the benefit of post-processing forecasts in early warning systems, reinforced the perception that moving to the ‘probabilistic threshold paradigm’ can increase the accuracy of the alerts and may inspire its use as a valuable approach in predicting imminent or developing emergencies and disasters.

## ACKNOWLEDGEMENTS

This work was partially carried out within the cooperation agreement between Italian National Civil Protection and Camilab-DIMES Unical on ‘Lo sviluppo della conoscenza, delle metodologie e delle tecnologie utili alla realizzazione, presso i centri funzionali, di sistemi di monitoraggio, previsione e sorveglianza nazionali, nonché per l'attuazione dell'organizzazione della funzione di supporto tecnico – scientifico nell'ambito del Servizio Nazionale della protezione civile. sviluppo, a beneficio della rete dei centri funzionali, di prodotti operativi finalizzati alla stima e alla previsione delle precipitazioni sul territorio nazionale’.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.