Abstract
Postprocessing of the ensemble precipitation data improves the bias and uncertainty induced in the numerical weather prediction (NWP) due to perturbations of the initial condition of atmospheric models. The evaluation of the NWP of short-range quantitative weather forecasts provided by the NCMRWF archived in TIGGE, for the Vishwamitri River Basin. The aim of the study is to perform univariate statistical postprocessing using six parametric methods and compare to find the best suitable approach among censored non-homogeneous logistic regression (cNLR), Bayesian model averaging (BMA), logistic regression (logreg), heteroscedastic logistic regression (hlogreg), heteroscedastic extended logistic regression (HXLR), and ordered logistic regression (OLR) methods for the Vishwamitri River Basin. The Brier score (BS), the area under curve (AUC) of receiver operator characteristics (ROC) plots, Brier decomposition, and reliability plots were used for the verification of the probabilistic forecasts. In agreement with the BS and AUC, the cNLR approach for postprocessing performed very well for calibration at all five grids and is preferred, whereas BMA and hlogreg approaches showed relatively poor performance for the Vishwamitri basin. The best post-processed ensemble precipitation will further be employed as an input for the generation of the hydrological forecasts in operational flood forecasting in the Vishwamitri River Basin.
HIGHLIGHTS
Parametric postprocessing of ensemble precipitation forecasts using cNLR, BMA, logreg, hlogreg, HXLR, OLR for the Vishwamitri River Basin.
Evaluation of the post-processed ensemble forecasts to find the best-fit model for the calibration of the short-range ensemble precipitation forecasts.
Comparison of different postprocessing methods using verification metrics, viz. BS, Brier decomposition, reliability plots and AUC of ROC plots.
Graphical Abstract
INTRODUCTION
Ensembles became part of the implementation suites at the Europe Centre of Medium-range Weather Forecasts (ECMWF) and National Centers for Environmental Predictions (NCEP) in 1992, eventually bringing a fundamental change from deterministic to ensemble-based probabilistic forecasting (Demeritt et al. 2010; Buizza 2018). The perturbations in the initial conditions of the climatic variable result in a set of members called ensemble members. The numerical weather prediction (NWP) by the ensemble prediction systems (EPSs) is classified as short-range forecasts (lead time ranging from 6 h to 3 days), medium-range forecasts (3–15 days lead time), and long-range forecasts (up to a few weeks). The short- to medium-range forecasts show better skills and added value in hydrological forecasting.
TIGGE stood for ‘THORPEX Interactive Grand Global Ensemble’ when first introduced as a project, but when the project was completed in 2014, it continued and is now known as ‘The International Grand Global Ensemble’ (Nobert et al. 2010). It provides the common platform for NWP datasets generated by different EPSs over the world such as ECMWF by Europe, CMA by China, KMA by Korea, NCEP by the US, NCMRWF by India, BoM, CPTEC by Brazil, ECCC, IMD by India, JMA by Japan, France, and UKMO by the United Kingdom (Bao & Zhao 2012; Duan et al. 2012; Jingyue et al. 2012; Liu et al. 2017) with different ranges of ensemble members.
The skill of the operational probability-based flood forecasting system is basically driven by the two major sources of uncertainty, i.e. uncertainty introduced due to the meteorological and hydrological aspects (Bourgin et al. 2014). The propagation of the NWP forecasts inputs as the perturbed members or ensembles into the hydrological modelling systems introduces meteorological uncertainty in the forecasting system (Buizza 2009). So, for a reliable forecasting system, it is of utmost importance to use a well-calibrated precipitation dataset. Recent studies show the implementation of statistical postprocessing approaches to evaluate and improve the skill of the ensemble flood forecasting systems (Schmeits & Kok 2010; Buizza 2018; Wilks 2018; Liu et al. 2019). The statistical postprocessing of the meteorological data (also called pre-processing of the data with respect to the input of the hydrological model) is carried out to address the uncertainty and bias of the verification period observed and ensemble data by taking into account the uncertainty in the past data.
The development, operation, and constant improvement of the EPS require a large avenue, rather than the improvement in the skills of the forecasts through the postprocessing of the ensemble numerical weather predictions which is presumably more economical. Even though EPS forecasts consist of system errors in the mean and spread of their forecast distribution, (Verkade et al. 2013), it is recurrently highly suggested to postprocess the weather predictions before feeding them as input to the hydrological model (Olsson & Lindström 2008; Awol et al. 2021). The requirement of the postprocessing is to do the bias correction and fine-tune their dispersion to produce appropriately calibrated forecast probabilities. For rectifying the systematic biases to produce improved calibrated forecasts, statistical postprocessing (or model output statistics (MOS)) utilises the relationship between a historical set of weather predictions and the corresponding observation.
Several parametric and non-parametric methods of postprocessing for meteorological parameters like temperature, precipitation, or wind, etc., are used for the correction and prediction of calibrated forecasts based on past observed data. In contrast to the statistical postprocessing of the meteorological forecasts like wind, temperature, etc., the postprocessing of ensemble precipitation is more thought-provoking due to the high variability of rainfall in terms of the magnitude and higher number of zero rainfall. The higher rainfall events recorded are not very frequent, so a large training dataset is required to be considered (Han & Coulibaly 2019). While the setup of the model, the forecast uncertainty typically rises with the consideration of a particular magnitude of precipitation. Recent studies show the implementation of the parametric methods, namely BMA (Raftery et al. 2005; Liang et al. 2013; Liu & Xie 2014), cNLR (Messner et al. 2014b; Scheuerer 2014; Gebetsberger et al. 2017), logreg (Hamill et al. 2008; Verkade et al. 2013; Gebetsberger et al. 2019; Medina et al. 2019), hlogreg (Messner et al. 2016), heteroscedastic extended logistic regression (HXLR) (Schmeits & Kok 2010; Scheuerer & Büermann 2014), and ordered logistic regression (OLR) (Gebetsberger et al. 2018) for postprocessing.
Messner et al. (2014a) compared the extended logistic regression with three closely related regression models of censored logistic regression, OLR, and heteroscedastic OLR for 10 European weather stations. The results showed that censored logistic regression performed better than extended logistic regression. Liu & Xie (2014) compared the performance of the raw ensembles, BMA, and the logreg method with the concluding remark of an advantage of BMA over logreg is the prediction of a full probability density function. Messner et al. (2014b) extended the logistic regression method to develop HXLR and presented the advantage of using the ensemble spread effectively to improve forecasts. The performance criteria showed good predictions by BMA and logreg as compared to the raw ensembles using BS metrics. An approach to postprocess the ensemble forecasts using logistic regression and OLR is carried out with results showing slight improvement in the forecasts using the OLR method (Hemri et al. 2016). Williams et al. (2014) carried out a comparative study of different postprocessing methods and concluded that the Bayesian model averaging (BMA) and non-homogeneous Gaussian regression perform similarly, while logistic regression performs less well. In the comparative study of the postprocessing methods by Schmeits & Kok (2010) the results showed that BMA and extended LR are much more skilful than the raw EPS, but the difference in skill between the two postprocessing methods does not seem to be statistically significant for the Netherlands. To the author's knowledge, similar studies of comparison of the parametric postprocessing methods to find the best suitable method have not been conducted for Indian regions.
This paper addressed the challenge of identifying the most suitable postprocessing technique and assessed the performance of the six parametric statistical postprocessing methods, i.e. cNLR, BMA, logreg, hlogreg, HXLR, and OLR for the Vishwamitri River Basin using short-range precipitation forecasts by NCMRWF EPS. The data retrieved from TIGGE for daily accumulation of 1-day and 3-day lead times were used for model fitting, prediction, and verification for various postprocessing methods using Brier score (BS), AUC of receiver operating curve (ROC) plots, reliability plots, and Brier decomposition of the post-processed ensembles with the objective to compare and find the best-fit model for the calibration of ensemble forecasts for the Vishwamitri River Basin.
Study area
The Vishwamitri River Basin is located between 73°0′0″ E to 73°30′0″ E longitude and 22°0′0″ N to 22°45′0″ N latitude. Hilly topography with an elevation of 149–480 m above sea level may be found in the northeast region of the city, while the major part of the basin has a flat terrain. The Vishwamitri river originates in the Pavagadh, 43 km northeast of Vadodara. The river's flow length is roughly 71 km, with about 59 km flowing through the Vadodara district. The river is seasonal but has a history of floods almost every year during the monsoon. The river flows through the heart of Vadodara City and due to the insufficient stormwater drainage systems, rapid urbanisation, and frequent constructions in the city (Sarthak et al. 2015), the issue of flooding arises nearly every monsoon. Vadodara City suffered major flood events in 2005, 2008, 2014, and 2019 which have even resulted in 8 deaths and the evacuation of more than 5,000 people. The total rainfall recorded in the verification period considered is 932.7 mm which is higher than the average annual rainfall of Vadodara, i.e. 868 mm.
DATA AND METHODS
The high-resolution 0.25° × 0.25° gridded data are prepared based on quality-controlled recorded rainfall data from more than 6,000 rain gauges available in India by the Indian Meteorological Department (IMD) (Roy Bhowmik et al. 2007; Deshpande et al. 2021; Seenu & Jayakumar 2021). The IMD daily gridded data at this resolution are prepared by employing data for the period of 1901–2010 from the National Data Centre, IMD, Pune using the Shepard interpolation technique. The observed rainfall data are derived from the high-resolution IMD gridded data for the years 2007–2021 for the study area.
The postprocessing for the improvement of the ensemble precipitation datasets is carried out in three phases. The first phase of data preparation comprised of retrieval of the NCMRWF dataset from the TIGGE portal in grib file format and thus extracting the data of the grids falling in the study region by using the shapefile of Vishwamitri River basin. The extraction of the data is carried in R studio using the raster and rgdal package. The grids falling in the region are G1(73.14, 22.18), G2(73.14, 22.30), G3(73.32, 22.30), G4(73.32, 22.42), and G5(73.5, 22.42).
The excel sheets of the ensemble datasets for 24-h accumulated rainfall of 1-day and 3-day lead times with the corresponding observed re-gridded IMD rainfall were prepared for all grids. The first column containing the date is followed by the regridded IMD observed data column and the corresponding 11 ensemble members. In phase 2, these files are the input for the model fitting and prediction using six parametric postprocessing methods to test and find the best suitable method of postprocessing for the Vishwamitri River basin using R programming (Messner 2018). The 24-h accumulated precipitation of five grids of both 1-day and 3-day lead times are calibrated separately for each postprocessing method. Prior to the postprocessing of the ensembles, the dataset is divided into training and testing periods. The models are fitted for the defined training period. For model fitting of cNLR the package crch is utilised by employing continuous distributions for the response with separate linear predictors for the ensemble mean (location model) and variance (scale model). Further details of crch may be found at https://CRAN.R-project.org/package=crch. The BMA model is fitted using the command fitBMA of the ensembleBMA package. The precipitation data are cube-root transformed while fitting this model. Being a member of the generalised linear model family, binomial has been set while fitting the logistic regression model using the glm package. The ensemble mean has been employed as the only predictor. The hlogreg model is fitted with predictors as the ensemble mean and standard deviation using the hetglm command of the glmx package. The model for HXLR is developed using the command hxlr from the crch package. In HXLR, the ensemble spread is directly used to predict the dispersion of the predictive distribution. The clm command of the ordinal library is utilised for the fitting of the model for OLR.
The parametric univariate postprocessing methods of the ensemble precipitation forecasts employed in this paper are discussed in detail.
Censored non-homogeneous logistic regression
The concept of censoring can be understood as considering a certain minimum precipitation amount, which is referred to as the occurrence of the rain event, if the precipitation falling under this ‘minimum’ amount is recorded as ≤ minimum, then the data are censored at ‘minimum’. If the model is zero censored, it means any probability for rainfall less than zero will be allocated as 0 rainfall.
Bayesian model averaging
Logistic regression (logreg)
The training-data predictands are binary while fitting logistic regression parameters, if the statement on the left-hand side in Equation (10) is true or false then one and zero, respectively.
Heteroscedastic logistic regression (hlogreg)
Heteroscedastic extended logistic regression
Ordered logistic regression
OLR also provides coherent forecasts of category probabilities (Messner 2018). Unlike HXLR it has ordered intercepts and requires the estimation of more coefficients. However, it differs from HXLR as in OLR no CPD is presumed or specified by the model. Also, only the probability of exceedance forecasts can be derived from the OLR models, which do not facilitate density or quantile predictions.
Once the model is fitted for cNLR, BMA, logreg, hlogreg, HXLR, and OLR then the predictions for the model are derived using R programming. The location and scale parameters for cNLR were derived for the test dataset for different quantiles using predict command in R. The predictions of the BMA are generated using the command cdf of the ensembleBMA or fitBMA package which computes the cumulative distribution function. The modelling functions estimate model parameters from training data via the EM algorithm for the mixture models of gamma distributions with a point mass at 0 (appropriate for quantitative precipitation) to the cube-root transformation of the ensemble forecasts and observed data. In the case of the prediction of binary or heteroscedastic logistic regression, the computation of the probabilities is carried out using predict command and setting the type as ‘response’. The type in the predict command controls whether location (‘response"/"location’), scale (‘scale’) or quantiles (‘quantile’) are predicted. Quantile Further, the sapply command is used to predict the logistic regression model. For HXLR and OLR the prediction is done using predict command giving the input type as ‘cumprob’ and ‘cum.prob’ respectively. For HXLR crch and for predictions by OLR models’ Ordinal package is used in R programming.
cNLR, BMA and HXLR provide a full predictive distribution but the other logistic regression method only predicts the threshold probabilities.
Model verification metrics
The first part of the Equation (15) is reliability, the middle part is resolution, and the last part of the equation is uncertainty. Higher reliability signifies a higher contribution to the higher BS and thus poorer forecasts. The ROC deals with the performance of a forecast for the occurrence of a distinct event, with reference to a threshold value (Regonda et al. 2013; Fan et al. 2014). For a probabilistic forecast, the ROC curve deals with the quality of a decision on the basis of forecast probability. The area under the ROC plot represents the performance forecast skill, the higher value of AUC signifies better skill, and the diagonal line or 0.5 AUC represents no or zero skill. Different verification tools were used for the conduction of the verification of the precipitation post-processed outputs such as for Reliability plots were generated using ReliabilityDiagram command in the SpecsVerification package. The generation and analysis of the ROC plots that include the computation of the area under the curve are carried out using the pROC package in R programming. Another alternative is to use the verification package for the generation of ROC plots using roc.plot command. The BS is estimated by the command brier in the verification package and its decomposition is carried out using the command BrierDecomp in SpecsVerification in R programming software.
RESULTS AND DISCUSSION
Precipitation plays an important role in operational flood forecasting systems. The evaluation of NWP from NCMRWF is conducted for Western India, where the major rainfall occurs in the monsoon period (June–September) of the year. The Vishwamitri basin region is mostly hit by southwest monsoon which results in major rainfall in the duration from June to September, where around 85–90% of the annual rainfall occurs in this duration (Karuna Sagar et al. 2017). Due to the higher number of zero precipitation in the non-monsoon period, the validation of the calibrated forecasts over the period of 25 July to 30 September, 2021.
The comparison of the quality of the forecasts by the postprocessing methods used in the paper is done on the basis of the performance criteria by using the verification metrics such as BS, reliability, resolution, AUC of ROC plots, and reliability diagram criteria.
1-day lead time of the G1 grid shows the BS in the range of 0.10–0.22, Figure 5(a) showing the clear uplift of BMA and hlogreg boxplot as compared with the other postprocessing methods showing slightly poorer performance than others. The cNLR and logistic regression performed better than the rest. Figure 5(f) shows the better performance of hlogreg and cNLR for G1, 3-day forecasts with BMA and logreg performing poorer than the rest of other approaches. At G2 shown in Figure 5(b) and 5(h) logistic regression and cNLR performed comparatively well than the other postprocessing approaches applied. BMA and hlogreg performed slightly poorly on both days. The postprocessing method BMA for Figure 5(c) G3 1 day, Figure 5(h) G3 3 days, and Figure 5(d) G4 1 day also showed slightly poorer results than others. For G5, referring to Figure 5(e) and 5(j) BMA and OLR methods performed poorly for 1-day and 3-day lead times. The cNLR method performed the best for both lead times with the 25th and 75th quantile ranging from 0.14 to 0.17 with the median nearly at 0.16. Overall for all the short-range forecasts, the cNLR method outperforms the other postprocessing methods applied for the grids falling in the study area. The cNLR method uses each individual predictand value in the training dataset instead of the selected category probabilities. Messner et al. (2014a) also found cNLR provides good category probability forecasts while requiring fewer coefficients and additionally specifying full predictive distributions. For different grids, hlogreg gave mixed performances according to BS, so for the Vishwamitri River Basin, it is recommended not to prefer hlogreg. Overall, considering the performances determined by the BS of different approaches the cNLR shows the best performance as compared with other parametric methods.
AUC varied from 0.857 to 0.876 at G4 of 1-day and 3-day lead times represented in Figure 8(d) and 8(i). The cNLR method shows the best AUC of 0.867 and 0.876 at 1 day and 3 days, respectively. Referring Figure 8(e) and 8(j), the AUC showed good performance by all postprocessing approaches adopted, with the best AUC noted by cNLR of 0.871 and 0.908 for 1 day and 3 days, respectively. In comparison with 1 day, the 3-day forecasts were better calibrated at G5. At grid G1, the 1-day forecasts show better skills (higher hit rate) than 3-day forecast. Excluding grid G1, all the grids showed better performance (higher AUC corresponds to 3 days) of all the postprocessing methods for higher lead time forecasts (Gomez et al. 2019).
CONCLUSIONS
Based on the parametric postprocessing methods used worldwide for the postprocessing of the ensemble forecasts, the comparison of six postprocessing methods is shown in this paper using different verification metrics for probabilistic forecasts such as BS, reliability plots, and AUC under ROC plots grid-wise for the G1, G2, G3, G4, and G5 grids falling in the Vishwamitri River Basin. The results can be summarised as follows:
The average BS shows that an overall 50–70% improvement is noted from raw to post-processed forecasts, signifying the necessity of the statistical postprocessing of the ensemble forecasts for better and more reliable calibrated forecasts.
For the overall performance, in G1, G2, G3, G4, and G5 of short-range forecasts (here shown as 1 day and 3 days, due to the cause of brevity, 2-day results are not presented in the paper), the cNLR method performed comparatively better than the other five postprocessing methods adopted for the study.
The BS of BMA at all the grids showed relatively poor performance than other methods. Other methods performed moderately well in calibration, wherein the cNLR outperformed for G1, G2, G3, G4, and G5 for 1 day and 3 days with logreg giving a good performance along with cNLR at G5 1-day, G2 3-day, and G3 3-day forecasts.
The reliability of the forecast is evidently better for the forecasts predicted by the cNLR and logreg for G1, G2, G4, and G5 of 1-day forecasts.
The plots of AUC of ROC show the well-calibrated forecasts by cNLR at different grids falling in the Vishwamitri River basin for short-range ensemble forecasts by NCMRWF EPS.
The cNLR method slightly outperformed and provided the full predictive continuous density function for the prediction of future forecasts instead of only predicting the probability of exceedance of specified thresholds as in the case of logreg, hlogreg, and OLR.
The study has explored a wide number of packages for the evaluation of the ensemble precipitation forecasts such as ensembleBMA, ensemblepp, ensembleMOS, Ordinal, gamlss, and verification packages such as easyverification, pROC, s2dverification, scoringrules, SpecsVerification, and verification in R programming. It would be an interesting study to carry out the evaluation for multi-model or grand ensembles and evaluate the performances of short–medium-range forecasts in future studies. Furthermore, the best-calibrated datasets shall be used for the development of the flood forecasting system for the Vishwamitri River Basin which will be very beneficial to addressing the floods and early preparedness in the case of flood in Vadodara City.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories. NCMRWF data can be downloaded from https://apps.ecmwf.int/datasets/data/tigge/levtype%3Dsfc/type%3Dcf/. Gridded IMD rainfall data can be downloaded from https://imdpune.gov.in/cmpg/Griddata/Rainfall_25_Bin.html.
CONFLICT OF INTEREST
The authors declare there is no conflict.