An effective postprocessing approach has been examined to improve the skill of North American Multi-Model Ensemble (NMME) precipitation forecasts in the Karoon basin, Iran. The Copula–Bayesian approach was used along with the Normal Kernel Density marginal distribution and the Kernel Copula function. This process creates more than one postprocessing precipitation value as results candidates (first pass). A similar process is used for a second pass to obtain preprocessed values based on the candidate inputs, which helps identify the most suitable postprocessed value. The application of the technique for order preference by similarity to the ideal solution method based on conditional probability distribution functions of the first and second passes leads to achieving final improved forecast data among the existing candidates. To validate the results, data from 1982–2010 and 2011–2018 were used for the calibration and forecast periods. The results show that while the GFDL and CFS2 models tend to overestimate precipitation, most other NMME models underestimate it. Postprocessing improves the accuracy of forecasts for most models by 20%–40%. Overall, the proposed Copula–Bayesian postprocessing approach could provide more reliable forecasts with higher spatial and temporal consistency, better detection of extreme precipitation values, and a significant reduction in uncertainties.

  • The precipitation forecasts of Karoon river watershed in southwest Iran as a flood-prone area are investigated.

  • A new postprocessing approach is presented for North American Multi-Model Ensemble (NMME) precipitation estimations.

  • The proposed method is based on the Copula–Bayesian approach.

  • The method is desirable for detection of the extreme precipitation values.

  • Significant increases in forecast skill of improved NMME data are provided.

Nowadays, the importance of precipitation forecasting has resulted in the development of various dynamic models to predict global precipitation at hourly, daily, monthly, and seasonal time-scales (Dehban et al. 2020). The North American Multi-Model Ensemble (NMME) is one of the most famous general circulation models that provide effective seasonal precipitation forecasts (Slater et al. 2019; Becker et al. 2020; Roy et al. 2020). However, the NMME models' predictions are accompanied by some uncertainties due to initial assumptions, model limitations, and inaccurate forecast ensembles (Rayner et al. 2005; Wu et al. 2011; Tao et al. 2014). As a result, researchers have attempted to improve the predictions by using different statistical postprocessing approaches. In these methods, establishing a good relationship between the observed and predicted NMME model's data is the most common important task. In this regard, the application of joint distributions and functions (Kelly & Krzysztofowicz 1997; Robertson et al. 2013), neural networks (Pakdaman et al. 2020), hybrid models (Xu et al. 2018; Yazdandoost et al. 2020), and copula-based models (AghaKouchak et al. 2010; Sadegh et al. 2017; Khajehei et al. 2018) is widely used. These methods lead to improved forecast data, but there are some challenges in evaluating extreme events, which are an essential issue.

Most of the postprocessing methods used for precipitation forecasting employ parametric distributions to simulate the behavior of each variable (observation or predicted data) or parametric bivariate functions. However, copula-based models have the unique ability to use parametric, nonparametric, and semiparametric distributions to fit the bivariate distribution that best fits the observed and predicted variables (Chen & Huang 2007). The fully parametric estimation of copulas involves determining parametric models for both marginal distributions and copulas (first suggested by Oakes (1982) in the context of the Clayton copula). Later, semiparametric estimation was suggested, which uses nonparametric marginal distributions to achieve a parametric copula (Oakes 1986; Genest et al. 1995). However, one deficiency of these methods is the elimination of the relation between extreme observed data and forecasts or vice versa. Consequently, they may not accurately predict the extreme precipitation values. Accurate estimation of extreme values is crucial in flood-prone areas (Exum et al. 2018). The use of nonparametric estimations for both the marginal distributions and copula function (such that they are parameter-free) can overcome this weakness (Chen & Huang 2007; Bouri et al. 2019). Therefore, proposing a nonparametric estimator may provide a more effective approach than parametric copula models.

Based on the literature, while preparing for possible future floods has always been emphasized, there has been limited academic attention (from academics) to the assessment and improvement of NMME models' skill in forecasting extreme precipitation, which may lead to flooding (Slater et al. 2019). Following these efforts, this study aims to improve NMME precipitation forecasts through postprocessing using a nonparametric Copula–Bayesian approach, with a specific focus on improving the forecasts of extreme values.

Previously, the Copula–Bayesian approach has been applied in different studies as a method for improving forecasts. This method causes a conditional probability density function (CPDF), which describes the probability of an observation data event given each raw data. Various approaches have been proposed for selecting the improved data based on CPDF (Madadgar & Moradkhani 2012; Khajehei & Moradkhani 2017). One of the most common methods is to select the maximum likelihood of CPDF as the desired improved forecast. In the parametric method, this is an individual point, but in semiparametric and nonparametric methods, there may be multiple relative maximum points in the CPDF. However, when using this method, there is no guarantee that the selected data are the most accurate data. The main contribution of this paper is to introduce a novel method that utilizes CPDF with various relative maximum values to identify the most suitable improvement data, particularly for the extreme precipitation values, by distinguishing between the maximum relative values and selecting the best one. The effectiveness of the proposed method has been evaluated in one of Iran's largest and most important watersheds, the Karoon River basin, which experienced widespread floods and landslides in the spring of 2019 due to unprecedented rainfall within a brief duration, resulting in extensive damage. This event highlighted the importance of timely and accurate precipitation forecasts, especially for extreme precipitation values, as they can contribute to advances in flood early-warning systems. Furthermore, the longer the forecast horizon for precipitation data, the better the preparedness for possible warnings.

The rest of this article is structured as follows. First, the study area and the data used are presented in Section 2. Subsequently, Section 3 presents the proposed research methodology in four subsections: (1) preparing the input data for the following steps, (2) estimating the CPDF based on the Copula–Bayesian method for each raw forecast data, (3) describing the proposed new approach for selecting the most-improved data, and (4) introducing some statistical criteria for evaluating the skill of forecast data. In Section 4, the obtained results are presented, and finally, in Section 5, the concluding remarks are condensed.

Study area

This study investigates the Karoon river watershed in southwest Iran with about 67,000 km2 area (Figure 1). According to Iran's hydrological divisions, this region is part of the Persian Gulf basin. The watershed climate has a great diversity, as classified by the De Martonne aridity index (de Martonne 1926). The Karoon River is the largest river in the region in terms of discharge, and it is separated into two main branches, namely, the Arvand and Bahmanshir rivers, before flowing into the Persian Gulf.
Figure 1

The location of the Karoon basin in Iran as study area. The Bing aerial imagery is used as a base map.

Figure 1

The location of the Karoon basin in Iran as study area. The Bing aerial imagery is used as a base map.

Close modal

In recent years, extreme precipitation events have been observed in this watershed. One of these devastating events occurred in 2019, when several consecutive floods inundated large parts of the region following heavy rainfall. During this incident, many cities in the Karoon basin, such as Khorram Abad, Ahvaz, and Shahrekord, experienced more than 50% of their average annual precipitation. Therefore, accurate precipitation forecasts can be used as effective management tools to avoid or reduce the damage caused by such natural disasters, prevent further degradation of natural resources, and promote sustainable development. Table 1 presents the observed long-term monthly precipitation data.

Table 1

Observed long-term monthly precipitation

MonthJanFebMarAprMayJunJulAugSepOctNovDec
Precipitation (mm) 66.11 46.74 52.4 31.3 16.13 9.62 9.44 9.71 9.48 14.77 39.07 59.75 
MonthJanFebMarAprMayJunJulAugSepOctNovDec
Precipitation (mm) 66.11 46.74 52.4 31.3 16.13 9.62 9.44 9.71 9.48 14.77 39.07 59.75 

Data sources

Observation data

The Global Precipitation Climatology Center (GPCC) dataset was first created in 1989 by the German Weather Service (DWD) on behalf of the World Meteorological Organization (Yazdandoost et al. 2020). It is a precipitation dataset based on data from around 80,000 observation stations from several different sources, providing gridded data with different spatial resolutions (2.5° × 2.5°, 1.0° × 1.0°, 0.5° × 0.5°, and 0.25° × 0.25°). In this study, monthly records of GPCC with 1° cell spatial resolution (according to the forecast's resolution) were used as reliable surrogate reference gridded data for observed precipitation (Rezayi Banafsheh et al. 2011; Azizi et al. 2015; Darand & Zand Karimi 2016) from 1982 to 2018.

Forecast data

Five NMME models were utilized at a spatial resolution of 1° to provide monthly precipitation forecasts over the study area. Table 2 provides more information about the NMME models used. The ensemble means of each model from 1982 to 2010 (hindcast or reforecast period) were used to form nonparametric Copula and reforecast-based calibration. The validity of the presented postprocessing process was assessed (directly to each model's ensemble means) for the forecast period of 2011–2018.

Table 2

The five used NMME models and their characteristics

ModelHindcast periodForecast periodEnsemble size
NCEP-CFSv2 1982–2010 2012 to present 24 (28)* 
GFDL 1980–2010 2011 to present 12 
CMC1-CanCM3 1981–2010 2011 to present 10 
CMC2-CanCM4 1981–2010 2011 to present 10 
NCAR-CCSM4 1982–2010 2011 to present 10 
ModelHindcast periodForecast periodEnsemble size
NCEP-CFSv2 1982–2010 2012 to present 24 (28)* 
GFDL 1980–2010 2011 to present 12 
CMC1-CanCM3 1981–2010 2011 to present 10 
CMC2-CanCM4 1981–2010 2011 to present 10 
NCAR-CCSM4 1982–2010 2011 to present 10 

*The value in the parenthesis presents the ensemble size for the forecast period.

Figure 2 presents the proposed three-step postprocessing approach based on the Copula–Bayesian method. As mentioned in the studies by Khajehei et al. (2018) and Yazdandoost et al. (2021), the proposed approach is based on the presence of a general association between the historical raw forecast and observational data. In the first step, the data are prepared to obtain fitted distributions of forecast and observational time series. In the second step, referred to as pass 1, CPDF is prepared (as CPDFf) for each specific raw forecast data. In pass 2, the new approach focuses on determining the improved data. Each of the three steps is described in detail in the following sections. To show the applied method and its novelty, there is an example as a problem statement followed by a description of a solution in Appendix 2 of the Supplementary Information.
Figure 2

The dominant perspective of postprocessing.

Figure 2

The dominant perspective of postprocessing.

Close modal

Input data preparation

First, the available data must be classified into the observational and forecast time series for each month of the year. In this regard, GPCC and the ensemble raw NMME data time series are formed for each month separately. Then, the obtained time series containing monthly precipitation data (with the annual time steps for each 1° cell) are used to prepare a marginal distribution. Later, the marginal distribution functions are separately estimated for historical observations and the model in the analyses period based on the normal kernel density distribution, which has been shown to be efficient in previous studies (Yazdandoost et al. 2020; Yazdandoost et al. 2021). The marginal distribution and kernel Copula function are set based on the historical period from 1982 to 2010. Finally, as an input variable, the obtained fitted distributions are evaluated for the forecast period (2011–2018).

Estimation of CPDF

A dependence among the observational and predicted precipitation data can be completely described by the Copula-based methods. Let be the marginal distribution of each ith variable . Sklar's theorem (Sklar 1959) assures the existence of a unit cube function, C, such that:
(1)

See the study by Nelsen (2006) for a comprehensive overview of copulas and their mathematical properties.

The probability distribution function is calculated according to Equation (2):
(2)
Also, the joint distribution function (), which will be used in predicted precipitation improvement, is given as follows:
(3)
By considering the kernel copula function (nonparametric estimation of Copula) as a joint distribution function in the Bayesian equation (Equation (4a)), the CPDF will be created as the likelihood of an observational data event, given each raw forecast data (Equation (4b)):
(4a)
(4b)
For the bivariate joint distribution function, the CPDF in Equation (4b) can be calculated as follows:
(5)

In Equation (5), is the CPDF, and are the marginal distributions of the samples from the forecast and observation at time t, respectively. The sample data consist of 500 random data with the same distribution of variable in the hindcast period (Khajehei & Moradkhani 2017). The process mentioned earlier for creating CPDF (as CPDFf) will be carried out for each specific raw forecast data. We encourage readers to further explore the topic by referring to Sklar (1959) and Yazdandoost et al. (2021).

Determination of improved forecast data

The CPDFf illustrates the likelihood of sample observation data for the particular raw forecast. As discussed earlier, the unresolved underlying question in using nonparametric or semiparametric distribution is how to select the improved forecast of CPDFf. In this regard, this study has offered a novel method to identify the best improvement for the predicted data between the relative maximums of calculated CPDFf in the semiparametric or nonparametric approaches. To do this, Equation (6) is established for each pair of correlated events, which has the same form as Equation (4b):
(6)
So, for each sample observation (), which is the relative maximum of CPDFf, we have:
(7)
where is 500 random sample data with the forecast time series distribution. refers to the CPDF of each ith relative maximum ) of .

Next, the relative maximums of , which are closer to the raw forecast, are chosen. Each of and has a unique conditional likelihood of occurrence based on and . In this study, the technique for order preference by similarity to ideal solution (TOPSIS) method is used as a decision-making tool to prioritize the . This method which was developed by Hwang & Yoon (1981) and Yoon & Hwang (1995) is presented in Appendix 1 of the Supplementary Information. In this multicriteria decision-making method, the selection alternatives (maximum relative points) are ranked based on the degree of similarity to the desired values (observation), and the one with the highest rank is introduced as the best potential for selection. See the study by Garg (2019) for the related equations and method properties. The likelihood of an occurrence forecast given the observation and vice versa, as well as the PDF of each sample observation and forecast are the four input criteria of TOPSIS used to introduce the sample observation by taking the relative closeness to the ideal solution.

Assessment of the method validation

Validation of forecast data against observations is fundamental to scientific advancements, algorithm/model developments, and integration of data into applications. The Kling–Gupta efficiency (KGE) criterion is used to evaluate the reliability of NMME raw data (Equation (8)). The KGE combines the three components of Nash–Sutcliffe efficiency of model errors (i.e., correlation, bias, ratio of variances, or coefficients of variation) in a more balanced way. It has been widely used for model evaluation in recent years. According to Khajehei & Moradkhani (2017), KGE values of more than 0.6 can be considered acceptable. For the lower values, the importance of the postprocessing increases. Validation and quantification of the raw or improved NMME models' uncertainties are fundamental issues for evaluating their performance.

The validation of forecast NMME models, skill (raw and improved values), particularly in extreme value detection, is investigated using four volumetric indices (Equations (9)–(12) in Table 3) developed by AghaKouchak & Mehran (2013).

Table 3

Equations used for validation and quantification

 (8) 
 (9) 
 (10) 
 (11) 
 (12) 
Parameters 
 Correlation 
 The ratio of the variance of the forecast to the variance of the observation 
 The ratio bias 
 The NMME model value 
 The observation 
 Total number of observation (or NMME) data in the desired time series 
 Threshold 
 (8) 
 (9) 
 (10) 
 (11) 
 (12) 
Parameters 
 Correlation 
 The ratio of the variance of the forecast to the variance of the observation 
 The ratio bias 
 The NMME model value 
 The observation 
 Total number of observation (or NMME) data in the desired time series 
 Threshold 

The first index, the volumetric hit index (VHI), calculates the volume of correctly detected improved precipitation volume and missing observation values. The volumetric false alarm ratio (VFAR) calculates the volume of false simulations (inaccurate improved data) to the volume of simulations. The volumetric miss index (VMI) describes the missing observation's volume to the correctly detected simulation volume and missing observations. Finally, the volumetric critical success index (VCSI) indicates an overall measure of volumetric performance, including volumetric hit, false alarm, and misses (AghaKouchak & Mehran 2013). One benefit of volumetric indexes is their ability to decompose biases of improved data by evaluating different thresholds. According to the main purpose of this study, these indexes are a suitable tool to evaluate the efficient performance of NMME data postprocessing and extreme precipitation value detection. In this study, the threshold value applied to volumetric index calculations is selected based on extreme value detection. Extreme precipitation should be rarer than the tenth or 90th percentile of the observed density probable precipitation function (Shaffie et al. 2019). For areas with a high risk of flooding, the threshold value can be considered 0.9, as suggested by Shaffie et al. (2019).

To assess the accuracy of the raw precipitation data of each individual model ensemble mean, the consistency of the data with the GPCC data for the most precipitated months was determined. Table 4 shows the evaluation of the KGE criterion for the entire study area during 1982–2018 and the six rainiest months. As seen, the KGE value of the study area is less than 0.6 in all months; therefore, it is important to perform postprocessing for each cell.

Table 4

The KGE values for the six rainiest months

ModelJanuaryFebruaryMarchAprilNovemberDecember
NCEP-CFSv2 0.35 0.21 0.27 −0.07 0.25 0.13 
GFDL −0.33 −0.25 −0.14 −0.44 −0.53 −0.08 
CMC1-CanCM3 −0.27 −0.24 −0.08 −0.85 −0.48 −0.2 
CMC2-CanCM4 −0.32 −0.12 0.005 −0.53 −0.4 −0.19 
NCAR-CCSM4 0.25 0.31 0.31 −0.08 0.2 0.22 
ModelJanuaryFebruaryMarchAprilNovemberDecember
NCEP-CFSv2 0.35 0.21 0.27 −0.07 0.25 0.13 
GFDL −0.33 −0.25 −0.14 −0.44 −0.53 −0.08 
CMC1-CanCM3 −0.27 −0.24 −0.08 −0.85 −0.48 −0.2 
CMC2-CanCM4 −0.32 −0.12 0.005 −0.53 −0.4 −0.19 
NCAR-CCSM4 0.25 0.31 0.31 −0.08 0.2 0.22 

As there is no intent to limit this study, the proposed postprocessing procedure is applied to raw forecasts with a length of 1–6 months. However, in the first step, detailed information on the analysis for the one-month length is described. Figure 3 shows the spatial distribution of precipitation for January, the rainiest month (for the one-month length). In this figure, the raw data are presented in the first row, the improved data in the second row, and the GPCC data in the last row. According to the figure, the raw data predicted by CanCM4 are much less than the GPCC data, while the raw predicted values of the GFDL model show better performance. Postprocessing has brought all the model's raw data values closer to the GPCC data, and even the spatial patterns of postprocessed precipitations are approximately similar to the GPCC value's pattern, confirming that most of the rainfall is along the mountainous parts (in the eastern part) of the region.
Figure 3

The spatial distribution of Karoon basin precipitation for January in the forecast period. First row: the raw precipitation data of various models. Second row: the improved precipitation data of each model based on double Copula–Bayesian approach. Third row: the GPCC data as the observations.

Figure 3

The spatial distribution of Karoon basin precipitation for January in the forecast period. First row: the raw precipitation data of various models. Second row: the improved precipitation data of each model based on double Copula–Bayesian approach. Third row: the GPCC data as the observations.

Close modal

To assess the skill of the postprocessing method, Table 5 presents a comparison of each raw and improved NMME model forecasts against the GPCC calculated over the extent of the study area. In this table, the columns refer to the results of one-month (January) to six-month (January to June) length forecasts starting from January of years 2011 to 2018. As shown in this table, the raw data of CanCM3 and CanCM4 had the lowest estimates, while CCSM4 slightly underestimated and GFDL slightly overestimated. However, applying the double Copula postprocessing method revealed that no single model consistently had the best performance for each forecast length. Therefore, investigating the skill of all models is necessary to address the involved uncertainties. As expected, the model error after postprocessing was significantly reduced in most models and different studied periods. Since accurate estimates of future precipitation are crucial for managing water resources, our results suggest that for short-term planning (1–2 months), CFS2 and CanCM3 models are the most reliable, while for estimating resources in the 3-6-month horizon, CanCM4 outperforms the others. Interestingly, the raw data from the CanCM4 model had the poorest accuracy compared with the other models, but its postprocessing proved highly effective. The findings also indicate that correcting raw data in all models will lead to an overestimation of precipitation compared with actual values (except for the CanCM4 model). Therefore, considering this in future planning is critical.

Table 5

Comparison of the raw and improved NMME models against the GPCC data (% error)

NMME modelLength (month)
JJFJFMJFMAJFMAMJFMAMJ
CanCM4 Raw −68 −71.4 −77 −72.2 −66.7 −66.1 
Improved 29.5 23.2 −0.89 −6.9 0.2 3.4 
CanCM3 Raw −40.7 −25 −22.9 −12.6 −4.5 −3.1 
Improved 18.5 −3.7 1.7 17.7 14.88 17.55 
CFS2 Raw −12.2 37.7 33.4 49.8 55.9 56.6 
Improved −11.8 10.2 −2.8 0.5 14.9 −5 
CCSM4 Raw −20 −15.8 −26.1 −18.7 −11.77 −10.9 
Improved 7.5 23.1 12.8 12 13.91 2.01 
GFDL Raw 29.7 26.2 32.67 49.5 61.63 62.9 
Improved 14.4 10.7 17.6 13.86 15.29 11.5 
NMME modelLength (month)
JJFJFMJFMAJFMAMJFMAMJ
CanCM4 Raw −68 −71.4 −77 −72.2 −66.7 −66.1 
Improved 29.5 23.2 −0.89 −6.9 0.2 3.4 
CanCM3 Raw −40.7 −25 −22.9 −12.6 −4.5 −3.1 
Improved 18.5 −3.7 1.7 17.7 14.88 17.55 
CFS2 Raw −12.2 37.7 33.4 49.8 55.9 56.6 
Improved −11.8 10.2 −2.8 0.5 14.9 −5 
CCSM4 Raw −20 −15.8 −26.1 −18.7 −11.77 −10.9 
Improved 7.5 23.1 12.8 12 13.91 2.01 
GFDL Raw 29.7 26.2 32.67 49.5 61.63 62.9 
Improved 14.4 10.7 17.6 13.86 15.29 11.5 

Figure 4 uses the Taylor diagram as an efficient instrument to display the quality of model improvements against the GPCC values for one- to six-month lengths for various forecast models. According to the displayed results, the shorter the forecast length, the higher the dispersion of the Taylor diagram's estimated variables for raw data. In other words, in longer periods, almost all models (except CFSv2) present closer results. In terms of CC, in all different length analyses, most of the improved NMME models have been able to show a significant positive correlation with observational data, while among raw model data, only CCSM4 has a positive correlation with GPCC data. In terms of root mean square error, the estimated values of improved data show a significant increase compared with the raw model data. Generally, it seems that CCSM4 had the best and CanCM3 had the worst performance among the raw data.
Figure 4

The Taylor diagram of raw and postprocessed NMME models with the one- to six-month periods in Karoon basin. The red points illustrate the raw data and the green points present the postprocessed NMME data. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/wcc.2023.277.

Figure 4

The Taylor diagram of raw and postprocessed NMME models with the one- to six-month periods in Karoon basin. The red points illustrate the raw data and the green points present the postprocessed NMME data. Please refer to the online version of this paper to see this figure in colour: http://dx.doi.org/10.2166/wcc.2023.277.

Close modal
To evaluate the temporal performance of NMME models, we compared each model's skill against observational data at each 1° pixel. Figure 5 presents an example of the temporal distributions of precipitation predicted by the five NMME models with a one-month horizon for the study region. In this figure, the predicted intensity of precipitation during different months of years (throughout the entire study period 1982–2018) for all models is presented. Each subplot shows a comparison of observational precipitation data with each raw and postprocessed NMME model forecast. As seen, the number of months with heavy rainfall has remained consistent over the years, with most of these events occurring from October to April.
Figure 5

The temporal distribution of one-month length precipitation in the region during 1982–2018.

Figure 5

The temporal distribution of one-month length precipitation in the region during 1982–2018.

Close modal

Furthermore, the intercomparison of models shows that the raw GFDL model often tends to overestimate precipitation. In addition, it has estimated the highest precipitation values compared with the observed values, and the CFSv2 model has the most similarity to the observation data. It is also shown that the raw data of the CanCM4 model were unable to predict periods of high precipitation intensity and the raw data of the CCSM4 model showed the highest pattern similarity of estimating extreme precipitation. Moreover, the results showed that the postprocessed data were more consistent with the observational data, indicating the effectiveness of the double Copula postprocessing method.

Therefore, it can be claimed that the proposed method is truly able to improve the overall results and identify extreme precipitation values. Since the intensity or frequency of heavy rains in the region has not significantly increased, it can be considered that the cause of recent floods in the region may be due to increased urbanization and/or changes in land use/cover, which subsequently cause changes in runoff.

For precise investigation of postprocessing ability to recognize extreme precipitations, the KGE and four mentioned volumetric index values for January (one-month length) are investigated. Figure 6 (as a StarPie diagram) shows different models' performance over the hindcast and forecast periods. In this figure, each pie ranges from 0 to 1 (like volumetric indexes). The closer the index values are to 1, the better the model performance. For the VMI and VFMR indexes, the values of 1-VMI and 1-VFMR are used as substitutes. In the case of negative KGE values, they have been indicated by a star sign (*). As shown, the double Copula method has a high ability to improve the raw NMME data and detect extreme values. As shown in Figure 6, among the raw data results, the estimated KGE values for the raw CFSv2 and CCSM4 models are higher than for the others. Due to the other models' negative KGE values, using them for other hydrological calculations will lead to unreliable results.
Figure 6

The KGE and volumetric indices results of NMME models for January. The left block contains the hindcast raw and improved data. The right block shows the forecast raw and improved data. Each side of pie star is equal to 1.

Figure 6

The KGE and volumetric indices results of NMME models for January. The left block contains the hindcast raw and improved data. The right block shows the forecast raw and improved data. Each side of pie star is equal to 1.

Close modal

According to the volumetric indices shown in the figure, during the forecast period, the extreme precipitation values are better identified in the raw GFDL model. As seen in the second row of Figure 6, the postprocessed precipitation values in both hindcast and forecast periods have been significantly improved. Among these modified precipitation values, the KGE index has the lowest value, while the CFSv2 has the least value. In other words, modified forecasts can still be improved to gain sufficient certainty. However, in terms of other indices, the postprocessing algorithm has been very successful.

Seasonal precipitation forecasts are one of the main inputs of hydrological forecasting models. If these forecasts are reliable, they can provide useful information for decision-makers in water resources management. In this paper, a Copula-based Bayesian approach is used as the postprocessing method to evaluate the use of nonparametric distribution for both the marginal distributions and copula function to improve NMME precipitation forecasts. Here, the normal kernel density distribution function is employed as the marginal distribution, and the kernel Copula function is used as the bivariate function to create the conditional probability distribution function (CPDF). Introducing several relative maximum likelihoods in the CPDF can make it challenging to choose the most-improved data. To address this issue, here, a novel method is suggested that chooses the relative maximum points and applies the Copula–Bayesian approach for the second time on the selected values diversely to receive the initial forecast value for the given sample observations. Finally, the TOPSIS decision-making method is applied to pick out the most likely sample observation. The proposed postprocessing method is examined on the Karoon basin, one of the regions in Iran that have experienced significant flood damage. Five NMME models for the hindcast (1982–2010) and forecast (2011–2018) periods are used. The GPCC observational data are used to evaluate the accuracy of the ensemble's mean. The KGE values indicate the necessity of improving the NMME model output. By investigating the obtained results for forecasts with one- to six-month lengths, we found the following:

  • 1.

    The spatial and temporal distributions of GFDL raw models are more similar to the observational data.

  • 2.

    The GFDL raw models' values relatively perform better than those of the other models. Conversely, CanCM3 and CanCM4 models miss the early months of each year.

  • 3.

    According to the postprocessed data, the spatial and temporal distributions are highly consistent with the observations. However, in creating time series and subsequent processing methods, the spatial and temporal coherence of data with adjacent cells are ignored.

  • 4.

    Based on the used volumetric indexes (VHI, VMI, VFMR, and VCSI), the raw forecast model values have low skill for estimating extreme values of precipitation. After the postprocessing procedure, the strength of the double Copula method in determining the extreme values was demonstrated.

  • 5.

    The higher accuracy and correlation of different improved NMME data imply lower uncertainties than raw estimations.

These findings enable us to make a correct decision about the following important questions: (1) which model can more accurately estimate the total amount of precipitation, (2) which models have higher accuracy in months with heavy precipitation, (3) which models perform better in the short-term horizon versus the long-term horizon, and (4) which models tend to underestimate or overestimate the actual values, and what are the corresponding error margins for each. Such information is crucial for effective water resource management planning and the development of reliable warning systems.

Given the methodology utilized in this study and the resultant findings, there are several potential areas for future research that will complete the presented postprocessing method and make critical contributions to the improvement of forecasts. For example, investigating the use of spatial copulas to account for the dependence between neighboring cells, which was not considered in this study, is an interesting topic. Also, the investigated NMME data are monthly, so the time series in each cell are performed for each month's data separately. But now, daily NMME models are presented. Exploring the use of daily NMME models instead of monthly data and developing new strategies to incorporate data dependency and seasonality trends lead to an increase in forecast skill. This could involve the use of temporal copulas to illustrate the temporal dependency of precipitation data and its impact on postprocessing outcomes. Precipitation is a function of various variables, and the authors propose to apply multidimensional copulas to consider other effective parameters such as temperature and topography.

All relevant data are included in the paper or its Supplementary Information.

The authors declare there is no conflict.

AghaKouchak
A.
,
Bárdossy
A.
&
Habib
E.
2010
Copula-based uncertainty modelling: application to multisensor precipitation estimates
.
Hydrological Processes
24
(
15
),
2111
2124
.
Azizi
G.
,
Miri
M.
,
Mohammadi
H.
&
Pourhashemi
M.
2015
Analysis of relationship between forest decline and precipitation changes in Ilam Province
.
Iranian Journal of Forest and Poplar Research
23
(
3
),
505
515
.
Becker
E.
,
Kirtman
B. P.
&
Pegion
K.
2020
Evolution of the North American Multi-Model Ensemble
.
Geophysical Research Letters
47
(
9
),
e2020GL087408
.
Bouri
E.
,
Gupta
R.
,
Lau
C. K. M.
&
Roubaud
D.
2019
Risk Aversion and Bitcoin Returns in Normal, Bull, and Bear Markets, RePEc Working Paper 201927. Department of Economics, University of Pretoria, Pretoria, South Africa.
Chen
S. X.
&
Huang
T. M.
2007
Nonparametric estimation of copula functions for dependence modelling
.
Canadian Journal of Statistics
35
(
2
),
265
282
.
Darand
M.
&
Zand Karimi
S.
2016
Evaluation of the accuracy of the Global Precipitation Climatology Center (GPCC) data over Iran
.
Iranian Journal of Geophysics
10
(
3
),
95
113
.
Dehban
H.
,
Ebrahimi
K.
,
Araghinejad
S.
&
Bazrafshan
J.
2020
Development of monthly ensemble precipitation forecasting system in Sefidrud Basin, Iran
.
Iranian Journal of Soil and Water Research
51
(
8
),
1881
1893
.
de Martonne
E.
1926
Une nouvelle function climatologique: L'indice d'aridité
.
Meteorologie
2
,
449
459
.
Exum
N. G.
,
Betanzo
E.
,
Schwab
K. J.
,
Chen
T. Y. J.
,
Guikema
S.
&
Harvey
D. E.
2018
Extreme precipitation, public health emergencies, and safe drinking water in the USA
.
Current Environmental Health Reports
5
(
2
),
305
315
.
Hwang
C.
&
Yoon
K.
1981
Multiple Attribute Decision Making: Methods and Applications. A State-of-the-Art Survey
.
Springer-Verlag
,
Berlin , West Germany
.
Kelly
K. S.
&
Krzysztofowicz
R.
1997
A bivariate meta-Gaussian density for use in hydrology
.
Stochastic Hydrology and Hydraulics
11
(
1
),
17
31
.
Madadgar
S.
&
Moradkhani
H.
2012
Towards an improved postprocessing of hydrological forecast ensembles using copula
. In:
EGU General Assembly Conference Abstracts
, p.
13757
.
Nelsen
R. B.
2006
An Introduction to Copulas
, 2nd edn.
Springer Science & Business Media
,
New York, USA
.
Oakes
D.
1982
A model for association in bivariate survival data
.
Journal of the Royal Statistical Society: Series B (Methodological)
44
(
3
),
414
422
.
Pakdaman
M.
,
Falamarzi
Y.
,
Babaeian
I.
&
Javanshiri
Z.
2020
Post-processing of the North American multi-model ensemble for monthly forecast of precipitation based on neural network models
.
Theoretical and Applied Climatology
141
(
1
),
405
417
.
Rayner
S.
,
Lach
D.
&
Ingram
H.
2005
Weather forecasts are for wimps: why water resource managers do not use climate forecasts
.
Climatic Change
69
(
2–3
),
197
227
.
Rezayi Banafsheh
M.
,
Jahanbakhsh
S.
,
Bayati Khatibi
M.
&
Zeinali
B.
2011
Forecast of autumn and winter precipitation of west Iran by use from summer and autumn Mediterranean sea surface temperature.
Physical Geography Research Quarterly
42 (
4
),
47
62
.
Robertson
D. E.
,
Shrestha
D. L.
&
Wang
Q. J.
2013
Post-processing rainfall forecasts from numerical weather prediction models for short-term streamflow forecasting
.
Hydrology and Earth System Sciences
17
(
9
),
3587
3603
.
Roy
T.
,
He
X.
,
Lin
P.
,
Beck
H. E.
,
Castro
C.
&
Wood
E. F.
2020
Global evaluation of seasonal precipitation and temperature forecasts from NMME
.
Journal of Hydrometeorology
21
(
11
),
2473
2486
.
Sklar
M.
1959
Fonctions de repartition à n dimensions et leurs marges
.
Publications de l'Institut de Statistique de l'Universite de Paris
8
,
229
231
.
Tao
Y.
,
Duan
Q.
,
Ye
A.
,
Gong
W.
,
Di
Z.
,
Xiao
M.
&
Hsu
K.
2014
An evaluation of post-processed TIGGE multimodel ensemble precipitation forecast in the Huai river basin
.
Journal of Hydrology
519
,
2890
2905
.
Wu
L.
,
Seo
D.-J.
,
Demargne
J.
,
Brown
J. D.
,
Cong
S.
&
Schaake
J.
2011
Generation of ensemble precipitation forecast from single-valued quantitative precipitation forecast for hydrologic ensemble prediction
.
Journal of Hydrology
399
(
3–4
),
281
298
.
Xu
L.
,
Chen
N.
,
Zhang
X.
&
Chen
Z.
2018
An evaluation of statistical, NMME and hybrid models for drought prediction in China
.
Journal of Hydrology
566
,
235
249
.
Yazdandoost
F.
,
Moradian
S.
,
Zakipour
M.
,
Izadi
A.
&
Bavandpour
M.
2020
Improving the precipitation forecasts of the North-American multi model ensemble (NMME) over Sistan basin
.
Journal of Hydrology
590
,
125263
.
Yoon
K. P.
&
Hwang
C.
1995
Multiple Attribute Decision Making: An Introduction
.
Quantitative Applications in the Social Sciences 104
,
Sage Publications
,
Thousand Oaks, CA, USA
, pp.
38
45
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).

Supplementary data