Flood attributes such as the water level may depend on multiple forcing variables that arise from common meteorological conditions. To correctly estimate flood risk in these situations, it is necessary to account for the joint probability distribution of all the relevant forcing variables. An example of a joint probability approach is the design variable method, which focuses on the extremes of the forcing variables, and approximates the hydraulic response to forcing variables with a water level table. In practice, however, application of the design variable method is limited, even for the bivariate case, partly because of the high computational cost of the hydrologic/hydraulic simulations. We develop methods to minimise the computational cost and assess the appropriate extent and resolution of the water level table in a bivariate context. Flood risk is then evaluated as a bivariate integral, which we implement as an equivalent line integral. The line integral is two orders of magnitude quicker and therefore beneficial to settings that require multiple evaluations of the flood risk (e.g., optimisation studies or uncertainty analyses). The proposed method is illustrated using a coastal case study in which floods are caused by extreme rainfall and storm tide. An open-source R package has been developed to facilitate the uptake of joint probability methods among researchers and practitioners.
Numerous studies have demonstrated that extreme water levels may be attributed to multiple forcing variables. Examples include high ocean levels caused by concurrent waves and storm tides (Hawkes et al. 2002), coastal floods involving both elevated storm tides and high river flows (Zheng et al. 2013; Kew et al. 2013), and fluvial floods resulting from extreme rainfall occurring on an already wet catchment (Pathiraja et al. 2012). Statistical dependence between the forcing variables is common due to the significant spatial and temporal dependences that exist in the climate system (Leonard et al. 2014). Methods that incorporate these dependent forcing variables into flood risk estimation are often referred to as joint probability analyses (Hawkes 2008).
There are three general methods that encapsulate the joint probability of forcing variables in a statistically rigorous manner: the response variable method, the continuous simulation method and the design variable method (Hawkes 2008; Zheng et al. 2014). Each method differs in its assumptions, data requirements and computational complexity. For example, the response variable method directly estimates flood risk as a univariate frequency analysis of the response variable observations at the location of interest (e.g., flood level or flood volume). In this method, the impact of the dependence between the forcing variables on flood risk has been implicitly accounted for by the response variable. Despite being relatively simple, application of the response variable method is limited in practice because recorded data are rarely available at the location of interest.
The continuous simulation method obtains the distribution of the response variable (whether extreme or not) by means of simulation, followed by a univariate frequency analysis (Cai et al. 2008). Similar to the response variable method, this approach does not require complex statistical modelling of extremes, but it does require long and continuous sequences of the forcing variables (such as rainfall and storm tide) that are often not available. It can also be computationally demanding to predict the response variable with hydrologic/hydraulic models. Given that computational budgets are typically limited in practice, the continuous simulation method can be costly to implement.
The design variable method significantly differs from the response variable and continuous simulation methods in that it requires explicit modelling of the extremal dependence of the forcing variables (and associated joint probability density), followed by an integral of the joint density to estimate the flooding risk (Coles & Tawn 1994). Since this method focuses on the extremes of the response variable, it is appreciably less computationally demanding compared to the continuous simulation method that requires simulation of both extreme and non-extreme situations. Widespread implementation of the design variable method is still limited, however, due to difficulty in quantifying the extremal dependence between forcing variables, and computational challenges associated with implementation. This study focuses on the latter.
To address the computational challenges of joint probability analyses, a number of probabilistic computation methods have been developed. For example, the first-order reliability method and the second-order reliability method were developed to approximate the failure region for efficient probability calculation (Zhao & Ono 1999). Monte Carlo importance sampling, directional sampling and subset simulation methods generated samples in the region of interest rather than the entire space (Schueremans et al. 2011), and adaptive response surface techniques were proposed to minimise the number of model runs (Nguyen et al. 2009). In contrast to these approaches, the design variable method uses a response surface to approximate the behaviour of the hydraulic model at a fixed set of boundary conditions and estimate the failure probability using a numeric integration (Coles & Tawn 1994). To improve its implementation efficiency in a bivariate context, this study proposes a multi-step approach as well as a method for transforming the bivariate integral to an equivalent line integral.
Improved efficiency is especially important whenever aspects of a design require repeated evaluation. Examples include: systems having many locations, return periods or durations of interest; optimisation studies (Nicklow & Hellman 2004); and uncertainty analyses (Fu & Kapelan 2013). Two key issues for enhancing the efficiency of the design variable method are: (i) the construction of a water level table (a table of water levels from different combinations of two forcing variables) that uses the minimum number of hydraulic model evaluations required for the design; and (ii) the estimation of exceedance probabilities for a given water level. We address the first issue by using a multi-step approach combined with techniques that assess the adequacy of the table's extent and resolution, and the second issue by converting the exceedance probability from a bivariate integral to an equivalent univariate line integral. Flood risk is normally defined as the probability of the hazard times the consequences. This study considers the probability of the hazard only, measured in terms of water levels.
RATIONALE FOR JOINT PROBABILITY ANALYSES
Figure 1 illustrates the difference between the univariate method (top panel) and the design variable method as an example of a joint probability method (bottom panel) for a hypothetical scenario in which floods are caused by two forcing variables X and Y. In the top panel, the grey shaded region represents the exceedance probability where h (the dashed line) is determined by a single forcing variable (e.g., rainfall or storm tides). The grey shaded region in the bottom panel illustrates the exceedance probability for the region Ah where h (the solid line) depends on both forcing variables. The probability is evaluated as the integral of the joint density (contours, lower panel) across the whole failure region Ah (Equation (3)) and thus allows for the statistical dependence between the two forcing variables.
To enable comparison, the failure region determined by the univariate method (the grey shaded region in the top panel) is also shown in the bivariate setting (the bottom panel) by the grey shaded region to the right side of the dashed vertical line. It is clearly seen that the univariate failure region is smaller than the Ah indicated by the joint probability method, suggesting that the univariate method will underestimate the flood risk in this case.
A MULTI-STEP METHOD FOR EFFICIENT EVALUATION OF JOINT PROBABILITY
The proposed method comprises five steps: (1) perform a preliminary analysis, (2) specify the bivariate probability distribution of the forcing variables, (3) determine the failure region for the given water level h, (4) integrate the joint density in the failure region to obtain the flood risk estimate and (5) check the bounds on the estimates due to the water level table approximation. Where the bounds are too large for the given design context, further iteration of Steps 3–5 is required.
Step 1: Perform a preliminary analysis
Joint probability analyses require additional computational effort when compared to conventional univariate methods for flood risk estimation and this is especially the case whenever gridded (two-dimensional) hydraulic models are used. Therefore, a preliminary analysis is proposed, which aims to determine the potential improvement in water level estimates to be gained by undertaking a joint probability analysis.
The default (Case 0) for the preliminary analysis is the conservative assumption of complete dependence between the respective distributions of the boundary values (e.g., rainfall and tide). Thus a water level with Y year average recurrence interval (ARI) is determined from a Y year ARI rainfall event (from which streamflow is derived) coinciding with a Y year ARI storm tide event at the river's lower reaches. To evaluate the complete independence case, two alternative situations are modelled: Case 1, in which a Y year ARI storm tide event occurs in isolation (i.e., with no rainfall); and Case 2, in which a Y year ARI rainfall event occurs in isolation (i.e., with the mean sea level (MSL)). This results in a total of three runs of the hydraulic model before proceeding with more analyses for a particular ARI of interest for the water level.
The three runs are compared to determine whether the differences between complete dependence and complete independence of the forcing variables are large enough for the given design context to warrant the additional modelling effort. The tolerance is defined by the user and it is a trade-off between the benefit of having a more accurate assessment versus the cost of implementing the joint probability analysis. In other words, the additional computational cost of the joint probability analysis should be at least proportional to the benefit of the additional precision. This trade-off may vary according to different locations and design problems. If the differences are small, the conservative assumption of complete dependence might serve as a suitable and expedient estimate for the location of interest and hence Steps 2–5 given below can be skipped, otherwise the entire multi-step method (Steps 2–5) needs to be performed for estimating flood risk. The same approach can be repeated for multiple ARIs.
The preliminary analysis can be especially beneficial when multiple locations are considered as a single hydrodynamic simulation can often provide water levels for all locations. The impacts of the joint probability of the forcing variables to flood risk (e.g., rainfall dominating, surge dominating or transition zones) can be simultaneously obtained for all these locations without additional computational overheads compared to considering a single location.
Step 2: Specify the joint cumulative distribution function F(x, y)
An appropriate threshold u needs to be determined for each margin and is critical for the use of the threshold-excess method. This is because a value of u that is too low may produce biased estimates due to an invalid asymptotic justification, and a value that is too high will result in a high variance due to the limited sample size (see Coles (2001) for details). The marginal thresholds were selected based on diagnostic plots such as mean residual life plots and scale and shape parameter plots (Coles 2001; Zheng et al. 2013).
Step 3: Approximate the hydraulic response
Many different combinations of x and y can result in an identical water level h. In the design variable method, the failure region for a given water level h is interpolated from a water level table. This table is obtained from hydraulic modelling and represents an approximation of the hydraulic response since it is only evaluated for certain pairs of boundary values across a finite range of ARIs (focused on extreme events). An example of the water level table is given later for the case study.
Step 4: Estimate the exceedance probability of water levels
Step 5: Check the impact of hydraulic approximations on the water level estimate
The values in the water level table are obtained using hydrologic/hydraulic models and obtaining larger tables requires greater computational and modelling effort. To minimise the modelling effort while maximising precision it is important to determine the influence of the selected boundaries and resolution of the table. The methods used here are diagnostic tools to evaluate the adequacy of any given water level table (no matter how coarse it may be), allowing it to be refined subsequently with additional runs if necessary.
One issue relates to the specification of the upper boundary of the water level table when it does not capture the entirety of the boundary function B(x, y). To address this issue, we compute the return probabilities of two alternate cases. A water level of h =2 m is taken from the case study results to illustrate the two cases as shown in Figure 2 (solid black lines). The first case (left panel) projects the h =2 m line horizontally back to the y-axis at the storm tide intersection ARI = 500 years (dotted black line). This caps the largest storm tide at ARI = 500 years, which will result in an overestimate of the return probability (the failure region Ah is shown by the grey shaded region in the left panel of Figure 2). The second case extends the black solid line of h =2 m vertically to infinity (black dotted line in the right panel), i.e., assumes the largest water level produced by the storm tides is h =2 m, even if the storm tide . For this case, the failure region Ah is given by the shaded region to the right side of the black solid and dotted lines, and the return probability is underestimated.
Another issue relates to the specification of the lower boundary of the water level table. The lower boundary of the water level table should give the case of ‘rainfall only’ flooding when the tide is not extreme, and ‘tide only’ flooding when there is no rain. The sensitivity of the water level estimates to these cases can be tested by removing them and using the lowest ARI tide/rain values in their place.
A third issue relates to whether the resolution of the water level table leads to sufficient precision in the boundary function interpolation for a given water level. Variability in the specification of the water level table leads to potential variation in the interpolated boundary function as well as variation in the associated estimated exceedance probability. The limited resolution of the water level table represents a trade-off between the precision in the water level estimates and the computational cost of running hydrologic/hydraulic models. Quantifying the sensitivity of the results to the resolution of the water level table will allow for judicious selection of additional model runs used to refine the table. We implement a simple method based on adjacent values from the water level table as upper and lower bounds on the interpolated boundary function.
Based on the results in Step 5, one can determine whether or not the water level table needs to be further increased in extent and resolution. For example, if the upper boundary of the water level table causes significant variations in estimates, further large ARIs of the forcing variables are required to construct the water level table; if the lower boundary (no-rainfall or MSL) results in a large difference in water level estimates, this indicates that the inclusion of the lower boundary is critical, offering guidance for building water level tables. Similarly, a higher-resolution water level table is needed when the estimation differences between upper and lower bounds of the interpolated boundary function are substantial.
CASE STUDY RESULTS AND DISCUSSION
The proposed method is demonstrated for a drainage network of a small coastal catchment located in Perth, Western Australia. This drainage network discharges into the Fremantle region near the outlet of the Swan River to the Indian Ocean. Water levels in the drainage system at the catchment outlet are influenced by the concurrent rainfall on the catchment and tide levels. Flows in the urban drainage network are modelled using the Infoworks CS package (Innovyze Inc. 2011).
The design event method is employed to determine flows according to Australian design guidelines (ARR 1987), using rainfall parameters and initial-loss/continuing-loss parameters for the Perth region as specified by the guidelines. The design method accounts for antecedent moisture conditions by means of the loss parameters (Rahman et al. 2002). The method was repeated for several storm-burst durations and a 1-hour duration was ultimately selected as the critical duration as this duration produced the maximum water levels for all ARIs. Only the results for the 1-hour duration are presented in this paper. The dependence analysis was conducted for the cumulative 1-hour rainfall paired with the maximum concurrent storm tide (Zheng et al. 2013). It is noted that the ‘water level’ terminology was used for the case study, and floods only occur when water overflows from the drainage network.
Table 1 shows the results of the preliminary analysis (Step 1 of the proposed method), considering ARI = 2, 20 and 100 years of the marginal variables (rainfall and storm tides). The ‘no rainfall’ and ‘MSL’ cases are represented in grey shades along each margin of the water level table while the diagonal terms are the cases of complete dependence. It can be seen that the diagonal values are considerably higher than either margin for all ARIs considered. This indicates that the interaction of the two variables has a significant influence on the water level at this study location and that a joint probability analysis is necessary.
|Rainfall events (ARIs)|
|Storm tide events (ARIs)||Mean sea level||1.16||1.76||2.24|
|Rainfall events (ARIs)|
|Storm tide events (ARIs)||Mean sea level||1.16||1.76||2.24|
Estimation of dependence
Having determined the necessity for a joint probability analysis, it is necessary to estimate the dependence between the two forcing variables (rainfall and storm tides). Using the method outlined in Step 2, Equations (4)–(8) are used to estimate the dependence parameter between the extreme rainfall and storm tides. A comprehensive analysis of dependence has been documented in Zheng et al. (2013) and hence we do not present the details in the current study. We estimated the dependence parameter to be using a large number of rainfall-storm tide data sets from the case study catchment (Zheng et al. 2013). The fitted joint density function for is shown by the blue contours in Figure 3.
Approximation of hydraulic response
In Step 3 of the method, a water level table has to be constructed to specify the boundary of the failure region as necessary for the density integral above a given water level h. Table 2 shows a relatively higher-resolution water level table for a range of ARIs (10 tide ARIs × 10 rainfall ARIs gives 100 model runs). The temporal patterns of the extreme rainfall (1 hour duration) events in Table 2 with different ARIs were taken from Australian Rainfall and Runoff guidance (ARR 1987).
|Storm tide (ARI)||MSL||0.79||0.88||1.16||1.43||1.57||1.76||1.97||2.24||2.55||3.07|
|Storm tide (ARI)||MSL||0.79||0.88||1.16||1.43||1.57||1.76||1.97||2.24||2.55||3.07|
The interpolated contours from Table 2 are shown in Figure 3 (solid red lines). The overlay of the probability density (blue contours) with the boundary function is a useful visual aid for conceptualising the mechanics of Equation (3): the exceedance probability is the summation of the probability density function (blue contours) for all combinations of x and y that exceed the specified water level h. In the format of Figure 3 it is straightforward to visualise that the exceedance probability depends on the shape of the density function via the estimated parameter , so that the more (less) strongly dependent the two variables are, the more (less) density occurs along the increasing diagonal, and accordingly, the exceedance probability is greater (lower).
It is also straightforward to see that for some contours the exceedance probability is influenced mostly by one forcing variable (such as the 3 m near-vertical contour) whereas for other contours it is influenced by combinations of both variables (such as the 1 m contour). This suggests that very high water levels of this case study are dominated by the rainfall while more frequent flood events are increasingly influenced by the joint dependence between extreme rainfall and storm tides.
Water level estimation
The water level estimates for the case study are presented in Figure 4 (solid black line). These results required 1.32 seconds to evaluate 100 water levels ranging from 1 to 3 m using R3.0.1 on a 3.4 GHz Pentium PC. We also implemented the direct integral method specified in Equation (3) through nested use of the standard 1-D integrate function from the R ‘stats’ package. To obtain the results (to similar accuracy) given in Figure 4, the bivariate integral method of Equation (3) required 131 seconds for the same range and number of water levels, which is approximately two orders of magnitude slower than the proposed method specified in Equation (9).
Although the difference between the proposed integration and the direct integral methods is on the order of seconds (contrasted with hydraulic models running on the order of days), the cumulative efficiency gain is significant given that many water levels, locations and duration scales may be considered. Furthermore, many applications involve numerous iterations of the estimation procedure, such as when optimising the design of flood infrastructure and when performing uncertainty analyses. For example, 1,000 samples of marginal and dependence parameters were generated from a posterior distribution and were used to conduct a parametric uncertainty at the location corresponding to Table 1. The analysis, at this one location, required 20 minutes using Equation (9) as compared to 1.5 days using Equation (3).
For comparison with the estimated dependence case , Figure 4 shows the cases where the rainfall and storm tide are completely dependent (, red dotted line) and completely independent (, blue dot dash line). There are substantial differences between the estimated ARIs of the independence and complete dependence cases for all given water levels. For example, for the water level of 2.0 m, the estimated ARI with is approximately 18 years, the ARI with is 40 years (assuming complete independence), and the ARI with is approximately 8 years (assuming complete dependence).
Since no recorded water level data are available for this case study catchment, the verification on the estimated water levels from the design variable method cannot be made. However, the accuracy of the design variable method has been demonstrated by a number of previous studies, such as Coles & Tawn (1994) and Bruun & Tawn (1998). The focus of this study is on the efficiency improvement of the bivariate integration as well as the proposal to effectively construct water level tables for flood risk estimation though the design variable method.
Upper boundary of the water level table
Using the approach outlined in Figure 2, the two approaches for extrapolating the water level table above the highest estimated values were adopted, and which by design will either over- or under-estimate the flood risk. These are shown in Figure 4 (the solid black and dashed lines). For the given ARIs, the differences in the water levels from the two cases are relatively minor, suggesting that the provided upper boundaries in Table 2 with ARI = 500 for each margin are sufficient for estimating water levels with ARI up to 100 years. If this difference were greater, it would be necessary to expand the water level table using larger ARIs of the rainfall and storm tides (e.g., ARI = 1,000 years). The proposed specification method can be used as a diagnostic to verify whether the given upper boundary is sufficient or not for estimating the specified water level range.
Lower boundary of the water level table
To consider the influence of the lower boundary, a basic analysis of the sensitivity to variations of Table 2 was conducted. The results are summarised in Figure 5 where the solid black line is the same as that in Figure 4. The green dashed and red dotted lines represent the cases where the lower boundaries have been removed (i.e., the removal of the no-rainfall column and the ‘MSL’ row in Table 2, respectively). The results show that the estimates of the case study are sensitive to the lower boundary: the green dashed line shows that estimates of low water levels are sensitive to excluding the no-rainfall case, but that all estimates are sensitive to the removal of the MSL case as shown by the red dotted line. This indicates that it is important to include the low boundaries of no-rainfall and mean sea level in the water level table.
The significant impact of the lower boundary is expected, because in practice, rainfall or surge events will frequently occur separately as a result of their weak dependence (Zheng et al. 2013). Therefore the removal of the no-rainfall (or MSL) case indicates that a significant proportion of the extreme surge (or rainfall) events are ignored, resulting in substantial increases of water level estimates using Equation (9). As shown in Figure 3, the rainfall has large impact on water levels across all ARIs, while the storm surge only influences the water levels with low ARIs for the case study. This explains the significant effects of the MSL case to water level estimates across all ARIs (red dotted line in Figure 5) as well as the limited impact of the no-rainfall case on water levels with high ARIs (green dashed line).
Resolution of the water level table
The method to assess the resolution of the water level table is illustrated in Figure 6 using two water levels h =1.52 and h =2.54 m (solid red lines) – the same as in Figure 3. All values in the water level table (Table 2) are shown in Figure 6 to facilitate the interpretation of the proposed method.
From Figure 6, the respective boundary functions for h =1.52 and h =2.54 m are shown as the linearly interpolated solid lines. The interpolation bounds are obtained from the bordering values of Table 2 for the two separate cases. The lower bound case uses all the values in the water level table that are immediately lower than the specified water level h, resulting in a boundary function that has a larger failure region and consequently a larger exceedance probability (red dotted lines in Figure 6). The upper bound case takes all the values in the water level table that are immediately larger than the water level h, producing a boundary function with a smaller failure region and hence a smaller exceedance probability (red dashed lines in Figure 6).
Figure 7 shows the lower and upper bounds for the estimated water levels with ARI ranging from 1 to 100 years using two different sized water level tables. The red dashed lines are based on the 10 × 10 water level table presented as Table 2 whereas the blue dotted lines are from a coarser 5 × 5 water level table (MSL/no-rainfall, 2, 10, 50 and 500 ARIs). The range between the bounds is a measure of the coarseness of the water level table in its approximation of the hydraulic response and the improvement in the bounds on the estimate that arises from using a 10 × 10 table over a 5 × 5 table can clearly be seen. Note that the 10 × 10 table was the initial implementation in this case study, so the 5 × 5 comparison was not strictly used as a diagnostic.
To reduce the bound range, it is possible to perform additional runs of the hydraulic model and refine the water level table. The amount of refinement would depend on whether the bounds are acceptable for a specific water level of interest. If they are deemed too large, then it may be necessary to increase the resolution in the rows or columns of the table immediately above/below the water level of interest. For example, if one is interested to reduce the bound range of h =2.54 m as shown in Figure 7 for the 10 × 10 table, additional model runs should target the region between the dotted and dashed lines associated with h =2.54 m as illustrated in Figure 6.
While the ranges of the respective bounds in Figure 7 are large at places, it is important to emphasise that the lines represent the utmost extreme cases of departure from the interpolated ‘best’ line. In other words, they should not be interpreted in a statistical manner (as with confidence limits). Furthermore, unless there is a specific ‘higher-resolution’ feature in the hydraulic model that might induce a large shift in the water level, it is reasonable to expect subsequent refinements of the water level table to give interpolations that lie toward the centre of the bounded range than to the outer extent.
The method demonstrated in Figure 7 has a diagnostic ability since it allows the sensitivity of the range of water levels to be evaluated for any given degree of resolution of the water level table. Along with the diagnostic checks for the extent of the water level table (Figures 4 and 5) it is possible to assess the adequacy of the water level table as an approximation of the hydraulic response and any impact it may have on the results. This method allows for development of the water level table in a manner that minimises the number of evaluations of computationally expensive hydraulic models.
The dependence of floods on multiple forcing variables complicates the calculation of flood exceedance probabilities, because a single water level can be caused by many different combinations of the forcing variables. The current study implements an efficient joint probability method (the design variable method) by using a water level table combined with a univariate line integral. For studies involving many repeated design scenarios (locations, nodes, parameters) the cumulative saving of this integration method can be substantial. Additionally, a range of techniques are implemented to further minimise the computational effort in the use of the design variable method. These include: (i) a preliminary analysis to determine the necessity of a joint probability analysis at the location of interest; and (ii) diagnostic tools to evaluate the adequacy of the water level table in terms of its extent and resolution.
The proposed method has been demonstrated using a coastal flooding case study affected by both rainfall and storm tides. Results for this case study show that although the dependence between extreme rainfall and storm tides appears to be weak, it has significant implications for flood risk. This suggests that it is critical to incorporate the dependence between extreme rainfall and storm tides to correctly estimate the coastal flood risk.
While the proposed method has been illustrated here using extreme rainfall and extreme storm tides, it can also be used with other hydrological variables, such as waves and tides in coastal regions, rainfall and downstream river heights in urban drainage systems, or flows from two tributaries in large-scale catchments. An open-source R package (‘jointPm’ http://cran.r-project.org) has been developed so that this method can be accessible to a wide audience. Important future focuses along this research line include: (i) extension of the proposed method to multivariate cases of more than two forcing variables; and (ii) incorporation of uncertainty in the selection of rainfall durations and temporal patterns.
This project was made possible by funding from the Federal Government through Geoscience Australia as part of the revision of Australian Rainfall and Runoff by Engineers Australia. This paper and the associated project are the result of a significant amount of in kind hours provided by Engineers Australia members. We are grateful to the WA Water Corporation in providing data for the case study.