Three different methods are compared to estimate the instantaneous peak flow (IPF) from the corresponding maximum daily flow (MDF), as the daily data are more often available at gauges of interest and often with longer recording periods. In the first approach, simple linear regression is applied to calculate IPF from MDF values using probability weighted moments and quantile values. In the second method, the use of stepwise multiple linear regression analysis allows to identify the most important catchment descriptors of the study basin. The resulting equation can be applied to transfer MDF into IPF. With the third method, the temporal scaling properties of annual maximum flow series are investigated based on the hypothesis of piece wise simple scaling combined with the generalized extreme value distribution. The scaling formulas developed from three 15 min stations in the Aller-Leine river basin of Germany are transferred to all daily stations to estimate the IPF. The method based on stepwise multiple linear regression gives the best results compared with the other two methods. The simple regression method is the easiest to apply given sufficient peak flow data, while the scaling method is the most efficient method with regard to data use.
INTRODUCTION
The severity of floods is often poorly captured using conventional daily flow gauge networks which are not sufficient to fully describe the temporal characteristics of flood events and may increase the risk of failure due to an underestimation of design floods (Pilon 2004). Poor knowledge of this phenomenon comes from the fact that peak discharges are unknown in the catchments of concern, or the length of the records is too short for a flood frequency analysis. However, the design of hydraulic structures often depends on the instantaneous peak flows (IPFs) as they may vary considerably from mean daily flow values, especially in the case of small basins (Fill & Steiner 2003). Thus, investigations into the link between the IPF and maximum daily flow (MDF) can extend our knowledge for flood risk management.
The first known study about estimating the relationship between annual IPF and its corresponding MDF is Fuller's study (Fuller 1914). He analyzed flow data from 24 river basins with drainage areas varying from 3.06 to 151,592 km2 in the Eastern United States and developed an equation to calculate IPF from MDF as a function of the basin area. Following Fuller's method, there have been several studies investigating the relationship between the ratio of IDF and MDF and basin area. For instance, Canuti & Moisello (1982) studied a group of basins in Tuscany to determine the probability distribution of the IDF on the basis of the MDF using the ratio of IDF and MDF as a function of its geomorphic catchment characteristics, such as mean altitude, basin magnitude, basin relief and channel slope, where the basin area does not seem to be the most important predictor.
Taguas et al. (2008) developed a three-step procedure to find possible linear relationships of IPF–MDF in semi-arid areas of Spain. The first step is a preliminary analysis and involves categorizing basins into different groups according to the coefficient of determination (R2) of the regressions between IPF and MDF. The second step is the identification of the most important hydrological and topographic characteristics of the watersheds that will be included in IPF–MDF regional equations by using principal components analysis. The third step is the development of regional equations to calculate the peak flow based on the influence of different attributes of the studied areas. The results showed a significant improvement in comparison to the traditional method of Fuller. The influence of the catchment properties on the hydrological response is an accepted concept in hydrology, upon which many models are built (Beven et al. 1988). Muñoz et al. (2012) developed a reliable peak flow estimation method for the design of hydraulic structures in Chile to relate IPF with rainfall and several other geomorphological descriptors. They assumed that the peak discharge response of a catchment is dependent on the regional runoff mechanism and climate and is also a reflection of different watershed characteristics, such as river slope, land use, land cover and rainfall intensity.
To effectively use the available flow data in IPF–MDF studies, there is a need to develop other potential techniques to relate IPF and MDF. One approach to tackle this problem is by using only the sequence of mean daily flow data to examine the link between IPF and MDF. It was first reported by Langbein (1944), where he illustrated that the ratio of IPF to MDF is a function of the ratio of the daily discharge of the preceding day to the discharge of the maximum day and the ratio of the daily discharge of the succeeding day to that of the maximum day. Since this method does not permit generalized conclusions, it was further improved by Sangal (1983) and Fill & Steiner (2003).
During the last decades, self-computing approaches have evolved with several applications in hydrology. Dastorani et al. (2013) employed artificial neural networks and adaptive neuro-fuzzy inference systems (ANFIS) to derive IPF from MDF. The authors compared their results with the methods of Fuller (1914), Sangal (1983) and Fill & Steiner (2003). ANFIS showed the highest accuracy of all methods. A general limitation of machine-based learning techniques is the need for training data. Thus, they need at least a short time series of IPF.
Another potential approach is based on the concept of scaling which has gained increasing awareness recently in the field of geophysics and hydrology. Gupta & Waymire (1990) introduced the ‘multiplicative cascade’ modeling framework for studying spatial variability of probabilistic structures of the rainfall process by introducing the concepts of simple and multiple scaling. Smith (1992) developed a lognormal cascade model to represent the basin scale in flood peak distributions and used a lognormal multi-scaling model to fit observed flood series in the Appalachian region of the United States. Gupta et al. (1994) applied a multi-scaling theory at the regional scale based on log-levy stable distributions to study the invariance of the probability distributions of peak flows. De Michele & Rosso (1995) proposed an assessment of regionalization procedures to investigate whether simple scaling can be used to discriminate among different flood probabilities in relatively small basins with high variability of climate and geological features in Italy. Yu et al. (2004) investigated the regional intensity–duration–frequency formulas based on the scaling theory. It involves the hypothesis of piecewise simple scaling combined with the Gumbel distribution and the analysis of the temporal scaling properties of annual maximum precipitation series for various durations and return periods in northern Taiwan. These studies reveal that scaling theory is capable of finding a synthesis of the regularities in different temporal scales to characterize extreme storm probabilities. Thus, it is adopted here to investigate the scaling properties of extreme flow for different time scales and to estimate IPF from MDF data.
Recently, several disaggregation methods have been developed and performed in different areas to construct streamflow series at a finer temporal scale (Stedinger & Vogel 1984; Tarboton et al. 1998; Kumar et al. 2000; Xu et al. 2003; Acharya & Ryu 2014). It is suggested that many of these disaggregation algorithms are capable of producing streamflow realizations. However, the uncertainty existed in estimating parameters, high dimensionality of the disaggregation problem and the intensive computational resources mightily limit the application of disaggregation methods.
As such, in this study, the link between annual IPF and MDF is investigated in a more comprehensive and easier manner, given the limited findings in this field. The idea of the first approach is to apply simple regression between observed peak and daily data regarding their probability weighted moment (PWM) and quantile values. It can give us an impression of the differences and significance of correlation between the IPF and MDF. For the second method, a multiple linear regression analysis is performed and the selection of the most important catchment descriptors will extend the work of further constructing a functional linear relationship between IPF and MDF. Besides, both of the two methods can be easy to carry out in a meso-scale catchment according to adequate records of peak flow data. The last scaling method proposes to build the scaling equations in three 15 min flow stations while no such work has been done in the literature on relating the IPF with MDF. This scale-invariant model, scales flow data from one temporal resolution to another to give us a new insight into overcoming the lack of the IPF data and regionalization.
The paper is organized as follows: following this introduction, the three selected methodologies for estimating the IPF from the corresponding MDF are explained. In the following section the study area and the flow data are described. A discussion of results is then carried out, followed by conclusions.
METHODS
Simple regression using quantiles and probability weighted moments
The simple regression model consists of three steps:
select the proper frequency distribution function for fitting of annual extreme flow series by the Chi-square test;
estimate the flood quantiles (T = 10, 20, 50, 100 years) for annual maximum daily and annual peak flow series, respectively;
apply linear regression without intercept to obtain a regional regression model based on the quantile and PWM values derived from (2).
In the first step, eight commonly employed probability distributions for extreme values are considered as the candidate distribution. These are two-parameter Weibull (Wei), gamma distribution (Gam), three-parameter generalized extreme value (GEV), Pearson type III, generalized pareto distribution, kappa distribution, five-parameter wakeby and generalized logistic distribution. For the Chi-square goodness of fit procedure, the acceptability of the distribution functions are selected on the basis of a p-value with a confidence level of 95% (α = 0.05). The null hypothesis will therefore be rejected if the p-value is smaller than the 5% significance level. Following Hosking & Wallis (1997), L-moments are utilized for the parameter estimation in the second step.
Multiple linear regression using physical catchment descriptors
As mentioned earlier, this method takes advantage of previous research results where the peak flow can be described as a function of catchment properties. It consists of the following three steps:
obtain a perspective about the relationship between extreme flow rate and the selected explanatory variables as well as the interrelation among the latter by a descriptive statistical analysis;
perform a stepwise regression analysis based on the selected catchment characteristics derived in (1);
eliminate redundant explanatory variables within the selected regression models through a partial correlation analysis and compare the final multiple regression model with the traditional Fuller equation.
The stepwise regression procedure involves removing or adding variables to the regression equation step by step and eliminating the variable which contributes least to the prediction of a group membership. It is guided by the AIC (Akaike information criterion) score which allows for an immediate ranking of the candidate models. The model with the minimum AIC score has the smallest divergence (see Bozdogan 1987). Furthermore, the adjusted coefficient of determination (Adj. R2) and residual sum of squares (RSS) are considered. RSS is the sum of the squared differences between the actual and estimated IPF. According to the number of predictors, the regression models are grouped and the model with the best performance in its corresponding group will be selected for further analysis.
Piecewise simple scaling of moments
Since scaling theory has been investigated far less in runoff study in comparison with extreme rainfall, we start introducing the theory with the basic equations applied in rainfall studies. The symbol for rainfall (P) is replaced with that for runoff (Q).
When the scaling exponents and their corresponding order of moment have a linear relationship, it is usually referred to simply as ‘scaling in the strict sense’ (see Gupta & Waymire 1990).
For the analysis in this method, five steps are carried out as follows:
select the peak flow series using the peak over threshold (POT) method (see Katz et al. 2002);
assess the scaling properties in the maximum flows of various durations by scaling analysis of PWM;
validate the scaling hypothesis in runoff;
determine the scaling formulas based on the scaling property of fractal;
estimate the quantile values of IPFs according to the PWMs computed in step 4 and the GEV distribution. The application of step 5 for unobserved gauges without IPF data involves the assumption of spatial consistence of the relationship within the transfer region.
Performance assessment of the IPF–MDF models
Leave-one-out cross validation (LOOCV) is the degenerate case of k-fold cross validation by using a single observation from the original sample as the validation set and using remaining samples as the training data. This has the benefit that each observation in the sample will be utilized once as validation data and avoids splitting the sample with their limited size into independent calibration and validation data sets. LOOCV has also been proved to be an almost unbiased estimator of prediction risk by Cawley & Talbot (2003). Here, it is used to assess the performances of all three models. Root mean square error (RMSE) and bias are the criteria applied for the evaluation of model performance.
STUDY AREA AND DATA
The Aller-Leine river basin is located in the federal state of Lower Saxony in northern Germany. The total basin area is 15,803 km2 and the basin is physiographically diverse. Mean elevations in this catchment range from 5 m.a.s.l. (meters above sea level) to 1,140 m.a.s.l. (see Figure 1). In the northern part of the basin, the highest elevation is 169 m.a.s.l., compared to the southern Harz middle mountains where elevations reach up to 1,140 m.a.s.l. In the Harz mountains most aquifers are fractured with some areas of karst. The flatland around the Lüneburger Heide in the North is characterized by sandy soils, porous quaternary aquifers and heath and moor vegetation. Agriculture accounts for 58.2% of the total area and 32.5% is forest. The climate is characterized by high annual precipitation with mean annual precipitation ranging from 500 to approximately 1,600 mm. Frost is present in the winter season.
According to the locations of discharge gauges, there are 45 delineated subbasins. For each basin, 16 catchment descriptors are derived (see Table 1). Their geomorphologic characteristics, such as drainage area, minimum and maximum elevation, longest flow path, river slope and basin slope, are extracted from a digital elevation model with a resolution of 10 m. The shape length is taken as the perimeter of each catchment, and the derivation of time of concentration (Tc) is based on Kirpich's formula. The soil properties of effective field capacity (Fc) and saturated hydraulic conductivity (Kf) are estimated from the German digital soil database BÜK1000 (see Hartwich et al. 1995). The portion of different land use types is derived from the land cover map CORINE2000 (see EUR 1994). Mean annual areal precipitation is computed using point observations from 244 daily precipitation stations, which are interpolated on a 1 × 1 km raster through ordinary kriging and aggregated over space. Finally, mean daily temperature is interpolated for all available climate stations by external drift kringing with elevation as additional information.
Variable description . | Symbol . | Units . |
---|---|---|
100-year of peak flow | HQ1 | m3/s |
100-year of daily flow | HQ2 | m3/s |
Shape length of subbasins | shape_len | m |
Subbasin area | Area | km2 |
Basin slope | basin_sl | ‰ |
Mean concentration time | Tc | h |
Longest flow path | lst_fp | m |
River slope | river_sl | |
Max elevation | Elv_up | M |
Min elevation | Elv_ds | M |
Mean elevation | mean_Elv | M |
Mean conductivity | Kf | mm/h |
Mean field capacity | FC | Vol-% |
Ratio of city | City | % |
Ratio of agriculture | agriculture | % |
Ratio of forest | Forest | % |
Annual precipitation | PCP | mm/year |
Mean daily temperature | Tem | °C |
Variable description . | Symbol . | Units . |
---|---|---|
100-year of peak flow | HQ1 | m3/s |
100-year of daily flow | HQ2 | m3/s |
Shape length of subbasins | shape_len | m |
Subbasin area | Area | km2 |
Basin slope | basin_sl | ‰ |
Mean concentration time | Tc | h |
Longest flow path | lst_fp | m |
River slope | river_sl | |
Max elevation | Elv_up | M |
Min elevation | Elv_ds | M |
Mean elevation | mean_Elv | M |
Mean conductivity | Kf | mm/h |
Mean field capacity | FC | Vol-% |
Ratio of city | City | % |
Ratio of agriculture | agriculture | % |
Ratio of forest | Forest | % |
Annual precipitation | PCP | mm/year |
Mean daily temperature | Tem | °C |
Figure 2 displays box-plots of the hydrological attributes for all 45 subbasins. It shows that minimum elevations vary between 33 and 580 m, the size of the catchments from 40 to 1,000 km2, and the longest flow path from 10 to 70 km.
The sampling of annual flood flow series is done in a way that IPF and MDF always belong to the same event. The primary daily and peak discharge data for analysis herein are employed from a total number of 45 flow stations which are illustrated in Figure 1. However, only three 15 min continuous flow stations can be used for the scaling analysis, since the record length of all other stations is inadequate. The record lengths of the three 15 min flow stations used are from 2000 to 2008, from 2002 to 2008 and from 2003 to 2008, respectively. It can be seen from Figure 1 that all the three selected 15 min continuous flow stations are located within the higher elevated area of the catchment.
RESULTS
In this study, three different models have been presented to explore the relationship between IPF and MDF regarding their frequency analysis. The performance of these models is measured by leave-one-out cross validation based on RMSE and bias criteria for a total number of 45 flow stations in northern Germany. It is important to note that for the first two models long-term observed annual IPF and MDF data series are used, whereas for the third model the observed MDF and the short-term 15 min continuous flow data for three flow stations are used.
Simple regression approach
In the first method, a simple regression model is obtained by direct comparison between the observed annual IPF and MDF series regarding their quantile and PWM values. Because of space limitation, only the best fitting four probability distributions, according to Chi-square test performed on each of the 45 discharge gauges, are shown here (see Figure 3). The red solid line in the figure indicates a 5% significance level (the full colour version of Figure 3 is available online at http://www.iwaponline.com/nh/toc.htm). As can be seen, the best fitted probability model is the GEV distribution with hardly any rejection of the null hypothesis for all flow gauges that the GEV fits the flow data (p-value >0.05) for both MDF and IPF data series.
Figure 4 shows the linear relationship between the peak flow and daily flow for four quantile values (T = 10, 20, 50, 100 year) (Figure 4(a)) and the first four PWMs for the whole set of 45 flow stations in the Aller-Leine catchment (Figure 4(b)).0
It can be seen that peak and MDF are highly correlated with each other in the sense of their quantiles and PWMs, with correlation coefficients larger than 0.95 for the four return periods and the first four orders of PWMs. Hence, it is reasonable to estimate the design IPF though correcting the underestimation of its corresponding MDF. In addition, the strength of PWMs and the quantile values regression are very similar. As such, the performance of the simple regression models regarding the PWM and quantiles are expected to be similar, as shown in Figure 5.
Figures 5(a) and 5(b) display the RMSE and bias from the simple regression model using cross validation. The IPFs based on the observed annual peak flow series are considered to be the reference for assessment of the performance of the following three models. The first blue column is the observed error representing the direct comparison between the peak flows and MDFs from observed IPF and MDF series while the red and yellow columns denote the estimation error from the simulated peak flows using the observed annual MDF series (the full colour version of Figure 5 is available online at http://www.iwaponline.com/nh/toc.htm). On average, the RMSE is approximately 20% for the simple regression model for these four different recurrence intervals in comparison with an observed error of over 32%. The results in Figure 7(b) show the biases from the simple regression model are approximately –10% and the observed errors around –27%. This means that the simple regression model considerably reduces the error in estimating the design peak flow compared to using the observed MDF data, although still with some negative bias.
Multiple regression analysis
Figure 6 shows a scatter plot matrix with a combined correlation for HQ1 and all explanatory variables. The symbols of all explanatory variables are indicated in the diagonal of the matrix and their definitions can be referred to in Table 1. In the cells of the upper part of the matrix (above the diagonal), the correlation coefficients and p values are presented. For the remainder of the matrix, scatter plots are shown between variable pairs. The scales of each variable are indicated on the margins of the matrix.
As an arbitrary decision rule to determine the significance of the predictors related with HQ1, the p value limit of 0.05 is considered to be important, although the absolute values of the simple correlation coefficients for some variables may be less than 0.5. For instance, the Elv_up is selected as one of the importand predictors even though the simple correlation coefficient between Elv_up and HQ1 is 0.36 while the p value is less than 0.05. It can be seen from the top two rows that Area has the strongest positive correlation with HQ1 and HQ2, followed by lst_fp and shape_len. In addition to these three aforementioned variables, Elv_ds is selected for the next step, as it has the strongest negative correlation with HQ2 (p value = 0.057, r = –0.29). Below the second row, it is shown that many of the explanatory variables are highly interrelated with one another (e.g. Area ∼ lst_fp, Area ∼ shape_len). This fact must be taken into account for further analysis and possible exclusion of variables from the final model.
A stepwise multiple regression analysis is carried out in the next step. According to the number of predictors, six regression models are selected from their specific combination groups (see Table 2). The adjusted coefficients of determination of the last five models are almost equal while the first model with HQ2 as the only predictor possesses the least correlation with the target variable (Adj. R2 = 0.94). For the RSS, the first two models produce almost twice as large errors as the other models. Given the simplicity and overall performance of the regression model, the fourth model with shape_len, lst_fp, Elv_ds as predictors, seems to be the most suitable model with the least AIC value (189.47) while the third model is the next most suitable.
Number . | Variables . | Adj.R2 (–) . | RSS (m3/s)2 . | AIC (–) . |
---|---|---|---|---|
1 | HQ2 | 0.9423 | 4,607 | 208.65 |
2 | HQ2, Elv_ds | 0.9553 | 4,575 | 191.21 |
3 | HQ2, lst_fp, Elv_ds | 0.9602 | 2,650 | 190 |
4 | HQ2, shape_len, lst_fp, Elv_ds | 0.9630 | 2,600 | 189.47 |
5 | HQ2, shape_len, lst_fp, Elv_ds, Area | 0.9636 | 2,551 | 190.64 |
6 | HQ2, shape_len, lst_fp, Elv_ds, Area, ELV_up | 0.9648 | 2,539 | 192.44 |
Number . | Variables . | Adj.R2 (–) . | RSS (m3/s)2 . | AIC (–) . |
---|---|---|---|---|
1 | HQ2 | 0.9423 | 4,607 | 208.65 |
2 | HQ2, Elv_ds | 0.9553 | 4,575 | 191.21 |
3 | HQ2, lst_fp, Elv_ds | 0.9602 | 2,650 | 190 |
4 | HQ2, shape_len, lst_fp, Elv_ds | 0.9630 | 2,600 | 189.47 |
5 | HQ2, shape_len, lst_fp, Elv_ds, Area | 0.9636 | 2,551 | 190.64 |
6 | HQ2, shape_len, lst_fp, Elv_ds, Area, ELV_up | 0.9648 | 2,539 | 192.44 |
Variables . | Maximum daily flow . | Shape length . | Longest flow path . | Minimum elevation . |
---|---|---|---|---|
(HQ2) . | (shape_len) . | (lst_fp) . | (Elv_ds) . | |
HQ2 | 1 | –0.045 | 0.005 | 0.012 |
shape_len | 1 | 0.921 | –0.188 | |
lst_fp | 1 | –0.044 | ||
Elv_ds | 1 |
Variables . | Maximum daily flow . | Shape length . | Longest flow path . | Minimum elevation . |
---|---|---|---|---|
(HQ2) . | (shape_len) . | (lst_fp) . | (Elv_ds) . | |
HQ2 | 1 | –0.045 | 0.005 | 0.012 |
shape_len | 1 | 0.921 | –0.188 | |
lst_fp | 1 | –0.044 | ||
Elv_ds | 1 |
A comparison of the performance between the Fuller equation and the developed multiple regression model, through the RMSE and bias, is presented in Figure 7.
It can be seen that both methods are able to reduce the error compared to using the observed annual MDF data series to predict IPF directly. However the multiple regression model outperforms the traditional Fuller equation. For the Fuller equation, the RMSE values increase with increasing return period, ranging from 20% RMSE for the 10 year recurrence interval to 28% RMSE for the 100 year recurrence interval. In addition, the bias rises from –10 to –16%, which implies that for a higher return period the underestimation is larger. In contrast, for the multiple regression model proposed here, the errors are independent of the recurrence instead. Some physical explanations for the selected regression are as follows:
The differences between peak flow and daily flow are due to the catchment retention. Therefore, basins with longer flow paths have greater potential to show differences between MDF and IPF.
The significance of other commonly used predictors will become weaker for larger basins. The differences between their mean climate, soil or basin slope characteristics is not obvious, whereas the distinctive minimum elevation in each basin can be more representative.
According to the initial investigation in our study area, the available MDF data are the most important resource to relate the IPFs.
Scaling analysis
In the first step of the scaling analysis the annual peak flow series are extracted from three short-term 15 min flow stations. The observed 15 min flow data for each gauge are aggregated into 27 different time scales (see Table 4). For each time scale, the POT is adopted to extract approximately 30 extreme values. Briefly, the peaks will be extracted at an average rate of four events each year from those aggregated flow time series data sets. According to the characteristics of the flood events in our study basin, a minimum separation period of 45 days is imposed to ensure independence of selected peaks within the 1 year. The PWMs of the runoff extremes including all 27 time scales are consequently derived.
Time scales . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . | 11 . | 12 . | 13 . | 14 . | 15 . | 16 . | 17 . | 18 . | 19 . | 20 . | 21 . | 22 . | 23 . | 24 . | 25 . | 26 . | 27 . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Flow duration (h) | 0.25 | 0.5 | 0.75 | 1 | 1.25 | 1.5 | 1.75 | 2 | 2.5 | 3 | 3.5 | 4 | 4.5 | 5 | 5.5 | 6 | 7 | 8 | 9 | 10 | 12 | 14 | 16 | 18 | 20 | 22 | 24 |
Time scales . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . | 7 . | 8 . | 9 . | 10 . | 11 . | 12 . | 13 . | 14 . | 15 . | 16 . | 17 . | 18 . | 19 . | 20 . | 21 . | 22 . | 23 . | 24 . | 25 . | 26 . | 27 . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Flow duration (h) | 0.25 | 0.5 | 0.75 | 1 | 1.25 | 1.5 | 1.75 | 2 | 2.5 | 3 | 3.5 | 4 | 4.5 | 5 | 5.5 | 6 | 7 | 8 | 9 | 10 | 12 | 14 | 16 | 18 | 20 | 22 | 24 |
Next, the property of simple scaling of runoff of various durations in the three selected sample stations is demonstrated according to Equation (6). Here, the parameter h is defined as the runoff duration of 0.25 h; the scale parameter λ is a multiplier to convert runoff duration h to λh and Qλh denotes the runoff intensity of λh hours. Figure 8 displays the relationships between the log-transformed values of PWM of various orders with various durations for the three 15 min flow stations. As can be seen, the whole period (15 min–24 h) can be visually divided into three pieces (15 min–2 h, 2–10 h and 10–24 h) where the slope of the linear regression line for each piece gives the values of the scaling exponent αk for various orders of moments k. For each piece the linear regression lines are almost parallel to one another. This characteristic implies that the scaling exponents (the slopes of linear regression lines) of each piece will be very similar for various orders of PWM, which is demonstrated in Figure 9.
Figure 9 shows the relationship between the scaling exponents and various orders of PWM for the three divided pieces at the three 15 min flow stations. It is clear that a linear relationship exists between scaling exponents and various orders of PWM. The scaling exponents increase slightly with the order of PWM in the second and third piece (2–10 h and 10–24 h), but remain stable in the first piece (15 min–2 h). This indicates that the property of simple scaling of runoff rate exists in the three analyzed stations. Furthermore, the scaling properties at stations 1 and 2 are more similar than when compared with station 3. The reason for this could be that the first two flow stations are located within close proximity (see Figure 1).
According to the obtained scaling exponents (, of various orders of PWM and pieces for the three 15 min flow stations, the three scaling equations are applied separately to all 45 daily flow stations in our study catchment to compute PWMs of annual IPFs. The GEV distribution is then utilized for estimating the design values of peak flow for four different return periods (T = 10, 20, 50, 100 years).
The cross validation results of simple scaling method are illustrated in Figure 10, where a comparison of RMSE and bias among the three 15 min resolution flow stations is presented. Here, the reference error (Observed) is the same as for the former two regression models and the four return periods. It shows the difference between the average daily flow and IPF at the same event.
It can be seen from Figure 10(a) that the three simple scaling models also produce good results considering the flood frequencies. Generally, the difference of RMSE between station 1 and 2 is small, with an average RMSE of approximately 20% for the four return periods. It reduces the observed error by 15% and no obvious dependence of RMSE exists with increasing return period. The most significant difference between station 3 and the other two stations is that station 3 exhibits greater RMSE (by 6%) at the T = 10 year recurrence interval, but for the more important larger return periods, it performs as well as the other two 15 min flow stations. Compared with the RMSE, the differences in bias (see Figure 10(b)) between the three stations are more apparent, ranging from 0 to 4%, –10 to –6% and –8 to 10%, respectively, for each station. For the longer return periods (T = 50, 100 years), the bias results indicate an underestimation of the design peak flows for all three scaling models. In summary, the scaling model generated from station 1 displays the best overall performance according to the above performance analysis and is recommended for application here. Owing to the sparse station network and limited record length of the 15 min runoff gauges, the relationship between the selected station locations and scaling attributes could not be investigated in more detail.
Comparative results for the three methods
We compare the three methods in terms of their performance in predicting the IPF from MDF. The final comparative results for the four return periods are illustrated in Table 5. The observed error in the second column denotes the direct comparison between the observed MDF and its corresponding peak flow. The RMSE of it (around 30% for each return period) suggests for some catchments the average MDF will be much lower than the corresponding peak flows without post correction. It is clear that the multiple regression analysis performs best with the average value of RMSE 15% and bias 0.29%. The simple regression approach performs poorest since the magnitude of the estimation error is larger (around 20% for each return period) and it underestimates the peak flows relative significance. Overall the scaling analysis based on the 3-hourly flow stations corresponds well and the first and third stations perform better than the second one regarding their bias results although the second station generates a slightly smaller RMSE (18.5%).
. | . | Simple regression approach . | Multiple regression analysis . | Scaling analysis . | ||||
---|---|---|---|---|---|---|---|---|
. | Observed . | Quantile regression . | PWM regression . | Fuller . | Q ∼ QLE . | Station 1 . | Station 2 . | Station 3 . |
RMSE (%) | ||||||||
T = 10[year] | 32.18 | 19.71 | 19.69 | 19.67 | 15.84 | 21.21 | 17.59 | 26.14 |
T = 20[year] | 32.47 | 19.53 | 19.72 | 21.27 | 15.28 | 19.57 | 18.17 | 20.26 |
T = 50[year] | 33.10 | 19.88 | 20.15 | 24.26 | 14.79 | 19.2 | 18.78 | 17.86 |
T = 100[year] | 33.82 | 20.98 | 20.94 | 26.92 | 15.09 | 20.19 | 19.77 | 20.25 |
Bias (%) | ||||||||
T = 10[year] | –25.41 | –10.52 | –10.65 | –9.85 | 0.42 | 4.05 | –8.42 | 9.46 |
T = 20[year] | –25.45 | –10.21 | –10.57 | –12.06 | 0.37 | 1.89 | –9.14 | 2.87 |
T = 50[year] | –25.52 | –10.17 | –10.55 | –15.24 | 0.24 | 0.1 | –8.13 | –4.67 |
T = 100[year] | –25.56 | –10.51 | –10.45 | –17.57 | 0.11 | –0.71 | –6.25 | –9.7 |
. | . | Simple regression approach . | Multiple regression analysis . | Scaling analysis . | ||||
---|---|---|---|---|---|---|---|---|
. | Observed . | Quantile regression . | PWM regression . | Fuller . | Q ∼ QLE . | Station 1 . | Station 2 . | Station 3 . |
RMSE (%) | ||||||||
T = 10[year] | 32.18 | 19.71 | 19.69 | 19.67 | 15.84 | 21.21 | 17.59 | 26.14 |
T = 20[year] | 32.47 | 19.53 | 19.72 | 21.27 | 15.28 | 19.57 | 18.17 | 20.26 |
T = 50[year] | 33.10 | 19.88 | 20.15 | 24.26 | 14.79 | 19.2 | 18.78 | 17.86 |
T = 100[year] | 33.82 | 20.98 | 20.94 | 26.92 | 15.09 | 20.19 | 19.77 | 20.25 |
Bias (%) | ||||||||
T = 10[year] | –25.41 | –10.52 | –10.65 | –9.85 | 0.42 | 4.05 | –8.42 | 9.46 |
T = 20[year] | –25.45 | –10.21 | –10.57 | –12.06 | 0.37 | 1.89 | –9.14 | 2.87 |
T = 50[year] | –25.52 | –10.17 | –10.55 | –15.24 | 0.24 | 0.1 | –8.13 | –4.67 |
T = 100[year] | –25.56 | –10.51 | –10.45 | –17.57 | 0.11 | –0.71 | –6.25 | –9.7 |
DISCUSSION
In this paper, three different methods are proposed to provide a relatively simple way for this estimation of IPFs from MDFs in northern Germany.
The first method, a simple regression model, provides a coefficient to correct the underestimation of the design daily flow. The expression is based on the linear relationship between IPF and MDF regarding their quantile and PWM values. This method allows a good comparison of the difference between IPF and MDF and is computationally very favorable. The final RMSE and bias results prove it to be a useful approach for estimating the peak flows for flood studies.
In the second method, a stepwise multiple regression model, special attention is given to the extraction of the proper predictors. The MDFs, the longest flow path and the minimum elevation in this case have been selected as predictors in the final regression equation using stepwise regression and considering partial correlations. According to the multiple regression analysis, the longest flow path is highly correlated with peak flow and also highly interrelated with basin area which is most significantly related with flow. This explains that the longest flow path is found to be one of the final explanatory variables. However, unlike in previous studies the minimum elevation showed a higher performance than the catchment area (e.g. Taguas et al. (2008) and Fuller (1914)). A physical explanation is given by the structure of the catchment. Using the area alone does not sufficiently differentiate between headwaters in the upper and the lower parts of the catchment. This study confirms that the scaling of the response is much related to hill slope and channel properties, where the elevation of the outlet is useful information. A side effect of the result is the fact that the lowest point of the catchment can be easily obtained because it is approximately the same value as the geodetic elevation of the gauging station, which is often published with the maximum flow data.
In comparison with the classical Fuller's equation, the proposed multiple regression model noticeably improves the accuracy of the estimation results. Despite a small overestimation, the multiple regression model performs best among the three models and for longer return periods its comparative performance becomes even more remarkable.
The last method, a piecewise simple scaling model, provides promising insights into the temporal issues between peak flow and its corresponding MDF. The hypothesis of piecewise simple scaling combined with the GEV distribution is used to explore the link between PWMs of IPF and MDF, given short-term 15 min continuous flow data for three discharge gauges. The formulas obtained from the three 15 min flow stations are then applied to the 45 daily gauges individually. The validation results reveal that the three piecewise simple scaling models are capable of deriving peak flow when only MDF is available. Compared with the regression models, the scaling model is more efficient because the parameters of the scaling model can be determined exclusively by one station with sufficient continuous high resolution flow data.
CONCLUSIONS
It is difficult to perfectly represent IPFs with MDFs regarding flood frequency analysis. For all three proposed methods, the MDFs are considered to be the main decision variable. As discussed, all of them can provide a significantly better result compared with approximating IPFs by MDFs directly and they can be easily applied. The first two methods are highly dependent on the peak data availability for a sufficiently large set of stations, which may restrain its use in areas with poor peak flow data. The third method can be favorably applied if single catchments are considered and some high resolution flows from a nearby station are available. However, criteria for selecting a proper high resolution donor station are not clear and further investigation is required to establish regional scaling formulas. Although this case study is carried out for the Aller-Leine catchment in northern Germany, the knowledge and the methods can be applied to other areas as well. Our future work on the derivation of IPFs will be related to using hydrological models together with rescaling approaches in order to be able to consider land use or climate changes for the estimation of design flows.
ACKNOWLEDGEMENTS
The authors thank their colleagues and student assistants for their valuable comments and suggestions. Special thanks go to Ana Callau Poduje and Markus Wallner. We are also grateful for the right to use data from the German National Weather Service (DWD), NLWKN Niedersachsen, Landesamt für Geoinformation und Landentwicklung Niedersachsen, Landesamt für Bergbau and the funding from the China Scholarship Council (CSC). The constructive comments made by two anonymous reviewers helped to improve the paper.