Abstract
The peak-discharge and drainage area power-law relation has been widely used in regional flood frequency analysis for more than a century. The coefficients and can be obtained by nonlinear or log-log linear regression. To illustrate the deficiencies of applying log-transformation in peak-discharge power-law analyses, we studied 52 peak-discharge events observed in the Iowa River Basin in the United States from 2002 to 2013. The results show that: (1) the estimated scaling exponents by the two methods are remarkably different; (2) for more than 80% of the cases, the power-law relationships obtained by log-log linear regression produce larger prediction errors of peak discharge in the arithmetic scale than that predicted by nonlinear regression; and (3) logarithmic transformation often fails to stabilize residuals in the arithmetic domain, it assigns higher weight to data points representing smaller peak discharges and drainage areas, and it alters the visual appearance of the scatter in the data. The notable discrepancies in the scaling parameters estimated by the two methods and the undesirable consequences of logarithmic transformation raise caution. When conducting peak-discharge scaling analysis, especially for prediction purposes, applying nonlinear regression on the arithmetic scale to estimate the scaling parameters is a better alternative.
INTRODUCTION
Power-law relations between peak flow and drainage area have been widely observed for centuries and have been applied in analyzing both regional flood frequencies and individual rainfall-runoff events (e.g., O'Connell 1868; Fuller 1914; Allen 1960; Gupta & Dawdy 1995; Ogden & Dawdy 2003; Griffis & Stedinger 2007; Dawdy et al. 2012; Eash et al. 2013; Ayalew et al. 2015; Furey et al. 2016). Studies that pertain to understanding the physical basis of this relation have advanced our understanding of flood generation mechanisms (Gupta et al. 2007, 2010). Such studies involve estimating and interpreting the scaling slopes and intercepts through data analyses (e.g., Ogden & Dawdy 2003; Gupta et al. 2010), numerical modeling (e.g., Furey & Gupta 2007; Mandapaka et al. 2009; Ayalew et al. 2015), and theoretical considerations (Furey et al. 2016). The scaling slopes and intercepts are often estimated in these studies by fitting a straight line to the logarithmic transformations of drainage area and river discharge data using ordinary or generalized least squares techniques. Although the discharge and drainage area power-law relation has been widely applied to estimate flood flows for flood hazard mapping and has attracted growing attention from the research community, the appropriateness of using linear regression of logarithms to estimate power function parameters tends to be overlooked.
Form of the peak-discharge scaling relation
Parameters and are termed as scaling intercept and exponent of the peak-discharge power-law relations.
Fitting the power function
The conventional method uses the logarithmic transformation to obtain the parameters and , and uses the following procedure: (i) transform the original data for Q and A to logarithms (e.g., of base 10); (ii) fit a straight line to the logarithms using ordinary or general least squares techniques; (iii) display the straight line and data points in a scatter plot with the logarithmic scale and report the coefficient of determination (R2) as the evaluation of the reliability of the estimated parameters α and θ, and thus the accuracy of the prediction equation (Equation (2)). Estimates of Equation (2) can be used to explore the variability of the scaling parameters as results of the interactions between watershed characteristics and climatic forcing. Similarly, one can use Equation (2) to predict peak discharge at ungauged locations.
Accordingly, the scaling intercept is assigned as 10 to the power of the intercept in the regression equation (Equation (3)) and the scaling exponent is equaled to the slope of Equation (3). In most cases, the coefficient of determination R2 obtained from the fit of Equation (3) takes a value less than unity, indicating that there are discrepancies between paired logarithms of observed and predicted peak-discharge values. Equation (3), when back-transformed to Equation (2), implicitly assumes a multiplicative error. For easier presentation, we call the conventional method log-log linear regression hereafter. The true values of the coefficients of Equation (2) are unknown for a number of reasons. First, only rarely can the existence of the power law be strictly proven (Newman 2005; Broido & Clauset 2019). Second, data of Q and A used to estimate Equation (2) are corrupted with observational errors and their sample size is limited. Therefore, the coefficients can only be estimated.
The conventional method minimizes the sum of squares of the logarithms, but in many predictive applications of Equation (2) our main interest is in estimating Q and not its logarithm. As an alternative, a nonlinear least squares regression approach estimates the scaling slope and intercept directly, i.e., without a logarithmic transformation, through Equation (2). This is a numerical fitting approach that uses the minimum sum of squares of modeling errors (differences between data and model output) as the optimization criterion to obtain the values for parameters. The nonlinear least squares method therefore assumes additive errors for the power-law model. Nonlinear fitting used to take greater computational time, but this is no longer a limitation with the advances in computers and software. Initial values can be assigned as the estimates obtained from the conventional linear regression on logarithms.
Aim of this work
When examining or applying the peak-discharge power-law relation in the arithmetic scale, the existing reports (e.g., Eash 2001; Eash et al. 2013) and research papers present the peak-discharge-area relationship in the log-log coordinates (e.g., Ogden & Dawdy 2003; Mandapaka et al. 2009; Gupta et al. 2010; Ayalew et al. 2015; Furey et al. 2016). But in one of the earliest studies on peak-discharge power-law relation, O'Connell (1868) plotted peak discharge against drainage areas in the original units. In O'Connell's study, logarithmic transformation was not needed because he fixed the scaling exponent to be 0.5.
Using a variable scaling exponent, however, brings the problem of fitting the nonlinear power-law function into view in later studies. Before the invention of computers and access to nonlinear fitting algorithms, linearizing the power-law function to Equation (3) was a convenient solution to this problem. However, statisticians (Miller 1984; Osbourne 2002) and researchers in other fields (Richards 1973; Smith 1984; McCuen et al. 1990; Packard et al. 2011; Xiao et al. 2011; Packard 2014) have demonstrated that logarithmic transformation does fundamentally transform the nature of the variables, making the interpretation of the results somewhat more complex (Asselman 2000). They further call for reducing transformation bias (Pattyn & Van Huele 1998; Packard 2013) in curve fitting or using logarithmic transformation with caution (Miller 1984).
The appropriateness of applying logarithmic transformations in peak-discharge power-law analyses tends to have been overlooked. This study explores some implications of the logarithmic transformation and calls for caution in future peak-discharge power-law analyses. The authors pursue this objective by first illustrating the overlooked discrepancies between the peak-discharge power-law models fitted by the least squares log-log linear and nonlinear regressions, and then analyzing the causes for the discrepancies. We use the observed event-based peak-discharge data as examples.
This article is organized as follows. We describe the study area and data used in this study immediately below. Then, the following section shows the discrepancies in the fitted relationships by the log-log linear and the nonlinear regressions, the underlying reasons for the discrepancies, and the problems of log-log transformation. The next section discusses the implications for peak-discharge power-law data analyses and this is followed by concluding remarks in the final section.
DATA AND METHOD
In this study, data for peak-discharge and drainage area of 52 individual rainfall-runoff events in the Iowa River Basin are taken from the study by Ayalew et al. (2015). The authors identified 52 events, over the period from 2002 to 2013 using both radar rainfall information and instantaneous streamflow measurements, to investigate how the duration, depth, and temporal structure of rainfall control the flood scaling intercept and exponent. Based on the information available, they reported that the entire Iowa River Basin received rainfall for these events. The Iowa River Basin drains an area of about 33,000 km2 at its confluence with the Mississippi River and its longest river flows about 600 km. About 85% of the basin has a surface slope of less than 5% and an average river bed slope of about 0.6‰. By assuming flow velocities varying from 0.5 to 1.5 m/s in the channel, the in-channel time of concentration of the Iowa River Basin ranges from about 5 to 15 days. Flooding has been frequent in Iowa in the past three decades, including the disastrous events in 1993 and 2008. Location and description of the streamflow gauges in the Iowa River Basin are provided in the Supplementary material (Figure S1 and Table S1).
The fitted power-law models with smaller LOOE are assumed to have better predictive skills.
RESULTS
Different estimates of scaling parameters by the two regression methods
Analysis of a single event
Analysis of an example event over the period of 14 June to 4 July 2010 reveals remarkable discrepancy between the estimated parameters by the two regression methods. High coefficient of determination in the log-log linear regression does not guarantee a good fit of the back-transformed model to the original untransformed data. Figure 1 illustrates the fitted peak-discharge power-law functions by the log-log linear and nonlinear regressions. For this event, the maximum stream discharge values were extracted at 34 river gauges. The gray and black colors signify the data, the fitted lines, and the models associated with the log-log linear regression and nonlinear regression, respectively. Figure 1(a) shows the fitted straight line to the logarithmic transformation of data for the upstream basin area and peak discharge, with a scaling exponent of 0.56. The power-law function that is back-transformed from the log-log linear fitting equation is presented at the lower right corner. The linear model fitted to the log-transformations, with a coefficient of determination of 0.85 and favorable pattern in its residual plot (as shown later in Figure 2(a)), strongly supports the power-law relationship between drainage area and peak-discharge compared with some reported in the literature. Visual inspection by rotating the line counterclockwise suggests a steeper slope, which tends to coincide with the scaling exponent of 0.83 given by the nonlinear regression (Figure 1(b)). The difference between the estimates of the scaling exponent is surprising. The discrepancy between the estimated scaling intercepts is also obvious.
Figure 1(b) compares the original data and the curves fitted by the log-log linear (light gray) and nonlinear (black) regressions. The power-law function we obtained from the nonlinear fitting is presented at the lower right corner. Although Figure 1(a) shows satisfactory fitting in the log-log scale, when back-transformed to the original arithmetic scale, the equation seems to be an adequate fit for basins smaller than 10,000 km2 but would seriously underestimate the peak flow for large basins. In contrast, the function resulting from the nonlinear regression describes the peak discharge well over the full range of upstream basin areas. This visual comparison is supported by the fact that the LOOE is smaller for the equation obtained from the nonlinear regression.
Analyses of multiple events
To show that the event analyzed in the preceding section is not an exception and that the values of scaling exponent estimated by the two regression methods are different, Table 1 compares the estimated parameters and presents the performances of the two regression methods. Estimates of scaling parameters in Equation (2) for each peak-discharge event were obtained using both the log-log linear and nonlinear regressions. The average values of the scaling exponent for all of the 52 peak-discharge events are 0.83(±0.17) from the log-log linear regression and 0.86(±0.14) from the nonlinear regression. Paired t-test accepted the null hypothesis that the mean difference between the paired scaling exponents estimated by the log-log linear and the nonlinear regressions from the 52 events is zero with a p value of 0.52. However, for about 60% of the 52 events, their absolute values of difference in the estimated scaling exponents by the two regression methods are greater than one standard deviation (0.14). Fourteen events have absolute differences greater than two standard deviations.
Event # . | . | . | LOOE (m3/s) . | . | . | . | LOOE (m3/s) . | . | . | . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
log . | nls . | log . | nls . | log . | nls . | Event # . | log . | nls . | log . | nls . | log . | nls . | . | . | . | . | |
20020409 | 0.0212 | 0.0027 | 0.85 | 1.10 | 10.1 | 5.9 | 20071018 | 1.4963 | 0.0909 | 0.55 | 0.88 | 57.3 | 28.0 | ||||
20020428 | 0.0546 | 0.0275 | 0.81 | 0.89 | 10.7 | 8.8 | 20080425 | 0.6123 | 0.3878 | 0.76 | 0.83 | 150.0 | 150.6 | ||||
20020604 | 0.0635 | 0.0005 | 0.81 | 1.39 | 47.6 | 29.0 | 20080603 | 1.9778 | 1.0269 | 0.75 | 0.83 | 252.7 | 226.4 | ||||
20020711 | 0.0447 | 0.0020 | 0.79 | 1.16 | 17.2 | 9.9 | 20090427 | 0.2029 | 0.2853 | 0.82 | 0.78 | 59.0 | 60.0 | Center of the 52 events: | |||
20030509 | 0.3169 | 0.1749 | 0.75 | 0.82 | 28.4 | 26.7 | 20090515 | 0.3619 | 0.0240 | 0.70 | 1.02 | 43.8 | 24.0 | mean | median | ||
20030606 | 0.0122 | 0.0441 | 1.02 | 0.88 | 15.9 | 13.6 | 20090527 | 0.0455 | 0.1532 | 0.95 | 0.82 | 34.0 | 31.6 | log | 0.4031 | 0.1050 | |
20030626 | 0.0171 | 0.0589 | 0.97 | 0.83 | 14.5 | 14.1 | 20090620 | 0.4789 | 0.01o87 | 0.70 | 1.05 | 60.5 | 59.0 | nls | 0.2334 | 0.1488 | |
20030711 | 0.0497 | 0.1876 | 0.91 | 0.76 | 26.4 | 23.9 | 20091023 | 0.8486 | 0.0671 | 0.61 | 0.90 | 47.9 | 30.7 | log | 0.83 | 0.82 | |
20040305 | 0.5134 | 0.0405 | 0.65 | 0.94 | 32.4 | 32.5 | 20091030 | 2.5929 | 0.2160 | 0.54 | 0.83 | 75.7 | 43.8 | nls | 0.86 | 0.83 | |
20040326 | 0.0899 | 0.0086 | 0.78 | 1.06 | 26.5 | 17.4 | 20100309 | 0.5563 | 1.1907 | 0.79 | 0.70 | 110.8 | 107.2 | LOOE | log | 57.3 | 42.9 |
20040530 | 0.2468 | 0.5909 | 0.88 | 0.79 | 203.3 | 197.9 | 20100513 | 0.9480 | 0.1491 | 0.63 | 0.85 | 75.1 | 58.7 | nls | 49.0 | 32.3 | |
20050412 | 0.0514 | 0.1699 | 0.87 | 0.73 | 14.3 | 10.3 | 20100623 | 2.4067 | 0.2502 | 0.56 | 0.83 | 92.1 | 59.1 | ||||
20050422 | 0.0073 | 0.2372 | 1.12 | 0.73 | 44.5 | 32.1 | 20100731 | 0.0390 | 0.1484 | 0.99 | 0.85 | 37.6 | 42.3 | Spread of the 52 events: | |||
20050513 | 0.1560 | 0.3369 | 0.82 | 0.72 | 36.5 | 27.0 | 20100811 | 0.4960 | 0.0581 | 0.68 | 0.94 | 76.5 | 52.3 | standard deviation | inter quantile range | ||
20050629 | 0.0104 | 0.1263 | 1.13 | 0.86 | 71.8 | 62.5 | 20100924 | 0.1143 | 1.0967 | 0.84 | 0.62 | 74.1 | 81.7 | ||||
20050924 | 0.0017 | 0.1887 | 1.19 | 0.70 | 40.3 | 38.7 | 20110426 | 0.0954 | 0.1382 | 0.89 | 0.84 | 29.5 | 26.8 | ||||
20060309 | 0.0894 | 0.0743 | 0.73 | 0.76 | 11.1 | 9.8 | 20110529 | 0.4401 | 0.1196 | 0.71 | 0.85 | 29.5 | 19.6 | log | 0.6313 | 0.4468 | |
20060406 | 0.0625 | 0.3798 | 0.92 | 0.72 | 56.0 | 54.9 | 20110615 | 0.0956 | 0.0119 | 0.86 | 1.08 | 32.5 | 33.7 | nls | 0.2782 | 0.2182 | |
20060501 | 0.0149 | 0.0784 | 1.06 | 0.86 | 34.6 | 25.4 | 20110621 | 0.1190 | 0.1099 | 0.83 | 0.85 | 19.7 | 19.6 | log | 0.17 | 0.20 | |
20060918 | 0.0006 | 0.0681 | 1.29 | 0.77 | 20.3 | 14.2 | 20110728 | 0.0569 | 0.1538 | 0.84 | 0.77 | 42.0 | 44.6 | nls | 0.14 | 0.13 | |
20070311 | 0.2690 | 0.8126 | 0.81 | 0.68 | 72.6 | 46.0 | 20120229 | 0.0225 | 0.4799 | 1.06 | 0.74 | 107.8 | 110.6 | LOOE | log | 49.6 | 44.0 |
20070401 | 0.0939 | 0.1250 | 0.89 | 0.87 | 37.3 | 37.4 | 20120504 | 0.0686 | 0.0051 | 0.85 | 1.14 | 27.8 | 27.8 | nls | 46.9 | 36.2 | |
20070426 | 0.4359 | 0.2273 | 0.72 | 0.81 | 57.6 | 42.9 | 20130410 | 0.2112 | 0.0212 | 0.74 | 1.01 | 31.0 | 21.6 | ||||
20070524 | 0.0098 | 0.1066 | 1.09 | 0.82 | 28.4 | 13.3 | 20130505 | 0.6650 | 0.4348 | 0.69 | 0.75 | 45.3 | 42.8 | ||||
20070820 | 0.4305 | 0.2367 | 0.73 | 0.81 | 59.7 | 57.5 | 20130527 | 2.2308 | 0.6200 | 0.65 | 0.80 | 193.0 | 163.3 | ||||
20071008 | 0.0094 | 0.2054 | 1.11 | 0.76 | 49.4 | 28.9 | 20130624 | 0.7074 | 0.3700 | 0.72 | 0.81 | 108.6 | 103.7 |
Event # . | . | . | LOOE (m3/s) . | . | . | . | LOOE (m3/s) . | . | . | . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
log . | nls . | log . | nls . | log . | nls . | Event # . | log . | nls . | log . | nls . | log . | nls . | . | . | . | . | |
20020409 | 0.0212 | 0.0027 | 0.85 | 1.10 | 10.1 | 5.9 | 20071018 | 1.4963 | 0.0909 | 0.55 | 0.88 | 57.3 | 28.0 | ||||
20020428 | 0.0546 | 0.0275 | 0.81 | 0.89 | 10.7 | 8.8 | 20080425 | 0.6123 | 0.3878 | 0.76 | 0.83 | 150.0 | 150.6 | ||||
20020604 | 0.0635 | 0.0005 | 0.81 | 1.39 | 47.6 | 29.0 | 20080603 | 1.9778 | 1.0269 | 0.75 | 0.83 | 252.7 | 226.4 | ||||
20020711 | 0.0447 | 0.0020 | 0.79 | 1.16 | 17.2 | 9.9 | 20090427 | 0.2029 | 0.2853 | 0.82 | 0.78 | 59.0 | 60.0 | Center of the 52 events: | |||
20030509 | 0.3169 | 0.1749 | 0.75 | 0.82 | 28.4 | 26.7 | 20090515 | 0.3619 | 0.0240 | 0.70 | 1.02 | 43.8 | 24.0 | mean | median | ||
20030606 | 0.0122 | 0.0441 | 1.02 | 0.88 | 15.9 | 13.6 | 20090527 | 0.0455 | 0.1532 | 0.95 | 0.82 | 34.0 | 31.6 | log | 0.4031 | 0.1050 | |
20030626 | 0.0171 | 0.0589 | 0.97 | 0.83 | 14.5 | 14.1 | 20090620 | 0.4789 | 0.01o87 | 0.70 | 1.05 | 60.5 | 59.0 | nls | 0.2334 | 0.1488 | |
20030711 | 0.0497 | 0.1876 | 0.91 | 0.76 | 26.4 | 23.9 | 20091023 | 0.8486 | 0.0671 | 0.61 | 0.90 | 47.9 | 30.7 | log | 0.83 | 0.82 | |
20040305 | 0.5134 | 0.0405 | 0.65 | 0.94 | 32.4 | 32.5 | 20091030 | 2.5929 | 0.2160 | 0.54 | 0.83 | 75.7 | 43.8 | nls | 0.86 | 0.83 | |
20040326 | 0.0899 | 0.0086 | 0.78 | 1.06 | 26.5 | 17.4 | 20100309 | 0.5563 | 1.1907 | 0.79 | 0.70 | 110.8 | 107.2 | LOOE | log | 57.3 | 42.9 |
20040530 | 0.2468 | 0.5909 | 0.88 | 0.79 | 203.3 | 197.9 | 20100513 | 0.9480 | 0.1491 | 0.63 | 0.85 | 75.1 | 58.7 | nls | 49.0 | 32.3 | |
20050412 | 0.0514 | 0.1699 | 0.87 | 0.73 | 14.3 | 10.3 | 20100623 | 2.4067 | 0.2502 | 0.56 | 0.83 | 92.1 | 59.1 | ||||
20050422 | 0.0073 | 0.2372 | 1.12 | 0.73 | 44.5 | 32.1 | 20100731 | 0.0390 | 0.1484 | 0.99 | 0.85 | 37.6 | 42.3 | Spread of the 52 events: | |||
20050513 | 0.1560 | 0.3369 | 0.82 | 0.72 | 36.5 | 27.0 | 20100811 | 0.4960 | 0.0581 | 0.68 | 0.94 | 76.5 | 52.3 | standard deviation | inter quantile range | ||
20050629 | 0.0104 | 0.1263 | 1.13 | 0.86 | 71.8 | 62.5 | 20100924 | 0.1143 | 1.0967 | 0.84 | 0.62 | 74.1 | 81.7 | ||||
20050924 | 0.0017 | 0.1887 | 1.19 | 0.70 | 40.3 | 38.7 | 20110426 | 0.0954 | 0.1382 | 0.89 | 0.84 | 29.5 | 26.8 | ||||
20060309 | 0.0894 | 0.0743 | 0.73 | 0.76 | 11.1 | 9.8 | 20110529 | 0.4401 | 0.1196 | 0.71 | 0.85 | 29.5 | 19.6 | log | 0.6313 | 0.4468 | |
20060406 | 0.0625 | 0.3798 | 0.92 | 0.72 | 56.0 | 54.9 | 20110615 | 0.0956 | 0.0119 | 0.86 | 1.08 | 32.5 | 33.7 | nls | 0.2782 | 0.2182 | |
20060501 | 0.0149 | 0.0784 | 1.06 | 0.86 | 34.6 | 25.4 | 20110621 | 0.1190 | 0.1099 | 0.83 | 0.85 | 19.7 | 19.6 | log | 0.17 | 0.20 | |
20060918 | 0.0006 | 0.0681 | 1.29 | 0.77 | 20.3 | 14.2 | 20110728 | 0.0569 | 0.1538 | 0.84 | 0.77 | 42.0 | 44.6 | nls | 0.14 | 0.13 | |
20070311 | 0.2690 | 0.8126 | 0.81 | 0.68 | 72.6 | 46.0 | 20120229 | 0.0225 | 0.4799 | 1.06 | 0.74 | 107.8 | 110.6 | LOOE | log | 49.6 | 44.0 |
20070401 | 0.0939 | 0.1250 | 0.89 | 0.87 | 37.3 | 37.4 | 20120504 | 0.0686 | 0.0051 | 0.85 | 1.14 | 27.8 | 27.8 | nls | 46.9 | 36.2 | |
20070426 | 0.4359 | 0.2273 | 0.72 | 0.81 | 57.6 | 42.9 | 20130410 | 0.2112 | 0.0212 | 0.74 | 1.01 | 31.0 | 21.6 | ||||
20070524 | 0.0098 | 0.1066 | 1.09 | 0.82 | 28.4 | 13.3 | 20130505 | 0.6650 | 0.4348 | 0.69 | 0.75 | 45.3 | 42.8 | ||||
20070820 | 0.4305 | 0.2367 | 0.73 | 0.81 | 59.7 | 57.5 | 20130527 | 2.2308 | 0.6200 | 0.65 | 0.80 | 193.0 | 163.3 | ||||
20071008 | 0.0094 | 0.2054 | 1.11 | 0.76 | 49.4 | 28.9 | 20130624 | 0.7074 | 0.3700 | 0.72 | 0.81 | 108.6 | 103.7 |
Table 1 also compares the prediction errors of the power functions fitted by the log-log linear and nonlinear regressions. The prediction errors were evaluated in the arithmetic scale using the untransformed data. For 42 out of the 52 events, the fit by the nonlinear regression has smaller LOOE than that by the log-log linear regression, indicating a better fit of the former from the perspective of prediction skill. This quantitative assessment is consistent with our graphical comparisons between the two regressions. As shown in Figure 1(b), the curve fitted by nonlinear regression traces the untransformed data better for the example event. Similarly, a visual inspection for all the 52 events suggests that, for about 80% of the events, the function resulting from the nonlinear regression better describes the peak discharge over the full range in upstream basin area.
Potential problems of log-transformation
Log-transformation may not stabilize variance in arithmetic scale
Figure 2(a) plots the SR from the log-log linear regression in the logarithmic scale (the same example event used in Figure 1). The SR display no compelling pattern with respect to the logarithm of fitted peak flow. The Shapiro–Wilk test of the null hypotheses that the residuals are normally distributed had p= 0.20. The Spearman rank correlation analysis between absolute values of SR and logarithm of fitted peak flow showed that the null hypothesis of zero correlation coefficient had p= 0.76. Visual inspection of the residual plot and the quantitative tests appears to suggest that the linear regression is statistically satisfactory in the logarithmic scale.
However, the residuals from a power-law equation, which is back-transformed from the log-log linear equation, tend to be heteroscedastic in the arithmetic scale. Figure 2(b) plots the SR from the back-transformed power-law function in the arithmetic scale. The SR exhibit a ‘megaphone’ shape (the upper side only) and systematic underestimation of the peak flow of large basins. The Spearman rank correlation analysis between absolute values of SR and fitted peak flow rejects the constant variance with a p value of 10−5.
For this exemplary event, the logarithmic transformation tends to stabilize the variance of the residuals of the log-log linear regression, but it does not fix the problem of heteroscedastic variance of residuals from the back-transformed power function. For comparison, we show the residual plots of the power-law equation fitted by nonlinear regression. The SR appear to be randomly distributed with respect to the fitted peak flows (Figure 2(c)) in the arithmetic scale. Spearman rank correlation analysis showed that the null hypothesis of zero correlation coefficient had p= 0.12. However, our analyses show that for most of the 52 events, neither the nonlinear regression nor the log-log linear regression produced constant variance of residuals in the arithmetic scale.
Log-transformation may make ill-suited data look extraordinary
Figure 3(a) plots peak flow and upstream basin area in log-log scale for the event recorded over the period of 9 March to 19 March 2010. Although the scatter at the lower left corner of the plot can be a concern, the overall pattern in the bivariate plot is fairly good and suggests that fitting a straight line to the observations would be appropriate. The equation obtained by ordinary least squares linear regression (R2 = 0.92, n = 34) appears to be good.
Figure 3(b) plots the original observations for the same event with arithmetic coordinates and the power function back-transformed from the log-log linear regression. Strikingly, this data set seems to be ill-suited for peak-discharge power-law analysis. On the one hand, peak flows for 4 of the 34 river gauges in the sample form a nearly horizontal band in the middle of the scatterplot. Points for the remaining 31 gauges tend to have noticeable scatter. On the other hand, the back-transformed power-law equation is a poor fit to the pattern over the full range of upstream basin areas. This poor fitting is reflected by the fact that the LOOE (110.8 m3/s) for this event is relatively large among all the events listed in Table 1. The power-law equation estimated using scaling parameters by the nonlinear regression gives a little better fit (LOOE = 107.2 m3/s, not shown). Nevertheless, we consider this event inappropriate for peak-discharge power-law analysis. The problem of this event would not be noticed if we solely plotted the data in the log-log scale rather than in the arithmetic coordinates.
Additionally, the four points at the upper right corner of Figure 3(a) appear to suggest a line segment with a mild slope, while the remaining observations seem to follow a linear model with a much steeper slope. This break in the scaling exponents (i.e., slopes of line segments) has been called ‘multiscaling of flood peaks’ and has been interpreted as an indication of changing dominant physical processes (e.g., Gupta et al. 1994). Apparently, the ‘multiscaling’ observed in the log-log space herein should be interpreted with great caution given the ill-suited data.
Causes of the discrepancy in the estimated scaling parameters
Log-transformation alters the pattern of data points
Comparisons of the data patterns plotted in the log-log and arithmetic scales in Figures 1 and 3 show that logarithmic transformation is monotonic, i.e., the log-transformation does not alter the order of the original data. However, the relative distances between adjacent points are changed. Taking Figure 1(b) as an example, the data points plotted in the arithmetic scale are clustered into three groups: one point for observation with upstream basin area greater than 30,000 km2, 29 with areas less than 10,000 km2, and the remaining four with areas in-between. Along the horizontal axis, the group with 29 points takes up about 25% of the plotting space. In contrast, after the logarithmic transformation (Figure 1(a)), along the x-axis the distribution of the 29 points with areas less than 10,000 km2 expands and occupies about 70% of the plotting space. It is evident from comparing Figure 1(a) and 1(b) that logarithmic transformation compresses the larger numbers much more than the smaller numbers. These imply that the logarithmic transformation fundamentally changed the pattern of the untransformed data.
Log-log linear regression models the geometric mean response
Equation (7) indicates that the log-log linear regression equation, and thus the back-transformed power function, models the geometric mean of peak flow at each given value of upstream basin area. Since geometric mean is always smaller than arithmetic mean, the back-transformed power function likely underestimates peak flows (see Figure 1 for an example).
Leverage low-value data points in log-log linear regression
Data points on the upper right corner tend to be of low leverage in the log-log linear regression of peak flow against upstream drainage area. Figure 4(a) compares the log-log linear regressions with (solid line in light gray) or without (dashed line in light gray) the five basins with an area greater than 10,000 km2 (lighter gray dots). Both visual inspection and the fitted equations in the lower right corner of Figure 4(a) show a small discrepancy between the fits, indicating these log-log linear regressions are dominated by the 29 data points (darker gray dots) at the lower left corner. The back-transformed power-law equations (curves in light gray) are also plotted in the arithmetic scale and the discrepancy due to removing the five data points is relatively small (Figure 4(b)). Figure 4 was based on the same peak-discharge event as used in Figure 1.
Assigning high leverage to data points at the lower-left corner in the log-log linear regression may at least partially explain that the back-transformed model reflects the features of the data points representing small values, but not the overall relationship between the peak discharge and drainage area over the full range. In contrast, the equations estimated by nonlinear regression with (solid line in black) and without (dashed line in black) the five data points are remarkably different, implying that these five data points have apparent influences on the fits. Again, for this peak flow event, the nonlinear regressions fit the data better in the units of measurements than those back-transformed from log-log linear regression when all the data points are used.
DISCUSSION
Peak-discharge power-law relation has been widely used in regional flood frequency analysis and explored in the event-based analysis for physical interpretation. In both applications, the real interest lies in the nonlinear relationship between the original variables of peak-discharge and drainage area, that is, in specifying a representative curve in the arithmetic scale rather than in the double-logarithmic scale. However, the double-logarithmic scale, i.e., logarithmic transformation, has been frequently adopted. Using 52 peak-discharge events observed in the Iowa River Basin, the United States over the period from 2002 to 2013, this article illustrates the deficiencies of applying log-transformation in peak-discharge power-law analyses.
On the use of log-transformation for peak-discharge power-law analysis
Logarithmic transformation was introduced as a notional method to estimate scaling parameters of the peak-discharge power-law relationship for: (1) better distribution of data spanning across few orders (e.g., four in this study) of magnitude for graphical presentation; (2) simplicity in calculation when computers were not available; and (3) coping with the multiplicative residuals or heteroscedastic variance of residuals.
Presenting and analyzing peak-discharge power-law relations using log-transformations, however, may lead to side effects. Figures 1, 3 and 4 and Figures S2 and S3 (similar analyses for another peak-discharge event in the Supplementary material) in this work show that logarithmic transformation compresses the higher numbers much more than the lower numbers. This observation supports the criticism of statisticians (e.g., Miller 1984; Osbourne 2002) and researchers in other fields (e.g., Richards 1973; Smith 1984; McCuen et al. 1990; Packard et al. 2011; Packard 2014) that logarithmic transformation fundamentally changes the nature of the untransformed data. In Figure 1(b) and Figure S2(b), the theoretical analysis (see section ‘Log-log linear regression models the geometric mean response’) demonstrates that as one of its side effects, the power function back-transformed from log-log linear regression may underestimate peak flows for larger basins. Similar underestimation due to log-transformation was reported by Asselman (2000), who studied the fitting of sedimentation-discharge power-law relation. In addition, Figure 3 illustrates that the log-transformation makes poor fitting look better. Furthermore, Figure 4 and Figure S3 shows that log-transformation may make the fitting less sensitive to the data points representing larger basins. Our analyses of peak-discharge data and studies on similar power-law relation in other fields (e.g., Pattyn & Van Huele 1998; Pandey & Nguyen 1999; Asselman 2000; Packard & Boardman 2008; Packard 2017) all demonstrate that although the linear fitting in the logarithmic scale appears pleasing graphically, this does not guarantee that the power function estimated by back-transformation will describe the data in the arithmetic scale.
On the other hand, Figure 2(a) and Figure S4(a) indicate that although log-transformation tends to succeed in stabilizing the variance of the residuals in the logarithmic space, it fails to alleviate the problem of heteroscedastic variance of residuals in the arithmetic scales of practical interest (Figure 2(b) and Figure S4(b)). This heteroscedastic variance problem remains for the nonlinear regression method.
Implications for analyzing peak-discharge power-law relation
Apparently, the aforementioned problems of log-transformation could lead to misinterpretations of the underlying relationships in the original data. Accordingly, the implication from this work is to at least use logarithmic transformations with greater care in peak-discharge power-law analyses. Nonlinear regression seems to fit better the data in the arithmetic scales, which is of practical interest in power-law applications. It also helps to disrepute the data that are not appropriate for power-law analysis of peak discharge. However, as pointed out by one of our reviewers, nonlinear least squares regressions might weight data points representing large values more than those representing smaller values. Nevertheless, we recommend that the peak-discharge power-law relationships should be displayed, evaluated, and applied in the arithmetic scale instead of log-log scale. This could increase the fidelity of inferences drawn from future peak-discharge power-law analyses.
CONCLUSION
In this study, we investigated the overlooked problem of adopting logarithmic transformation in peak-discharge power-law analysis. Our findings, through analyzing 52 peak-discharge events observed in the Iowa River Basin, the United States over the period from 2002 to 2013, are as follows:
- (1)
The discrepancy between the parameters estimated by the log-log linear and nonlinear regression methods is remarkable.
- (2)
High coefficient of determination (R2) of log-log linear regression does not guarantee high accuracy of the back-transformed peak-discharge power-law model in the arithmetic scale.
- (3)
Log-log transformation of discharge and area data may mislead the observation of power-law relation and multiscaling of flood peaks.
- (4)
Log-log linear regression may assign high leverage to data points at the lower-left corner, alter the visual appearance of the scatter in the data, and fail to stabilize variance and predict the median response in the arithmetic scale. These potential problems at least partially explain that the back-transformed model reflects the features of the data points representing lower values but not the overall relationship between peak discharge and drainage area over the full range.
- (5)
The peak-discharge power functions estimated by nonlinear fitting tend to give smaller prediction errors (LOOE) and better follow the track of data points in the arithmetic scales in most cases.
Recognizing its importance to the field of flood hydrology, this article addresses the use of logarithmic transformations in future peak-discharge power-law analyses. When applying the regression equations of peak-discharge vs drainage area to predict flood flows for ungauged locations, or to investigate the connections between natural processes and the dynamics of peak-discharge, the fitted peak-discharge power-law relationships should be displayed, evaluated, and applied in the arithmetic scale of practical interest. Accordingly, we recommend using logarithmic transformations in peak-discharge power-law analyses with greater care. The nonlinear regression, which often fits the data better in the arithmetic scale, could be an alternate. This cautionary note may help increase the prediction accuracy of peak-discharges at ungauged locations and improve our understanding of flood generating mechanisms retrieved from peak-discharge scaling analyses. We investigated the issues of using log-log linear regression to fit power-law functions in the context of Q-A relationships, while the findings herein may also be valid for other applications, including but not limited to, fitting discharge-stage, discharge-sediment, and hydraulic geometry relationships.
We also recommend more research into the issues that affect the statistical consideration of peak-discharge modeling via regression. These include the probability distribution of peak-discharge data, and the statistical (distributional) properties of the residuals. One special topic that is often ignored is the spatial dependence of the peak-discharge. With the covariance known, it would be interesting to explore the estimation framework of the generalized least squares.
ACKNOWLEDGEMENTS
The first author acknowledges the partial financial support provided by the projects (Grant No. 2017YFA0604903, 41321001 and 41501020). The third author acknowledges support from the Rose & Joseph Summers endowment and the Iowa Flood Center. The corresponding author acknowledges the partial financial support provided by the projects (Grant No. 2017YFC050530302, 41301201 and CKSF2019292/SH + TB). We thank our editor Professor Nevil Wyndham Quinn, reviewer Professor Daniel B. Wright and the other two anonymous reviewers for their valuable suggestions that greatly helped us to improve the quality of this work. Thanks to Professor Jin Liu at Duke-NUS Medical School for discussions that were beneficial to the authors.
SUPPLEMENTARY MATERIAL
The Supplementary Material for this paper is available online at https://dx.doi.org/10.2166/nh.2019.108.