Abstract

The peak-discharge and drainage area power-law relation has been widely used in regional flood frequency analysis for more than a century. The coefficients and can be obtained by nonlinear or log-log linear regression. To illustrate the deficiencies of applying log-transformation in peak-discharge power-law analyses, we studied 52 peak-discharge events observed in the Iowa River Basin in the United States from 2002 to 2013. The results show that: (1) the estimated scaling exponents by the two methods are remarkably different; (2) for more than 80% of the cases, the power-law relationships obtained by log-log linear regression produce larger prediction errors of peak discharge in the arithmetic scale than that predicted by nonlinear regression; and (3) logarithmic transformation often fails to stabilize residuals in the arithmetic domain, it assigns higher weight to data points representing smaller peak discharges and drainage areas, and it alters the visual appearance of the scatter in the data. The notable discrepancies in the scaling parameters estimated by the two methods and the undesirable consequences of logarithmic transformation raise caution. When conducting peak-discharge scaling analysis, especially for prediction purposes, applying nonlinear regression on the arithmetic scale to estimate the scaling parameters is a better alternative.

INTRODUCTION

Power-law relations between peak flow and drainage area have been widely observed for centuries and have been applied in analyzing both regional flood frequencies and individual rainfall-runoff events (e.g., O'Connell 1868; Fuller 1914; Allen 1960; Gupta & Dawdy 1995; Ogden & Dawdy 2003; Griffis & Stedinger 2007; Dawdy et al. 2012; Eash et al. 2013; Ayalew et al. 2015; Furey et al. 2016). Studies that pertain to understanding the physical basis of this relation have advanced our understanding of flood generation mechanisms (Gupta et al. 2007, 2010). Such studies involve estimating and interpreting the scaling slopes and intercepts through data analyses (e.g., Ogden & Dawdy 2003; Gupta et al. 2010), numerical modeling (e.g., Furey & Gupta 2007; Mandapaka et al. 2009; Ayalew et al. 2015), and theoretical considerations (Furey et al. 2016). The scaling slopes and intercepts are often estimated in these studies by fitting a straight line to the logarithmic transformations of drainage area and river discharge data using ordinary or generalized least squares techniques. Although the discharge and drainage area power-law relation has been widely applied to estimate flood flows for flood hazard mapping and has attracted growing attention from the research community, the appropriateness of using linear regression of logarithms to estimate power function parameters tends to be overlooked.

Form of the peak-discharge scaling relation

Peak-discharge scaling studies hypothesize a general power function structure as: 
formula
(1)
in which Q [m3/s] is the predicted value of peak-discharge as the independent variable, A [km2] is the upstream drainage area of a specific location, are regression coefficients, and m is the number of predictor variables. are the predictor variables including but not limited to watershed characteristics such as river length, basin slope, land use, and climatic variables such as the amount of precipitation. For simplicity, for the remainder of this paper, we limit our considerations to a single variable case of relating Q and A.
Drainage area often has been found to be the dominant predictor variable for basin discharge, therefore Equation (1) is frequently simplified to a two-parameter power function 
formula
(2)

Parameters and are termed as scaling intercept and exponent of the peak-discharge power-law relations.

Fitting the power function

The conventional method uses the logarithmic transformation to obtain the parameters and , and uses the following procedure: (i) transform the original data for Q and A to logarithms (e.g., of base 10); (ii) fit a straight line to the logarithms using ordinary or general least squares techniques; (iii) display the straight line and data points in a scatter plot with the logarithmic scale and report the coefficient of determination (R2) as the evaluation of the reliability of the estimated parameters α and θ, and thus the accuracy of the prediction equation (Equation (2)). Estimates of Equation (2) can be used to explore the variability of the scaling parameters as results of the interactions between watershed characteristics and climatic forcing. Similarly, one can use Equation (2) to predict peak discharge at ungauged locations.

The linear regression equation is typically written as: 
formula
(3)

Accordingly, the scaling intercept is assigned as 10 to the power of the intercept in the regression equation (Equation (3)) and the scaling exponent is equaled to the slope of Equation (3). In most cases, the coefficient of determination R2 obtained from the fit of Equation (3) takes a value less than unity, indicating that there are discrepancies between paired logarithms of observed and predicted peak-discharge values. Equation (3), when back-transformed to Equation (2), implicitly assumes a multiplicative error. For easier presentation, we call the conventional method log-log linear regression hereafter. The true values of the coefficients of Equation (2) are unknown for a number of reasons. First, only rarely can the existence of the power law be strictly proven (Newman 2005; Broido & Clauset 2019). Second, data of Q and A used to estimate Equation (2) are corrupted with observational errors and their sample size is limited. Therefore, the coefficients can only be estimated.

The conventional method minimizes the sum of squares of the logarithms, but in many predictive applications of Equation (2) our main interest is in estimating Q and not its logarithm. As an alternative, a nonlinear least squares regression approach estimates the scaling slope and intercept directly, i.e., without a logarithmic transformation, through Equation (2). This is a numerical fitting approach that uses the minimum sum of squares of modeling errors (differences between data and model output) as the optimization criterion to obtain the values for parameters. The nonlinear least squares method therefore assumes additive errors for the power-law model. Nonlinear fitting used to take greater computational time, but this is no longer a limitation with the advances in computers and software. Initial values can be assigned as the estimates obtained from the conventional linear regression on logarithms.

Aim of this work

When examining or applying the peak-discharge power-law relation in the arithmetic scale, the existing reports (e.g., Eash 2001; Eash et al. 2013) and research papers present the peak-discharge-area relationship in the log-log coordinates (e.g., Ogden & Dawdy 2003; Mandapaka et al. 2009; Gupta et al. 2010; Ayalew et al. 2015; Furey et al. 2016). But in one of the earliest studies on peak-discharge power-law relation, O'Connell (1868) plotted peak discharge against drainage areas in the original units. In O'Connell's study, logarithmic transformation was not needed because he fixed the scaling exponent to be 0.5.

Using a variable scaling exponent, however, brings the problem of fitting the nonlinear power-law function into view in later studies. Before the invention of computers and access to nonlinear fitting algorithms, linearizing the power-law function to Equation (3) was a convenient solution to this problem. However, statisticians (Miller 1984; Osbourne 2002) and researchers in other fields (Richards 1973; Smith 1984; McCuen et al. 1990; Packard et al. 2011; Xiao et al. 2011; Packard 2014) have demonstrated that logarithmic transformation does fundamentally transform the nature of the variables, making the interpretation of the results somewhat more complex (Asselman 2000). They further call for reducing transformation bias (Pattyn & Van Huele 1998; Packard 2013) in curve fitting or using logarithmic transformation with caution (Miller 1984).

The appropriateness of applying logarithmic transformations in peak-discharge power-law analyses tends to have been overlooked. This study explores some implications of the logarithmic transformation and calls for caution in future peak-discharge power-law analyses. The authors pursue this objective by first illustrating the overlooked discrepancies between the peak-discharge power-law models fitted by the least squares log-log linear and nonlinear regressions, and then analyzing the causes for the discrepancies. We use the observed event-based peak-discharge data as examples.

This article is organized as follows. We describe the study area and data used in this study immediately below. Then, the following section shows the discrepancies in the fitted relationships by the log-log linear and the nonlinear regressions, the underlying reasons for the discrepancies, and the problems of log-log transformation. The next section discusses the implications for peak-discharge power-law data analyses and this is followed by concluding remarks in the final section.

DATA AND METHOD

In this study, data for peak-discharge and drainage area of 52 individual rainfall-runoff events in the Iowa River Basin are taken from the study by Ayalew et al. (2015). The authors identified 52 events, over the period from 2002 to 2013 using both radar rainfall information and instantaneous streamflow measurements, to investigate how the duration, depth, and temporal structure of rainfall control the flood scaling intercept and exponent. Based on the information available, they reported that the entire Iowa River Basin received rainfall for these events. The Iowa River Basin drains an area of about 33,000 km2 at its confluence with the Mississippi River and its longest river flows about 600 km. About 85% of the basin has a surface slope of less than 5% and an average river bed slope of about 0.6‰. By assuming flow velocities varying from 0.5 to 1.5 m/s in the channel, the in-channel time of concentration of the Iowa River Basin ranges from about 5 to 15 days. Flooding has been frequent in Iowa in the past three decades, including the disastrous events in 1993 and 2008. Location and description of the streamflow gauges in the Iowa River Basin are provided in the Supplementary material (Figure S1 and Table S1).

We analyzed the fitted power-law models by examining the regression residuals and the prediction errors. First, we used both the log-log linear regression and the nonlinear least squares regression to obtain the peak-discharge scaling coefficients and thus their associated power-law models. Then we used the standardized residuals (SR) defined as: 
formula
(4)
to characterize the fit. In Equation (4), Rk is the residual, i.e., the difference between Qobs(k) and Qpred(k), at the kth streamflow gauge, is the mean of Rk (k=1,2,3,…n), and SRk represents the associated standardized residual. Variables Qobs(k) and Qpred(k) are the observed and predicted peak-discharges at the kth streamflow gauge, and n is the number of gauges at which peak-discharge is recorded for an event. All log-transformations were base 10 and nonlinear regressions were implemented using the ‘nls’ function of the R programming language.
Leave-one-out cross-validation technique is adopted in this study to compare the fitted power-law models' ability to predict peak-discharge data that were not used in estimating them. In the practice of conservative flood analysis, these ‘predictions’ are mostly done for ungauged locations with drainage areas within the original observation range. To avoid extrapolation, we included the peak-discharge data associated with the largest and smallest drainage areas in the training set for each individual event. The leave-one-out prediction errors (LOOE), in the original unit of streamflow for each peak-discharge event, are quantified as: 
formula
(5)

The fitted power-law models with smaller LOOE are assumed to have better predictive skills.

RESULTS

Different estimates of scaling parameters by the two regression methods

Analysis of a single event

Analysis of an example event over the period of 14 June to 4 July 2010 reveals remarkable discrepancy between the estimated parameters by the two regression methods. High coefficient of determination in the log-log linear regression does not guarantee a good fit of the back-transformed model to the original untransformed data. Figure 1 illustrates the fitted peak-discharge power-law functions by the log-log linear and nonlinear regressions. For this event, the maximum stream discharge values were extracted at 34 river gauges. The gray and black colors signify the data, the fitted lines, and the models associated with the log-log linear regression and nonlinear regression, respectively. Figure 1(a) shows the fitted straight line to the logarithmic transformation of data for the upstream basin area and peak discharge, with a scaling exponent of 0.56. The power-law function that is back-transformed from the log-log linear fitting equation is presented at the lower right corner. The linear model fitted to the log-transformations, with a coefficient of determination of 0.85 and favorable pattern in its residual plot (as shown later in Figure 2(a)), strongly supports the power-law relationship between drainage area and peak-discharge compared with some reported in the literature. Visual inspection by rotating the line counterclockwise suggests a steeper slope, which tends to coincide with the scaling exponent of 0.83 given by the nonlinear regression (Figure 1(b)). The difference between the estimates of the scaling exponent is surprising. The discrepancy between the estimated scaling intercepts is also obvious.

Figure 1

Different estimates of scaling parameters for the peak-discharge event identified over the period of 14 June to 4 July 2010 by the two regression methods. (a) Fitting a straight line (light gray) to the base 10 log-transformations of the observed peak flow and upstream basin area (gray dots) by ordinary least squares technique. The keys at the lower right corner are the power function back-transformed from the log-log linear regression, the coefficient of determination of the log-log linear regression (R2), and the leave-one-out prediction error (LOOE) from the back-transformed equation. (b) The power function estimated by the nonlinear regression applied to the original untransformed peak flow and upstream basin area data (black curve), which differs from that estimated by preceding analysis (light gray curve), and the former tends to follow the track of the data points better. For this analysis, the same set of peak flow and upstream basin area data recorded at the 34 river gauges are available and used. The power functions were fitted using data from all gauges.

Figure 1

Different estimates of scaling parameters for the peak-discharge event identified over the period of 14 June to 4 July 2010 by the two regression methods. (a) Fitting a straight line (light gray) to the base 10 log-transformations of the observed peak flow and upstream basin area (gray dots) by ordinary least squares technique. The keys at the lower right corner are the power function back-transformed from the log-log linear regression, the coefficient of determination of the log-log linear regression (R2), and the leave-one-out prediction error (LOOE) from the back-transformed equation. (b) The power function estimated by the nonlinear regression applied to the original untransformed peak flow and upstream basin area data (black curve), which differs from that estimated by preceding analysis (light gray curve), and the former tends to follow the track of the data points better. For this analysis, the same set of peak flow and upstream basin area data recorded at the 34 river gauges are available and used. The power functions were fitted using data from all gauges.

Figure 2

Residual plots. (a) SR for log-log regression are within the acceptable limits of ±2 and appear to have homoscedastic variance in the logarithmic scale. (b) SR for power-law equation that is back-transformed from log-log linear regression exhibit a megaphone shape, indicating heteroscedastic variance in arithmetic scale. (c) SR for power-law equation obtained from nonlinear regression seem to have homoscedastic variance in arithmetic scale. This figure analyzed the same peak-discharge event as used in Figure 1.

Figure 2

Residual plots. (a) SR for log-log regression are within the acceptable limits of ±2 and appear to have homoscedastic variance in the logarithmic scale. (b) SR for power-law equation that is back-transformed from log-log linear regression exhibit a megaphone shape, indicating heteroscedastic variance in arithmetic scale. (c) SR for power-law equation obtained from nonlinear regression seem to have homoscedastic variance in arithmetic scale. This figure analyzed the same peak-discharge event as used in Figure 1.

Figure 1(b) compares the original data and the curves fitted by the log-log linear (light gray) and nonlinear (black) regressions. The power-law function we obtained from the nonlinear fitting is presented at the lower right corner. Although Figure 1(a) shows satisfactory fitting in the log-log scale, when back-transformed to the original arithmetic scale, the equation seems to be an adequate fit for basins smaller than 10,000 km2 but would seriously underestimate the peak flow for large basins. In contrast, the function resulting from the nonlinear regression describes the peak discharge well over the full range of upstream basin areas. This visual comparison is supported by the fact that the LOOE is smaller for the equation obtained from the nonlinear regression.

Analyses of multiple events

To show that the event analyzed in the preceding section is not an exception and that the values of scaling exponent estimated by the two regression methods are different, Table 1 compares the estimated parameters and presents the performances of the two regression methods. Estimates of scaling parameters in Equation (2) for each peak-discharge event were obtained using both the log-log linear and nonlinear regressions. The average values of the scaling exponent for all of the 52 peak-discharge events are 0.83(±0.17) from the log-log linear regression and 0.86(±0.14) from the nonlinear regression. Paired t-test accepted the null hypothesis that the mean difference between the paired scaling exponents estimated by the log-log linear and the nonlinear regressions from the 52 events is zero with a p value of 0.52. However, for about 60% of the 52 events, their absolute values of difference in the estimated scaling exponents by the two regression methods are greater than one standard deviation (0.14). Fourteen events have absolute differences greater than two standard deviations.

Table 1

Comparison of the estimated scaling coefficients α and θ and leave-one-out prediction errors (LOOE) for 52 events observed in the Iowa River Basin of the United States, obtained from the log-log linear regression (log) and nonlinear regression (nls)

Event #

LOOE (m3/s)


LOOE (m3/s)
lognlslognlslognlsEvent #lognlslognlslognls
20020409 0.0212 0.0027 0.85 1.10 10.1 5.9 20071018 1.4963 0.0909 0.55 0.88 57.3 28.0     
20020428 0.0546 0.0275 0.81 0.89 10.7 8.8 20080425 0.6123 0.3878 0.76 0.83 150.0 150.6     
20020604 0.0635 0.0005 0.81 1.39 47.6 29.0 20080603 1.9778 1.0269 0.75 0.83 252.7 226.4     
20020711 0.0447 0.0020 0.79 1.16 17.2 9.9 20090427 0.2029 0.2853 0.82 0.78 59.0 60.0 Center of the 52 events: 
20030509 0.3169 0.1749 0.75 0.82 28.4 26.7 20090515 0.3619 0.0240 0.70 1.02 43.8 24.0   mean median 
20030606 0.0122 0.0441 1.02 0.88 15.9 13.6 20090527 0.0455 0.1532 0.95 0.82 34.0 31.6  log 0.4031 0.1050 
20030626 0.0171 0.0589 0.97 0.83 14.5 14.1 20090620 0.4789 0.01o87 0.70 1.05 60.5 59.0  nls 0.2334 0.1488 
20030711 0.0497 0.1876 0.91 0.76 26.4 23.9 20091023 0.8486 0.0671 0.61 0.90 47.9 30.7  log 0.83 0.82 
20040305 0.5134 0.0405 0.65 0.94 32.4 32.5 20091030 2.5929 0.2160 0.54 0.83 75.7 43.8  nls 0.86 0.83 
20040326 0.0899 0.0086 0.78 1.06 26.5 17.4 20100309 0.5563 1.1907 0.79 0.70 110.8 107.2 LOOE log 57.3 42.9 
20040530 0.2468 0.5909 0.88 0.79 203.3 197.9 20100513 0.9480 0.1491 0.63 0.85 75.1 58.7  nls 49.0 32.3 
20050412 0.0514 0.1699 0.87 0.73 14.3 10.3 20100623 2.4067 0.2502 0.56 0.83 92.1 59.1     
20050422 0.0073 0.2372 1.12 0.73 44.5 32.1 20100731 0.0390 0.1484 0.99 0.85 37.6 42.3 Spread of the 52 events: 
20050513 0.1560 0.3369 0.82 0.72 36.5 27.0 20100811 0.4960 0.0581 0.68 0.94 76.5 52.3   standard deviation inter quantile range 
20050629 0.0104 0.1263 1.13 0.86 71.8 62.5 20100924 0.1143 1.0967 0.84 0.62 74.1 81.7     
20050924 0.0017 0.1887 1.19 0.70 40.3 38.7 20110426 0.0954 0.1382 0.89 0.84 29.5 26.8     
20060309 0.0894 0.0743 0.73 0.76 11.1 9.8 20110529 0.4401 0.1196 0.71 0.85 29.5 19.6  log 0.6313 0.4468 
20060406 0.0625 0.3798 0.92 0.72 56.0 54.9 20110615 0.0956 0.0119 0.86 1.08 32.5 33.7  nls 0.2782 0.2182 
20060501 0.0149 0.0784 1.06 0.86 34.6 25.4 20110621 0.1190 0.1099 0.83 0.85 19.7 19.6  log 0.17 0.20 
20060918 0.0006 0.0681 1.29 0.77 20.3 14.2 20110728 0.0569 0.1538 0.84 0.77 42.0 44.6  nls 0.14 0.13 
20070311 0.2690 0.8126 0.81 0.68 72.6 46.0 20120229 0.0225 0.4799 1.06 0.74 107.8 110.6 LOOE log 49.6 44.0 
20070401 0.0939 0.1250 0.89 0.87 37.3 37.4 20120504 0.0686 0.0051 0.85 1.14 27.8 27.8  nls 46.9 36.2 
20070426 0.4359 0.2273 0.72 0.81 57.6 42.9 20130410 0.2112 0.0212 0.74 1.01 31.0 21.6     
20070524 0.0098 0.1066 1.09 0.82 28.4 13.3 20130505 0.6650 0.4348 0.69 0.75 45.3 42.8     
20070820 0.4305 0.2367 0.73 0.81 59.7 57.5 20130527 2.2308 0.6200 0.65 0.80 193.0 163.3     
20071008 0.0094 0.2054 1.11 0.76 49.4 28.9 20130624 0.7074 0.3700 0.72 0.81 108.6 103.7     
Event #

LOOE (m3/s)


LOOE (m3/s)
lognlslognlslognlsEvent #lognlslognlslognls
20020409 0.0212 0.0027 0.85 1.10 10.1 5.9 20071018 1.4963 0.0909 0.55 0.88 57.3 28.0     
20020428 0.0546 0.0275 0.81 0.89 10.7 8.8 20080425 0.6123 0.3878 0.76 0.83 150.0 150.6     
20020604 0.0635 0.0005 0.81 1.39 47.6 29.0 20080603 1.9778 1.0269 0.75 0.83 252.7 226.4     
20020711 0.0447 0.0020 0.79 1.16 17.2 9.9 20090427 0.2029 0.2853 0.82 0.78 59.0 60.0 Center of the 52 events: 
20030509 0.3169 0.1749 0.75 0.82 28.4 26.7 20090515 0.3619 0.0240 0.70 1.02 43.8 24.0   mean median 
20030606 0.0122 0.0441 1.02 0.88 15.9 13.6 20090527 0.0455 0.1532 0.95 0.82 34.0 31.6  log 0.4031 0.1050 
20030626 0.0171 0.0589 0.97 0.83 14.5 14.1 20090620 0.4789 0.01o87 0.70 1.05 60.5 59.0  nls 0.2334 0.1488 
20030711 0.0497 0.1876 0.91 0.76 26.4 23.9 20091023 0.8486 0.0671 0.61 0.90 47.9 30.7  log 0.83 0.82 
20040305 0.5134 0.0405 0.65 0.94 32.4 32.5 20091030 2.5929 0.2160 0.54 0.83 75.7 43.8  nls 0.86 0.83 
20040326 0.0899 0.0086 0.78 1.06 26.5 17.4 20100309 0.5563 1.1907 0.79 0.70 110.8 107.2 LOOE log 57.3 42.9 
20040530 0.2468 0.5909 0.88 0.79 203.3 197.9 20100513 0.9480 0.1491 0.63 0.85 75.1 58.7  nls 49.0 32.3 
20050412 0.0514 0.1699 0.87 0.73 14.3 10.3 20100623 2.4067 0.2502 0.56 0.83 92.1 59.1     
20050422 0.0073 0.2372 1.12 0.73 44.5 32.1 20100731 0.0390 0.1484 0.99 0.85 37.6 42.3 Spread of the 52 events: 
20050513 0.1560 0.3369 0.82 0.72 36.5 27.0 20100811 0.4960 0.0581 0.68 0.94 76.5 52.3   standard deviation inter quantile range 
20050629 0.0104 0.1263 1.13 0.86 71.8 62.5 20100924 0.1143 1.0967 0.84 0.62 74.1 81.7     
20050924 0.0017 0.1887 1.19 0.70 40.3 38.7 20110426 0.0954 0.1382 0.89 0.84 29.5 26.8     
20060309 0.0894 0.0743 0.73 0.76 11.1 9.8 20110529 0.4401 0.1196 0.71 0.85 29.5 19.6  log 0.6313 0.4468 
20060406 0.0625 0.3798 0.92 0.72 56.0 54.9 20110615 0.0956 0.0119 0.86 1.08 32.5 33.7  nls 0.2782 0.2182 
20060501 0.0149 0.0784 1.06 0.86 34.6 25.4 20110621 0.1190 0.1099 0.83 0.85 19.7 19.6  log 0.17 0.20 
20060918 0.0006 0.0681 1.29 0.77 20.3 14.2 20110728 0.0569 0.1538 0.84 0.77 42.0 44.6  nls 0.14 0.13 
20070311 0.2690 0.8126 0.81 0.68 72.6 46.0 20120229 0.0225 0.4799 1.06 0.74 107.8 110.6 LOOE log 49.6 44.0 
20070401 0.0939 0.1250 0.89 0.87 37.3 37.4 20120504 0.0686 0.0051 0.85 1.14 27.8 27.8  nls 46.9 36.2 
20070426 0.4359 0.2273 0.72 0.81 57.6 42.9 20130410 0.2112 0.0212 0.74 1.01 31.0 21.6     
20070524 0.0098 0.1066 1.09 0.82 28.4 13.3 20130505 0.6650 0.4348 0.69 0.75 45.3 42.8     
20070820 0.4305 0.2367 0.73 0.81 59.7 57.5 20130527 2.2308 0.6200 0.65 0.80 193.0 163.3     
20071008 0.0094 0.2054 1.11 0.76 49.4 28.9 20130624 0.7074 0.3700 0.72 0.81 108.6 103.7     

Table 1 also compares the prediction errors of the power functions fitted by the log-log linear and nonlinear regressions. The prediction errors were evaluated in the arithmetic scale using the untransformed data. For 42 out of the 52 events, the fit by the nonlinear regression has smaller LOOE than that by the log-log linear regression, indicating a better fit of the former from the perspective of prediction skill. This quantitative assessment is consistent with our graphical comparisons between the two regressions. As shown in Figure 1(b), the curve fitted by nonlinear regression traces the untransformed data better for the example event. Similarly, a visual inspection for all the 52 events suggests that, for about 80% of the events, the function resulting from the nonlinear regression better describes the peak discharge over the full range in upstream basin area.

Potential problems of log-transformation

Log-transformation may not stabilize variance in arithmetic scale

Figure 2(a) plots the SR from the log-log linear regression in the logarithmic scale (the same example event used in Figure 1). The SR display no compelling pattern with respect to the logarithm of fitted peak flow. The Shapiro–Wilk test of the null hypotheses that the residuals are normally distributed had p= 0.20. The Spearman rank correlation analysis between absolute values of SR and logarithm of fitted peak flow showed that the null hypothesis of zero correlation coefficient had p= 0.76. Visual inspection of the residual plot and the quantitative tests appears to suggest that the linear regression is statistically satisfactory in the logarithmic scale.

However, the residuals from a power-law equation, which is back-transformed from the log-log linear equation, tend to be heteroscedastic in the arithmetic scale. Figure 2(b) plots the SR from the back-transformed power-law function in the arithmetic scale. The SR exhibit a ‘megaphone’ shape (the upper side only) and systematic underestimation of the peak flow of large basins. The Spearman rank correlation analysis between absolute values of SR and fitted peak flow rejects the constant variance with a p value of 10−5.

For this exemplary event, the logarithmic transformation tends to stabilize the variance of the residuals of the log-log linear regression, but it does not fix the problem of heteroscedastic variance of residuals from the back-transformed power function. For comparison, we show the residual plots of the power-law equation fitted by nonlinear regression. The SR appear to be randomly distributed with respect to the fitted peak flows (Figure 2(c)) in the arithmetic scale. Spearman rank correlation analysis showed that the null hypothesis of zero correlation coefficient had p= 0.12. However, our analyses show that for most of the 52 events, neither the nonlinear regression nor the log-log linear regression produced constant variance of residuals in the arithmetic scale.

Log-transformation may make ill-suited data look extraordinary

Figure 3(a) plots peak flow and upstream basin area in log-log scale for the event recorded over the period of 9 March to 19 March 2010. Although the scatter at the lower left corner of the plot can be a concern, the overall pattern in the bivariate plot is fairly good and suggests that fitting a straight line to the observations would be appropriate. The equation obtained by ordinary least squares linear regression (R2 = 0.92, n = 34) appears to be good.

Figure 3

Plotting data in the log-log scale and arithmetic scale. (a) Values for base 10 log-transformations of peak flow and upstream basin area. The straight line was fitted to the log-transformations by ordinary least squares linear regression. (b) Power-law equation back-transformed from log-log linear regression equation shown against observations in arithmetic scale. Dark gray dots represent observations for peak flows over the period of 9 March to 19 March 2010 at 34 river gauges in the Iowa River Basin.

Figure 3

Plotting data in the log-log scale and arithmetic scale. (a) Values for base 10 log-transformations of peak flow and upstream basin area. The straight line was fitted to the log-transformations by ordinary least squares linear regression. (b) Power-law equation back-transformed from log-log linear regression equation shown against observations in arithmetic scale. Dark gray dots represent observations for peak flows over the period of 9 March to 19 March 2010 at 34 river gauges in the Iowa River Basin.

Figure 3(b) plots the original observations for the same event with arithmetic coordinates and the power function back-transformed from the log-log linear regression. Strikingly, this data set seems to be ill-suited for peak-discharge power-law analysis. On the one hand, peak flows for 4 of the 34 river gauges in the sample form a nearly horizontal band in the middle of the scatterplot. Points for the remaining 31 gauges tend to have noticeable scatter. On the other hand, the back-transformed power-law equation is a poor fit to the pattern over the full range of upstream basin areas. This poor fitting is reflected by the fact that the LOOE (110.8 m3/s) for this event is relatively large among all the events listed in Table 1. The power-law equation estimated using scaling parameters by the nonlinear regression gives a little better fit (LOOE = 107.2 m3/s, not shown). Nevertheless, we consider this event inappropriate for peak-discharge power-law analysis. The problem of this event would not be noticed if we solely plotted the data in the log-log scale rather than in the arithmetic coordinates.

Additionally, the four points at the upper right corner of Figure 3(a) appear to suggest a line segment with a mild slope, while the remaining observations seem to follow a linear model with a much steeper slope. This break in the scaling exponents (i.e., slopes of line segments) has been called ‘multiscaling of flood peaks’ and has been interpreted as an indication of changing dominant physical processes (e.g., Gupta et al. 1994). Apparently, the ‘multiscaling’ observed in the log-log space herein should be interpreted with great caution given the ill-suited data.

Causes of the discrepancy in the estimated scaling parameters

Log-transformation alters the pattern of data points

Comparisons of the data patterns plotted in the log-log and arithmetic scales in Figures 1 and 3 show that logarithmic transformation is monotonic, i.e., the log-transformation does not alter the order of the original data. However, the relative distances between adjacent points are changed. Taking Figure 1(b) as an example, the data points plotted in the arithmetic scale are clustered into three groups: one point for observation with upstream basin area greater than 30,000 km2, 29 with areas less than 10,000 km2, and the remaining four with areas in-between. Along the horizontal axis, the group with 29 points takes up about 25% of the plotting space. In contrast, after the logarithmic transformation (Figure 1(a)), along the x-axis the distribution of the 29 points with areas less than 10,000 km2 expands and occupies about 70% of the plotting space. It is evident from comparing Figure 1(a) and 1(b) that logarithmic transformation compresses the larger numbers much more than the smaller numbers. These imply that the logarithmic transformation fundamentally changed the pattern of the untransformed data.

Log-log linear regression models the geometric mean response

Linear regression models the mean responses at given magnitude of predictors, and in this sense Equation (3) can be rewritten as: 
formula
(6)
where E() denotes the expected value. By definition, 
formula
(7)

Equation (7) indicates that the log-log linear regression equation, and thus the back-transformed power function, models the geometric mean of peak flow at each given value of upstream basin area. Since geometric mean is always smaller than arithmetic mean, the back-transformed power function likely underestimates peak flows (see Figure 1 for an example).

Leverage low-value data points in log-log linear regression

Data points on the upper right corner tend to be of low leverage in the log-log linear regression of peak flow against upstream drainage area. Figure 4(a) compares the log-log linear regressions with (solid line in light gray) or without (dashed line in light gray) the five basins with an area greater than 10,000 km2 (lighter gray dots). Both visual inspection and the fitted equations in the lower right corner of Figure 4(a) show a small discrepancy between the fits, indicating these log-log linear regressions are dominated by the 29 data points (darker gray dots) at the lower left corner. The back-transformed power-law equations (curves in light gray) are also plotted in the arithmetic scale and the discrepancy due to removing the five data points is relatively small (Figure 4(b)). Figure 4 was based on the same peak-discharge event as used in Figure 1.

Figure 4

Leverage in log-log linear regression. (a) Two straight lines fitted by log-log linear regression with (light gray, solid line) and without (light gray, dashed line) the five basins with drainage area greater than 10,000 km2. (b) The curves fitted by nonlinear regression with (black, solid line) and without (black, dashed line) the five basins with drainage area greater than 10,000 km2. The light gray lines are back-transformed fits from the log-log linear regression shown in (a). The lighter gray dots are data for the nine basins with drainage area greater than 10,000 km2 and the darker gray dots are for the remaining 29 basins. The keys in the bottom right corner are the fitted power-law equations. This figure analyzed the same peak-discharge event as used in Figure 1.

Figure 4

Leverage in log-log linear regression. (a) Two straight lines fitted by log-log linear regression with (light gray, solid line) and without (light gray, dashed line) the five basins with drainage area greater than 10,000 km2. (b) The curves fitted by nonlinear regression with (black, solid line) and without (black, dashed line) the five basins with drainage area greater than 10,000 km2. The light gray lines are back-transformed fits from the log-log linear regression shown in (a). The lighter gray dots are data for the nine basins with drainage area greater than 10,000 km2 and the darker gray dots are for the remaining 29 basins. The keys in the bottom right corner are the fitted power-law equations. This figure analyzed the same peak-discharge event as used in Figure 1.

Assigning high leverage to data points at the lower-left corner in the log-log linear regression may at least partially explain that the back-transformed model reflects the features of the data points representing small values, but not the overall relationship between the peak discharge and drainage area over the full range. In contrast, the equations estimated by nonlinear regression with (solid line in black) and without (dashed line in black) the five data points are remarkably different, implying that these five data points have apparent influences on the fits. Again, for this peak flow event, the nonlinear regressions fit the data better in the units of measurements than those back-transformed from log-log linear regression when all the data points are used.

DISCUSSION

Peak-discharge power-law relation has been widely used in regional flood frequency analysis and explored in the event-based analysis for physical interpretation. In both applications, the real interest lies in the nonlinear relationship between the original variables of peak-discharge and drainage area, that is, in specifying a representative curve in the arithmetic scale rather than in the double-logarithmic scale. However, the double-logarithmic scale, i.e., logarithmic transformation, has been frequently adopted. Using 52 peak-discharge events observed in the Iowa River Basin, the United States over the period from 2002 to 2013, this article illustrates the deficiencies of applying log-transformation in peak-discharge power-law analyses.

On the use of log-transformation for peak-discharge power-law analysis

Logarithmic transformation was introduced as a notional method to estimate scaling parameters of the peak-discharge power-law relationship for: (1) better distribution of data spanning across few orders (e.g., four in this study) of magnitude for graphical presentation; (2) simplicity in calculation when computers were not available; and (3) coping with the multiplicative residuals or heteroscedastic variance of residuals.

Presenting and analyzing peak-discharge power-law relations using log-transformations, however, may lead to side effects. Figures 1, 3 and 4 and Figures S2 and S3 (similar analyses for another peak-discharge event in the Supplementary material) in this work show that logarithmic transformation compresses the higher numbers much more than the lower numbers. This observation supports the criticism of statisticians (e.g., Miller 1984; Osbourne 2002) and researchers in other fields (e.g., Richards 1973; Smith 1984; McCuen et al. 1990; Packard et al. 2011; Packard 2014) that logarithmic transformation fundamentally changes the nature of the untransformed data. In Figure 1(b) and Figure S2(b), the theoretical analysis (see section ‘Log-log linear regression models the geometric mean response’) demonstrates that as one of its side effects, the power function back-transformed from log-log linear regression may underestimate peak flows for larger basins. Similar underestimation due to log-transformation was reported by Asselman (2000), who studied the fitting of sedimentation-discharge power-law relation. In addition, Figure 3 illustrates that the log-transformation makes poor fitting look better. Furthermore, Figure 4 and Figure S3 shows that log-transformation may make the fitting less sensitive to the data points representing larger basins. Our analyses of peak-discharge data and studies on similar power-law relation in other fields (e.g., Pattyn & Van Huele 1998; Pandey & Nguyen 1999; Asselman 2000; Packard & Boardman 2008; Packard 2017) all demonstrate that although the linear fitting in the logarithmic scale appears pleasing graphically, this does not guarantee that the power function estimated by back-transformation will describe the data in the arithmetic scale.

On the other hand, Figure 2(a) and Figure S4(a) indicate that although log-transformation tends to succeed in stabilizing the variance of the residuals in the logarithmic space, it fails to alleviate the problem of heteroscedastic variance of residuals in the arithmetic scales of practical interest (Figure 2(b) and Figure S4(b)). This heteroscedastic variance problem remains for the nonlinear regression method.

Implications for analyzing peak-discharge power-law relation

Apparently, the aforementioned problems of log-transformation could lead to misinterpretations of the underlying relationships in the original data. Accordingly, the implication from this work is to at least use logarithmic transformations with greater care in peak-discharge power-law analyses. Nonlinear regression seems to fit better the data in the arithmetic scales, which is of practical interest in power-law applications. It also helps to disrepute the data that are not appropriate for power-law analysis of peak discharge. However, as pointed out by one of our reviewers, nonlinear least squares regressions might weight data points representing large values more than those representing smaller values. Nevertheless, we recommend that the peak-discharge power-law relationships should be displayed, evaluated, and applied in the arithmetic scale instead of log-log scale. This could increase the fidelity of inferences drawn from future peak-discharge power-law analyses.

CONCLUSION

In this study, we investigated the overlooked problem of adopting logarithmic transformation in peak-discharge power-law analysis. Our findings, through analyzing 52 peak-discharge events observed in the Iowa River Basin, the United States over the period from 2002 to 2013, are as follows:

  • (1)

    The discrepancy between the parameters estimated by the log-log linear and nonlinear regression methods is remarkable.

  • (2)

    High coefficient of determination (R2) of log-log linear regression does not guarantee high accuracy of the back-transformed peak-discharge power-law model in the arithmetic scale.

  • (3)

    Log-log transformation of discharge and area data may mislead the observation of power-law relation and multiscaling of flood peaks.

  • (4)

    Log-log linear regression may assign high leverage to data points at the lower-left corner, alter the visual appearance of the scatter in the data, and fail to stabilize variance and predict the median response in the arithmetic scale. These potential problems at least partially explain that the back-transformed model reflects the features of the data points representing lower values but not the overall relationship between peak discharge and drainage area over the full range.

  • (5)

    The peak-discharge power functions estimated by nonlinear fitting tend to give smaller prediction errors (LOOE) and better follow the track of data points in the arithmetic scales in most cases.

Recognizing its importance to the field of flood hydrology, this article addresses the use of logarithmic transformations in future peak-discharge power-law analyses. When applying the regression equations of peak-discharge vs drainage area to predict flood flows for ungauged locations, or to investigate the connections between natural processes and the dynamics of peak-discharge, the fitted peak-discharge power-law relationships should be displayed, evaluated, and applied in the arithmetic scale of practical interest. Accordingly, we recommend using logarithmic transformations in peak-discharge power-law analyses with greater care. The nonlinear regression, which often fits the data better in the arithmetic scale, could be an alternate. This cautionary note may help increase the prediction accuracy of peak-discharges at ungauged locations and improve our understanding of flood generating mechanisms retrieved from peak-discharge scaling analyses. We investigated the issues of using log-log linear regression to fit power-law functions in the context of Q-A relationships, while the findings herein may also be valid for other applications, including but not limited to, fitting discharge-stage, discharge-sediment, and hydraulic geometry relationships.

We also recommend more research into the issues that affect the statistical consideration of peak-discharge modeling via regression. These include the probability distribution of peak-discharge data, and the statistical (distributional) properties of the residuals. One special topic that is often ignored is the spatial dependence of the peak-discharge. With the covariance known, it would be interesting to explore the estimation framework of the generalized least squares.

ACKNOWLEDGEMENTS

The first author acknowledges the partial financial support provided by the projects (Grant No. 2017YFA0604903, 41321001 and 41501020). The third author acknowledges support from the Rose & Joseph Summers endowment and the Iowa Flood Center. The corresponding author acknowledges the partial financial support provided by the projects (Grant No. 2017YFC050530302, 41301201 and CKSF2019292/SH + TB). We thank our editor Professor Nevil Wyndham Quinn, reviewer Professor Daniel B. Wright and the other two anonymous reviewers for their valuable suggestions that greatly helped us to improve the quality of this work. Thanks to Professor Jin Liu at Duke-NUS Medical School for discussions that were beneficial to the authors.

SUPPLEMENTARY MATERIAL

The Supplementary Material for this paper is available online at https://dx.doi.org/10.2166/nh.2019.108.

REFERENCES

REFERENCES
Allen
H. E.
1960
Flood-Frequency Analyses Manual of Hydrology: Part 3 Flood Flow Techniques
.
US Government Printing Office
,
Washington, DC
,
USA
.
Asselman
N. E. M.
2000
Fitting and interpretation of sediment rating curves
.
Journal of Hydrology
234
(
3
),
228
248
.
https://doi.org/10.1016/S0022-1694(00)00253-5
.
Ayalew
T. B.
,
Krajewski
W. F.
,
Mantilla
R.
2015
Analyzing the effects of excess rainfall properties on the scaling structure of peak discharges: insights from a mesoscale river basin
.
Water Resources Research
51
(
6
),
3900
3921
.
Doi:10.1002/2014WR016258
.
Broido
A. D.
,
Clauset
A.
2019
Scale-free networks are rare
.
Nature Communications
10
(
1
),
1017
.
Doi:10.1038/s41467-019-08746-5
.
Dawdy
D.
,
Griffis
V.
,
Gupta
V.
2012
Regional flood-frequency analysis: how we got here and where we are going
.
Journal of Hydrologic Engineering
17
(
9
),
953
959
.
Doi:10.1061/(ASCE)HE.1943-5584.0000584
.
Eash
D. A.
2001
Techniques for Estimating Flood-Frequency Discharges for Streams in Iowa
.
U.S. Geological Survey Water-Resources Investigations Report (2000-4233)
.
Eash
D. A.
,
Barnes
K. K.
,
Veilleux
A. G.
2013
Methods for Estimating Annual Exceedance-Probability Discharges for Streams in Iowa, Based on Data Through Water Year 2010
.
U.S. Geological Survey Scientific Investigations Report 2013-5086
.
Fuller
W. E.
1914
Flood flows
.
Transactions of the American Society of Civil Engineers
77
(
1
),
564
617
.
Furey
P. R.
,
Gupta
V. K.
2007
Diagnosing peak-discharge power laws observed in rainfall–runoff events in goodwin creek experimental watershed
.
Advances in Water Resources
30
(
11
),
2387
2399
.
https://doi.org/10.1016/j.advwatres.2007.05.014
.
Furey
P. R.
,
Troutman
B. M.
,
Gupta
V. K.
,
Krajewski
W. F.
2016
Connecting event-based scaling of flood peaks to regional flood frequency relationships
.
Journal of Hydrologic Engineering
21
(
10
),
04016037
.
Doi:10.1061/(ASCE)HE.1943-5584.0001411
.
Griffis
V.
,
Stedinger
J.
2007
The use of GLS regression in regional hydrologic analyses
.
Journal of Hydrology
344
(
1
),
82
95
.
Gupta
V. K.
,
Dawdy
D. R.
1995
Physical interpretations of regional variations in the scaling exponents of flood quantiles
.
Hydrological Processes
9
(
3–4
),
347
361
.
Doi:10.1002/hyp.3360090309
.
Gupta
V. K.
,
Mesa
O. J.
,
Dawdy
D. R.
1994
Multiscaling theory of flood peaks: regional quantile analysis
.
Water Resources Research
30
(
12
).
Doi:3405-3421,10.1029/94wr01791
.
Gupta
V. K.
,
Troutman
B. M.
,
Dawdy
D. R.
2007
Towards a nonlinear geophysical theory of floods in river networks: an overview of 20 years of progress
. In:
Nonlinear Dynamics in Geosciences
(
Tsonis
A. A.
,
Elsner
J. B.
, eds).
Springer
,
New York
,
USA
, pp.
121
151
.
Gupta
V. K.
,
Mantilla
R.
,
Troutman
B. M.
,
Dawdy
D.
,
Krajewski
W. F.
2010
Generalizing a nonlinear geophysical flood theory to medium-sized river networks
.
Geophysical Research Letters
37
(
11
).
Doi:L11402,10.1029/2009GL041540
.
Mandapaka
P. V.
,
Krajewski
W. F.
,
Mantilla
R.
,
Gupta
V. K.
2009
Dissecting the effect of rainfall variability on the statistical structure of peak flows
.
Advances in Water Resources
32
(
10
),
1508
1525
.
http://dx.doi.org/10.1016/j.advwatres.2009.07.005
.
McCuen
R. H.
,
Leahy
R. B.
,
Johnson
P. A.
1990
Problems with logarithmic transformations in regression
.
Journal of Hydraulic Engineering
116
(
3
),
414
428
.
Doi:10.1061/(ASCE)0733-9429(1990)116:3(414)
.
Miller
D. M.
1984
Reducing transformation bias in curve fitting
.
The American Statistician
38
(
2
),
124
126
.
Doi:10.1080/00031305.1984.10483180
.
Newman
M. E. J.
2005
Power laws, Pareto distributions and Zipf's law
.
Contemporary Physics
46
(
5
),
323
351
.
Doi:10.1080/00107510500052444
.
O'Connell
P. P. L.
1868
On the relation of the freshwater floods of rivers to the areas and physical features of their basins
.
Minutes of the Proceedings of the Institution of Civil Engineers
27
(
1868
),
204
217
.
Doi:10.1680/imotp.1868.23121
.
Ogden
F.
,
Dawdy
D.
2003
Peak discharge scaling in small Hortonian watershed
.
Journal of Hydrologic Engineering
8
(
2
),
64
73
.
Doi:10.1061/(ASCE)1084-0699(2003)8:2(64)
.
Osbourne
J. W.
2002
Notes on the use of data transformation
.
Practical Assessment Research & Evaluation
8
(
6
),
1
7
.
Packard
G. C.
2013
Is logarithmic transformation necessary in allometry?
Biological Journal of the Linnean Society
109
(
2
),
476
486
.
Doi:10.1111/bij.12038
.
Packard
G. C.
2014
On the use of log-transformation versus nonlinear regression for analyzing biological power laws
.
Biological Journal of the Linnean Society
113
(
4
),
1167
1178
.
Doi:10.1111/bij.12396
.
Packard
G. C.
2017
Misconceptions about logarithmic transformation and the traditional allometric method
.
Zoology
123
(
Supplement C
),
115
120
.
https://doi.org/10.1016/j.zool.2017.07.005
.
Packard
G. C.
,
Boardman
T. J.
2008
A comparison of methods for fitting allometric equations to field metabolic rates of animals
.
Journal of Comparative Physiology B
179
(
2
),
175
.
Doi:10.1007/s00360-008-0300-x
.
Packard
G. C.
,
Birchard
G. F.
,
Boardman
T. J.
2011
Fitting statistical models in bivariate allometry
.
Biological Reviews
86
(
3
),
549
563
.
Doi:10.1111/j.1469-185X.2010.00160.x
.
Pattyn
F.
,
Van Huele
W.
1998
Power law or power flaw?
Earth Surface Processes and Landforms
23
(
8
),
761
767
.
Richards
K. S.
1973
Hydraulic geometry and channel roughness; a non-linear system
.
American Journal of Science
273
(
10
),
877
896
.
Doi:10.2475/ajs.273.10.877
.
Smith
R. J.
1984
Allometric scaling in comparative biology: problems of concept and method. American journal of physiology-regulatory
.
Integrative and Comparative Physiology
246
(
2
),
R152
R160
.
Doi:10.1152/ajpregu.1984.246.2.R152
.
Xiao
X.
,
White
E. P.
,
Hooten
M. B.
,
Durham
S. L.
2011
On the use of log-transformation vs. nonlinear regression for analyzing biological power laws
.
Ecology
92
(
10
),
1887
1894
.
Doi:10.1890/11-0538.1
.

Supplementary data