Parameter uncertainty in hydrologic modeling is commonly evaluated, but assessing the impact of spatial input data uncertainty in spatially descriptive ‘distributed’ models is not common. This study compares the significance of uncertainty in spatial input data and model parameters on the output uncertainty of a distributed hydrology and sediment transport model, HYdrology Simulation using Time-ARea method (HYSTAR). The Shuffled Complex Evolution Metropolis (SCEM-UA) algorithm was used to quantify parameter uncertainty of the model. Errors in elevation and land cover layers were simulated using the Sequential Gaussian/Indicator Simulation (SGS/SIS) techniques and then incorporated into the model to evaluate their impact on the outputs relative to those of the parameter uncertainty. This study demonstrated that parameter uncertainty had a greater impact on model output than did errors in the spatial input data. In addition, errors in elevation data had a greater impact on model output than did errors in land cover data. Thus, for the HYSTAR distributed hydrologic model, accuracy and reliability can be improved more effectively by refining parameters rather than further improving the accuracy of spatial input data and by emphasizing the topographic data over the land cover data.

## INTRODUCTION

Mathematical models simulate watershed (catchment) hydrology based on the assumptions, relationships, parameters, boundary conditions, and descriptive spatial data that define the physical characteristics of the watershed. The spatial features of the landscape are commonly represented by averaging conditions or parameters to a single ‘lumped’ value. A ‘distributed’ watershed model considers a spatially descriptive representation of the landscape and is commonly implemented as a grid structure with separate parameters and data values for each cell. For any model, input data and parameters contain uncertainty because they represent characteristics of a complex and dynamic system that the model depicts at specific temporal and spatial scales. Errors are inherent in the system description when single deterministic values are used to describe a system that is highly nonlinear, heterogeneous, and stochastic. Thus, input data, model algorithms, equations, data used for calibration, and temporal and spatial scale are all contributing sources of model output error in watershed modeling (Shirmohammadi *et al.* 2006).

A distributed model requires many parameters and extensive input data to effectively reflect landscape heterogeneity of a watershed in simulating hydrologic processes. Thus, a distributed model tends to be over-parameterized with the potential for propagating more error to model outputs due to the spatially detailed input data. A number of tools have been developed and used in uncertainty analysis of distributed models. The Generalized Likelihood Uncertainty Estimator (GLUE) provides a framework and software tools that have been used in assessing uncertainty in distributed parameter models (Beven 2006). Another tool that has been applied to watershed models is the Shuffled Complex Evolution Metropolis (SCEM-UA) algorithm which is a Markov Chain Monte Carlo (MCMC) sampler (Vrugt *et al.* 2003). Feyen *et al.* (2007) assessed uncertainty in parameters and outputs of a distributed hydrologic model, LISFLOOD using the SCEM-UA algorithm. Zhang *et al.* (2012) demonstrated that distributed hydrologic simulations of SACramento Soil Moisture Accounting (SAC-SMA) were sensitive to source data types and methods used for deriving spatially varied parameters. There have been a few studies that compare different methods for distributed hydrologic models (Yang *et al.* 2008).

Use of geospatial data such as the National Elevation Dataset (NED) (USGS 2009b) and National Land Cover Dataset (NLCD) (USGS 2009c) is common in hydrologic simulation, especially with distributed models. Geospatial data, as a digital representation or model of some characteristic of the landscape, implicitly incorporates error due to the limitations of spatial, temporal, and spectral resolutions of source data, and errors in sampling, analysis and processing of the data. Identifying the best spatial scale for raster-based watershed modeling has been a question of weighing tradeoffs in spatial resolution on the accuracy of geospatial data and improvements in representing hydrologic processes. Studies have used sensitivity analysis to examine the influence of resolution and data sources on watershed modeling (Shrestha *et al.* 2006; Wu *et al.* 2007; Wang *et al.* 2014).

Effects of errors in spatial input data on model results have been rarely reported even though errors in some publicly available geographic information system (GIS) data (or geospatial layers) are explicitly defined. In particular, the vertical error of digital elevation models (DEMs) and classification error in land use data may add a great amount of uncertainty in modeling results (Miller *et al.* 2007; Wu *et al.* 2008). The impacts of the errors have not been compared to each other, and their relative significance compared to that of parameter uncertainty is unknown. The scarcity of studies can be attributed to the limited number of techniques available for modeling errors in spatially correlated variables and the computational challenges of multi-dimensional uncertainty analysis. Although sequential simulation including sequential Gaussian/indicator simulation (SGS/SIS) techniques has been widely used to generate realizations of a spatially distributed variable in geostatistical simulation, its application to uncertainty analysis of a watershed model is seldom found in the literature (Hengl *et al.* 2010).

The objective of this study was to examine and compare for a distributed hydrologic model the relative contributions of spatial data error and parameter uncertainty on model output uncertainty. We used the HYdrology Simulation using Time-ARea method (HYSTAR) model (Her & Heatwole 2016a, 2016b) as the platform for the analysis, and examined spatial data error in elevation and land cover data inputs to the model. The overall goal is to demonstrate and evaluate techniques for uncertainty analysis with distributed models and to provide specific direction for improving the efficiency and accuracy of implementation of the HYSTAR model, which may also provide inference for application to other distributed hydrologic models.

## METHODOLOGY

### The HYSTAR model and base application

*et al.*1989). The NLCD 1992 (USEPA 2010) shows the land cover of the ORD catchment as predominately pasture/hay (59%) and forest (26%) (Figure 1). Runoff and sediment load measurements (Mostaghimi

*et al.*1989) made at the outlet of ORD were used in calibrating the model.

The HYSTAR model is capable of simulating direct runoff, base flow, soil moisture, groundwater recharge, and sediment load within a grid-based spatial data model in a distributed manner and in an hourly time step using readily available GIS layers such as NED and NLCD. The model employs a newly developed two-dimensional overland and channel ‘time-area routing’ scheme to redistribute direct runoff volume along flow paths over a watershed in every time interval based on flow hydraulics. In the model, a soil water accounting model is integrated with the routing scheme to continuously track soil water variations considering spatially distributed infiltration of surface flow so as to enable seamless predictions of runoff and sediment load for a long simulation period. Such features allow the model to effectively consider spatiotemporal changes in watershed processes in hydrology and sediment transport simulations. Thus, model predictions are responsive to variations of spatial input data layers, providing the basis for investigating the impact of GIS layer uncertainty on model outputs. A detailed description of the model and its calibration can be found in Her & Heatwole (2016a, 2016b), and only a brief description is included here.

Fourteen parameters for hydrology and sediment simulation (Table 1) were calibrated utilizing root mean squared error (RMSE) and relative error as objective functions, respectively. The calibrated model provided good monthly runoff and sediment load predictions for the entire simulation period of 6 years from 1990 to 1995 with Nash–Sutcliffe efficiency coefficients of 0.60 and 0.58 and coefficient of determination of 0.83 and 0.79, respectively. The parameters are described and calibrated values shown in Table 1.

Process | Parameter | Description | Feature | Range | Calibrated value |
---|---|---|---|---|---|

Direct runoff | CNF | Curve number scale factor | Spatial scale factor | 0.5–1.5 | 1.094 |

MNO | Manning's roughness coefficient scale factor for overland areas | Spatial scale factor | 0.5–1.5 | 1.054 | |

MNC | Manning's roughness coefficient scale factor for channel | Spatial scale factor | 0.5–1.5 | 1.171 | |

THA | Threshold area that defines initiation points of channel networks | Unit: ha | 1–100 | 17.91 | |

Soil moisture | GCL | Coefficient L of the van Genuchten equation | – | 0.25–0.75 | 0.662 |

GCM | Coefficient M of the van Genuchten equation | – | 0.25–0.75 | 0.325 | |

BCC | Basal crop coefficient scale factor | Spatial scale factor | 0.5–1.5 | 0.512 | |

EFS | Effective fraction of soil surface covered by vegetation scale factor | Spatial scale factor | 0.5–1.5 | 1.187 | |

RZD | Root zone depth scale factor | Spatial scale factor | 0.1–2.0 | 0.339 | |

SAR | Soil anisotropy ratio | – | 0.1–2.0 | 0.164 | |

Sediment transport | CSO | Critical Shields parameter for overland flow | – | 0.1–10.0 | 0.744 |

CSC | Critical Shields parameter for channel flow | – | 0.035–2.0 | 0.057 | |

SDR | Soil detachability ratio scale factor | Spatial scale factor | 0.1–2.0 | 0.891 | |

SCR | Soil cohesion ratio scale factor | Spatial scale factor | 0.1–2.0 | 1.066 |

Process | Parameter | Description | Feature | Range | Calibrated value |
---|---|---|---|---|---|

Direct runoff | CNF | Curve number scale factor | Spatial scale factor | 0.5–1.5 | 1.094 |

MNO | Manning's roughness coefficient scale factor for overland areas | Spatial scale factor | 0.5–1.5 | 1.054 | |

MNC | Manning's roughness coefficient scale factor for channel | Spatial scale factor | 0.5–1.5 | 1.171 | |

THA | Threshold area that defines initiation points of channel networks | Unit: ha | 1–100 | 17.91 | |

Soil moisture | GCL | Coefficient L of the van Genuchten equation | – | 0.25–0.75 | 0.662 |

GCM | Coefficient M of the van Genuchten equation | – | 0.25–0.75 | 0.325 | |

BCC | Basal crop coefficient scale factor | Spatial scale factor | 0.5–1.5 | 0.512 | |

EFS | Effective fraction of soil surface covered by vegetation scale factor | Spatial scale factor | 0.5–1.5 | 1.187 | |

RZD | Root zone depth scale factor | Spatial scale factor | 0.1–2.0 | 0.339 | |

SAR | Soil anisotropy ratio | – | 0.1–2.0 | 0.164 | |

Sediment transport | CSO | Critical Shields parameter for overland flow | – | 0.1–10.0 | 0.744 |

CSC | Critical Shields parameter for channel flow | – | 0.035–2.0 | 0.057 | |

SDR | Soil detachability ratio scale factor | Spatial scale factor | 0.1–2.0 | 0.891 | |

SCR | Soil cohesion ratio scale factor | Spatial scale factor | 0.1–2.0 | 1.066 |

Of the fourteen calibration parameters, eight are spatially varied: curve number, Manning's roughness coefficient for overland and channel flow, standard basal crop coefficient and fraction of the evaporable soil surface, root zone depth, and soil detachability and cohesion ratios. In order to reduce the number of parameters to be calibrated, spatial scale factors (CNF, MNO, MNC, BCC, EFS, RZD, SDR, and SCR) were introduced for the spatially varied parameters (Table 1). Values are adjusted by changing the corresponding scale factors while fixing their spatial variability in parameter calibration. The values for these parameters were determined from the spatially descriptive elevation (NED), land cover (NLCD), and soil (SSURGO: Soil Survey Geographic) data.

On the other hand, the other six parameters are spatially uniform in the watershed. The threshold area (THA) is used to define the stream network of a watershed by identifying its initiation points. A smaller THA makes the stream network more dense thus the overall travel time of a watershed shorter and vice versa. The two coefficients (GCL and GCM) that define the relationship between saturated and unsaturated hydraulic conductivity come from the van Genuchten equation, and the soil anisotropy ratio (SAR) represents the ratio of vertical to horizontal hydraulic conductivity. For the sediment transport simulation, CSO and CSC were introduced to control critical Shields parameters for overland and channel conditions, respectively.

### Parameter uncertainty analysis – SCEM-UA

The MCMC technique is one of the Bayesian samplers commonly used for uncertainty analysis. The goal of MCMC is to sample parameter values from the posterior distribution by simulating a random process using a Markov chain that has the posterior distribution as its stationary distribution. MCMC is known to provide a solution to the difficult problem of sampling from a high dimensional posterior distribution (Kanso *et al.* 2006). The required number of iterations for convergence of the Markov chain is determined by diagnostic criterion such as the Gelman–Rubin statistic, while the speed of convergence is affected by the efficiency of the sampling strategy and the proposal distribution. When the Gelman–Rubin statistic becomes less than 1.2, the distribution of samples is regarded as stationary (Vrugt *et al.* 2003).

*et al.*(2003) introduced an advanced MCMC method, called SCEM-UA algorithm, to improve efficiency in updating a proposal distribution in the MCMC. This algorithm operates by merging the strengths of the Metropolis algorithm and Shuffled Complex Evolution (SCE-UA) algorithm of Duan

*et al.*(1992). In order to prevent the collapse of the SCE-UA algorithm into the relatively small region of a single best parameter set, the SCEM-UA algorithm replaced the Simplex search method with the Metropolis method. It is known as one of the most efficient and robust algorithms for identifying uncertainty of model parameters and output (Vrugt

*et al.*2003; Feyen

*et al.*2007). In this study, the SCEM-UA algorithm was employed for examining uncertainty of parameters and reliability of the modeling output. The detailed procedures of SCEM-UA are described below and summarized in Figure 2.

*O*is a set of observations,

*N*is the length of the data and

*e*is a vector of errors with zero expectation and constant variance (Vrugt

*et al.*2003). It is assumed that the errors are distributed normally and are independent (Feyen

*et al.*2007).

A value vector of the fourteen parameters for hydrology and sediment transport simulations of HYSTAR was found through calibrating the model to observed data (Table 1). The same rainfall events, simulation conditions, and objective function as those used in the calibration were applied in the uncertainty analysis (Her & Heatwole 2016a, 2016b). The Gelman–Rubin statistic was used to assess convergence of the Markov chain into a stationary distribution (Vrugt *et al.* 2003). The SCEM-UA algorithm was run with the same numbers of complexes, population size for each complex, and total sample population as those of the SCE-UA algorithm used for calibration. For instance, the number of complexes, a population size for each complex, and the total sample population were set to 25, 13, and 325, respectively, in the uncertainty analysis for the hydrology simulation. The sampling stopped with 1,000 iterations, and each parameter had a set of 1,000 values sampled by the Markov chain. Then, the posterior distributions of the parameters were developed based on the 900 sets sampled after the Markov chain converged to its stationary distribution (in the first 100 sampling).

### GIS input data uncertainty analysis – sequential simulation

HYSTAR employs three GIS layers as input data: elevation (DEM), land cover (NLCD), and soil properties (SSURGO). The impact of uncertainty in the DEM and NLCD input data on model output was examined through sensitivity analysis, assuming vertical measurement error and classification error are major sources of uncertainty in the DEM and NLCD, respectively.

The common source of elevation data in the United States is the NED, which is developed and distributed by the US Geological Survey (USGS). The reported overall absolute vertical RMSE of the NED is 2.4 m, and the errors range from −42.64 to 18.74 m (USGS 2009a). If error at any point occurs independently of that at any other point, RMSE can be a way to represent the variance of the overall error in DEM appropriately. When errors are spatially variable and correlated, however, a single scalar index like RMSE cannot distinguish areas with more or less uncertainty or assess spatial autocorrelation.

The NLCD provides 21 different land cover classes at a 30-meter grid resolution derived by classification of Landsat satellite imagery (USEPA 2010). For this study, we used the NLCD-1992 dataset which has an overall classification accuracy that ranges between 70 and 85% (Wickham *et al.* 2004). The error matrix is commonly used to quantify classification errors. However, Steele *et al.* (1998) pointed out that the error matrix does not provide information on the spatial structure of error in a classification and does not have the ability to describe the variation of accuracy across a classified map.

SGS and SIS were utilized to simulate spatially correlated errors and produce multiple realizations of the DEM and NLCD, respectively. Elevation errors were simulated using SGS for randomly selected cells, with the total area corresponding to 30% of the watershed. For the NLCD dataset, classification errors were simulated using SIS for randomly selected cells. The total target area was initially set to 50% of the watershed. In cases where different land cover classes were simulated for a cell, the class with higher probability was assigned to the cell. Thus, in the induced error grids, the land cover class was altered in around 30% of the area. A total of 100 ‘disturbed’ datasets with induced errors were generated for the DEM and NLCD.

## RESULTS

### Parameter uncertainty

As seen in Figure 4, the posterior distributions of the parameters for soil moisture simulation have multiple modes. The calibrated values of the parameters are located at vicinities of the modes in their own posterior distributions, implying the results did not always correspond to the calibration results. Such outcomes are possible because the calibration and uncertainty analysis employed unique algorithms for different purposes, identifying a single set of the parameter that gives the best performance statistics and the posterior distributions of the parameters, respectively. However, it is worth noting that the most sensitive parameters for hydrology and sediment transport simulations, CNF and CSO, have calibrated values that corresponded well with the posterior distributions, implying the soundness of sampling for both algorithms.

Runoff | Sediment load | ||||
---|---|---|---|---|---|

Item | Uncertainty measure | m^{3}/s | %^{a} | Mg | % |

Max^{b} | Range | 291.1 | 798 | 935.5 | 2,475 |

CI | 9.2 | 25 | 21.2 | 56 | |

Average^{c} | Range | 86.6 | 237 | 150.5 | 398 |

CI | 2.2 | 6 | 3.5 | 9 | |

Average calibrated^{d} | 36.5 | 100 | 37.8 | 100 |

Runoff | Sediment load | ||||
---|---|---|---|---|---|

Item | Uncertainty measure | m^{3}/s | %^{a} | Mg | % |

Max^{b} | Range | 291.1 | 798 | 935.5 | 2,475 |

CI | 9.2 | 25 | 21.2 | 56 | |

Average^{c} | Range | 86.6 | 237 | 150.5 | 398 |

CI | 2.2 | 6 | 3.5 | 9 | |

Average calibrated^{d} | 36.5 | 100 | 37.8 | 100 |

^{a}Percentage of the uncertainty measures to the average of the calibrated monthly runoff and sediment load.

^{b}The maximum width of the band of the uncertainty measure.

^{c}Average width of the band of the uncertainty measure.

^{d}Average of the calibrated monthly runoff and sediment load.

In Figure 5, the uncertainty band (width of the range) derived using the SCEM-UA algorithm generally covers the calibrated and observed runoff and sediment loads. When the range was used as an uncertainty measure, the average widths of the uncertainty band of 86.6 m^{3}/s and 150.5 Mg were estimated for the simulated monthly runoff and sediment load respectively in the entire simulation period (Table 2). They correspond to 237 and 398% of the average of the calibrated monthly runoff and sediment load, respectively. CI provided narrower uncertainty bands; average CIs were 2.2 m^{3}/s (6% of the calibrated monthly runoff) and 3.5 Mg (9% of the calibrated sediment load) for monthly runoff and sediment loads, respectively. When a smaller significance level is applied, the width of uncertainty band will be increased. For example, an average width of the CI for the monthly runoff increased to 2.9 m^{3}/s (7.9%) at a level of significance of 0.01.

### Impact of GIS data uncertainty on model outputs

The differences between elevations of the original and the disturbed DEMs range from 9.90 to −8.37 m (Table 3), which are within the reported error ranges (USGS 2009a), and the spatially averaged standard deviation of the errors is 1.24 m. In Table 3, the ranges of errors for every elevation class show that variations of the error are symmetric and greater in the mid-classes of elevation, 90–120 m. On the other hand, the minimum, maximum, and mean statistics of the error for the classes indicate that the SGS algorithm produced biased errors for lower and higher elevations. The errors tend to be positive and negative on lower and higher elevations respectively, while the overall average error, 0.04 m (median error is 0.00), is close to zero (Figure 7(b)). In other words, the disturbed DEMs overestimated and underestimated elevation on low and high elevation areas, respectively. It implies that, as desired, the correlation between error and elevation was considered in the algorithm.

Class | Area (ha) | Min | Max | Range | Mean |
---|---|---|---|---|---|

80–85 | 4.95 | −1.94 | 6.01 | 7.96 | 0.85 |

85–90 | 24.93 | −5.94 | 6.66 | 12.61 | 0.29 |

90–95 | 48.24 | −6.13 | 9.90 | 16.03 | 0.28 |

95–100 | 42.57 | −7.43 | 9.88 | 17.30 | 0.10 |

100–105 | 52.38 | −8.37 | 8.71 | 17.09 | −0.05 |

105–110 | 51.66 | −6.84 | 6.00 | 12.84 | −0.13 |

110–115 | 35.55 | −7.52 | 8.79 | 16.32 | 0.03 |

115–120 | 40.32 | −6.63 | 7.31 | 13.94 | 0.00 |

120–125 | 23.13 | −6.52 | 4.36 | 10.89 | −0.27 |

125–130 | 5.13 | −7.30 | 2.72 | 10.02 | −0.40 |

Class | Area (ha) | Min | Max | Range | Mean |
---|---|---|---|---|---|

80–85 | 4.95 | −1.94 | 6.01 | 7.96 | 0.85 |

85–90 | 24.93 | −5.94 | 6.66 | 12.61 | 0.29 |

90–95 | 48.24 | −6.13 | 9.90 | 16.03 | 0.28 |

95–100 | 42.57 | −7.43 | 9.88 | 17.30 | 0.10 |

100–105 | 52.38 | −8.37 | 8.71 | 17.09 | −0.05 |

105–110 | 51.66 | −6.84 | 6.00 | 12.84 | −0.13 |

110–115 | 35.55 | −7.52 | 8.79 | 16.32 | 0.03 |

115–120 | 40.32 | −6.63 | 7.31 | 13.94 | 0.00 |

120–125 | 23.13 | −6.52 | 4.36 | 10.89 | −0.27 |

125–130 | 5.13 | −7.30 | 2.72 | 10.02 | −0.40 |

Class | Original | Min | Max | Mean | Stdev^{a} | CV^{b} |
---|---|---|---|---|---|---|

Open water | 0.36 | 0.27 | 1.89 | 0.69 | 0.26 | 0.381 |

Low intensity residential | 18.99 | 19.26 | 22.59 | 21.01 | 0.82 | 0.039 |

Commercial/industrial | 1.98 | 1.17 | 3.15 | 1.93 | 0.35 | 0.184 |

Deciduous forest | 50.04 | 50.67 | 55.44 | 52.82 | 1.02 | 0.019 |

Evergreen forest | 21.78 | 21.78 | 25.65 | 23.59 | 0.81 | 0.034 |

Mixed forest | 14.31 | 10.89 | 14.67 | 12.45 | 0.76 | 0.061 |

Pasture/hay | 192.24 | 184.95 | 193.95 | 189.43 | 1.64 | 0.009 |

Row Crops | 29.16 | 24.30 | 28.89 | 26.94 | 0.90 | 0.033 |

Class | Original | Min | Max | Mean | Stdev^{a} | CV^{b} |
---|---|---|---|---|---|---|

Open water | 0.36 | 0.27 | 1.89 | 0.69 | 0.26 | 0.381 |

Low intensity residential | 18.99 | 19.26 | 22.59 | 21.01 | 0.82 | 0.039 |

Commercial/industrial | 1.98 | 1.17 | 3.15 | 1.93 | 0.35 | 0.184 |

Deciduous forest | 50.04 | 50.67 | 55.44 | 52.82 | 1.02 | 0.019 |

Evergreen forest | 21.78 | 21.78 | 25.65 | 23.59 | 0.81 | 0.034 |

Mixed forest | 14.31 | 10.89 | 14.67 | 12.45 | 0.76 | 0.061 |

Pasture/hay | 192.24 | 184.95 | 193.95 | 189.43 | 1.64 | 0.009 |

Row Crops | 29.16 | 24.30 | 28.89 | 26.94 | 0.90 | 0.033 |

^{a}Standard deviation.

^{b}Coefficient of variation.

The maximum ranges of the monthly runoff and sediment load (29.3 m^{3}/s and 64.6 Mg) simulated with the disturbed DEMs occurred when the model provided the greatest monthly direct runoff and sediment load (194 m^{3}/s and 264 Mg), March 1994 and March 1993, respectively. The maximum ranges of the monthly runoff and sediment load (17.0 m^{3}/s and 47.0 Mg) simulated with the disturbed NLCDs were also found in March 1994 and March 1993, respectively, which implies a proportional relationship between quantities of simulated runoff and sediment load and their uncertainty. Overall, the impact of the simulated error in the land use layer on runoff is relatively insignificant compared to that of the elevation layer.

^{2}) of the linear regression equations are close to one, meaning the simulated errors of the GIS layers do not significantly influence the runoff and sediment transport simulation.

Runoff | Sediment load | ||||
---|---|---|---|---|---|

Dataset with error | Statistics | m^{3}/s | % | Mg | % |

DEM | Max range | 29.3 | 80.3 | 64.6 | 170.0 |

Ave. range | 3.28 | 9.0 | 7.40 | 19.6 | |

Max CI | 2.84 | 7.8 | 4.99 | 13.2 | |

Ave. CI | 0.28 | 0.8 | 0.61 | 1.6 | |

NLCD | Max range | 17.0 | 46.7 | 47.0 | 124.4 |

Ave. range | 1.83 | 5.0 | 5.78 | 15.3 | |

Max. CI | 1.03 | 2.8 | 3.96 | 10.5 | |

Ave. CI | 0.14 | 0.4 | 0.45 | 1.2 | |

DEM/NLCD | Max range | 30.7 | 84.1 | 75.2 | 199.1 |

Ave. range | 3.49 | 9.6 | 8.60 | 22.8 | |

Max. CI | 2.77 | 7.6 | 6.47 | 17.1 | |

Ave. CI | 0.29 | 0.8 | 0.72 | 1.9 | |

Average runoff rate and sediment load^{a} | 36.5 | 100.0 | 37.8 | 100.0 |

Runoff | Sediment load | ||||
---|---|---|---|---|---|

Dataset with error | Statistics | m^{3}/s | % | Mg | % |

DEM | Max range | 29.3 | 80.3 | 64.6 | 170.0 |

Ave. range | 3.28 | 9.0 | 7.40 | 19.6 | |

Max CI | 2.84 | 7.8 | 4.99 | 13.2 | |

Ave. CI | 0.28 | 0.8 | 0.61 | 1.6 | |

NLCD | Max range | 17.0 | 46.7 | 47.0 | 124.4 |

Ave. range | 1.83 | 5.0 | 5.78 | 15.3 | |

Max. CI | 1.03 | 2.8 | 3.96 | 10.5 | |

Ave. CI | 0.14 | 0.4 | 0.45 | 1.2 | |

DEM/NLCD | Max range | 30.7 | 84.1 | 75.2 | 199.1 |

Ave. range | 3.49 | 9.6 | 8.60 | 22.8 | |

Max. CI | 2.77 | 7.6 | 6.47 | 17.1 | |

Ave. CI | 0.29 | 0.8 | 0.72 | 1.9 | |

Average runoff rate and sediment load^{a} | 36.5 | 100.0 | 37.8 | 100.0 |

^{a}These values, 36.5 m^{3}/s and 37.7 Mg, were calculated by averaging monthly runoff rates and sediment loads simulated with the original DEM and NLCD using a calibrated HYSTAR model.

## DISCUSSION

The results of this study indicate that the runoff simulation of a spatially distributed hydrologic model is affected more by parameter uncertainty than by uncertainty in GIS layers (Tables 2 and 5). Parameters, particularly critical parameters such as CNF and CSO, have a strong influence on hydrologic components in specific directions (increasing/decreasing runoff volume or sediment load). On the other hand, elevation and land use data errors generated using the sequential simulation methods were locally distributed, particularly in the vicinity of stream networks and between boundaries of different land uses. Thus, impacts of an individual error could be easily intermingled with those of other errors and canceled out while routing runoff and sediment to the outlet along flow paths.

In this study, we also found that errors in elevation data are more influential on the hydrologic modeling outputs than are errors in land use data (Table 5). Topography affects the hydrologic response of a watershed to rainfall events through its influence on flow directions, stream networks, and flow velocity. Lindsay & Evans (2008) demonstrated that ‘channel network morphometric’ of a watershed is very sensitive to elevation errors, and a small elevation error can lead to the very different presentation of the channel networks. On the other hand, an error in land use classification may not influence watershed-wide processes but only local conditions (Dong *et al.* 2015), thus its impact on hydrologic simulation tends to be limited.

Sequential simulation using the SGS/SIS techniques was an effective tool for generating spatially correlated errors of spatial input data for uncertainty analysis of a distributed watershed model (Figures 6–8 and Tables 3 and 4). Although the application of the sequential simulation techniques was limited to elevation and land use layers in this study, they could also be useful in simulating or interpolating soil features such as textures, hydraulic conductivity, and field capacity required as input data for hydrologic modeling. It would be interesting to see how errors in soil classifications, mapping and attributes would affect distributed modeling outputs.

There are many methods and sampling algorithms developed for parameter uncertainty analysis, and the SCEM-UA algorithm was selected for this study because of its proven applicability to a distributed hydrologic model (Vrugt *et al.* 2003; Feyen *et al.* 2007). Although this study did not intend to compare different methods of estimating parameter uncertainty, the GLUE method was also applied as another way to derive the posterior parameter distributions for the purpose of comparison. From the comparison, we observed that the GLUE method provided different uncertainty measures depending on a cut-off value used to identify ‘behavioral’ parameter sets (Beven 2006). While the results of the analysis with GLUE are not presented here, we found that the GLUE method led to the same conclusion that parameter uncertainty has a greater impact on model outputs than uncertainty in spatial data (Her 2011).

## CONCLUSIONS

This study compared the significance of parameter and spatial input data uncertainty on predictions of a two-dimensional continuous hydrology model, HYSTAR. Posterior distributions of the parameters were derived using the SCEM-UA algorithm, and errors in the elevation and land cover spatial data grids were simulated using SGS and SIS to assess the impact of these errors on hydrology and sediment transport predictions of the model. The study showed that the impact of parameter uncertainty was much greater than the impact of GIS data errors on model output. The average ranges of runoff and sediment loads simulated with parameter sets sampled from the posterior distributions are one to four times greater than the averages of the base simulation, signifying greater significance of parameter uncertainty on model predictions. On the other hand, average variations of runoff and sediment loads simulated with error-added spatial input data layers were only 10–20% of the base simulation. Therefore, the accuracy and reliability of a distributed model will be more efficiently improved by refining the parameters than by exploring the better quality of the GIS input data. This conclusion is not to say that the accuracy of the layers is not important to the model, but should be interpreted that the spatial input layers are accurate enough to provide acceptable levels of errors in this distributed model application. In addition, the impact of the topographic data error on the model output was greater than that of the land cover data error. Thus, improving the topographic data should be emphasized over the land cover data for better accuracy in the model results.