Abstract
This work compares the accuracy of streamflow estimated by a data-driven artificial neural network (ANN) and the physically based soil and water assessment tool (SWAT). The models were applied in two small watersheds, one highly urbanized and the other primarily covered with evergreen forest and shrubs, in the San Antonio region of central south Texas, where karst geologic features are prevalent. Both models predicted daily streamflow in the urbanized watershed very well, with the ANN and SWAT having the Nash–Sutcliffe coefficient of efficiency (NSE) values of 0.76 and 0.72 in the validation period, respectively. However, both models predicted streamflow poorly in the nonurban watershed. The NSE values of the ANNs significantly improved when a time series autoregressive model structure using historical streamflow data was implemented in the nonurban watershed. The SWAT model only achieved trivial performance improvement after using the SWAT-CUP SUFI-2 calibration procedure. This result suggests that an ANN model may be more suitable for short-term streamflow forecasting in watersheds heavily affected by karst features where the complex processes of rapid groundwater recharge and discharge strongly influence surface water flow.
HIGHLIGHTS
The ANN models outperformed the SWAT models by varying degrees in the two karstified watersheds in central Texas.
The calibrated SWAT and ANN models performed better in the urban watershed while having poor statistical performance in the rural watershed.
Applying the correction factor approach caused different extents of improvement in statistical results when a relatively higher uncertainty level was assumed.
INTRODUCTION
The prediction of streamflow using rainfall-runoff models is vital in water resources management. It aims to improve decision-making for a wide range of hydrological problems. Streamflow is the result of complex natural processes at the watershed scale. In the past several decades, various computer-based hydrological models have been developed to simulate streamflow, most of which focus on capturing the rainfall-runoff process since precipitation is usually the primary driving force of a hydrological system (Moradkhani & Sorooshian 2009). As more and more models became available, hydrologists started to classify them into different categories based on their structure. One standard classification divides hydrological models into lumped or distributed models. A lumped hydrological model considers the study watershed as a single unit. The parameters representing spatial characteristics related to the rainfall-runoff process are averaged or ignored for the entire watershed (Brirhet & Benaabidate 2016). In a distributed model, the variation of watershed characteristics and hydrological processes in space are explicitly considered, usually through discretizing the watershed into a large number of rectangular grid cells or a limited number of subbasins based on the drainage and topographic features (Islam 2011). Another frequently used classification approach considers models as deterministic or stochastic. A deterministic model generates only a single output value with a given input dataset and model parameters (Beven 2011). In contrast, stochastic hydrological models usually provide probability distributions of the target variables (Beaumont 1979). Exploring the suitability and analyzing different models' limitations is a popular topic in modern hydrology.
One major issue in hydrological modeling is deciding the appropriate level of model complexity. Beven (2011) summarized two widely accepted views of modeling. The first suggests that hydrological models are merely tools for extrapolating available data in time and space. Therefore, if the model inputs and outputs can be successfully related, it is not essential to elaborate on the details of a watershed. The second view maintains that models should reflect the physical processes involved to the extent possible to ensure confidence when extrapolating beyond the existing observations. Consequently, more complex physically based hydrological models are usually adopted under the second perspective. The soil and water assessment tool (SWAT) and artificial neural network (ANN) are two widely adopted watershed-level hydrological models in recent literature (Kassem et al. 2020). The more complex SWAT model simulates hydrological processes using mathematical equations that describe important physical laws of conservation of mass, energy, and momentum (Arnold et al. 2012), while the ANN models simply identify the statistical connections between hydrological inputs and outputs (ASCE 2000b). A detailed comparison of these two models can provide insights into the issue of model complexity in hydrological simulation.
A few studies have compared SWAT and ANN regarding streamflow prediction. Kim et al. (2015) applied SWAT and ANN to impute missing streamflow observations in the Taehwa River Watershed in Korea and found that SWAT was better at simulating low flows, while the neural network model generally performed better at simulating high flows. Jimeno-Sáez et al. (2018) compared the accuracy of streamflow prediction between SWAT and ANN models at the Ladra River Basin and the Segura River Basin in Spain. The authors also concluded that ANN is better at estimating higher flows. Demirel et al. (2009) applied SWAT and ANN models to simulate streamflow in the Pracana River Basin in Portugal and concluded that ANN was more successful at forecasting peak streamflow, whereas the SWAT model performed better in terms of overall goodness-of-fit indicators. Srivastava et al. (2006) analyzed the performance of streamflow prediction from SWAT and ANN in the agricultural-dominated Honey Brook Watershed in Pennsylvania. They found that the ANN model produced simulation results with the Nash–Sutcliffe coefficient of efficiency (NSE) and coefficient of determination (R2) better than the SWAT model. In a recent study, Zakizadeh et al. (2020) used ANN and SWAT to simulate the rainfall-runoff relationship in a small watershed near Tehran city, Iran. The authors concluded that ANN produced simulations with minor error and uncertainty, although both models could achieve excellent predictive performance.
While previous studies have explored the suitability of SWAT and ANN models in various regions, none of these studies were conducted in the state of Texas, where its hot climate condition and growing population have made water sustainability a contentious issue. Additionally, the San Antonio region of central south Texas has extensive karst terrain, making hydrological modeling more difficult and rendering model performance patterns vastly different from non-karstified watersheds (Jakada & Chen 2020). Furthermore, few studies on SWAT and ANN have considered uncertainty in the measurement and modeling process, yet considering uncertainty estimates allows modelers to quantify their confidence in the observed and simulated values (Harmel et al. 2010). Considering model uncertainty is vitally essential for the stochastic ANN model, in which the simulation outcome of each data point is a distribution rather than a single determined value (Hu et al. 2001). Hence, the objectives of this study are (a) to parameterize SWAT and an ANN model to simulate streamflow in two small watersheds in the San Antonio Region, one rural and one urban; (b) to compare SWAT and ANN model performance in karstic watersheds; (c) to analyze SWAT and ANN performance under different dominant land-use type; and (d) to enhance the model evaluation using modified goodness-of-fit indicators that incorporate measurement and model uncertainty.
MATERIAL AND METHODS
Description of SWAT and ANN models
The SWAT is a physically based, semi-distributed, deterministic model developed to assess water quality and quantity in large river basins with varying soils, land uses, land cover types, and management practices (Arnold et al. 2012). The SWAT model divides a study watershed into a few subbasins. The model further groups land with homogeneous slopes, soil types, and land cover types into hydrologic response units (HRUs) at a finer spatial scale. The HRUs may not be spatially continuous and can be found at different locations within a subbasin. This strategy effectively simplifies the simulation of watershed processes (Gassman et al. 2007; Licciardello et al. 2011; Tuppad et al. 2011). The development of the SWAT model spans the last three decades, with new functions and routines continuously added to the model. The current SWAT model has been widely applied to complex systems' water, sediment, agricultural chemicals, and contaminant yields (Gassman et al. 2007; Abbaspour et al. 2015). Moreover, the SWAT model is closely integrated with a geographic information system (GIS) since most SWAT input data have spatial characteristics (Srinivasan & Arnold 1994; Jayakrishnan et al. 2005; Olivera et al. 2006). The ArcGIS and QGIS platforms are integrated with the SWAT model to date.
A physically based hydrological model is often referred to as a white box since its modeling processes are established on known scientific principles of mass and energy fluxes (Moradkhani & Sorooshian 2009). In contrast to the complex structure of physically based models, statistical models have gained popularity for rainfall-runoff modeling based on their simplicity and low computational resource demand. An ANN is a computing system that resembles the structure of the human biological neural network. It can identify nonlinear relationships from given patterns and fit nonparametric models on multivariate input data (Govindaraju & Rao 2013). ANNs have been applied to rainfall-runoff modeling since the 1990s. However, when used as a rainfall-runoff model, an ANN only identifies the relationship between historical inputs (meteorological data) and outputs (streamflow) without considering any of the physical processes involved, therefore falling into the category of a lumped model and is often referred to as a black-box model (ASCE 2000a). In addition, the ANNs are stochastic models, given that the fitted parameter values often vary from one training process to another for a fixed training dataset (Jain & Srinivasulu 2004), creating a distribution of outputs rather than one value.
The physically based SWAT model simulates streamflow by incorporating descriptive mathematical equations designed to conceptualize the hydrological processes in a watershed. The model requires grid-based geospatial data as inputs in addition to meteorological data and incorporates a significant number of model parameters (Fatichi et al. 2016; Noori & Kalin 2016). Solving the equations for the state variables at the level of HRUs for each time step can be time-consuming and computationally demanding (Noori & Kalin 2016; Jimeno-Sáez et al. 2018). In this regard, statistical models such as ANN hold a clear advantage. ANNs do not require a priori knowledge of the watershed's physical characteristics as model input, which reduces the model setup procedures. Meanwhile, the time it takes to train and select the best neural network is significantly shorter than calibrating a SWAT model (Minns & Hall 1996; Jimeno-Sáez et al. 2018), capable of achieving ‘unreasonable effectiveness’ in hydrological applications when sufficient training data are available (Worland et al. 2019). On the other hand, several studies have noted the disadvantages of applying ANN as a rainfall-runoff model. In particular, ANNs' lack of explanations for the underlying physical hydrological processes has generated much concern among hydrologists (Karunanithi et al. 1994; ASCE 2000a; Ha & Stenstrom 2003; Jain & Srinivasulu 2004; Yaseen et al. 2015). The inability to capture physical dynamics at the watershed level means that ANNs are unsuitable for environmental impact studies, such as modeling streamflow under changing climate conditions (Milly et al. 2008; Humphrey et al. 2016). Like other statistical models, extrapolating beyond the training data range often undermines ANNs' predictive performance (Minns & Hall 1996; Sahoo et al. 2006).
Study area
The two study watersheds were delineated in ArcSWAT using the digital elevation model (DEM). The outlets of both study watersheds were selected at the USGS gages with long-term consistent streamflow records. The Leon Creek Watershed (LCW) covers the western part of the San Antonio urban area. It is centered at 98.67° west longitude, 29.56° north latitude, and has a drainage area of 535.76 km2. The drainage area of the LCW is situated across the contributing, recharge, and artesian zones of the Edwards (BFZ) aquifer (Figure 1(a)). Elevation of the LCW decreases from north to south, ranging from 546 m in the northern part to 177 m near the watershed outlet (Figure 1(c)). The LCW is heavily urbanized. According to the 2011 National Land Cover Database (NLCD2011) land-use land cover (LULC) classification (Yang et al. 2018), 47.2% of the LCW is classified as developed urban areas with different levels of development intensity. Besides the extensive impervious surface of the city, urban afforestation has made up about 34.2% of the LCW evergreen or deciduous forest. Leon Creek is the main waterway in the LCW. It originates from multiple smaller creeks in the northern part of LCW and flows south before merging into the lower part of the Medina River.
The Upper Medina River Watershed (UMRW) is about 30 km northwest of San Antonio. It centers at 99.32° west longitude and 29.82° north latitude. The drainage area of the UMRW is 847.03 km2 and is located entirely within the contributing zone of the Edwards BFZ aquifer (Figure 1(a)). The UMRW is primarily rural, and its dominant land cover types are forest and shrub. Deciduous and evergreen forests cover 48.5% of the drainage area, while shrubland covers 38.9%, according to the NLCD2011 classification. Elevation of the UMRW declines from 727 m at its highest point in the northwest to 364 m at the lowest point in the southeast watershed outlet (Figure 1(b)). The upper Medina River is the main waterway in the UMRW. It follows the topographic decline of the UMRW and flows southeastwards. The Medina River is a tributary of the San Antonio River, and it eventually merges into the San Antonio River further south outside the Medina River Basin.
The authors used the hydrological simulations of the same period in the LCW demonstrated in this work as a part of an earlier study that explored the application of various gridded weather products (Mei et al. 2022). In this work, different ANN model configurations were explored in LCW. Meanwhile, unlike the earlier study, this work presented a different investigation of the model output to conduct paired watershed comparisons and incorporate model uncertainty analysis.
Data acquisition
All the data used for constructing hydrological models in this study were obtained from publicly accessible data sources. To set up model simulations, SWAT requires four main data files, including the DEM, LULC, soil, and meteorological data. The National Elevation Dataset (NED), which has a 30 m resolution, was used for the DEM data. The 2011 National Land Cover Dataset (NLCD2011) was collected as the LULC data. The NED and NLCD2011 datasets for the study area were obtained from the USDA Natural Resources Conversation Service (NRCS) geospatial data gateway (USDA-NRCS 2014). The state soil geographic (STATSGO) database preloaded with the ArcSWAT interface was used as the soil data. Most importantly, the Parameter-elevation Regressions on Independent Slopes Model (PRISM) daily spatial climate dataset AN81d (Daly et al. 2008) was acquired as meteorological inputs using the Google Earth Engine (GEE). This cloud-based geospatial analysis platform allows easy access to many government-supported accessible geospatial data archives (Gorelick et al. 2017). The area-averaged daily precipitation, minimum, maximum, and mean temperature of the PRISM dataset were collected by applying shapefile masks of the two study watersheds on GEE.
The USGS daily streamflow observations at Leon Creek (USGS 08181480) and Upper Medina River (USGS 08178880) from the years 2000 to 2009 were used (U.S. Geological Survey 2016) in this study. This study period was chosen as streamflow observations were available at both sites, ensuring a comparison between the watersheds. Observations from 2000 to 2006 were used to calibrate the SWAT model and provided a training target for the ANN model. Observations from 2007 to 2009 were used as standalone testing data.
SWAT modeling approach
SWAT formulates hydrological processes in a watershed using mathematical equations describing important physical laws of mass, energy, and momentum conservation. Water balance within the system at each time step is calculated to produce simulation results of hydrological and water quality (H/WQ) variables. A detailed description of the SWAT modeling process can be found in the SWAT theoretical documentation (Neitsch et al. 2011). This study used a DEM covering a much larger spatial extent than the study area to define the two study watersheds. Based on the topography provided by the DEM, the subbasin thresholds (minimum area for initiating stream networks) were applied to define the number and location of subbasins (Her et al. 2015). The subbasins were further discretized into HRUs, and a 10% threshold was applied to remove minor slope, soil, and land-use classes to restrict the total number of HRUs for improving computational efficiency (Femeena et al. 2022). As a result, 25 subbasins and 298 HRUs were defined for LCW, while 23 subbasins and 169 HRUs were defined for UMRW.
While SWAT adopts the more traditional approach of utilizing gauge weather data as inputs, the PRISM dataset for the contiguous United States is in a gridded format. This study used GEE to calculate the area-averaged meteorological data for the two study watersheds. The watershed centroids were used as ‘virtual rain gauges’ (Elhassan et al. 2016). The SWAT model simulation used daily precipitation and maximum and minimum temperatures from 1998 to 2009. The model simulations were set up on a daily time step for the 12-year simulation period. The calendar years of 1998–1999 were used for model warm-up, and 2000–2006 were used for model calibration. Model calibration minimizes the difference between model simulation and observation by adjusting model parameters. The models are then validated using observations from 2007 to 2009 without further change to the calibrated parameters. The SWAT model calibration was conducted in SWAT Calibration and Uncertainty Programs (SWAT-CUP) using the SUFI-2 procedure (Abbaspour 2011), with the NSE adopted as the calibration objective function. Table 1 summarizes 15 parameters selected for calibration, all considered sensitive for streamflow simulation according to the literature (Arabi et al. 2007; Qi et al. 2017; Koycegiz & Buyukyildiz 2019; Chen et al. 2020). Snow-melt parameters were left at default values since snow rarely occurs in the San Antonio region. Meanwhile, multiple groundwater parameters were adjusted due to their significant impact on modeling the recharge and discharge of the Edwards BFZ aquifer. More details of the adjusted parameters are discussed in the results section.
Hydrology input parameter . | Description . | File extension . | Type of change . | Initial value range . |
---|---|---|---|---|
CN2 | SCS runoff curve number for antecedent moisture condition II | .mgt | Relative | (−10%, 10%) |
ALPHA_BF | Base flow alpha factor (days) | .gw | Replace | (0, 1) |
GW_DELAY | Delay time for aquifer recharge (days) | .gw | Replace | (0, 500) |
GWQMN | Threshold depth of water in the shallow aquifer required for return flow to occur (mm H2O) | .gw | Replace | (0, 5,000) |
GW_REVAP | Groundwater ‘revap’ coefficient | .gw | Replace | (0.02, 0.2) |
REVAPMN | Threshold depth of water in the shallow aquifer for ‘revap’ or percolation to the deep aquifer to occur (mm H2O) | .gw | Replace | (0, 500) |
RCHRG_DP | Deep aquifer percolation fraction | .gw | Replace | (0, 1) |
SOL_AWC | Available water capacity of the soil layer (mm H2O/mm soil) | .sol | Relative | (−5%, 5%) |
SOL_K | Soil-saturated hydraulic conductivity (mm/h) | .sol | Relative | (−5%, 5%) |
ESCO | Soil Evaporation compensation factor | .hru | Replace | (0.6, 0.95) |
CANMX | Maximum canopy storage (mm H2O) | .hru | Replace | (0, 100) |
CH_K1 | Effective hydraulic conductivity in tributary channel alluvium (mm/h) | .sub | Replace | (5, 130) |
CH_K2 | Main channel hydraulic conductivity (mm/h) | .rte | Replace | (5, 130) |
CH_N2 | Manning's ‘n’ value for the main channel | .rte | Replace | (0.01, 0.3) |
SURLAG | Surface runoff lag coefficient (days) | .bsn | Replace | (1, 24) |
Hydrology input parameter . | Description . | File extension . | Type of change . | Initial value range . |
---|---|---|---|---|
CN2 | SCS runoff curve number for antecedent moisture condition II | .mgt | Relative | (−10%, 10%) |
ALPHA_BF | Base flow alpha factor (days) | .gw | Replace | (0, 1) |
GW_DELAY | Delay time for aquifer recharge (days) | .gw | Replace | (0, 500) |
GWQMN | Threshold depth of water in the shallow aquifer required for return flow to occur (mm H2O) | .gw | Replace | (0, 5,000) |
GW_REVAP | Groundwater ‘revap’ coefficient | .gw | Replace | (0.02, 0.2) |
REVAPMN | Threshold depth of water in the shallow aquifer for ‘revap’ or percolation to the deep aquifer to occur (mm H2O) | .gw | Replace | (0, 500) |
RCHRG_DP | Deep aquifer percolation fraction | .gw | Replace | (0, 1) |
SOL_AWC | Available water capacity of the soil layer (mm H2O/mm soil) | .sol | Relative | (−5%, 5%) |
SOL_K | Soil-saturated hydraulic conductivity (mm/h) | .sol | Relative | (−5%, 5%) |
ESCO | Soil Evaporation compensation factor | .hru | Replace | (0.6, 0.95) |
CANMX | Maximum canopy storage (mm H2O) | .hru | Replace | (0, 100) |
CH_K1 | Effective hydraulic conductivity in tributary channel alluvium (mm/h) | .sub | Replace | (5, 130) |
CH_K2 | Main channel hydraulic conductivity (mm/h) | .rte | Replace | (5, 130) |
CH_N2 | Manning's ‘n’ value for the main channel | .rte | Replace | (0.01, 0.3) |
SURLAG | Surface runoff lag coefficient (days) | .bsn | Replace | (1, 24) |
Relative means the existing parameter value is multiplied by 1 plus the given value; Replace means the given value replaces the existing parameter value.
ANN modeling approach
Prediction scenario . | Input combination . | Output . |
---|---|---|
1 | Pt, Pt−1, Pt−2, Pt−3, Pt−4, Tt | Qt |
2 | Pt, Pt−1, Pt−2, Pt−3, Pt−4, Pn | Qt |
3 | Pt, Pt−1, Pt−2, Pt−3, Pt−4, Pn,Tt | Qt |
4 | Pt, Qt−1, Qt−2 | Qt |
5 | Pt, Qt−1, Qt−2, Qt−3 | Qt |
6 | Pt, Qt−1, Qt−2, Qt−3, Qt−4 | Qt |
Prediction scenario . | Input combination . | Output . |
---|---|---|
1 | Pt, Pt−1, Pt−2, Pt−3, Pt−4, Tt | Qt |
2 | Pt, Pt−1, Pt−2, Pt−3, Pt−4, Pn | Qt |
3 | Pt, Pt−1, Pt−2, Pt−3, Pt−4, Pn,Tt | Qt |
4 | Pt, Qt−1, Qt−2 | Qt |
5 | Pt, Qt−1, Qt−2, Qt−3 | Qt |
6 | Pt, Qt−1, Qt−2, Qt−3, Qt−4 | Qt |
Cross-validation is the simplest and most widely used method for estimating prediction errors in statistical modeling (Hastie et al. 2009). However, since the time dependency and autocorrelation of a streamflow time series contradict the basic assumption of traditional cross-validation that the data are independent and identically distributed (Bergmeir & Benítez 2012), this study proposes to use the blocked cross-validation (BlockedCV) for determining the best model structure. Using BlockedCV, the data points are grouped into consistent blocks, and their sequential order is preserved in the training/validation data split process. The data from 2000 to 2006 were used for modeling training and validation, while data from 2007 to 2009 were used for testing to match the SWAT model simulation period discussed in the previous section. Unlike the notations commonly used in physically based hydrological model analysis, the validation dataset in statistical cross-validation often refers to the dataset of which statistical outcome is used for model selection. In contrast, the testing dataset is used for standalone model verification without further changing the model parameters.
There is, in general, no commonly accepted rule for determining the number of hidden units for a three-layered neural network. However, some studies have discussed the size of hidden units to explore. For example, Ha & Stenstrom (2003) evaluated 1–9 hidden units in their study of water quality estimation. In another similar study for water quality prediction, Kalin et al. (2010) searched from 1 to 10 hidden units. Demirel et al. (2009) supported the number of hidden units to be two-thirds of the sum of the number of input and output nodes, while Gazzaz et al. (2012) suggested that the hidden units size falls between i and 2i + 1, where i represents the number of input nodes. This study investigated the number of hidden layer units from 1 to 10 for each prediction scenario in Table 2. The root mean square error (RMSE) of the validation dataset of all trained models was calculated to find the best model structure. The RMSE is commonly adopted for evaluating statistical models and represents how far the residuals are from 0 on average (Kuhn & Johnson 2013). This study used the R software (R Core Team 2019) for all ANN model training and structure selection.
Model performance measures
Previous studies have suggested that no single metric is sufficient to verify a hydrological model (Harmel et al. 2014; Yaseen et al. 2018). This study used two goodness-of-fit indicators to evaluate model performance: the NSE and percent bias (PBIAS). The NSE is a normalized statistic that determines the magnitude of residual variance compared to the observed data variance. It ranges from –∞ to 1.0, with an NSE = 1.0 representing an optimal fit (Nash & Sutcliffe 1970). The PBIAS measures the average tendency of model overestimation or underestimation. A smaller absolute PBIAS indicates a better model fit. We adopted the evaluation criteria Moriasi et al. (2007) proposed to evaluate the model performance. The rating criteria are slightly relaxed since the simulations in this study are conducted on the finer daily time step. A more detailed rating guideline is available at ASABE (2017). Table 3 displays the form of NSE and PBIAS and their corresponding model performance criteria.
Performance rating . | NSE . | PBIAS (%) . |
---|---|---|
Very good | ||
Good | ||
Satisfactory | ||
Unsatisfactory |
Performance rating . | NSE . | PBIAS (%) . |
---|---|---|
Very good | ||
Good | ||
Satisfactory | ||
Unsatisfactory |
is the simulated data; is the observed data; is the mean of the observed data.
Incorporating measurement and model uncertainty
RESULTS AND DISCUSSION
Statistical assessment of the observed streamflow
The statistical summary (i.e., mean, median, standard deviation, Std.Dev, coefficient of variation, Cv, maximum, Qmax, minimum, Qmin, first-, second-, third-, and fourth-order autocorrelation coefficients, r1, r2, r3, r4) of the observed streamflow for both study watersheds are displayed in Table 4. In LCW, the overall streamflow difference between the calibration and validation periods is smaller than in UMRW. This study proposes using the streamflow of previous time steps as one of the predictors in the ANN models. Hence, autocorrelation coefficients of the streamflow up to the fourth order were calculated and are presented in Table 4. The autocorrelation decreases significantly as the lag increases for all modeling periods and watersheds. The autocorrelation is more substantial in the LCW during the calibration period. In the UMRW, the autocorrelation of the calibration period is much weaker than that of the validation period.
Watershed . | Time period . | Streamflow data (m3/s) . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Mean . | Median . | Std.Dev . | Cv . | Qmax . | Qmin . | r1 . | r2 . | r3 . | r4 . | ||
LCW | Calibration | 1.581 | 0.185 | 16.419 | 10.386 | 580.150 | 0.025 | 0.621 | 0.490 | 0.296 | 0.112 |
Validation | 1.486 | 0.198 | 10.474 | 7.046 | 277.623 | 0.018 | 0.390 | 0.071 | 0.076 | 0.079 | |
UMRW | Calibration | 6.358 | 1.837 | 65.366 | 10.280 | 2,943.200 | 0.009 | 0.552 | 0.342 | 0.267 | 0.094 |
Validation | 3.992 | 0.819 | 10.420 | 2.610 | 163.291 | 0.000 | 0.707 | 0.607 | 0.538 | 0.499 |
Watershed . | Time period . | Streamflow data (m3/s) . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Mean . | Median . | Std.Dev . | Cv . | Qmax . | Qmin . | r1 . | r2 . | r3 . | r4 . | ||
LCW | Calibration | 1.581 | 0.185 | 16.419 | 10.386 | 580.150 | 0.025 | 0.621 | 0.490 | 0.296 | 0.112 |
Validation | 1.486 | 0.198 | 10.474 | 7.046 | 277.623 | 0.018 | 0.390 | 0.071 | 0.076 | 0.079 | |
UMRW | Calibration | 6.358 | 1.837 | 65.366 | 10.280 | 2,943.200 | 0.009 | 0.552 | 0.342 | 0.267 | 0.094 |
Validation | 3.992 | 0.819 | 10.420 | 2.610 | 163.291 | 0.000 | 0.707 | 0.607 | 0.538 | 0.499 |
SWAT model calibration and validation
The SUFI-2 algorithm is an iterative procedure that requires updating the parameter ranges after each iteration until model performance stabilizes. For both LCW and UMRW, five calibration iterations, each with 500 simulations, were run. The parameter range from the last calibration iteration was used without further change in the validation iteration, which was composed of 500 simulations as well. The calibrated parameter value range and the best-fitted parameter values for the validation period are summarized in Table 5.
Hydrology input parameter . | Default value in SWAT . | Calibrated parameter range . | Best-fitted validation value . | ||
---|---|---|---|---|---|
LCW . | UMRW . | LCW . | UMRW . | ||
CN2a | 35–98 | (−15.5%, −8.3%) | (12.0%, 16.4%) | −14.7% | 12.0% |
ALPHA_BF | 0.048 | (0.075, 0.224) | (0.751, 0.882) | 0.086 | 0.774 |
GW_DELAY | 31 | (228, 420) | (334, 411) | 363 | 370 |
GWQMN | 1,000 | (3,712, 4,378) | (2,521, 3,496) | 4,268 | 3,302 |
GW_REVAP | 0.02 | (0.125, 0.150) | (0.037, 0.076) | 0.136 | 0.056 |
REVAPMN | 750 | (331, 412) | (85, 225) | 348 | 154.2 |
RCHRG_DP | 0.05 | (0.049, 0.159) | (0.152, 0.286) | 0.157 | 0.179 |
SOL_AWCa | 0.01–0.42 | (−5.3%, 1.5%) | (−3.4%, −1.4%) | −2.6% | −2.8% |
SOL_Ka | 0–2,000 | (−3.2%, −1.6%) | (−4.4% −1.9%) | −2.1% | −4.4% |
ESCO | 0.95 | (0.862, 0.921) | (0.745, 0.797) | 0.872 | 0.796 |
CANMX | 0 | (65, 82) | (79, 100) | 79 | 81 |
CH_K1 | 0 | (77, 114) | (9, 24) | 107 | 23 |
CH_K2 | 0 | (43, 60) | (111, 130) | 51 | 114 |
CH_N2 | 0.014 | (0.010, 0.031) | (0.247, 0.300) | 0.014 | 0.273 |
SURLAG | 4 | (3.81, 9.75) | (19.41, 24.00) | 7.94 | 23.42 |
Hydrology input parameter . | Default value in SWAT . | Calibrated parameter range . | Best-fitted validation value . | ||
---|---|---|---|---|---|
LCW . | UMRW . | LCW . | UMRW . | ||
CN2a | 35–98 | (−15.5%, −8.3%) | (12.0%, 16.4%) | −14.7% | 12.0% |
ALPHA_BF | 0.048 | (0.075, 0.224) | (0.751, 0.882) | 0.086 | 0.774 |
GW_DELAY | 31 | (228, 420) | (334, 411) | 363 | 370 |
GWQMN | 1,000 | (3,712, 4,378) | (2,521, 3,496) | 4,268 | 3,302 |
GW_REVAP | 0.02 | (0.125, 0.150) | (0.037, 0.076) | 0.136 | 0.056 |
REVAPMN | 750 | (331, 412) | (85, 225) | 348 | 154.2 |
RCHRG_DP | 0.05 | (0.049, 0.159) | (0.152, 0.286) | 0.157 | 0.179 |
SOL_AWCa | 0.01–0.42 | (−5.3%, 1.5%) | (−3.4%, −1.4%) | −2.6% | −2.8% |
SOL_Ka | 0–2,000 | (−3.2%, −1.6%) | (−4.4% −1.9%) | −2.1% | −4.4% |
ESCO | 0.95 | (0.862, 0.921) | (0.745, 0.797) | 0.872 | 0.796 |
CANMX | 0 | (65, 82) | (79, 100) | 79 | 81 |
CH_K1 | 0 | (77, 114) | (9, 24) | 107 | 23 |
CH_K2 | 0 | (43, 60) | (111, 130) | 51 | 114 |
CH_N2 | 0.014 | (0.010, 0.031) | (0.247, 0.300) | 0.014 | 0.273 |
SURLAG | 4 | (3.81, 9.75) | (19.41, 24.00) | 7.94 | 23.42 |
aIndicate percent change to existing parameter values.
Due to their spatial heterogeneity, the SCS runoff curve number (CN2) and soil parameters are adjusted through a percentage change. Similarly, groundwater (.gw) and general HRU parameters (.hru) have spatial heterogeneity to the level of HRUs, and subbasin (.sub) and routing parameters (.rte) have spatial heterogeneity to the level of subbasins. However, this study adjusted all other parameters through direct replacement since their default values are fixed across the entire watershed in SWAT 2012. It should be noted that only the watershed outflows were adopted in the model calibration process due to the limitation in accessible observed streamflow data. Meanwhile, the two study watersheds were divided into many HRUs. Applying only one value for some spatially distributed parameters across all HRUs can cause model over-parameterization and equifinality.
The groundwater (.gw) parameters were the focus of this study as they control the recharge speed into the Edwards Aquifer. The baseflow alpha factor (ALPHA_BF) is slightly increased from the default in LCW and significantly increased in UMRW. The larger ALPHA_BF indicates that the groundwater flow response to the changes in recharge is more rapid in UMRW. The groundwater delay time (GW_DELAY) was extended in LCW and UMRW, which indicates that water exits the soil profile and enters the shallow aquifer relatively slowly in both watersheds. Meanwhile, the threshold depth of water in the shallow aquifer required for return flow to occur (GWQMN) is significantly increased from its default value, possibly reflecting a large storage capacity of the shallow aquifer in both study watersheds, which is very common in karstic regions. A larger groundwater ‘revap’ coefficient was found in LCW than in UMRW, suggesting that water movement from the shallow aquifer to the root zone is faster in LCW. The fast water discharge is likely occurring through sinkholes in the region when the hydraulic head of groundwater is high. In addition, a relatively small deep aquifer percolation fraction (RCHRG_DP) was found in LCW and UMRW, indicating that only a tiny proportion of water in the root zone was recharged into the deep aquifer for our calibrated models. A possible explanation might be that the deep aquifer in this karstic region mainly receives fast recharge through the sinkholes. The deep aquifer has a relatively high hydraulic head, which reduces the amount of water recharge by percolating the soil layers.
ANN model selection
The three-layer feed-forward neural network structure contains an input, hidden, and output layer. All nodes are fully connected to nodes in their adjacent layers with links that contain weight and bias information. As mentioned in the methodology section, for all six prediction scenarios presented in Table 2, the hidden layer size was explored from 1 to 10. The RMSE of the cross-validation dataset is calculated to select the best model structure. The models with the smallest RMSE for each prediction scenario are displayed in Table 6 for LCW and Table 7 for UMRW. Among the six prediction scenarios, scenario 6 of LCW and scenario 5 of UMRW had the smallest cross-validation RMSE and were selected as the best model for each study watershed.
Prediction scenario . | Hidden nodes . | Training . | Testing . | Validation RMSE (m3/s) . | ||
---|---|---|---|---|---|---|
NSE . | PBIAS (%) . | NSE . | PBIAS (%) . | |||
1 | 7 | 0.901 | 60.7 | 0.736 | −16.9 | 2.247 |
2 | 5 | 0.891 | 59.4 | 0.671 | 1.3 | 2.178 |
3 | 6 | 0.908 | 68.3 | 0.760 | −6.3 | 2.231 |
4 | 6 | 0.882 | 73.2 | 0.759 | −2.0 | 2.183 |
5 | 8 | 0.901 | 57.7 | 0.770 | −11.2 | 2.149 |
6 | 3 | 0.887 | 56.2 | 0.756 | −13.4 | 2.111 |
Prediction scenario . | Hidden nodes . | Training . | Testing . | Validation RMSE (m3/s) . | ||
---|---|---|---|---|---|---|
NSE . | PBIAS (%) . | NSE . | PBIAS (%) . | |||
1 | 7 | 0.901 | 60.7 | 0.736 | −16.9 | 2.247 |
2 | 5 | 0.891 | 59.4 | 0.671 | 1.3 | 2.178 |
3 | 6 | 0.908 | 68.3 | 0.760 | −6.3 | 2.231 |
4 | 6 | 0.882 | 73.2 | 0.759 | −2.0 | 2.183 |
5 | 8 | 0.901 | 57.7 | 0.770 | −11.2 | 2.149 |
6 | 3 | 0.887 | 56.2 | 0.756 | −13.4 | 2.111 |
Prediction scenario . | Hidden nodes . | Training . | Testing . | Validation RMSE (m3/s) . | ||
---|---|---|---|---|---|---|
NSE . | PBIAS (%) . | NSE . | PBIAS (%) . | |||
1 | 8 | 0.949 | 40.5 | −0.076 | −89.7 | 4.150 |
2 | 6 | 0.961 | 8.8 | −0.029 | −90.3 | 4.238 |
3 | 9 | 0.950 | 44.7 | −0.077 | −89.5 | 3.984 |
4 | 1 | 0.670 | 3.2 | 0.294 | −72.8 | 3.518 |
5 | 7 | 0.892 | 3.7 | 0.316 | −79.8 | 2.360 |
6 | 8 | 0.863 | −11.7 | 0.355 | −79.1 | 2.658 |
Prediction scenario . | Hidden nodes . | Training . | Testing . | Validation RMSE (m3/s) . | ||
---|---|---|---|---|---|---|
NSE . | PBIAS (%) . | NSE . | PBIAS (%) . | |||
1 | 8 | 0.949 | 40.5 | −0.076 | −89.7 | 4.150 |
2 | 6 | 0.961 | 8.8 | −0.029 | −90.3 | 4.238 |
3 | 9 | 0.950 | 44.7 | −0.077 | −89.5 | 3.984 |
4 | 1 | 0.670 | 3.2 | 0.294 | −72.8 | 3.518 |
5 | 7 | 0.892 | 3.7 | 0.316 | −79.8 | 2.360 |
6 | 8 | 0.863 | −11.7 | 0.355 | −79.1 | 2.658 |
The trained ANN models overall had very good performance in LCW (Table 6). All six scenarios had NSE values around 0.9 during the training period, although the PBIAS values were above 50%, indicating that streamflow is continuously overestimated during the training process. In the testing period, the NSE values ranged between 0.67 and 0.77, while all PBIAS values are below 25%, overall indicating very good streamflow estimation performance.
In UMRW (Table 7), the NSE and PBIAS performance for all six prediction scenarios in the training period is in the range of ‘good’ to ‘very good,’ with NSE ranging from 0.67 to 0.95 and PBIAS ranging from −11.7 to 44.7%. However, the predictive performance of the testing period is much worse. The NSE values were below 0 for scenarios 1 through 3, in which the observed mean is a better predictor than the model. The NSE values are near 0.3 for scenarios 4 through 6, near the boundary of unsatisfactory and satisfactory performance. The PBIAS of all six scenarios were below −70%, suggesting severe streamflow underestimation. As mentioned in the methodology section, prediction scenarios 4 through 6 included the streamflow of previous time steps as predictors, while scenarios 1 through 3 only used the meteorological data. It is apparent from Tables 6 and 7 that the inclusion of streamflow as one of the predictors did not cause a clear difference in the predictive performance in LCW but significantly improved the predictive performance in UMRW.
SWAT and ANN model performance comparison
The goodness-of-fit indicators, traditionally calculated and modified with the correction factor for SWAT and ANN models, are presented in Table 8. Similar to the results in Harmel et al. (2010), when the uncertainty level is low (Cv = 0.026), there was almost no noticeable improvement in the indicator values. However, when the more significant uncertainty level was applied (Cv = 0.256), both NSE and PBIAS values showed levels of improvement in the two study watersheds. Overall, incorporating the correction factors into the goodness-of-fit assessment yielded results consistent with the opinion of Harmel et al. (2010), as very good model performance did not improve considerably, while poor performance did not become satisfactory. In this study, the slightly improved values provided a more comprehensive assessment of the model performance.
Watershed . | Model . | Indicator . | Calibration (training) (2000–2006) . | Validation (testing) (2007–2009) . | ||||
---|---|---|---|---|---|---|---|---|
Traditional calculation . | Cv = 0.026 . | Cv = 0.256 . | Traditional calculation . | Cv = 0.026 . | Cv = 0.256 . | |||
LCW | ANN | NSE | 0.89vg | 0.89vg | 0.91vg | 0.76vg | 0.76vg | 0.79vg |
PBIAS (%) | 56.2s | 56.2s | 56.4s | −13.4vg | −13.3vg | −10.7vg | ||
SWAT | NSE | 0.90vg | 0.90vg | 0.92vg | 0.72vg | 0.72vg | 0.74vg | |
PBIAS (%) | 49.6g | 49.6g | 48.9g | 36.8g | 36.8g | 33.5g | ||
UMRW | ANN | NSE | 0.89vg | 0.89vg | 0.96vg | 0.32s | 0.32s | 0.34s |
PBIAS (%) | 3.7vg | 3.7vg | 5.8vg | −79.8u | −79.8u | −78.4u | ||
SWAT | NSE | 0.54g | 0.54g | 0.56g | −0.02u | −0.02u | 0.07u | |
PBIAS (%) | 4.3vg | 4.3vg | 3.5vg | 27.3g | 27.3g | 25.3g |
Watershed . | Model . | Indicator . | Calibration (training) (2000–2006) . | Validation (testing) (2007–2009) . | ||||
---|---|---|---|---|---|---|---|---|
Traditional calculation . | Cv = 0.026 . | Cv = 0.256 . | Traditional calculation . | Cv = 0.026 . | Cv = 0.256 . | |||
LCW | ANN | NSE | 0.89vg | 0.89vg | 0.91vg | 0.76vg | 0.76vg | 0.79vg |
PBIAS (%) | 56.2s | 56.2s | 56.4s | −13.4vg | −13.3vg | −10.7vg | ||
SWAT | NSE | 0.90vg | 0.90vg | 0.92vg | 0.72vg | 0.72vg | 0.74vg | |
PBIAS (%) | 49.6g | 49.6g | 48.9g | 36.8g | 36.8g | 33.5g | ||
UMRW | ANN | NSE | 0.89vg | 0.89vg | 0.96vg | 0.32s | 0.32s | 0.34s |
PBIAS (%) | 3.7vg | 3.7vg | 5.8vg | −79.8u | −79.8u | −78.4u | ||
SWAT | NSE | 0.54g | 0.54g | 0.56g | −0.02u | −0.02u | 0.07u | |
PBIAS (%) | 4.3vg | 4.3vg | 3.5vg | 27.3g | 27.3g | 25.3g |
Superscripts represent the performance levels, ‘vg’ – ‘very good’, ‘g’ – ‘good’, ‘s’ – ‘satisfactory’, ‘u’ – ‘unsatisfactory’.
In the urban LCW, both SWAT-LCW and ANN-LCW models produced very good simulation results, with the calibration (training) NSE reaching approximately 0.90 and validation (testing) NSE above 0.70. In addition, both SWAT-LCW and ANN-LCW models overestimated streamflow in the calibration (training) period. The SWAT-LCW model further overestimated the streamflow in the validation period, while the ANN model underestimated the streamflow in the testing period. Overall, the statistical indicators suggest that the ANN model slightly outperformed the SWAT in the urban LCW. These results corroborate the findings of some previous work comparing ANN and process-based approaches for streamflow modeling, which had often reported that the ANN models provided more accurate simulations using the criteria of statistical indicators (Srivastava et al. 2006; Jimeno-Sáez et al. 2018; Kassem et al. 2020).
In the rural UMRW, the ANN model performed significantly better than the SWAT model. The NSE performance of the training period was classified as very good for the ANN-UMRW model, using either the traditional calculation or the modified indicators. In addition, the testing NSE of the ANN-UMRW model was classified as satisfactory. In contrast, the calibration period SWAT-UMRW model only had good performance with NSE ranging from 0.54 to 0.56 from different uncertainty levels, and the validation period SWAT-UMRW model had unsatisfactory NSE performance on all uncertainty levels. Interestingly, the PBIAS statistics of the calibration (training) period for both SWAT-UMRW and ANN-UMRW models were classified as very good. However, the validation (testing) of PBIAS was similar to that in LCW, with ANN-UMRW underestimated streamflow and SWAT-UMRW made overestimation.
Additionally, the hydrographs in LCW show that the SWAT model overestimated the magnitude of almost every prominent streamflow peak in 2007 and 2009, while the magnitude of the ANN model predicted peak flow was much smaller. In the rural UMRW, SWAT and ANN models failed to capture the timing and magnitude of the significant storm events. More specifically, the ANN-UMRW model caught the timing of the storm events from 2007 to 2009 but often severely underestimated their magnitudes. On the other hand, the SWAT-UMRW model failed to capture the timing and magnitude of the storm events and falsely simulated a few nonexistent storm flow events. Additionally, the recession limbs simulated by SWAT-UMRW were much longer than those from the actual flow. It seems most likely that these results are due to the fast recharge through the sinkholes, which are found plenty in regions with karstic features.
The LCW and UMRW are situated above the Edwards BFZ Aquifer, with UMRW entirely located within the contribution zone and LCW located through the contributing, recharge, and artesian zone (Figure 1(a)). Karst geological features are prevalent in the Edwards BFZ Aquifer region, and a significant amount of surface runoff in the contributing zone infiltrates the water table aquifer (Loáiciga et al. 2000). In the rural UMRW, the karst terrain makes rainfall-runoff modeling particularly difficult. The observed streamflow time series did not correspond well with precipitation, mostly likely due to the fast infiltration or direct recharge through sinkholes. The water recharging into the Edwards BFZ Aquifer generally moves from west to east through large subterranean conduits (Loáiciga et al. 2000) and eventually discharges back to the surface through a few large springs far east outside of the watershed boundary of UMRW.
In the meantime, the two hydrological models employed in this study, especially the SWAT model, heavily relied on the input precipitation for estimating the watershed outflow. The daily streamflow simulation results suggest that the calibrated SWAT model failed to quantify the amount of groundwater recharge and discharge in UMRW, hence overestimating the watershed outflow. On the contrary, estimating watershed outflow in the urban LCW was much more successful for SWAT and ANN. These results are likely due to the extensive impervious surface in LCW, causing more direct surface runoff into the stream instead of recharging into the water table aquifer. Thereby, the rainfall-runoff process in the urban watershed was not evidently affected by the karst topography underneath. A few previous studies pointed out that the traditional SWAT models have generally not yielded satisfactory runoff estimates in karst watersheds, and some suggested applying additional modules or using the trace test method to quantify streamflow and infiltration (Spruill et al. 2000; Shao et al. 2019; Wang et al. 2019; Jakada & Chen 2020), which are beyond the scope of this study.
SUMMARY AND CONCLUSION
This study investigated watershed-level daily streamflow simulation using SWAT and ANN in two small watersheds in the San Antonio Region, where karst terrain is prevalent. This study set out to compare the efficacy of the SWAT and ANN models in estimating streamflow in the karstic watershed in central south Texas. The paired watershed approach assesses SWAT and ANN performance under different dominant land cover types. Additionally, we applied the correction factor approach to enhance model evaluation using modified goodness-of-fit indicators incorporating measurement and model uncertainty. The conclusions of this study are summarized as follows:
- (1)
Six ANN prediction scenarios were set up for the two study watersheds, and BlockedCV was applied to select the best neural network structure for each watershed. The model selection results (Tables 6 and 7) show that in urban LCW, the inclusion of previous time step streamflow as one of the predictors did not noticeably improve model performance. In contrast, in the rural UMRW, the model performance significantly improved when a time series autoregressive model structure using historical streamflow as input data was implemented.
- (2)
Considering the statistical performance and graphical comparison between the observed and simulated time series, both SWAT and ANN models made very good daily streamflow estimations in the urban LCW, while their performance in the rural UMRW was much less satisfactory. This discrepancy could be attributed to the extensive impervious surface in the urban watershed, causing more direct surface runoff into stream networks than in the rural watershed.
- (3)
The calibrated SWAT model has inferior performance in the rural UMRW, which is located within the contributing zone of Edwards BFZ Aquifer, where karst geological features are prevalent. This finding is consistent with a few previous studies, which suggested that the conventional SWAT model is less capable in a karst environment.
- (4)
The statistical indicators performance results for the standalone validation (testing) period suggest that the ANN model performed slightly better than the SWAT model in LCW and significantly better in UMRW. These findings, taken together with the fact that the data-driven ANN model has a short response time compared with SWAT, suggest that ANN is a better real-time simulator of streamflow, although it does not address the physical aspect of a hydrological system.
- (5)
Applying the correction factor approach to modify goodness-of-fit indicators to incorporate measurement and model uncertainty yielded similar results in the two study watersheds. When the uncertainty level was low (Cv = 0.026), the indicator values showed almost no visible improvement. However, when the more significant uncertainty level was applied (Cv = 0.256), both NSE and PBIAS values showed different levels of improvement for the ANN and SWAT models in both study watersheds.
This work was limited to hydrological simulations in small-sized watersheds in the San Antonio region. The study area was unique with its vast karstic groundwater aquifer with rapid recharge and discharge capabilities. An issue that was not addressed was whether the paired watershed study between urban and rural watersheds would yield closer model performance in an area without similar karst geological features. More importantly, although the results of this investigation show that ANNs achieved promising performance and outperformed the SWAT model by different margins, the most significant limitation lies in the fact that no spatially distributed watershed processes were considered for the ANNs in this work. Further research could usefully explore ANN forms that explicitly address the spatial distribution of hydrological processes within a watershed.
ACKNOWLEDGEMENTS
This manuscript is a chapter of the first author's dissertation.
AUTHOR CONTRIBUTIONS
X. M. and P. S. contributed significantly to developing the methodology applied in this study. X. M. conducted the model simulations and wrote the original draft. X. M., P. S. and J. L. analyzed the results and revised the manuscript. All authors have read and agreed to the published version of the manuscript.
FUNDING
This research was supported by the Yunnan Provincial Department of Education basic scientific research fund under grant NO. 2023J0446 and NO. 2023Y0884.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.