Abstract
Streamflow data are required for the effective management of flood damage; however, streamflow is rarely measured in small watersheds. In this study, a lumped conceptual model was adopted to generate streamflow values at small, ungauged watersheds using the spatial propagation concept. During the process of spatial propagation, parameters related to physical properties were fixed, while event-based initial conditions were optimized to ensure that error rates on the simulated streamflow data were comparable to those for measured data. Then the concept was validated using data from 21 flood events in the ChungJu Dam (CJD) watershed, Korea, which was divided into 22 small, ungauged watersheds. The propagated peak streamflow data at ungauged cross-validation points have average Nash-Sutcliffe efficiency (NSE) values of 0.91–0.99. Averages of NSE values for volume and time to peak streamflow are over 0.85 which satisfies the model criterion (NSE > 0.5). It is concluded that streamflow data for small, ungauged watersheds located upstream can be generated by one gauged downstream streamflow without an extensive amount of gauged streamflow data from other locations using the proposed concept. For accurate simulation, availability of rainfall data is essential for accurately modeling the spatial propagation of streamflow using a lumped conceptual model.
NOTATION
- A
watershed area (km2)
- Si
simulated streamflow (m3/s) at watershed i
- FF (
)
form factors
- f1
the initial runoff ratio
- fj
average inflow coefficient for channel flow
- Ij
inflow (m3/s)
- K and P
constant parameters
- L
river length
- Oi
observed streamflow (m3/s) at watershed i
- Q
streamflow (m3/s)
- Q(t)
streamflow (m3/s)
- Rsa
cumulative saturated rainfall (mm/h)
- r(t)
unit inflow (mm/h)
- re
effective rainfall (mm/h)
- RS
river slope
- S
storage (m3)
- Tl
lag time (h)
- t
time (h)
- WAS
watershed average slope
error rate at watershed i
INTRODUCTION
Rainfall-runoff modeling is used for hydrologic research, streamflow prediction, and engineering applications (Linsely 1981). A number of recent studies have concentrated on hydrological data prediction for engineering applications. With regard to streamflow predictions for ungauged watersheds, regionalization has been used to predict streamflow (1) by transferring parameters from gauged (i.e., donor) to ungauged (i.e., target) watersheds (Bloschl & Sivapalan 1995; Zhang & Chiew 2009), (2) by calibrating models (e.g., a hillslope model; an isotope model) related to discharge values (McGuire & McDonnell 2007; Leibundgut et al. 2009; McDonnell & Beven 2014; Yamanaka & Ma 2017) or (3) by estimating model parameters from various prior measured databases (Duan et al. 2006).
In the transferring of parameters, representative parameters for target watersheds are regionalized based on an extensive amount of information on physically proximate donor watersheds, or other donor watersheds with similar geo- and hydrological characteristics (Merz & Bloschl 2004; McIntyre et al. 2005; Parajka et al. 2005; Bardossy 2007; Oudin et al. 2008). In addition, there are various approaches to specified parameter identification related to the characteristics such as univariate regression, correlated regression, generalized regression (e.g., neural networks), symbolic regression (e.g., Genetic Program: Heřmanovský et al. 2017; Klotz et al. 2017), univariate weighted regression (Wagener et al. 2004), and statistical approaches (Durocher et al. 2015; Wazneh et al. 2015; Asadi et al. 2018).
For calibrating models, regionalization regression methods characterize watershed responses to attributes of donor watersheds and apply them to ungauged watersheds (Croke & Norton 2004; Parajka et al. 2005; Wagener & Wheater 2006; Boughton & Chiew 2007). Atieh et al. (2017) and Pugliese et al. (2016) generated a flow duration curve using an artificial neural network, gene expression programming, and geostatistical techniques for ungauged watersheds. A spatially distributed hydrology model is also used in the regionalization but, in terms of model calibration, it requires huge calibration processes and may produce over-parameterization (Troy et al. 2008; Klotz et al. 2017).
Beside streamflow prediction, regionalization functions can be utilized in streamflow record propagation by using a rainfall-runoff model. For this process, there remain a number of limitations: (1) regionalization requires extensive streamflow records from donor watersheds or watersheds with similar hydro/meteorological attributes (Wagener et al. 2004; Yamanaka & Ma 2017); however, for many regions only rainfall records are available (Javeed & Apoorva 2015; Ibrahim et al. 2015); (2) streamflow data generated from regionalization are currently concentrated on streamflow predictions from continuous daily rainfall-runoff models, which lack hourly flood records; and (3) streamflow data is measured at one gauge location for a large watershed which is a deficiency of spatial information.
In this study, a spatial propagation concept is used to generate streamflow records for proximate small and ungauged upstream watersheds, based on observed streamflow records at one downstream streamflow gauge. This newly developed method uses a conceptual hourly rainfall-runoff model, with the initial streamflow conditions based on past records (Figure 1). The remainder of this paper is organized as follows: The second and third sections describe the conceptual rainfall-runoff model and the concept of spatial propagation of streamflow records, respectively. The fourth section describes the selected sites and storm events. The fifth section presents the results and discussion of regionalized parameters and spatially propagated streamflow data. Finally, our conclusions are summarized.
HYDROLOGICAL MODEL
STREAMFLOW DATA GENERATION
Spatial propagation

For the purposes of this study, the chosen ungauged watersheds are spatially proximate to gauged watersheds, allowing for better streamflow estimation (Young 2006). The spatially proximate method of regionalization performs better than physical similarity and regression methods (e.g., Merz & Bloschl 2004; Parajka et al. 2005).
Procedure
In the procedure developed in this study, large watersheds (e.g., T; Figure 3) are divided into small watersheds (e.g., a, b, and c). Based on the observed streamflow values of the large watershed (e.g., T(a); Figure 3), streamflow values are generated for the small, ungauged watersheds using the concept of spatial propagation. If observed streamflow values are available for one of the small watersheds, these can be used for the streamflow data propagation instead of T(a).
During the process of spatial propagation of streamflow data (Figure 4), we first spatially positioned small watersheds within a large watershed using a lumped conceptual model. Parameters related to the physical characteristics of watersheds (i.e., K, P, and Tl in the SFM) were optimized iteratively for several years using measured streamflow values taken at the end of the large watershed (T). Generalized parameters for ungauged watersheds were fixed to regenerate streamflow data while event-based initial conditions (e.g., f1, Rsa in the SFM) were optimized.
Lumped conceptual model flowchart for the spatial propagation of streamflow data.
Lumped conceptual model flowchart for the spatial propagation of streamflow data.

STUDY SITE
For the purposes of this study, data were obtained from the ChungJu Dam (CJD) watershed, located in the southern Han river basin, South Korea. The watershed extends across 6,648 km2 and the longest river is ∼385 km in length with a 0.35 average slope. The CJD watershed contains five streamflow gauging stations (Figure 5): PanWoon (PW); YoungChoon (YC); YoungWol 1 (YW1); YoungWol 2 (YW2); and JungSun 2 (JS2). The inflow of the CJD is from two tributaries, one from the north and the other from the east (Figure 5(b)), which host the PW and YW1 streamflow gauge stations, and the JS2 and YW2 gauge stations, respectively. The YC station is located at the merged point of the two tributaries. Based on streamflow data availability, we selected YW1 and YW2 as cross-validation points for the streamflow data propagation.
(a) Map of the ChungJu Dam (CJD) watershed, with 22 small watersheds denoted. (b) Schematic of the study area water system.
(a) Map of the ChungJu Dam (CJD) watershed, with 22 small watersheds denoted. (b) Schematic of the study area water system.
We utilized an SFM-based rainfall-runoff system, the Coordinated Operation System for Flood control In Multi-reservoirs (COSFIM), developed by K-water (MLIT 2004). The CJD watershed was divided into 22 small watersheds, which include the five streamflow gauging stations. As we propagated streamflow data for ungauged locations, the 22 streamflow gauges also received streamflow information. To perform the spatial propagation of the streamflow data, 21 flood events from 1993 to 2009 were selected from observed hourly inflow data at CJD. The criteria for flood event selection included: (1) more than two events per year; (2) peak inflow >3,000 m3/s; and (3) high quality inflow data in terms of the similarity of hydrograph trends for the same watershed.
RESULTS AND DISCUSSION
Regionalized parameters
For the generalization of physical-based parameters, the Golden Section Search Method (GSSM; Kiefer 1953) was applied to the observed CJD inflow data for the 21 flood events. Initial values for the physical-based parameters using GSSM were taken from the empirical equations generated from the Tone river basin located in Japan (Kikkawa et al. 1975) as Duan et al. (2001) suggested for parameter estimation procedure. The initial values of physical-based parameters for the 22 watersheds were simultaneously optimized in terms of error rate change. The finalized parameter values (K, P, and Tl) for the 21 events were generalized specifically for the CJD watershed by searching relationships between physiographic watershed characteristics, for example: watershed area (A), river length (L), form factors (FF), watershed average slope (WAS), and river slope (RS). Even though previous research provides empirical equations, the generated process is required because physical-based parameters have diverse relations at various watersheds.
After the generalized parameters were adopted into the 22 watersheds, a selected event-based initial condition (Rsa) was optimized to regenerate the inflow values of the CJD out of two conditions (f1, Rsa). The results of sensitivity analysis shown in Figure 6 on event 04_1 show that cumulative saturated rainfall (Rsa) is important for streamflow generation. The parameters (K, P) related to the relation between storage and runoff depths were not considered since these are fixed for the regeneration process based on the propagation concept. We fixed the initial runoff ratio (f1) to 0.3 and selected the cumulative saturated rainfall (Rsa). Some studies have suggested that initial soil conditions (e.g., soil moisture values) are not critical for storm rainfall-runoff modeling (Brocca et al. 2009); however, others (e.g., Jothityangkoon & Sivapalan 2003) have shown that the status of the surface can considerably change the magnitude in extreme flood simulations. For the data regeneration process, initial conditions are important for streamflow simulations as Jothityangkoon & Sivapalan (2003) mentioned. During the optimization process, identical cumulative saturated rainfall was applied to the 22 watersheds, based on the assumption that the proximate small watersheds have an equivalent cumulative saturated rainfall for each flood event. Furthermore, we fixed the parameters for channel flow by using the empirical values of the Tone river basin. These parameters do not substantially influence peak streamflow generation.
Spatially propagated streamflow data for ungauged watersheds
At optimally generalized parameters (K, P, and Tl), event-based initial conditions (Rsa) were optimized using the gauged inflow values at CJD, with cross-validation at stations YW1 and YW2. CJD inflow data were used as the criteria for spatial propagation into the 22 small, ungauged watersheds. As a result, the CJD inflow records contain a maximum individual event NSE value of 0.95 (event 02_1), while the average is 0.83. In comparison, the maximum NSE of YW1 is higher (0.96 for event 02_1) than that of the CJD. For YW2, event 02_1 also has the highest NSE value (0.95), while the average is 0.80.
For the cross-validations of spatial propagation at YW1 and YW2, peak streamflow, time to peak, and streamflow volume were compared and are shown in Figure 7. With respect to the simulation validation criteria, the average NSE values exceed Moriasi et al. (2007) satisfaction criterion (NSE > 0.5). The NSE ranges for peak streamflow at CJD, YW1, and YW2 are 0.91–0.99. This result shows that using spatial propagation to regenerate peak streamflow is an effective method for setting up peak streamflow data for ungauged watersheds. Subsequently, these peak streamflow data can be used to design flood control countermeasures (Moretti & Montanari 2008). For the streamflow volume, the CJD has the highest NSE (0.98) followed by YW2 (0.96). For time to peaks, the spatially propagated results at YW1 show some outliers (Figure 7); however, these are not apparent in the streamflow peak and volume results. Since some events (e.g., 09_2) have similar magnitudes of dual streamflow peak and volume, the time to peak was selected for one of them based on location. Magnitude of rainfall at each location may influence this divergence.
At YC and PW, simulated data most closely matched observed data when streamflows are <10,000 m3/s and <1,000 m3/s, respectively (Figure 8). For YC, the coefficient of determination is 0.93, with 14% of simulations resulting in overestimated streamflow. For PW, the coefficient of determination is smaller (0.8), while 10% of the simulations underestimate streamflow. When streamflow is larger than 10,000 m3/s at YC, PW also has streamflow over 1,000 m3/s in the same events (e.g., 02_1 and 06_1) with different simulation patterns in terms of simulation rate (i.e., simulated value divided by observed value). The simulation rate at YC is 1.25 showing overestimated streamflow whereas PW has 0.62 streamflow rate. However, the sum of simulation rate at CJD is 1 which was PW and YC's combined simulation rate with areal size effects for both events. PW is located within a small watershed (220 km2), which is just 4% of the size of watershed in which YC is located. This difference in size means that the PW station may be more sensitive to environmental factors (e.g., rainfall and Rsa). In the propagation concept, Rsa is only one considerable parameter relying on streamflow regeneration because observed rainfall data is applied in this study. At streamflow values of <1,000 m3/s, some values fall along a linear trend (as denoted by the circled area in Figure 8). In the spatial propagation of streamflow for event 95_A, the finalized Rsa value, which was adjusted to match the gauged streamflow data at the CJD criteria point, is too large to accommodate more surface streamflow at the PW water gauge station because the same Rsa is applied to the PW. It might be possible that the model we used was too simple to produce a parsimonious structure as Wagener et al. (2004) suggested because the application of the same Rsa to the whole watershed possibly generated over- or under-estimations for small, ungauged watersheds. As Samaniego et al. (2010) tried, effective size of Rsa distribution can be applied to overcome this limitation.
Influences of rainfall data availability
During the streamflow propagation process, rainfall data was important for selecting the time to peak values for each event and for matching streamflow records at water gauge stations. We used the 02_1 storm event to test the sensitivity of the streamflow generation data to rainfall data availability. One rain gauge station per a rainfall-runoff simulation was added to the initial 11 rain gauge stations until the total 34 rain gauge stations were reached using the Thiessen polygon method. This method is selected because the arithmetic mean method and the isohyetal method require uniformly distributed gauges and an extensive gauge network, respectively, to get the average depth of rainfall (Bedient et al. 2008). Root mean square errors (RMSE) of the generated inflow data at CJD were calculated based on increasing rainfall data availability as the number of rain gauges increased. When data from just 11 rain gauge stations is utilized, RMSE values are 344.9 m3/s, while with 34 rain gauges, values are 40.79 m3/s (Figure 9). The most significant decrease in RMSE was observed when the rain gauge number increases from 25 to 27. The 26th and 27th rain gauges had a higher rainfall amount (16% more than average rainfall) for the 02_1 storm event, which was added at the locations where no rainfall gauge existed. Based on this result, 27 rain gauges could represent areal rainfall of the CJD watershed and the density of rain gauges in the CJD is 246 km2/gauge, which corresponds to the recommended range of the World Meteorological Organization (i.e., 250 km2/gauge for a non-recording gauge for hilly region). The spatial scale of rainfall is substantial for runoff estimation for large watersheds (>500 to 25,000 km2) in particular (Hay et al. 2002; Pinault & Allier 2007). Thus, the rainfall gauge locations could have rational effects on rainfall-runoff simulations (Jung et al. 2014). We conclude that the spatial propagation of streamflow data is highly dependent on rainfall data availability.
Sensitivity of streamflow generation to the availability of rainfall data.
Secondary effects of streamflow data propagation
During spatially distributed streamflow data generation, the quality of gauged streamflow data was checked to fill in missing data (Figure 10). For example, the JS2 water gauge station did not provide streamflow data earlier than 2004, and after 2004, several years of data were also missing. A number of studies (Perrin et al. 2007; Loukas & Vasiliades 2014) focused on the calibration problems with poorly gauged streamflow data and discussed the minimum amount of streamflow data based on the number of free parameters (Seibert & Beven 2009). During the process of spatial propagation, not only was fully simulated streamflow data obtained by following the trend of gauged streamflow data, but the quality of gauged streamflow data was examined (Figure 10(b)–10(e)). The results show that the gauged streamflow data at YW2 and YW1 are misgauged at approximately 31 and 51 hours after the start of flood events, respectively. In addition, YC gauge streamflow data are presented in Figure 10 for the same period as YW2 and YW1. The streamflow trends of YW2 in 2002 and YW1 in 2004 followed that of YC as expected. The trends of streamflow data will not be changed between upper and lower streamflow gauges except if physical properties in the upper and lower watersheds are changed or there is some highly localized rainfall. Through spatial propagation, these two misgauged hydrographs can be regenerated using the gauged streamflow data located downstream. Therefore, spatial propagation of streamflow could be a feasible method to evaluate the measured streamflow data.
CONCLUSIONS
Streamflow information is an important component needed to manage water resources and flood control. For small and ungauged watersheds in particular, there are some difficulties in accessing the records of flood events due to streamflow data deficiency. In this study, we performed a spatial propagation of streamflow data for upstream small, ungauged watersheds based on one downstream gauged streamflow without an extensive amount of gauged streamflow data for model calibration. The spatial propagation procedure used the SFM, with the physical-based parameters fixed using generalized equations, and the event-based initial condition optimized for selected events. The fixed parameters provide similar error scales, produced from optimization at the measured inflow station (CJD), to all watersheds. This parameter separation approach can provide better streamflow data
Average NSE values for peak, volume, and time to peak at cross-validation points were 0.88–0.99, which satisfies the model criterion (NSE >0.5). Furthermore, the hydrographs of other validation points show the applicability of the spatial data propagation concept for ungauged watersheds. However, the results also show some limitations of the application of this concept of spatial propagation as follows: (1) the availability of rainfall data and the selection of initial conditions is critical for the spatial propagation of streamflow data for smaller watersheds; and (2) the lumped conceptual model has limited applicability (i.e., limitation of the size of smaller watersheds and sensitivity to the initial surface conditions). In future work, a distributed and continuous model should be considered to minimize these limitations.
ACKNOWLEDGEMENTS
This paper was supported by Wonkwang University in 2018. We are truly grateful for the model support (COSFIM) by K-water in Korea.