## Abstract

Streamflow data are required for the effective management of flood damage; however, streamflow is rarely measured in small watersheds. In this study, a lumped conceptual model was adopted to generate streamflow values at small, ungauged watersheds using the spatial propagation concept. During the process of spatial propagation, parameters related to physical properties were fixed, while event-based initial conditions were optimized to ensure that error rates on the simulated streamflow data were comparable to those for measured data. Then the concept was validated using data from 21 flood events in the ChungJu Dam (CJD) watershed, Korea, which was divided into 22 small, ungauged watersheds. The propagated peak streamflow data at ungauged cross-validation points have average Nash-Sutcliffe efficiency (NSE) values of 0.91–0.99. Averages of NSE values for volume and time to peak streamflow are over 0.85 which satisfies the model criterion (NSE > 0.5). It is concluded that streamflow data for small, ungauged watersheds located upstream can be generated by one gauged downstream streamflow without an extensive amount of gauged streamflow data from other locations using the proposed concept. For accurate simulation, availability of rainfall data is essential for accurately modeling the spatial propagation of streamflow using a lumped conceptual model.

## NOTATION

*A*watershed area (km

^{2})*S*_{i}simulated streamflow (m

^{3}/s) at watershed*i**FF*()form factors

*f*_{1}the initial runoff ratio

*f*_{j}average inflow coefficient for channel flow

*I*_{j}inflow (m

^{3}/s)*K*and*P*constant parameters

*L*river length

*O*_{i}observed streamflow (m

^{3}/s) at watershed*i**Q*streamflow (m

^{3}/s)*Q*(*t*)streamflow (m

^{3}/s)*R*_{sa}cumulative saturated rainfall (mm/h)

*r*(*t*)unit inflow (mm/h)

*r*_{e}effective rainfall (mm/h)

*RS*river slope

*S*storage (m

^{3})*T*_{l}lag time (h)

*t*time (h)

*WAS*watershed average slope

error rate at watershed

*i*

## INTRODUCTION

Rainfall-runoff modeling is used for hydrologic research, streamflow prediction, and engineering applications (Linsely 1981). A number of recent studies have concentrated on hydrological data prediction for engineering applications. With regard to streamflow predictions for ungauged watersheds, regionalization has been used to predict streamflow (1) by transferring parameters from gauged (i.e., donor) to ungauged (i.e., target) watersheds (Bloschl & Sivapalan 1995; Zhang & Chiew 2009), (2) by calibrating models (e.g., a hillslope model; an isotope model) related to discharge values (McGuire & McDonnell 2007; Leibundgut *et al.* 2009; McDonnell & Beven 2014; Yamanaka & Ma 2017) or (3) by estimating model parameters from various prior measured databases (Duan *et al.* 2006).

In the transferring of parameters, representative parameters for target watersheds are regionalized based on an extensive amount of information on physically proximate donor watersheds, or other donor watersheds with similar geo- and hydrological characteristics (Merz & Bloschl 2004; McIntyre *et al.* 2005; Parajka *et al.* 2005; Bardossy 2007; Oudin *et al.* 2008). In addition, there are various approaches to specified parameter identification related to the characteristics such as univariate regression, correlated regression, generalized regression (e.g., neural networks), symbolic regression (e.g., Genetic Program: Heřmanovský *et al.* 2017; Klotz *et al.* 2017), univariate weighted regression (Wagener *et al.* 2004), and statistical approaches (Durocher *et al.* 2015; Wazneh *et al.* 2015; Asadi *et al.* 2018).

For calibrating models, regionalization regression methods characterize watershed responses to attributes of donor watersheds and apply them to ungauged watersheds (Croke & Norton 2004; Parajka *et al.* 2005; Wagener & Wheater 2006; Boughton & Chiew 2007). Atieh *et al.* (2017) and Pugliese *et al.* (2016) generated a flow duration curve using an artificial neural network, gene expression programming, and geostatistical techniques for ungauged watersheds. A spatially distributed hydrology model is also used in the regionalization but, in terms of model calibration, it requires huge calibration processes and may produce over-parameterization (Troy *et al.* 2008; Klotz *et al.* 2017).

Beside streamflow prediction, regionalization functions can be utilized in streamflow record propagation by using a rainfall-runoff model. For this process, there remain a number of limitations: (1) regionalization requires extensive streamflow records from donor watersheds or watersheds with similar hydro/meteorological attributes (Wagener *et al.* 2004; Yamanaka & Ma 2017); however, for many regions only rainfall records are available (Javeed & Apoorva 2015; Ibrahim *et al.* 2015); (2) streamflow data generated from regionalization are currently concentrated on streamflow predictions from continuous daily rainfall-runoff models, which lack hourly flood records; and (3) streamflow data is measured at one gauge location for a large watershed which is a deficiency of spatial information.

In this study, a spatial propagation concept is used to generate streamflow records for proximate small and ungauged upstream watersheds, based on observed streamflow records at one downstream streamflow gauge. This newly developed method uses a conceptual hourly rainfall-runoff model, with the initial streamflow conditions based on past records (Figure 1). The remainder of this paper is organized as follows: The second and third sections describe the conceptual rainfall-runoff model and the concept of spatial propagation of streamflow records, respectively. The fourth section describes the selected sites and storm events. The fifth section presents the results and discussion of regionalized parameters and spatially propagated streamflow data. Finally, our conclusions are summarized.

## HYDROLOGICAL MODEL

*K*and

*P*are constant parameters related to the physical characteristics of the watershed. In the process of obtaining runoff (

*Q*), Kimura (1961) used the modified Puls method in the continuity equation for watersheds (Equation (2)). Equation (2) presents that the storage change per unit time through the relationship between streamflow and effective rainfall (

*r*: mm/h) in the unit area: where

_{e}*Q*(

*t*) is streamflow (m

^{3}/s, cubic meter per second: cms) and

*A*is watershed area (km

^{2}). The effective rainfall is dependent on the relationship between cumulative saturated rainfall (

*R*: mm/h) and unit inflow (

_{sa}*r(t)*: mm/h) from rainfall at time

*t*(h), with consideration of lag time

*T*(h) included, as shown in Equation (3): Initial streamflow is highly related to the initial conditions presented in

_{l}*f*

_{1}(the initial runoff ratio) and

*R*. The minimum value of cumulative saturated rainfall is zero, which means that all rainfall can be accepted as streamflow without any infiltration. This event-based rainfall-runoff model must consider the baseflow, which has a substantial effect on streamflow simulations (Samuel

_{sa}*et al.*2012). In this study, the master depletion curve method is employed for separation between baseflow and streamflow using the recorded initial streamflow amount of the selected flood events. This baseflow consideration is in contrast to continuous model simulations which produce baseflow and antecedent runoff conditions continuously (Blazkova & Beven 2002).

## STREAMFLOW DATA GENERATION

### Spatial propagation

*et al.*2005): physical-based parameters (i.e.,

*K*,

*P*, and

*T*), which are strongly related to the watershed characteristics; and time variant initial conditions (i.e.,

_{l}*f*

_{1}and

*R*). This separation is clearly different from the previous studies because they included wetness and soil properties in a generalized model structure (Robson & Reed 1999). Physical-based parameters were fixed while time variant (event-based) initial conditions were optimized to decrease the differences between simulated and observed streamflow values for gauged watersheds. It would theoretically produce simulated streamflow data for ungauged watersheds with a similar magnitude of error. On this basis, the streamflow data from just one or two gauged sites would allow the generation of streamflow data at ungauged watersheds. The error rate of a given watershed is conceptually expressed by: where is the error rate at watershed

_{sa}*i*, and

*O*and

_{i}*S*are the observed and simulated streamflow values at watershed

_{i}*i*, respectively.

For the purposes of this study, the chosen ungauged watersheds are spatially proximate to gauged watersheds, allowing for better streamflow estimation (Young 2006). The spatially proximate method of regionalization performs better than physical similarity and regression methods (e.g., Merz & Bloschl 2004; Parajka *et al.* 2005).

### Procedure

In the procedure developed in this study, large watersheds (e.g., T; Figure 3) are divided into small watersheds (e.g., a, b, and c). Based on the observed streamflow values of the large watershed (e.g., T(a); Figure 3), streamflow values are generated for the small, ungauged watersheds using the concept of spatial propagation. If observed streamflow values are available for one of the small watersheds, these can be used for the streamflow data propagation instead of T(a).

During the process of spatial propagation of streamflow data (Figure 4), we first spatially positioned small watersheds within a large watershed using a lumped conceptual model. Parameters related to the physical characteristics of watersheds (i.e., *K*, *P*, and *T _{l}* in the SFM) were optimized iteratively for several years using measured streamflow values taken at the end of the large watershed (T). Generalized parameters for ungauged watersheds were fixed to regenerate streamflow data while event-based initial conditions (e.g.,

*f*

_{1},

*R*in the SFM) were optimized.

_{sa}*et al.*2007). NSE values range from to 1, where 1 indicates that simulated streamflow data are perfectly matched to observed values, and are expressed as:

## STUDY SITE

For the purposes of this study, data were obtained from the ChungJu Dam (CJD) watershed, located in the southern Han river basin, South Korea. The watershed extends across 6,648 km^{2} and the longest river is ∼385 km in length with a 0.35 average slope. The CJD watershed contains five streamflow gauging stations (Figure 5): PanWoon (PW); YoungChoon (YC); YoungWol 1 (YW1); YoungWol 2 (YW2); and JungSun 2 (JS2). The inflow of the CJD is from two tributaries, one from the north and the other from the east (Figure 5(b)), which host the PW and YW1 streamflow gauge stations, and the JS2 and YW2 gauge stations, respectively. The YC station is located at the merged point of the two tributaries. Based on streamflow data availability, we selected YW1 and YW2 as cross-validation points for the streamflow data propagation.

We utilized an SFM-based rainfall-runoff system, the Coordinated Operation System for Flood control In Multi-reservoirs (COSFIM), developed by K-water (MLIT 2004). The CJD watershed was divided into 22 small watersheds, which include the five streamflow gauging stations. As we propagated streamflow data for ungauged locations, the 22 streamflow gauges also received streamflow information. To perform the spatial propagation of the streamflow data, 21 flood events from 1993 to 2009 were selected from observed hourly inflow data at CJD. The criteria for flood event selection included: (1) more than two events per year; (2) peak inflow >3,000 m^{3}/s; and (3) high quality inflow data in terms of the similarity of hydrograph trends for the same watershed.

## RESULTS AND DISCUSSION

### Regionalized parameters

For the generalization of physical-based parameters, the Golden Section Search Method (GSSM; Kiefer 1953) was applied to the observed CJD inflow data for the 21 flood events. Initial values for the physical-based parameters using GSSM were taken from the empirical equations generated from the Tone river basin located in Japan (Kikkawa *et al.* 1975) as Duan *et al.* (2001) suggested for parameter estimation procedure. The initial values of physical-based parameters for the 22 watersheds were simultaneously optimized in terms of error rate change. The finalized parameter values (*K, P,* and *T _{l}*) for the 21 events were generalized specifically for the CJD watershed by searching relationships between physiographic watershed characteristics, for example: watershed area (

*A*), river length (

*L*), form factors (

*FF*), watershed average slope (

*WAS*), and river slope (

*RS*). Even though previous research provides empirical equations, the generated process is required because physical-based parameters have diverse relations at various watersheds.

*et al.*1980). The results show that river length (

*L*) is the major contributor for parameters

*K*(>58%) and

*P*(>56%) but WAS is not because the percentages of contribution are less than 3% for both parameters. In previous regionalization studies, the regression method performed badly because of cross-correlation between the parameters (Beven & Freer 2001). Therefore, we executed multicollinearity analysis to find that river length (

*L*) and watershed area (

*A*) are highly correlated. On the basis of this result, the form factor (

*FF*) was considered instead of these two characteristics. Generalized equations (Equations (6)–(8)) for physical-based parameters were obtained from stepwise regressions, and are expressed as: where

*T*does not have an

_{l}*R*value because the initial value is also from river length. From Equation (1), the value of

^{2}*P*is equal to 1 indicating that this model is a linear model like the HEC (Hydrologic Engineering Center) – HMS (Hydrologic Modeling System). The

*P*values range from 0.38 to 1 and 0.6 is empirically selected as a fixed value (Sugiyama

*et al.*1997).

*K*is related to the kinematic wave runoff model constant (

*k*) which is substantially dependent on watershed area. Therefore, Equation (7) reasonably presents the relation with watershed area and slope. The values of

*FF*and

*RS*could not be zero, thus the maximum value of

*K*is 28.096. The empirical range of

*K*was between 13.83 and 28.096; and these values rely on the size of watersheds and river length. In addition,

*K*is also related to the recession of the hydrograph. A long recession period means larger

*K*values, thus the watershed has larger

*FF*with smaller slope.

After the generalized parameters were adopted into the 22 watersheds, a selected event-based initial condition (*R _{sa}*) was optimized to regenerate the inflow values of the CJD out of two conditions (

*f*

_{1},

*R*). The results of sensitivity analysis shown in Figure 6 on event 04_1 show that cumulative saturated rainfall (

_{sa}*R*) is important for streamflow generation. The parameters (

_{sa}*K, P*) related to the relation between storage and runoff depths were not considered since these are fixed for the regeneration process based on the propagation concept. We fixed the initial runoff ratio (

*f*

_{1}) to 0.3 and selected the cumulative saturated rainfall (

*R*). Some studies have suggested that initial soil conditions (e.g., soil moisture values) are not critical for storm rainfall-runoff modeling (Brocca

_{sa}*et al.*2009); however, others (e.g., Jothityangkoon & Sivapalan 2003) have shown that the status of the surface can considerably change the magnitude in extreme flood simulations. For the data regeneration process, initial conditions are important for streamflow simulations as Jothityangkoon & Sivapalan (2003) mentioned. During the optimization process, identical cumulative saturated rainfall was applied to the 22 watersheds, based on the assumption that the proximate small watersheds have an equivalent cumulative saturated rainfall for each flood event. Furthermore, we fixed the parameters for channel flow by using the empirical values of the Tone river basin. These parameters do not substantially influence peak streamflow generation.

### Spatially propagated streamflow data for ungauged watersheds

At optimally generalized parameters (*K, P,* and *T _{l}*), event-based initial conditions (

*R*) were optimized using the gauged inflow values at CJD, with cross-validation at stations YW1 and YW2. CJD inflow data were used as the criteria for spatial propagation into the 22 small, ungauged watersheds. As a result, the CJD inflow records contain a maximum individual event NSE value of 0.95 (event 02_1), while the average is 0.83. In comparison, the maximum NSE of YW1 is higher (0.96 for event 02_1) than that of the CJD. For YW2, event 02_1 also has the highest NSE value (0.95), while the average is 0.80.

_{sa}For the cross-validations of spatial propagation at YW1 and YW2, peak streamflow, time to peak, and streamflow volume were compared and are shown in Figure 7. With respect to the simulation validation criteria, the average NSE values exceed Moriasi *et al.* (2007) satisfaction criterion (NSE > 0.5). The NSE ranges for peak streamflow at CJD, YW1, and YW2 are 0.91–0.99. This result shows that using spatial propagation to regenerate peak streamflow is an effective method for setting up peak streamflow data for ungauged watersheds. Subsequently, these peak streamflow data can be used to design flood control countermeasures (Moretti & Montanari 2008). For the streamflow volume, the CJD has the highest NSE (0.98) followed by YW2 (0.96). For time to peaks, the spatially propagated results at YW1 show some outliers (Figure 7); however, these are not apparent in the streamflow peak and volume results. Since some events (e.g., 09_2) have similar magnitudes of dual streamflow peak and volume, the time to peak was selected for one of them based on location. Magnitude of rainfall at each location may influence this divergence.

At YC and PW, simulated data most closely matched observed data when streamflows are <10,000 m^{3}/s and <1,000 m^{3}/s, respectively (Figure 8). For YC, the coefficient of determination is 0.93, with 14% of simulations resulting in overestimated streamflow. For PW, the coefficient of determination is smaller (0.8), while 10% of the simulations underestimate streamflow. When streamflow is larger than 10,000 m^{3}/s at YC, PW also has streamflow over 1,000 m^{3}/s in the same events (e.g., 02_1 and 06_1) with different simulation patterns in terms of simulation rate (i.e., simulated value divided by observed value). The simulation rate at YC is 1.25 showing overestimated streamflow whereas PW has 0.62 streamflow rate. However, the sum of simulation rate at CJD is 1 which was PW and YC's combined simulation rate with areal size effects for both events. PW is located within a small watershed (220 km^{2}), which is just 4% of the size of watershed in which YC is located. This difference in size means that the PW station may be more sensitive to environmental factors (e.g., rainfall and *R _{sa}*). In the propagation concept,

*R*is only one considerable parameter relying on streamflow regeneration because observed rainfall data is applied in this study. At streamflow values of <1,000 m

_{sa}^{3}/s, some values fall along a linear trend (as denoted by the circled area in Figure 8). In the spatial propagation of streamflow for event 95_A, the finalized

*R*value, which was adjusted to match the gauged streamflow data at the CJD criteria point, is too large to accommodate more surface streamflow at the PW water gauge station because the same

_{sa}*R*is applied to the PW. It might be possible that the model we used was too simple to produce a parsimonious structure as Wagener

_{sa}*et al.*(2004) suggested because the application of the same

*R*to the whole watershed possibly generated over- or under-estimations for small, ungauged watersheds. As Samaniego

_{sa}*et al.*(2010) tried, effective size of

*R*distribution can be applied to overcome this limitation.

_{sa}### Influences of rainfall data availability

During the streamflow propagation process, rainfall data was important for selecting the time to peak values for each event and for matching streamflow records at water gauge stations. We used the 02_1 storm event to test the sensitivity of the streamflow generation data to rainfall data availability. One rain gauge station per a rainfall-runoff simulation was added to the initial 11 rain gauge stations until the total 34 rain gauge stations were reached using the Thiessen polygon method. This method is selected because the arithmetic mean method and the isohyetal method require uniformly distributed gauges and an extensive gauge network, respectively, to get the average depth of rainfall (Bedient *et al.* 2008). Root mean square errors (RMSE) of the generated inflow data at CJD were calculated based on increasing rainfall data availability as the number of rain gauges increased. When data from just 11 rain gauge stations is utilized, RMSE values are 344.9 m^{3}/s, while with 34 rain gauges, values are 40.79 m^{3}/s (Figure 9). The most significant decrease in RMSE was observed when the rain gauge number increases from 25 to 27. The 26th and 27th rain gauges had a higher rainfall amount (16% more than average rainfall) for the 02_1 storm event, which was added at the locations where no rainfall gauge existed. Based on this result, 27 rain gauges could represent areal rainfall of the CJD watershed and the density of rain gauges in the CJD is 246 km^{2}/gauge, which corresponds to the recommended range of the World Meteorological Organization (i.e., 250 km^{2}/gauge for a non-recording gauge for hilly region). The spatial scale of rainfall is substantial for runoff estimation for large watersheds (>500 to 25,000 km^{2}) in particular (Hay *et al.* 2002; Pinault & Allier 2007). Thus, the rainfall gauge locations could have rational effects on rainfall-runoff simulations (Jung *et al.* 2014). We conclude that the spatial propagation of streamflow data is highly dependent on rainfall data availability.

### Secondary effects of streamflow data propagation

During spatially distributed streamflow data generation, the quality of gauged streamflow data was checked to fill in missing data (Figure 10). For example, the JS2 water gauge station did not provide streamflow data earlier than 2004, and after 2004, several years of data were also missing. A number of studies (Perrin *et al.* 2007; Loukas & Vasiliades 2014) focused on the calibration problems with poorly gauged streamflow data and discussed the minimum amount of streamflow data based on the number of free parameters (Seibert & Beven 2009). During the process of spatial propagation, not only was fully simulated streamflow data obtained by following the trend of gauged streamflow data, but the quality of gauged streamflow data was examined (Figure 10(b)–10(e)). The results show that the gauged streamflow data at YW2 and YW1 are misgauged at approximately 31 and 51 hours after the start of flood events, respectively. In addition, YC gauge streamflow data are presented in Figure 10 for the same period as YW2 and YW1. The streamflow trends of YW2 in 2002 and YW1 in 2004 followed that of YC as expected. The trends of streamflow data will not be changed between upper and lower streamflow gauges except if physical properties in the upper and lower watersheds are changed or there is some highly localized rainfall. Through spatial propagation, these two misgauged hydrographs can be regenerated using the gauged streamflow data located downstream. Therefore, spatial propagation of streamflow could be a feasible method to evaluate the measured streamflow data.

## CONCLUSIONS

Streamflow information is an important component needed to manage water resources and flood control. For small and ungauged watersheds in particular, there are some difficulties in accessing the records of flood events due to streamflow data deficiency. In this study, we performed a spatial propagation of streamflow data for upstream small, ungauged watersheds based on one downstream gauged streamflow without an extensive amount of gauged streamflow data for model calibration. The spatial propagation procedure used the SFM, with the physical-based parameters fixed using generalized equations, and the event-based initial condition optimized for selected events. The fixed parameters provide similar error scales, produced from optimization at the measured inflow station (CJD), to all watersheds. This parameter separation approach can provide better streamflow data

Average NSE values for peak, volume, and time to peak at cross-validation points were 0.88–0.99, which satisfies the model criterion (NSE >0.5). Furthermore, the hydrographs of other validation points show the applicability of the spatial data propagation concept for ungauged watersheds. However, the results also show some limitations of the application of this concept of spatial propagation as follows: (1) the availability of rainfall data and the selection of initial conditions is critical for the spatial propagation of streamflow data for smaller watersheds; and (2) the lumped conceptual model has limited applicability (i.e., limitation of the size of smaller watersheds and sensitivity to the initial surface conditions). In future work, a distributed and continuous model should be considered to minimize these limitations.

## ACKNOWLEDGEMENTS

This paper was supported by Wonkwang University in 2018. We are truly grateful for the model support (COSFIM) by K-water in Korea.