Evaluation of global forcing datasets for hydropower in ﬂ ow simulation in Nepal

Discharge over the Narayani river catchment of Nepal was simulated using Statkraft ’ s Hydrologic Forecasting Toolbox (Shyft) forced with observations and three global forcing datasets: (i) ERA-Interim (ERA-I), (ii) Water and Global Change (WATCH) Forcing Data ERA-I (WFDEI), and (iii) Coordinated Regional Climate Downscaling Experiment with the contributing institute Rossy Centre, Swedish Meteorological and Hydrological Institute (CORDEX-SMHI). Not only does this provide an opportunity to evaluate discharge variability and uncertainty resulting from different forcing data but also it demonstrates the capability and potential of using these global datasets in data-sparse regions. The ﬁ delity of discharge simulation is the greatest when using observations combined with the WFDEI forcing dataset (hybrid datasets). These results demonstrate the successful application of global forcing datasets for regional catchment-scale modeling in remote regions. The results were also promising to provide insight of the interannual variability in discharge. This study showed that while large biases in precipitation can be reduced by applying a precipitation correction factor (p_corr_factor), the best result is obtained using bias-corrected forcing data as input, i.e. the WFDEI outperformed other forcing datasets. Accordingly, the WFDEI forcing dataset holds great potential for improving our understanding of the hydrology of data-sparse Himalayan regions and providing the potential for prediction. The use of CORDEX-SMHI-and ERA-I-derived data requires further validation and bias correction, particularly over the high mountain regions.


INTRODUCTION
The Himalayan and adjacent Tibetan Plateau water supply is intricately linked to the livelihoods, economic and social, to millions of people (Bookhagen & Burbank ; to become the third largest freshwater delta (Chowdhury & Ward ).
Due to the high precipitation rates (1,500-2,500 mm/ year; Dahal & Hasegawa ) and steep elevation gradients, hydropower has significant potential in the region and, if well managed, could provide a resource for economic growth.Hydropower potential depends on climatic conditions, particularly on precipitation, evaporation, temperature, and snow/ice in the catchment (Edenhofer et al. ).However, the barriers for sustainable water resource management in this region are manifold.First, there are numerous socio-political issues related to transborder management issues (Biswas ).Despite efforts, a comprehensive framework to guide the development and international cooperation has yet to be ratified (Biswas ).Second, portions of the region are heavily glaciated and climate change is exacting an immediate and tangible impact (Immerzeel et al. ; Dehecq et al. ).
In this paper, we focus on a third significant barrier: providing robust and reliable analysis of the water resource potential and impacts from climate change.Data scarcity and the complexity of the terrain creating large spatial gradients and variability in weather and climate make such analyses a challenging task.The extremely heterogeneous topography in the region presents a great challenge for the accurate measurement of meteorological variables, giving rise to data scarcity (Pellicciotti et al. ).Although there are some meteorological stations, they are not well distributed in space.To overcome these challenges, regional climate models and reanalysis data offer gridded datasets of many meteorological variables, although with a rather coarse spatial resolution (Guo & Su ).The Coordinated Regional Climatic Downscaling Experiment (CORDEX) (Giorgi et al. ) provides extensively used regional datasets for past and future climate.For a historical perspective, the reanalysis Bharti & Singh () reported that the ERA-I precipitation is largely overestimated over the Indian Himalayan region.
To address the bias inherited in ERA-I, bias-and elevation correction-based monthly observations were carried out when creating the Water and Global Change Forcing Data ERA-I (WFDEI) datasets (Weedon et al. ; Kim et al. ).
Various factors make hydrological modeling in the region challenging, including a large spatial variability in hydrometeorological variables, steep gradients, marked seasonality driven by the Indian Monsoon, and contrasting moisture regimes between the high elevation Tibetan Plateau and regions in the vicinity of the Indian Ocean.
Changes in the climatic condition in the region may lead to changes in regional water balance components impacting hydrologic regimes (Koch et al. ).Moreover, snow and glacier storage and melt play an important role for the river discharge generation (Radic ´& Hock ; Li et al. , ).The effective assessment and implementation of hydropower projects and other water resource management projects depend on a thorough analysis of the discharge and hydrological storages (Bakken et al. ).

STUDY AREA
The Narayani river catchment lies in the central part of Nepal.About 13% of the total area lies in China (Figure 1 region).The Narayani river catchment lies also in the primary hydropower development region of Nepal, providing 44% of the total electric generation (Adhikari ).

Meteorological forcing data
Forcing variables used in the study include air temperature (T), precipitation (P), relative humidity (RH), wind speed (WS), and shortwave incoming solar radiation (S) at a daily resolution.Details of each dataset are described in the following sections.

Observed data
The DHM, GoN is responsible for collecting and disseminating hydrological and meteorological information for         The gridded forcing data have a coarser resolution than the simulation domain in the hydrological modeling framework of Shyft (Table 1).Accordingly, Shyft provides interpolation routines used for downscaling the forcing data-  2.

Hydrological model
The where P, Q, and E are the rate of precipitation (input from either snow melt or rain), discharge, and AE (from the snow-free area), respectively, in units of depth per time.
The idea behind this method is that the discharge sensitivity to changes in storage, i.e. g(Q), can be estimated from the time series of the discharge alone through fitting empirical functions to the data, such as the quadratic equation (Kirchner ), which is given by: where c1, c2, and c3 are the catchment-specific outlet parameters (hereafter called Kirchner parameters) obtained during the model calibration.There is no routing function in the model, but the Kirchner response function represents a delayed outflow from storage within the catchment.

Gamma snow
In Shyft, the Gamma snow routine, an energy balance approach (Equation (3)) for snow ablation, and the snow depletion curve following a gamma distribution are combined into a single routine.The energy balance approach in the Gamma snow routine is based on DeWalle & Rango () and also briefly explained in Hegdahl et al.
().The net energy flux (ΔE) at the surface available for snow ablation is expressed as follows: where S is the net shortwave radiation, L in and L out are the incoming and outgoing longwave radiations, H SE and H L represent sensible and latent heat fluxes, and E G is the net ground heat flux calculated using a bulk-transfer approach.
Two parameters defining the wind profile, intercept (wind constant), and slope (wind scale) are determined either by model calibration or as provided (Table 3).For a given time step (t), the snow albedo (α) at each cell depends on the minimum (α min ) and maximum albedo (α max ) as well as the albedo decay rate, temperature, and snowfall as described in Hegdahl et al. (): In Equation ( 4), FDR and SDR denote fast and slow snow cover decay rates, respectively.In this study, α max and α min are prescribed (refer Table 3).
Within the Gamma snow routine, precipitation falling in each cell is classified as solid or liquid depending on a threshold temperature (tx) and the actual cell temperature.
Snow distribution within each cell is estimated by using a three-parameter gamma probability distribution.The third parameter in the gamma probability distribution represents the bare ground fraction in the cell.Finally, snowmelt depth (mm/day) is calculated by multiplying ΔE (available energy) with the latent heat of fusion for water.
A temperature index model which does not require glacier ice albedo was used to calculate glacier melt (see Hock ).The glacier reservoir was assumed to be infinite, and the glacier area was assumed to be constant throughout the simulation periods.Within a glacier-covered cell, glacier melt only happens from the snow-free fraction.

Parameters and calibration
Hydrological simulation from distributed models such as Shyft generally requires the estimation of model parameters through calibration with measured data (Madsen ).In the PT_GS_K routine, there are 14 parameters, out of which eight parameters (Table 3: top eight parameters) have been found to be significantly more sensitive than the rest (Teweldebrhan et al. ) and were selected for calibration in this study.The remaining six parameters were prescribed (Table 3: lower six parameters).The precipitation correction factor (p_corr_factor), which is used to correct bias, was also set as a calibration parameter.
Manual calibration can be time-consuming and subjective; therefore, an automatic calibration was carried out.
Upper and lower limits for each parameter are shown in Typically, the procedure involves the selection of samples in the parameter space through the use of competitive evolution schemes, such as the simplex scheme, to reproduce better the observations.After several iterations, either due to convergence or when the maximum number of iteration is reached, the best set of parameter values based on the Nash-Sutcliffe coefficient of efficiency (NSE) is determined (Chu et al. ).
In this study, the model was calibrated and validated for each forcing dataset independently (hereafter named

Water balance estimation
A water balance analysis is a useful tool to describe the principal components of water in and out of a catchment (Rochester ), where the volume of water inflow should be balanced with water outflow, assuming no changes in storage (ΔS=Δt).The following water balance components: mean annual precipitation, discharge, and AE were calculated.Water balance components were calculated using calendar years, so that the change in storage is mainly a difference in storage between first and last days of simulation.
The annual change in storage was calculated according to the following equation: where P is precipitation, Q is the discharge, ET is actual evapotranspiration, G is the glacier melt, and the unit of measurement is mm/year.ΔS=Δt is the change in storage per time unit.

Model performance evaluation
To determine the agreements between observed and simu- Nash-Sutcliffe efficiency 1 Modified Kling-Gupta efficiency 1 Goodness of fit with respect to the benchmark series 1 n: total number of days in the evaluation period; Q sim and Q Obs : simulated and observed discharges; Q sim:std and Q obs:std : standard deviation for simulated and observed discharges; Q sim and Q obs : arithmetic mean for simulated and observed discharges; V sim and V obs : total discharge volume for simulated and observed discharges; Q bench : monthly long-term average observed discharge.

Meteorological forcing data analysis
As discussed in the section 'Reanalysis and regional climate model data', the resolutions of gridded datasets are different.
For the sake of comparison, Shyftinterpolated meteorological variables (for the period 2000-2009) for each forcing dataset were used.Temperature shows an overall agreement where all three datasets demonstrate similar seasonality and distinct cool or negative bias as compared to the observations during the winter and pre-monsoon seasons (Figure 3).The bias was the strongest in the CORDEX-SMHI data during the September-December period.Longterm spatially averaged temperature for July (Figure 4) shows that the CORDEX-SMHI was not able to capture higher daily average temperatures in the river valley.Daily average temperatures during July over the lower elevation region from ERA-I and CORDEX-SMHI were lower than the temperatures from the Observed and WFDEI interpolated datasets.What was most interesting is that negative catchment average temperatures, which were not at all captured by the interpolated observations (Figure 3), will have a significant impact on any simulation.We attribute the lack of negative catchment average temperatures to the lack of stations above 4,000 m a.s.l.combined with a too low-temperature lapse rate.This is a typical challenge when observations lack representativity to topography.
Precipitation varies notably among the forcing datasets.
WFDEI precipitation agrees well with the observation (Figures 3 and 4).The strongest positive bias was observed for CORDEX-SMHI and ERA-I datasets (Figure 5).() showed that the WFDEI is more representative for the observations.

Model parameters
Shyft was calibrated for each forcing dataset, and the calibrated parameters are shown in Table 5.Normal ranges for all calibrated parameters are given in Table 3.The   3.
Evaluation of discharge simulation using different forcing datasets   6).For NSE, NSE sqrt , and KGE, the ERA-I and WFDEI datasets were similar, but the volume differences during calibration were comparatively higher for ERA-I than WFDEI.Comparatively poorer model performance and higher volume differences (Table 6) were found  for the model calibrated with the CORDEX-SMHI dataset.
Goodness of fit with respect to the benchmark (G bench ) for ERA-I and CORDEX-SMHI during calibration periods was found negative, indicating poorer performance than the benchmark series.However, it should be noted the peak discharge was best simulated with CORDEX-SMHI data (Figure 7).
In Figure 8   Observed þ WFDEI forcing dataset resulted in discharge simulations with the best performance, it was not able to capture the highest observed discharge (simulated discharge points after 95% quantile).The simulations from CORDEX-SMHI were able to capture relatively high peak-discharge events (above the 99% quantile); however, they have large deviations from the observations between 90% and 95% quantiles.For the low flows (i.e.1-10% quantile), the simulation from the WFDEI forcing captures observation better than the Observed þ WFDEI forcing (Figure 9(c)).However, low-flow simulation from ERA-I until 5% quantile was better than the rest.
Similarly, in the validation period (Figure 9(b)), simulated discharge and observed daily discharge for values less than 90% quantile from all forcing datasets fit well to the observations.Above 95% quantile, most forcing datasets fail to capture these highest discharges.Interestingly, the CORDEX-SMHI, which generally provides lower performance (Table 6), notable in the lower flow range, manages to capture the highest flows best.However, these

Water balance analysis
Figure 10 compares the catchment average (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009) water balance components for the four different forcing datasets (the exact values are given in Table 7).Annual averages over the 10-year simulation period, 25.1%, 24.8%, 30.7%, and 17.8% of precipitation, were lost to evapotranspiration for Observed þ WFDEI, WFDEI, ERA-I, and CORDEX-SMHI datasets, respectively.Among all forcing datasets, Observed þ WFDEI shows lower runoff than the others.Higher runoff from CORDEX-SMHI forcing was observed, and it is associated with higher average precipitation during the monsoon period (Figure 3).The lowest (72.8 mm) total glacier melt during the simulation period was observed for the WFDEI dataset, while the highest  7.  5).An examination of the model parameters revealed that the calibrated values of the precipitation correction factor, wind scale, threshold temperature, and slow albedo decay rates differ a lot   The 'p_corr_factor' shown in Table 5 indicates that the CORDEX-SMHI was reduced by a factor of 0.4, and ERA-I precipitation was reduced by a factor of 0.41 during the calibration period to minimize bias in discharge estimation.

DISCUSSION
These numbers indicate that there were significant biases in these two forcing datasets and suggest that some kind of bias correction should be considered in any application.
We also notice that the precipitation correction factor for WFDEI was near one.This is reasonable and encouraging, as this dataset is already bias-corrected (using observed pre- Though we achieved high NSE, NSE sqrt , and KGE using WFDEI, ERA-I, and Observed þ WFDEI forcing datasets with reasonable D v (Table 6), the simulations were still not able to capture the highest discharges as seen in the QQplots in Figure 9.As compared to high discharge, low discharge events were well captured by all forcing datasets.
Local simulated peak discharge during the monsoon season may be attributed to less representativeness of the input precipitation data.Particularly, the lack of ability to capture intense localized precipitation events could be the main reason for an inability to simulate high discharge concurrent with the observations.The CORDEX-SMHI simulation generated relatively higher discharge peaks as compared with the other datasets but has overall higher positive biases during the pre-monsoon season (Figure 3).The CORDEX-SMHI forcing datasets could provide an opportunity for the peak discharge analysis in the Himalayan catchment.

Discussion on the water balance analysis
The catchment water balance component analysis (Table 7) revealed notable differences among the forcing datasets.The discrepancies can, to some degree, be attributed to the  The challenge using a model with many model parameters is that we might get a good model fit but a less robust model for prediction and forecasting due to over-fitting.
However, the uncertainty in the simulations will almost certainly be higher due to the increased uncertainty in the parameter values.In our study, the discharge was highly

SUPPLEMENTARY MATERIAL
The Supplementary Material for this paper is available online at https://dx.doi.org/10.2166/nh.2020.079.
dataset ERA-Interim (ERA-I) from the European Center for Medium-Range Weather Forecasts (ECMWF) (Dee et al. ) is widely used (Li et al. , ; Xu et al. ).
Hydrologic simulation of discharge and other water balance components (evapotranspiration, snow, and groundwater storage) (Bhattarai & Regmi ; Matt & Burkhart ; Li et al. ) is used for the analysis of available water resources both in the past and future analyses, but the quality of discharge simulation remains a challenge (Rochester ; Engeland et al. ; Kauffeldt et al. ).The choice of a suitable hydrological model and appropriate forcing data are critical for any analysis and will greatly affect the outcome (Kauffeldt et al. ).Currently, no standardized or community modeling framework within hydrology and prior studies of the water resources of the Himalayan region have been conducted using different types of hydrologic models ranging from simple conceptual models (e.g.Pradhananga et al. ; Bhattarai & Regmi ; Skaugen & Weltzien ; Bhattarai et al. ) to more advanced, distributed models (e.g.Pellicciotti et al. ; Jain et al. ).Hydrological modeling and water balance studies in Himalayan regions have taken a range of approaches and used different models to focus on topics such as glacier melt and retreat, water balance, flooding, and the impact of climate change (Bookhagen & Burbank ; Pellicciotti et al. ; Wortmann et al. ; Khanal et al. ).Recent studies have addressed the performance of large-scale forcing datasets, including the WFDEI and ERA-I, for discharge simulation in various regions across the world (e.g.Li et al. , ; Weedon et al. ; Essou et al. ; Nkiaka et al. ), but, to date, the use of these global forcing datasets to simulate discharge in a Himalayan catchment is limited.In this study, we use a distributed, conceptual hydrologic framework with an energy balance-based snow routine that has been demonstrated to perform well in the region (Xu et al. ; Hegdahl et al. ; Matt & Burkhart ).The main goal of this work is to evaluate the impact of four different forcing datasets on the simulated discharge and associated water balance assessment for the Narayani river catchment in Nepal.The forcing datasets include WFDEI, ERA-I, CORDEX, and a hybrid dataset referred to as Observed þ WFDEI (all datasets are described in the Data and Methods section).Our specific objectives are to evaluate: (i) the discharge sensitivity to the forcing dataset for hydrologic modeling inflow, (ii) model performance using bias-corrected versus non-bias-corrected forcing as input to a (calibrated) hydrological model, and (iii) the sensitivity to forcing data for water balance assessment.Furthermore, we assess the overall ability of the Statkraft Hydrological Forecasting Toolbox (Shyft) (Burkhart et al. ) to simulate discharge in this complex and data-sparse regions.
Figure 1 | Narayani river catchment with the location of the discharge station and meteorological stations (72 precipitation and 30 temperature) in the region (both within and outside the catchment boundary).
water resources, agriculture, energy, and other development activities in the country.Most of the measured data are based on the conventional manual observation.Measurements from the automatic weather station (AWS) are available only after the year 2000 (Karki ).The geographic location and installation date for each station are available from http://dhm.gov.np/meteorological-station/.Observed temperature and precipitation station data which are located inside or in the vicinity of the catchment (Figure 1) were collected from the DHM.Data from the DHM were manually plotted for each station and screened for data quality.Clearly, erroneous departures from the historical pattern were removed manually from the datasets.The highest temperature station elevation is 3,870 m a.s.l., located in Chhoser (DHM st.no 633), and the highest precipitation station elevation is 3,705 m a.s.l., located in Mustang (DHM st.no 612).To accommodate missing data, stations with less than 10 years of record or missing more than 15% of the observations were removed from the datasets.Ten precipitation stations (out of 82) did not meet this criterion and were removed from the datasets.Similarly, seven (out of 30) temperature stations were removed from datasets.This resulted in 72 precipitation stations and 23 temperature stations that were used for further analysis.Maximum numbers of stations with temperature and precipitation data were observed in the 2000-2009 period.Therefore, in this study, data for the period 2000-2009 were used.All the stations are located below 4,000 m a.s.l., covering only 60% of the total catchment area (Figure 2(a)).Generally, temperature and precipitation data from higher elevations are missing, which add uncertainty to the current study.Daily mean temperature and mean annual precipitation for each station are plotted versus station elevation in Figure 2(b).Temperature shows as expected a consistent decrease with elevation, whereas precipitation shows a more mixed picture, with no clear trend below approximately 2,000 m a.s.l., but a notable decrease in precipitation above, i.e. from about 2,000 to 3,500 m a.s.l.Normally precipitation increases with elevation (Daly et al. ) in the mountainous region due to the orographic effect, but in the Himalayan region, an opposite pattern has been reported after a certain elevation level (e.g.Nayava ; Kansakar et al. ), in agreement with what is observed for the Narayani river catchment.From the analysis, station average daily temperature and mean annual precipitation for the period 2000-2009 were 10.7 C and 1,292 mm/year, respectively.Discharge observed at the Narayanghat station is available from the year 2000-2009 with no missing values, and this is also the period used for the hydrological model calibration and validation.Daily average discharge for the period 2000-2009 was 1,482.5 m 3 /s.
averaged over the Narayani river catchment using the Shyft interpolation routines (see the section 'Spatial-temporal interpolation of forcing data') are shown in Figure 3.The catchment average mean temperature for the period (2000-2009) was 7.6 C, while catchment average annual precipitation was 4,660 mm/year.The WFDEI is a global forcing dataset obtained by downscaling and bias-correcting ERA-I data (Weedon et al. ; Raimonet et al. ).The temporal and horizontal resolution of the dataset is shown in Table 1.WFDEI has two sets of rainfall generated by using either Climate Research Unit (CRU) or Global Precipitation Climatology Centre (GPCC) precipitation correction methods (Weedon et al. ).In this study, GPCC-corrected data were preferred to CRU because of their higher resolution and data quality (Weedon et al. ; Raimonet et al. ).The WFDEI dataset is freely available online from ftp.iiasa.ac.at.Catchment average mean daily temperature and annual precipitation (for the period 2000-2009) were found to be 7.9 C and 1,764 mm/year, respectively.CORDEX is a program sponsored by the World Climate Research Programme (WCRP), to produce an improved generation of regional climate change projections (Giorgi et al. ).CORDEX has two datasets, referred to Evaluation and Historical.Evaluation is run within reanalysis and is used to 'mimic' observations (i.e.represent real weather), whereas Historical is run within a climate model, and the results can only be used in a climatological sense.Daily datasets of the Evaluation product over the study area are available for the South Asia CORDEX

Figure 2 |
Figure 2 | (a) Station altitude (temperature and precipitation) plotted against the hypsographic curve for the Narayani river catchment and (b) station average daily mean temperature and annual mean precipitation (for the period 2000-2009) plotted against altitude.Station name, number, and data availability are provided in the Supplementary Data.
domain (Region 6) from two institutional runs: Indian Institute of Tropical Meteorology (CORDEX-IITM) and Swedish Meteorological and Hydrological Institute (CORDEX-SMHI) (Sanjay et al. ).Based on a comparison of seasonal precipitation patterns over the study area, the CORDEX-SMHI was selected for this study.The horizontal resolution and available periods for the CORDEX-SMHI data are shown in Topographical and land cover datasetsIn this study, a digital elevation model (DEM) of 90 m spatial resolution from NASA's Shuttle Radar Topography Mission (NASA-SRTM) was used.The NASA-SRTM DEM is freely available for download from https://eros.usgs.gov/.The hydrology tool available in the software package, System for Automated Geoscientific Analysis (SAGA) (Conrad et al. ) under Qgis-2.18(QGIS Development Team ), was used for automatic catchment delineation.Catchment delineation was based on the gauge point at Narayanghat and NASA-SRTM DEM.Catchment slope
sets to the model grid cell scale.Data interpolation methods were selected based on prior studies by Sapkota (), Lombraña (), Matt et al. (), Matt & Burkhart (), and Teweldebrhan et al. ().Observed temperature and gridded data were extrapolated/interpolated by three-dimensional Kriging where elevation is the third dimension (Oliver & Webster ).Observed precipitation, gridded precipitation, global radiation, WS, and RH data were extrapolated/interpolated by inverse distance weighting (IDW) (Shepard ).A precipitation gradient of À0.07 mm/100 m (Table 2) and a temperature gradient of À0.6 C/100 m are used during interpolation.The precipitation gradient was calculated from the observed precipitation data.The different interpolation methods along with their parameter values are presented in Table Shyft modeling framework was used to simulate daily discharge from the catchment.The modeling framework has three main hydrologic modeling routines (https:// gitlab.com/shyft-os).These three models are different in the way to calculate evapotranspiration and snow estimation and melt.In this study, the PT_GS_K model was used for discharge simulation.The PT_GS_K model uses the Priestley-Taylor (PT) method (Priestley & Taylor ) for estimating potential evapotranspiration, the Gamma snow routine (GS; described in the Gamma Snow section) for snowmelt, sub-grid snow distribution and mass balance calculation, and a simple storage-discharge function (K) for catchment response calculation (Lambert ; Kirchner ).Actual evapotranspiration (AE) was assumed to take place from the snow-free area and was estimated as a function of potential evapotranspiration and a scaling factor.The catchment response function 'K' is based on the storage-discharge relationship concept described in Kirchner () and represents the sensitivity of discharge to storage changes as given by the following equation.Generally, discharge is nonlinear and varies by many orders of magnitude, and it is recommended to use log-transformed

'
independent calibration mode').The model calibration was based on observed discharge data from 2000 to 2004, and validation was based on data from 2005 to 2009.
lated discharges, NSE (Nash & Sutcliffe ), Kling-Gupta efficiency (KGE) (Gupta et al. ), square-roottransformed NSE (NSE sqrt ) (Seiller et al. ), benchmark series (G bench ) (Seibert ), and percentage volume difference (D v ) (Martinec et al. ) were used (Table 4).The NSE, NSE sqrt , and KGE efficiency measures are used to access the predictive power of the hydrological model and can range from À∞ to 1.An efficiency of 1 corresponds to a perfect match of simulated and observed discharges.To improve the estimation of the performance error, KGE considers three components: bias (α), variability (β), and linear correlation coefficient (r) to overcome the problems associated with NSE, i.e.NSE results in the underestimation of the streamflow variability and the runoff peaks (Gupta et al. ).NSE sqrt gives more emphasis to the overall agreement between observed and simulated discharges (Seiller et al. ; Peña-Arancibia et al. ) as compared to NSE, so it is also included as an evaluation criterion.Percentage volume difference (D v ) gives the percentage bias between the simulated and observed series and can range from À∞ to þ∞.D v equals to zero indicates a perfect agreement between simulated and observed discharges.Since the observed discharge shows strong seasonal patterns, we also used a benchmark series (G bench ) for model evaluation.The monthly average discharge was used as a benchmark series.G bench is negative if the model performance is poorer than the benchmark, zero if the model performs as well as the benchmark, and positive if the model is superior, with a highest value of one for a perfect fit.In this study, we aimed to achieve NSE > 0.7, KGE > 0.7, NSE sqrt > 0.7, G bench > 0.5, and D v within ±15%, during both the model calibration and validation periods.
Figure 4 | Distributed mean daily Shyft-interpolated temperature ( C) for July over the Narayani river catchment for each forcing dataset.

Figure 5 |
Figure 5 | Distributed daily mean Shyft-interpolated precipitation (mm) for July over the Narayani river catchment for each forcing dataset.

Figure 7
Figure 7 compares daily observed and simulated discharges using the different forcing datasets.Calibration and validation results in terms of error statistics are shown in

Figure 7 |
Figure 7 | Calibrated and simulated daily discharge for the Narayani river catchment for the period 2000-2009.
, the year 2004 was plotted to show in more details the representation of the annual cycle by the different forcing datasets.Overall, all datasets were able to simulate the cycle well; however, the CORDEX-SMHI simulation deviates somewhat from the rest and shows a higher peak flow early in the wet season.Quantile-Quantile (QQ) plots for the different forcing datasets are shown in Figure 9.The QQ-plot for the calibration period is shown in Figure 9(a), for the validation period in Figure 9(b).To highlight low flows, Figures 9(c) and 9(d) show the QQ-plot on log-scale.The QQ-plot is a graphical technique to determine if two datasets come from the same population with a common distribution (Renard et al. ).Departures from the 1:1 reference line indicate discrepancies between the simulated and the observed discharge.Figure9(a) shows that all simulations slightly overestimate discharge (with respect to the observed discharge) up to a value of around 4,000 m 3 /s.Between 4,000 and 8,000 m 3 /s, the CORDEX-SMHI simulations were higher than observed, whereas other simulations were lower than observed.After 8,000 m 3 /s, CORDEX-SMHI simulations were able to capture peaks where others simulations were lower than observed.Although the

Figure 8 |
Figure 8 | Daily observed and simulated discharges from different forcing models for the year 2004.

Figure 9 |
Figure 9 | QQ-plots of simulated versus observed discharge for the Narayani river catchment during model (a) calibration and (b) validation.Similarly, the QQ-plot for log-transformed observed and simulated discharges during model (c) calibration and (d) validation.The vertical lines denote the 1%, 5%, 10%, 90%, 95%, and 99% quantiles of the observed daily discharge.

(
297.3 mm) was observed from ERA-I.The change in storage (Equation (5)) shows that 73.1, 32.7, 68.7, and 29.2 mm of water were surplus (a positive change in storage) in the catchment for Observed þ WFDEI, CORDEX-SMHI, ERA-I, and WFDEI forcing data, respectively.The largest glacier melt contribution (19%) to total runoff was observed for ERA-I forcing, while the lowest was observed for WFDEI (4%).Different water balance components and their associated changes in storage are shown in Table Discussion on model parameters As with any hydrologic model, the model used in this study is sensitive to the parameters; particularly the Kirchner coefficients, precipitation correction factor, and the threshold temperature (Teweldebrhan et al. ).While calibrating the model, different values of the parameters were obtained for different forcing datasets (Table

Lakew
et al. () pointed out that the model simulations are highly sensitive to the precipitation correction factor, which helps adjusting systematic bias embedded in the precipitation forcing.The precipitation correction factor may account for both under catch and lack of representative stations (Lakew et al. ) and is directly related to the bias in precipitation data.Prior studies by Engeland et al. () and Teweldebrhan et al. () suggested that the Kirchner parameters are highly sensitive since they have a large influence on a given simulation.We found that the variance for Kirchner response parameters was less than 0.05.An explanation is that these parameters are not dependent on the forcing datasets but rather on the physical catchment characteristics and thereby the concentration time of the catchment.Kirchner response parameters are dependent on the physical catchment characteristics and thereby the concentration time of the catchment and are therefore less sensitive to the forcing.Potential factors controlling hydrological model efficiency during model calibration and validation Different forcing datasets lead to different levels of performance efficiency in terms of discharge simulations (Table 6).Explaining these differences is not easy, as each forcing dataset has different characteristics such as the spatial resolution and the assimilation methods.Figures 3-5 present the catchment average monthly mean forcing variables for the different forcing datasets.From the figures, we can see that in general, the spatio-temporal distribution of precipitation from WFDEI was quite similar to the interpolated observations.The extent of higher precipitation than observation during pre-monsoon and monsoon seasons was larger for ERA-I and CORDEX-SMHI.Overestimation of precipitation during the monsoon season identified for CORDEX-SMHI in the Himalayan region (Ghimire et al. ) and for the ERA-I over the Indus catchment (Dahri et al. ).Moulin et al. () and Zhu et al. () indicate that a poor representativeness of precipitation over the catchment is a major source of uncertainty in discharge simulation.Despite some of the biases in the forcing data, the performance of the model during calibration and validation showed that all datasets performed reasonably well.The good agreement between simulated and observed discharges during the model calibration and validation periods (Figure7) is to a large extent due to the application of the precipitation correction factor.The multiplicative correction factor to the precipitation removes a substantial proportion bias in both precipitation and simulated discharges.It should be clarified here that such an adjustment of bias in discharge (through the input forcing correction) is an 'ad hoc' procedure to overcome the quality of forcing data.
cipitation).The precipitation correction factor for observed datasets was 1.34, indicating that the areal precipitation is underestimated.This underestimation is likely caused by a combination of undercatch of the precipitation gauges and the lack of representative observation stations.The undercatch depends on the precipitation gauge, wind, and precipitation phase (Mekonnen et al. ; Zhao et al. ), and the non-representativeness is caused by the limited number of observation stations at high altitudes.Better performance in discharge simulation by the application of a correction factor to precipitation was also shown by previous studies Larson & Peck (), Lawrence et al. (), and Lakew et al. () and is a standard practice in the hydrologic analysis.The improvement of simulation from all forcing datasets versus prior results presented by Bhattarai et al. () using the HBV model (Bergström ) in the same catchment is likely attributed to the distributed nature of Shyft.
uncertainty inherent in each forcing dataset.As mentioned by Nkiaka et al. (), different precipitation in each forcing dataset can strongly influence the optimized parameters that control the rates and threshold of hydrological processes in the catchment.Although measured evapotranspiration data are not available for comparison, the percentage of evapotranspiration estimates from WFDEI (24.79%) was found to be similar to the previous study by Sakai et al. (), Giertz et al. (), Sintondji et al. (), and Ragettli et al. () in similar Himalayan catchments.The lowest evapotranspiration (399.5 mm/year) was found for the CORDEX-SMHI forcing dataset, which can be associated with high RH, and lower average shortwave radiation (Figure 3) than other forcing data.Lower evaporation (449.6 mm/year) and lower precipitation (1,771.1 mm/year) from the Observed þ WFDEI than WFDEI may partly result from precipitation under-catch in the catchment.Higher positive storage changes were found for the ERA-I and Observed þ WFDEI forcing datasets as compared to CORDEX-SMHI and WFDEI.As compared to Observed þ WFDEI forcing dataset, a lower storage change for ERA-I was observed and results from higher evapotranspiration in ERA-I.A change in storage was found smaller than the glacier melt for all forcing datasets.A smaller change in storage is due to the fact that it originates from snow that accumulates over the year and contributes to the glacier mass balance.Smaller changes in storage also indicate that the glacier mass balance was negative over the simulation period.Previous studies by Kulkarni (), Cogley et al. (), Bolch et al. (), Wagnon et al.
seasonal and particularly precipitation dependent.Inclusion of the precipitation correction factor as a calibrating parameter improves the predictability of the model at the expense of increases to the uncertainty in water balance components like evapotranspiration and snowmelt.However, only a single value of the precipitation correction factor (p_corr_factor) was applied.More complex precipitation correction could include seasonality and impact of orographic effects, though we feel this would result in greater uncertainty.Nonetheless, uncertainties remain.First, uncertainties in the observed data result from the uneven spatial distribution and few monitoring meteorological stations, which are mainly located in lower elevation regions (Figure 2).Furthermore, many observations are manual.This poses a unique source of uncertainty in measurements, not necessarily greater or less from automated.Furthermore, there are numerous sources of errors associated in the establishment of the river stage and parameterization of rating curves.During model calibration, we assumed that observed discharge is correct.Some of deviations between model simulations and observations might actually be explained by errors in the observations and not in the model or the forcings, to assess the uncertainty in streamflow observations with required detailed knowledge about local river profiles and data used to establish the rating curve and it is outside the scope of this paper.Second, uncertainties might originate from the estimation of AE and other water balance components, which again is influenced by the precipitation estimates and other forcing data used for driving the model.And thirdly, there are the uncertainties in the model formulation itself.The model in this study assumed that the discharge in the river depends on the amount of water stored in the catchment, and we did not consider the impact of industrial and households' water consumption or any other regulation.Despite this, the simulation performance achieved by the model presented here is quite good, though the results are limited to the one catchment in Nepal.A further examination should evaluate whether the selected forcing datasets could be applied to simulated discharge at other mountain regions with longer time periods.CONCLUSIONS We aimed to identify the quality of discharge simulation for the Narayani catchment of Nepal based on different forcing datasets.The forcing datasets WFDEI, CORDEX-SMHI, ERA-I, and ground-based observation combined with WFDEI (i.e.Observed þ WFDEI) were used to estimate the Naryani catchment discharge on a daily basis for the period of January 2000-December 2009.Because of the uniform coverage and data consistency, global forcing datasets from global and regional climate models were considered an important supplement to station data.In this study, the distributed Shyft hydrologic simulation platform was selected as a discharge simulation tool.The forcing data were interpolated using IDW for generating daily observed precipitation, while a gradient-based Kriging method was used for generating temperature fields matching the model resolution.In this study, Shyft was calibrated for the period of 2000-2004 and validation was done from 2005 to 2009.Our analysis showed that large differences exist between different forcing datasets particularly in the amount of precipitation.Precipitation from ERA-I and CORDEX-SMHI was unrealistic leading to poor model performance.To improve the existing bias in precipitation particularly over the high mountain regions, further validation and algorithm improvements are required.With the application of a precipitation correction factor (p_corr_factor), a significant amount of bias in precipitation could be mitigated.Still, the performance of the bias-corrected forcing data, i.e.WFDEI, was better than the climate model datasets.Comparing model performance during model calibration and validation periods, relatively higher NSE, NSE sqrt , and KGE with lower D v were found for the Observed þ WFDEI and WFDEI forcing datasets.The water balance analysis shows that higher evapotranspiration with large glacier melt is observed for the ERA-I forcing dataset, while the average evapotranspiration calculated from the WFDEI (24.79%) is similar to previous studies.Therefore, based on the different results from the Shyft, we conclude that, in the data-poor Himalayan catchment, the WFDEI forcing dataset may be the best choice for water resource planning and hydropower inflow calculations.Discharge simulations resulting from the WFDEI forcing data were particularly promising for hydropower estimation and water resource assessment in data-scarce or ungauged regions.CORDEX-SMHI data captured higher peaks and may be suitable for the peak discharge analysis.However, to use CORDEX-SMHI and ERA-I in ungauged catchments for the water balance analysis, bias correction is required.Further analysis by implementing Shyft in the different Himalayan catchments for different forcing datasets is a recommended further step to assess the regional extensibility of the current results.FUNDING This work is a contribution to the Strategic Research Initiative 'Land Atmosphere Interaction in Cold Environments' (LATICE) of the University of Oslo and partially supported through the Norwegian Research Council's INDOR program under the Hydrologic sensitivity to Cryosphere-Aerosol interaction in Mountain Processes (HyCAMP) project (NRF No. 222195) ACKNOWLEDGEMENT We are thankful to the Department of Hydrology and Meteorology, Government of Nepal for providing observed hydrological and meteorological datasets.We also gratefully acknowledge Special Issue Editor Dr. Kolbjørn Engeland and anonymous reviewer for their comments on the research and manuscript.

Table 1 |
Summary of selected forcing datasets (n refers to the number of grid cells in the Narayani river catchment)

Table 2 |
Parameter values used for IDW and Kriging interpolation

Table 3 .
CREST v.2.1, the Shuffle Complex Evolution Uni- versity of Arizona (SCE-UA)(Duan et al. ), is used as the kernel algorithm in the automatic calibration process.

Table 3 |
Model calibration parameters with upper and lower limits In the table, 'K' is the catchment response function; 'GS' is the Gamma snow; 'PT' is the Priestley and Taylor; and 'AE' is the actual evapotranspiration.

Table 5 |
Estimated parameter values from the independent calibration mode for the four

Table 6 |
Performance statistics for different forcing datasets

Table 7 |
Water balance components for different forcing (numbers in mm/year)