Improving real-time operational streamflow simulations using discharge data to update state variables of a distributed hydrological model

Reducing errors in streamflow simulations is one of the main issues for a reliable forecast system aimed to manage floods and water resources. Data assimilation is a powerful tool to reduce model errors. Unfortunately, its use in operational chains with distributed and physically based models is a challenging issue since many methodologies require computational times that are hardly compatible with operational needs. The implemented methodology corrects modelled water level in channels and root-zone soil moisture using real-time water level gauge stations. Model’s variables are corrected locally, then the updates are propagated upstream with a simple approach that accounts for sub-basins’ contributions. The overfitting issue, which arises when updating a spatially distributed model with sparse streamflow data, is hence here addressed in the context of a large-scale operational implementation working in real time thanks to the simplicity of the strategy. To test the method, a hindcast of daily simulations covering 18 months was performed on the Italian Tevere basin, and the modelling results with and without assimilation were compared. The setup was that currently in place in the operational framework in both cases. The analysis evidences a clear overall benefit of applying the proposed method even out of the assimilation time window.


INTRODUCTION
Hydrological models are widely used in early warning systems (EWS) and monitoring systems (MS) for their capability to simulate streamflow at both gauged and ungauged locations. However, despite the continuous improvements in the description of the physical processes and in the calibration strategy, uncertainties and errors are sometimes large in reproducing streamflow (e.g., Beven & Binley 1992;Vrugt et al. 2003;Bastola et al. 2011), especially in periods when the model diverges from observations. This condition can be due to several reasons: inaccurate description of meteorological forcings (rainfall, temperature, etc.), limits of the model to describe real world, model parameterization not optimized for that specific period, numerical approximations (e.g., Mantovan & Todini 2006;Silvestro et al. 2015;Sivapalan 2018). Moreover, in many cases the mismatch between observed and modelled streamflows lingers for long periods of time because of the large time scale of various physical processes' evolution, such as sub-flow in the root-zone and base flow formation. Accordingly, once the model diverges from observations, it might need a great deal of time to return to reliable results.
Assimilating external data is one of the approaches followed by hydrologists in the last two decades to face all the aforementioned issues. In many cases these data are retrieved from remote sensing and they are used in both assimilation schemes and calibration process. Brocca et al. (2010) and Laiolo et al. (2015) presented techniques to assimilate soil moisture satellite data in hydrological models and the impact on simulations. Campo et al. (2012) explored the possibility of using land surface temperature (LST) satellite observations to produce soil moisture maps. Jiang et al. (2020) used remote sensed evapotranspiration to calibrate a hydrological model. Dembélé et al. (2020) proposed a multivariate calibration framework based on multiple data sets.
Many of these works drove a general improvement in model states and fluxes such as soil moisture and evapotranspiration, but often the enhancement in terms of streamflow is not comparably stable. The performances of data assimilation depend on the season, the specific morpho-climatic conditions of the basin, and the dimension of the basin in comparison to the spatial resolution of the assimilated data.
Assimilating streamflow data is an appealing option, since discharge is one of the most important variables in a wide range of hydrological applications, such as flood forecasting and runoff volume estimation for water management and power production purposes. Assimilating streamflow observations poses two main challenges. The first one is the natural time lag issue (Li et al. 2013(Li et al. , 2015Meng et al. 2017) and the second one concerns the direct applicability to spatially distributed models. In fact, streamflow data are, contrary to remote sensed data such as soil humidity or evapotranspiration, available only at some sparse points of measurement and they need to be related to physical processes which have a strong spatial link and variability, such as runoff formation and water balance in soil medium. Nevertheless, several studies have proved the potential of focusing on streamflow data when the main goal is to improve the simulation of river discharge. Clark et al. (2008) assimilated streamflow observations with the ensemble Kalman filter approach, obtaining interesting results as well highlighting the difficulties in propagating the correction in ungauged portions of the basins. Wang et al. (2018) used data assimilation techniques by merging the modelled river discharges and the observations from the Global Runoff Data Centre to obtain optimized discharges in the study basin, moreover they accounted for 'model systematic errors' and 'human impacts' (dam operation, irrigation, etc.) through an optimization parameter (with annual variation), applied to correct model intermediate variable runoff and drainage in each sub-watershed. Castelli & Ercolani (2016) and Ercolani & Castelli (2017) showed a good improvement in flood events' reproduction by applying variational techniques to assimilate streamflow data at multiple locations in a distributed hydrological model. He et al. (2019) used an ensemble transform Kalman filter technique to assimilate streamflow and groundwater head measurements while, recently, Fisher et al. (2020) presented a method to propagate the observed information across all reachable parts of the river network (up/downstream from gauging point) and all reachable times (before/after observation time). Although employing different techniques, all the previous studies share the characteristic of having implemented assimilation strategies whose level of complexity ranges from medium to high, and this fact has triggered the present work. Our overall aim is, indeed, to develop a procedure for streamflow assimilation which is simple, robust and computationally fast enough to be actually included in an existing operational flood forecast modelling chain working at the national scale in Italy (Bruno et al. 2021) that uses a spatially distributed and continuous hydrological model (https://www.cimafoundation.org/foundations/research-development/flood-proofs.html). The flood forecast chain works in real time and it is available among the products used as support for civil protection purposes in the web GIS platform DEWETRA. DEWETRA is a system for real-time monitoring of natural hazards that allows to synthesize, integrate and compare data and models for instrumental monitoring, supervision and shared assessment of risk scenarios and their possible evolution (https://www.cimafoundation.org/foundations/projects/dpc.html). The hydrological model produces, in about 400 river sections around all of Italy, discharge forecast using as input meteorological observations and meteorological model forecast. In accordance with the characteristics of the modelling chain, the assimilation strategy needs to allow a straightforward identification of its impact on the simulated discharges, so that the forecasters can interpret the results of the modelling chain comprehensively and reject possible malfunctions promptly. Furthermore, given the spatially distributed nature of the hydrologic model combined with the extent of the domain and the average spatial resolution in the operational setup, the assimilation strategy must be also computationally fast in respect to the currently available resources. We thus present an approach which is operationally usable with a spatially distributed hydrological model although data assimilation is predominantly implemented in the operational practice with lumped codes (Seo et al. 2009;Chen et al. 2012). This means that our strategy might represent a simple way to deal with the issue of the high dimensionality of the inverse problem caused by the update of spatially distributed models on the basis of sparse observations of streamflow and eventually causing overfitting.
The method we propose updates both the modelled water level in river channels and the modelled soil moisture in rootzone. Specifically, the water level is corrected by comparing directly observed and simulated streamflows and finally inverting the motion equations. The approach, for water level, has its roots in the direct insertion method (Daley 1991), but, through further assumptions, it leads to an update of the streamflow and the water level on every point of the basin although on the basis of point data. In addition, it is designed to be robust in respect to instantaneous spurious values of the observed streamflow, which is a crucial characteristic for the operational usage. The second part of the method updates the soil moisture, which is corrected sequentially with an iterative procedure that minimizes the error between the observed and simulated streamflow on a predefined 'correction' time window. The approach was tested through a hindcast run of 18 months which uses a framework mimicking an operational implementation on the Tevere river basin (Central Italy), namely, updating the model states daily and adopting an hourly time step in the model. Results evidence systematic improvements in both the modelled surface flow volumes and the peak flow magnitude in respect to the open loop model configuration.

Study area and data
The Tevere river basin is in central Italy and is the second largest catchment of the country with a drainage area of about 17,000 km 2 . The head of the catchment is in the Apennine mountains between Toscana and Umbria regions, and the main river crosses the Italian peninsula from north to south for about 400 km. Four main tributaries increase the drainage area of the Tevere river: Topino, Paglia, Nera-Velino and Aniene, moving from north to south. The river flows into the Thyrrenian Sea at about 30 km out of the city of Rome after having crossed it. The Tevere river is the main source of hydraulic risk for the Italian capital. The mountainous part of the basin is mainly covered by forest and bushes while the flat areas and alluvial floodplains are often used for cultivation or pasture. Various cities and small inhabited areas reduce the natural permeability of some sub-areas, and cover about 3.7% of the catchment. Annual rainfall is about 800-900 mm but varies from a minimum of 600 mm up to 1,600 mm in the areas located at higher altitudes; the average annual temperature is around 15-16°C at altitudes lower than 300 m a.s.l. and drops to 6-8°C above 1,400 m a.s.l. The rainfall regime is sub-coastal and is influenced by the Apennine mountains, with two minimums of monthly precipitation in summer and winter and two maximums during autumn and spring.
Data used as input for the hydrological model are provided by the Italian national gauge network composed of the collection of the regional networks (Molini et al. 2009). Data with hourly time step are available for rainfall, air temperature, wind velocity, air humidity and solar radiation. Radar rainfall data from the National Radar Mosaic of Italy are used together with rain-gauge data in a merging algorithm that allows rainfall fields exploiting different sources of data to be produced (Sinclair & Pegram 2005;Silvestro et al. 2016;Bruno et al. 2021). This guarantees to use all the available and reliable sources of data for rainfall estimation. The other meteorological forcings are interpolated using the inverse distance weighted method, except temperature, that uses a linear regression with elevation.
Moreover, the present study uses observations from nine level gauges ( Figure 1) whose stage-discharge curves are sufficiently reliable. Each level gauge is associated with an influence area, defined as the drainage basin of the level gauge net of the drainage area of the upstream gauges. This information is used in the assimilation process. Figure 1 shows the study area, the geolocation of the level gauges and the corresponding influence areas.

Hydrological model
Continuum is the hydrological model operational in the forecasting chain and used for the experiment. It is described in Silvestro et al. (2013Silvestro et al. ( , 2015, and has been used for many research studies (e.g., Laiolo et al. 2015;Cenci et al. 2016;Corral et al. 2019;Poletti et al. 2019). The model's codes are available on github https://github.com/c-hydro/hmc-dev and an overview and the documentation of the Continuum model and its ancillary tools can be found at https://hmc.readthedocs.io/en/latest/.
Continuum is a continuous and fully distributed model that solves energy and mass balance equations on a regular grid, and simulates hillslope routing based on the flow directions extracted from a DEM (see Giannoni et al. 2005). The following subsections summarize the main aspects of the sub-surface flow, as well as the surface flow schemes implemented for the present application; both these processes are involved in the assimilation strategy.

Root-zone mass balance and sub-surface scheme
The mass balance and infiltration scheme for the root-zone is based on a modification of the Horton equation and it is applied assuming that the sub-surface flow follows the same paths of the surface flow (Gabellani et al. 2008). The full set of equations is widely discussed in Silvestro et al. (2013), while here we present only those needed to understand how the state variables are modified by the assimilation algorithm.

Hydrology Research Vol 52 No 6, 1241
The infiltration capacity is described by: where f 0 is the maximum infiltration rate when the soil is dry, c f is a parameter that describes the infiltration in saturation conditions (Mishra & Singh 2003), V max is the maximum infiltration storage, V(t) the state variable representing the water volume stored in the root-zone at the time t. The schematization in Figure 2 summarizes the physical processes, pointing out that this state variable has a great impact on runoff formation in the model and is the main driver of the surface flow volume feeding. Note that V(t) influences also the percolation flux, together with the parameter c t which is related to the field capacity of the soil.

Surface flow scheme
In the present work, the scheme of the surface flow routing was slightly modified in respect to the original one (Silvestro et al. 2013). The study area is represented by a grid of cells with the same size of the DEM, and each cell is supposed to include both a part with a channel behaviour and a part with a hillslope behaviour. The percentage of each portion is estimated with a morphologic formulation which depends on the drainage area (Liu et al. 2005;Caissie 2006;Coccia et al. 2009). As reported in Figure 3, the total dimension of the cell dX is the sum of the channel width b and the hillslope width L.
Similarly to Silvestro et al. (2013), the equation that describes the motion on the channel portion is: where QC is the discharge in the channel, b the width of the channel, u c the parameter of roughness, if the slope, and Hc the water level in the channel.
In the hillslope portion, the Cauchy equation is used: where QH is the discharge on the hillslope, L the width of the hillslope, u h the parameter of roughness, if the slope, Hh the water level on the hillslope. The runoff computed by the infiltration method and the mass balance equation of the root-zone is routed downhill through Equations (2) and (3), and QC is routed in the downstream cell as channel flow. Then, QH is decomposed into two components using an approach inspired by the downscaling of the topographic index (Pradhan et al. 2006). Accordingly, QHC  is the portion of QH which flows in the channel and is estimated as: where the parameter Phc depends on the channel width (b) and on the spatial resolution of the model (dX), and varies in the range [0, 1] according to: where 100 and 0.35 are two coefficients that have been chosen based on some synthetic experiments that allows to obtain similar model responses in terms of hydrograph varying the spatial resolution dX once fixed for a set of parameters (Arora et al. 2001). They are considered a characteristic of the model and they are not involved in the calibration process. QHH is the hillslope surface flow which flows in the downstream cell, and is estimated as: Implementation and calibration on the study area The Continuum model was implemented on the Tevere river basin using the Shuttle Radar Topographic Mission (SRTM) DEM; this latter was aggregated from 0.00098°to 0.005°, so that the final spatial resolution is about dX ¼ 480 m. The CORINE Land Cover (http://www.sinanet.isprambiente.it/it/progetti/corine-land-cover-1) was used to parametrize the sub-surface water storage (Silvestro et al. 2013). Parameters related to the energy balance are set similarly to Silvestro et al. (2018) and are based on Caparrini et al. (2004), as well as on the leaf area index (Silvestro et al. 2018). The other parameters were calibrated for each sub-catchment upstream each level gauge reported in Table 1, beginning at the upstream basins and going to downstream sections with a sequential approach. Once the head catchments are calibrated, the calibration process acts on the contributing area between the considered section and the upstream one only, as this allows spatial coherence in the parametrization of the model to be maintained. The calibration method is a simple brute force approach similar to the one used in Davolio et al. (2017). The target is minimizing a multi-objective function (Madsen 2000) that combines two skill scores presented in the Supplementary Material: the Nash-Sutcliffe efficiency and the relative error on peak flows. The calibration involved the parameters listed in Table 2, and was carried out for the period 2013-2014, while the validation was done on the period 2012-2017 to have a long validation time window but removing the period used for calibration.

Assimilation methods
The strategies for updating the state variables implemented and applied in this work are thought to be simple and usable in operational systems. Following such requirements, the first step in the methods is to select only gauging stations with highly reliable streamflow observations in all hydrologic regimes. Although completely offline and one-time performed, this step is as essential as the other parts of the procedure for a successful implementation of the methods, since it allows the assumption that there is no uncertainty in the observations during the updating. As evidenced in the following two subsections we accounted only for possible errors in the single measurement at a specific time, and this is done implicitly by considering predefined time windows in the updating procedures and not measurements that occurred on a single instant.
Assimilating streamflow observations to update the water level The basic core of this first assimilation scheme is the direct insertion method (Daley 1991). The basic idea is to correct the modelled value for all the cells corresponding to a level gauge through influence areas, based on the observed value at this level gauge. This simple procedure can be subdivided into two separate steps. First, the following ratio is calculated for each j-th of the J reference level gauges, at a specific time t 0 : where Q is the streamflow, and m and o stand for model and observation, respectively. Then, the modelled streamflow in the cell corresponding to the level gauge is multiplied by this ratio. As described in subsection 'Study area and data', an influence area was assigned to each level gauge. By applying the ratio of the j-th level gauge to each cell within its influence area, we extend the spatial domain in which it can be considered representative for the mismatch between modelled and observed discharge, and, in practice, we carry out a bi-dimensional correction of the streamflow: where Q ma,j (x,y,t 0 ) is the streamflow after the assimilation of all the cells belonging to the influence area j. The hypothesis is that all the pixels of the influence area j can be corrected with the same correction rate. As anticipated, the strategy we present includes also a modification on the temporal aspect of the calculation of the ratio. The value of the Rate,j to be used on the modelled streamflows at the time t 0 is indeed estimated considering a fixed time window around t 0 and not exclusively the instant t 0 . This is done to take into account several effects: the uncertainty of the model streamflow propagation time in respect to the observations, the uncertainty of the level gauge observations that affect the single streamflow estimation, as well as possible errors in level-gauge rating curves for specific range values (this latter applies when the observations vary significantly inside the considered time window). The value is hence estimated as: where N is the number of time steps around t 0 , and in this application is fixed to a value of 6 hours. This is an arbitrary value defined on the experience and considerations about the average drained area of the level gauges. A possible future refining could be to consider different values of N for each level gauge. Finally, Equation (2) is inverted in order to obtain the water level consistent with the updated streamflow: Note that this latter step is required because the state variable in the model is the water level and not the streamflow. The water level on the hillslope part of the cell is updated by applying the same correction rate of the channel level.

Assimilating streamflow observations to update the root-zone soil moisture
The assimilation technique is basically an iterative process that modifies the V(x,y,t) field minimizing a cost function. The cost function is the bias between observed and simulated streamflow volume over a predefined window of K simulation time steps. For each j-th of the J reference level gauges the cost function is defined as: The following term is then calculated as: where ɛ is a numerical parameter set to 0.2, and dDDcorr is as much larger as much as BIASV is large. The final corrective term is calculated considering the sign of BIASV.
If BIASV,0: If BIASV.0: The corrected status variable V(x, y, t 0 ) at the beginning of the simulation, on the influence area of j-th level gauge is estimated as: A new run of the model is performed and then the correction is iteratively applied M times. However, the iterative process is stopped on j-th influence area if the following condition is respected before the M iterations are completed: In the presented application M¼5, BiasT¼0.05.

Combining the assimilation schemes
The two described techniques were combined to explore the benefit of each approach; since they act on two different state variables, it is reasonable to expect that errors and uncertainties could be due to the poor description of one of the two related physical processes, depending on the specific period of the simulation. In some cases, the runoff propagation has a major role while in others, runoff formation related to the soil saturation prevails.  The two methods are executed sequentially as described by the following points: 1. Assimilation of streamflow data to correct water levels Hc(x,y,t) and Hh(x,y,t).
2. New run of the model with updated water level. 3. Iterative process to correct soil moisture in the root-zone V(x,y,t).
This approach should avoid forcing the model state variables to be corrected for compensating errors occurring in physical processes where they are not involved. For example, we could force the model to correct variables attributing the low performance to a poor soil moisture description while, in some cases, it could be due to a poor description of the propagation of the surface flow.
A model run for each day was done, for a total of 547 simulations (N H ), with the following characteristics: 1. The time of the simulation is set at 9:00 UTC of each day, this time is named t run . 2. The start time of the simulation is set exactly 2 days before t run , and this time is named t 0 .
3. The correction of the model state variables is done at time t 0 . 4. The time window of the analysis to correct the water level is set to N ¼ 6 hours. 5. The time window of the analysis to correct the soil moisture is set to K ¼ t run À t 0 ¼ 57 hours.
The hindcast is carried out with the following four configurations, and the respective results are compared: 1. Open loop (OL) configuration, i.e., no assimilation is performed. 2. Assimilation of streamflow data to correct the water level (DA WL) only. 3. Assimilation of streamflow data to correct the soil moisture (DA SM) only. 4. Combination of points 2 and 3, i.e., update of both the water level and the soil moisture (DA WLSM).
Note that in all the three assimilation configurations the update is applied to state variables already corrected at the previous step of the hindcast experiment, with a sequential approach; thus, for example, the assimilation at the day d is applied to variables obtained by the assimilation carried out at the day d À 1, not to the open loop simulation. This reduces the correction at each step of the assimilation experiment, increasing the coherence of state variables' patterns and assuring more numerical stability.

Calibration and validation
The results of the model calibration and validation are summarized in Table 3. The scores are quite satisfactory for most of the considered gauged stations, especially on the main course of the river Tevere (Ponte Felcino, Monte Molino) and on the Paglia tributary. Velino and Nera basins have slightly worse performances. Torreorsina and Terria evidence quite low values of NS and acceptable values of REP. In the case of Torreorsina this is probably due to the peculiar hydrology of the basin, which is not straightforward to model: there are large areas with karst behaviour and the presence of linear and punctual springs; Terria, instead, is strongly influenced by two dams and hydraulic plants which are considered in the model, but it is not always easy to simulate the real effect of water releases in terms of magnitude and timing, anyway REP is quite low. Topino river basin evidences a discrete performance even if slightly worse than Tevere and Paglia, although this basin has karst behaviour in some head sub-catchments.

Hindcast
Assimilating streamflow observations to update the water level (DA WL) First, the open loop results were compared with the assimilation strategy updating the water level. The comparison is done using the scores BIASV and REP, and for each gauge a set of N H values of the scores is available. Thus, we built a box plot in order to represent the median, 25 and 75 percentiles and the total range including outliers. Figures 4 and 5 report box plots for both the open loop (OL) and the assimilation (DA -WL) configurations, showing a general improvement at all the sections, even if Bettona and Ponte Rosciano show a small improvement, especially in terms of BIASV. Ripetta reduces the median REP to a value very close to 0, as well as Ponte Felcino and Terria. Terria is an exception in terms of BIASV, as in this case the score is better for the OL configuration. BIASV in the case of OL configuration evidences a general tendency of the model to overestimate the runoff volumes.
Assimilating streamflow observations to update the root-zone soil moisture (DA SM) The second experiment compared the OL results with the assimilation scheme that updates soil moisture only. Results are reported in Figures 6 and 7.  Vol 52 No 6, 1250 In this case, the improvement seems larger than the one obtained when correcting the water level for many stations: the median value of REP is in the range À10% to þ10% for all sub-basins except the Torreorsina station in Nera catchment, and the interquartile range is hugely reduced; the worst performance is obtained for Ponte Rosciano, even if an improvement in respect to the OL configuration was obtained.

Hydrology Research
Considering the BIASV, the improvement in respect to the OL is evident. An overestimation of the volumes is present at Ponte Rosciano, Torreorsina and Ripetta sections for various reasons; as already mentioned, Ponte Rosciano and Torreorsina are influenced by karst phenomena that lead to an influence on the surface flow which can be corrected only partially by acting on the water level and root-zone soil moisture. Ripetta, on one side is located downstream of Ponte Rosciano and Torre Orsina, on the other side has a large drainage area and very long channel paths. As a consequence, the uncertainty in surface routing propagation could affect results (see also results for WL updating).

Combining the assimilation schemes (DA WLSM)
Finally, we analysed the effects of using sequentially the two assimilation schemes. This is the most efficient solution; in fact, we already evidenced that DA-SM has a larger impact on simulations in respect to DA-WL, and combining them leads to a further improvement or results like DA-SM. This means that the two assimilation strategies act in the same direction and, in most of the cases, are not contrasting and do not cause negative effects on each other. Results are reported in Figures 8 and 9. Special comments are needed only for two stations: the first is Terria, where the improvements in terms of REP are small while the BIAS is slightly worse in the case of DA, but it is evident that the OL has quite good performances; the second is Ponte Rosciano where the improvement is evident, but the interquartile range, especially for BIASV, remains large.
As a final analysis, referring to the combined configuration, the verification about how the assimilation influences the simulation after the last assimilation step was carried out. To do this, we made a comparison in the time window from t run to t run þ 24 hours; in other words, we left the model running for 24 hours without applying any new assimilation process (i.e., 24 hours of open loop). Since in the presented framework we applied an assimilation cycle once a day at 9:00 UTC, the aforementioned experiment verifies the improvements in that time window where no new corrections to the state variables of the model are applied.
Results are presented in Figures 10 and 11, where we notice a positive overall effect. As expected, the performance are not so good as those obtained in the assimilation time window (Figures 8 and 9); however, the improvement in respect to OL configuration is generally high. Terria behaves slightly differently for two main reasons: first, the OL performance is very good, and a worsening of performances may be caused by the assimilation when the OL is already highly accurate (Ercolani & Castelli 2017) and second, it is strongly influenced by power plants releases, and, even if the model takes account of release data of power plants, it is possible that errors in timing/magnitude of release propagation along the river drive the assimilation in the wrong direction. Moreover, in general, the following can be claimed. An assimilation scheme always ascribes the mismatch between the modelled and the observed streamflow to a specific state or parameter (or even more than only one, but still a rigid set), which is updated accordingly. In other words, a scheme establishes the cause of the detected mismatch a priori (e.g., underestimated/overestimated soil moisture in the upstream basin, or too fast/slow routing in channels, etc). However, on some occasions, this assumption is not valid and the cause of the dissimilarity is from a physical process not connected to the set of the updated states or parameters (e.g., an underestimation of the streamflow actually due to a too slow channel routing but resulting in a drying of the upstream soil if the scheme is designed to update the saturation degree) or from an inaccurate description of the forcing fields. Namely, the benefits obtainable through an assimilation system depend not only on the assimilation scheme itself and quality of the data, but also on the model structure (Ercolani & Castelli 2017).
We did not show the verification for smaller time windows (as an example 3, 6, 12 hours after t run ) since results do not change very much. This is mainly due to the approach applied to correct the state variables, WL and SM, which are corrected at the time t 0 that is various hours before t run . WL assimilation has the main role to correctly drive the SM correction, avoiding assigning to these latter errors which are not presumably due to a bad description of root-zone processes. As a consequence, the benefits directly related to WL correction are generally weak at t ¼ t run while the combined method improves the performance even after t run . Moreover, the adopted methodology corrects state variables previously updated by other assimilation cycles done at t run À24 hours, t run -48 hours, and so on. This approach makes the performances less sensitive to the considered time windows, at least in the first 24 hours of simulation after t run .

Discussion on some events
In this section some results are presented in terms of hydrographs in order to better understand the effects of the different schemes of assimilation in the time window t 0 to t run and to have a look at possible differences on events' simulation. In the figures shown in this section, the time t ¼ 0 on the x axis coincides with t 0 while the last time (t ¼ 57) coincides with t run . Figure 12 shows the simulation for some of the considered sections for the event that occurred on 08/03/2018. At Orvieto Scalo, all the simulations are quite similar and the differences can be seen only for the first peak. At Ponte Felcino, SM and WL-SM configurations lead to a very good reproduction of observations. At Ponte Rosciano, WL configuration reduces the overestimation only for 10-12 hours while SM and WL-SM drive the model to better performances along all the simulation, and at Terria, SM and WL-SM correct the underestimation of the other two configurations. It is evident that in all the cases SM and WL-SM are quite similar, and WL modifies results only for the first hours in most of the events. Figure 13 shows the results for the event that occurred on 13/03/2018; even in this case the DA corrects over-and underestimations affecting the OL, especially if we refer to SM and WL-SM configurations.
In the case of the event that occurred on 04/02/2019 ( Figure 14) different considerations can be done: at Ponte Felcino and Monte Molino the optimal solution seems to be WL configuration, as well as OL, SM and WL-SM configurations leading to an underestimation of the peaks even if the simulation is quite good. At Bettona and Terria, again, SM and WL-SM configurations lead to better results.

CONCLUSIONS
This work explores the improvements of using an assimilation technique in order to exploit streamflow observations data in a distributed hydrological model referring to an operational context.
The presented technique is designed for operational purposes and it is based on the modification of two state variables of the hydrological model: the water level in the hydrographic network is corrected through an extended direct insertion method for the streamflow and then recovering the corresponding level by inverting the flow motion equations, while the soil moisture on root-zone is modified with an iterative procedure. Point corrections based on gauge observations are spatially extended on sub-catchments, named influence areas, which are assumed to be competent for each gauge.
A hindcast experiment was set up on the Tevere river basin in Italy, considering 18 months of daily simulations at hourly time step with or without the application of the data assimilation procedures.
State variables' corrections were applied separately on water level and soil moisture, and also combined in a unique sequential cascade.
The comparison with simulations carried out without assimilation (the open loop) evidenced a clear benefit of assimilating streamflow data. Figure 12 | Event on 08/03/2018. Hydrographs for the different model configurations compared with observations. OL is the model without assimilation, WL is the case of assimilation updating the water level, SM is the case of assimilation updating the soil moisture, WL-SM is the sequential combination of two approaches. Vol 52 No 6, 1255 Water level assimilation alone leads to a small but evident improvement, even if for some events it seems to help in producing good simulations and reducing over-and underestimation. To obtain further improvements acting only on this variable, we would probably need to make a more frequent assimilation.

Hydrology Research
Soil moisture correction leads to a more enduring benefit with better performances on both volumes and peak flows. The score to calibrate the correction is, in fact, estimated on many time steps given the response characteristic of this state variable, which is slower than surface water level.
The combination of the two assimilation approaches in a unique sequential process results in the best strategy, at least for the analysed case study: (i) the assimilation updating the water level is used to correct the starting time of the simulation; since the streamflow is directly related to this variable, the applied correction has a large impact on the streamflow pattern at the beginning of simulation; (ii) the assimilation updating the soil moisture in the root-zone is the state variable that mostly influences the runoff formation during all the simulation time window, thus, at the end, it impacts the forecasted water level for at least 24 hours beyond the time of the data assimilation.
The advantage is also that by acting on two state variables we reduce the risk of attributing too much importance to the single state variable, an occurrence that would lead to applying undue corrections because of compensation effects. Moreover, results highlight that we have considerable improvements by applying the assimilation once a day, with this strategy reducing the computational needs and the probability to generate numerical instabilities.
Two main criticalities that could arise are related to the spatial pattern of the variables involved in the assimilation and to the presence of unexpected unreliable streamflow observations. The simple approach adopted for the spatialization of the correction terms that modify water level and soil moisture could lead to unphysical spatial patterns. This is caused by the fact that the modifications requested for reproducing streamflow are sometimes contrasting even for contiguous sub-basins. The assimilation sometimes compensates the shortages of the model and of its parameterization, but this is done in some situations exaggerating the state variables' modifications. Consequently, some of these variables could be unsuitable for applications different from streamflow simulations, for example, the analysis of soil moisture patterns.
The unreliable streamflow observations are another possible source of critical issues. A single or few bad measurements cannot generate large mismatches in the simulations, but if bias or wrong observations persist for long periods of time (many hours or days), the assimilation process can be affected and cause the simulation results to deteriorate, with errors' propagation, especially in downstream sections when nested gauge stations are employed. As a consequence, in an operational real-time implementation it is important to carry out continuous and frequent check of the observation's reliability.
The experiment was conceived to be easily exported in an operational configuration, and for this reason we decided to execute cycles of assimilation every 24 hours, which was efficiently implemented with the available computational resources. We suppose that increasing the cycles of assimilation, as an example every 12 or 6 hours, results can be further improved, even though we did not investigate the case in this work because we were focused on a configuration that can be surely sustainable from a computational point of view. Moreover, the stability of the system increasing the frequency of the assimilation should be tested. It is in any case evident that the chance of increasing the number of assimilation cycles is constrained by various factors: the available computational resources, the dimension of the considered basin and the spatial resolution that one decides to adopt. The experiments evidence, in the opinion of the authors, the benefit of applying the presented assimilation technique in an operational context. Results are, overall, better than those obtained with the open loop configuration, even if, in some specific events, this latter could perform better.
Future improvements could be carried out following various developments. The first is related to studying more sophisticated ways to spatialize the correction of the state variables. Second, the effects of assimilating states at a t 0 closer to t run and the consequences of incrementing the assimilation frequency could be investigated. Finally, it might be interesting to compare the proposed approach with another method which is comparable in terms of applicability in an operational context using a distributed hydrological model.