Abstract
The task of real-time streamflow monitoring and forecasting is particularly challenging for ungauged or sparsely gauged river basins, and largely relies upon satellite-based estimates of precipitation. We present the design and implementation of a state-of-the-art real-time streamflow monitoring and forecasting platform that integrates information provided by cutting-edge satellite precipitation products (SPPs), numerical precipitation forecasts, and multiple hydrologic models, to generate probabilistic streamflow forecasts that have an effective lead time of 9 days. The modular design of the platform enables adding/removing any model/product as may be appropriate. The SPPs are bias-corrected in real-time, and the model-generated streamflow forecasts are further bias-corrected and merged, to produce probabilistic forecasts that are computed via several model averaging techniques. The platform is currently operational in multiple river basins in Africa, and can also be adapted to any new basin by incorporating some basin-specific changes and recalibration of the hydrologic models.
INTRODUCTION
Robust and accurate streamflow forecasts are needed for several water management applications, including water allocation, ecological management, and flood forecasting, in which they enable better management decisions. Such forecasts can be generated by forcing one or more hydrologic models with real-time hydrometeorological variables and/or their forecasts. Streamflow observations, when available, are used to adjust the model parameters through the process of calibration. If the basin of interest is ungauged or sparsely gauged (in terms of rainfall measurements), the task of generating streamflow forecasts (or any hydrological investigation as such) becomes considerably more challenging, requiring major breakthroughs in theoretical foundations (Sivapalan 2003; Sivapalan et al. 2003). Although the Prediction in Ungauged Basins (PUB) decade of the International Association of Hydrological Sciences (Hrachowitz et al. 2013) has helped to drive significant progress on this front, hydrologic modeling for ungauged or sparsely gauged basins remains a major challenge.
Streamflow forecasts are associated with different sources of uncertainties, ranging from the forcing data, model structural inadequacies, improper model parameters, initial and boundary conditions, etc. Due to its ability to characterize forecast uncertainties, the method of ensemble streamflow forecasting has become popular; for an extensive review see Cloke & Pappenberger (2009) and Cloke et al. (2013). Ensemble streamflow forecasts are mainly produced in three different ways: (1) by forcing hydrologic models by an ensemble of precipitation time series to reflect uncertainties in system inputs (e.g., Thielen et al. 2008); (2) by using different sets of model parameters to reflect model calibration uncertainties (e.g., GLUE; Beven & Binley 1992); and (3) by using multiple models to reflect model structural uncertainties (e.g., Georgakakos et al. 2004; Ajami et al. 2007; Duan et al. 2007). While the first two approaches overlook the uncertainties arising from structural deficiencies within the model, the third approach has the potential to exploit the information provided by different model structural hypotheses, and thereby account for the uncertainty therein. Examples of some operational streamflow/flood forecasting platforms include European Flood Awareness System (EFAS; Thielen et al. 2008), NOAA's Advanced Hydrologic Prediction Service (McEnery et al. 2005), Delft-FEWS (Werner et al. 2013), etc. Recently, successful efforts have also been made towards developing integrated modeling platforms (combining multiple models) for land surface modeling. One such example is NASA's Land Information System (LIS; Kumar et al. 2006, 2008; Mohr et al. 2013).
With the advent of satellite-based remote sensing datasets, it is now becoming feasible to generate streamflow forecasts for sparsely gauged basins with a reasonable degree of accuracy (Serrat-Capdevila et al. 2014). Roy et al. (2017a) recently developed a multimodel and multiproduct real-time (MMSF-RT) streamflow forecasting platform that uses multiple hydrologic models to characterize structural uncertainty, while incorporating a suite of real-time satellite-based precipitation products (SPPs), to overcome the limitations of poor coverage of rain gauges and to also account for the uncertainty in the knowledge of rainfall inputs. The platform does not depend on forecasts created from a single hydrologic model, instead, it combines multiple models (i.e., structural hypotheses) to efficiently account for model structural inadequacies. In this technical note, we report on the design and implementation of MMSF-RT as a state-of-the-art, operational, real-time streamflow monitoring and forecasting platform for several sparsely gauged basins in Africa. We also discuss how the platform can be implemented for other river basins by making basin-specific changes, or on a computer system having different hardware and software configurations.
THE MMSF-RT PLATFORM
Methodology
MMSF-RT is a probabilistic streamflow monitoring and forecasting platform (see Figure 1) that currently integrates four different satellite precipitation products, namely TMPA-RT (Huffman et al. 2007), CMORPH (Joyce et al. 2004), PERSIANN-CCS (Hong et al. 2004), and CHIRPS (Funk et al. 2014), one numerical precipitation forecast (NPF) (NCEP GFS Forecasts) to increase the forecast lead time, and three structurally different hydrologic models, namely, semi-distributed VIC-3 L (Liang et al. 1994, 1996a, 1996b), lumped HYMOD (Boyle et al. 2000), and lumped HBV-EDU (Aghakouchak & Habib 2010). The main idea behind building such a platform was to overcome the limitations of a single model or precipitation product; by combining multiple models and products we are able to better characterize model structural and data uncertainties that can affect the forecasts. The platform integrates the following operations: (1) bias correction of the SPPs using reference datasets; (2) calibration of the hydrologic models driven by bias-corrected SPPs; (3) bias correction of the model outputs to remove systematic errors in the forecasts; (4) creating probabilistic forecasts using corresponding historical error distributions; (5) probabilistic model merging to improve the characterization of uncertainty; and (6) final bias correction of the merged forecasts (optional) to minimize any remaining problems. The operational implementation of MMSF-RT also includes some additional features such as web visualization with daily updates, data downloading in different formats, etc.
The Step-I bias correction procedure adjusts the long-term mean (first moment) of the gridded SPPs in an attempt to remove systematic errors. To derive the bias correction factors, we use CHIRPS as the reference satellite dataset since it assimilates information from multiple sources including rain gauge measurements. Because CHIRPS is not available in real-time, the SPP bias analysis is done using historical data based on their common time of availability (e.g., 17 years for CMORPH in Mara River basin, Africa), and the bias correction factors obtained thereby are used in the real-time correction of SPPs. When available, we use rain gauge measurements from the study areas to correct the long-term mean of the CHIRPS product in a lumped manner; the corrected CHIPRS is then used to correct the monthly means of other SPPs. Figure 2(a) presents the precipitation bias correction flow chart and Figure 2(b) an example of bias correction on PERSIANN-CCS.
Each of the hydrologic models included in the platform is independently calibrated for each of the four bias-corrected SPPs used as forcings; the SCE-UA optimization algorithm (Duan et al. 1992) is used for parameter optimization. For basins with discharge stations, the daily forecasts generated by each calibrated model are then bias-corrected using a non-parametric quantile mapping scheme (Roy et al. 2017a) to account for model structural errors reflected in the model outputs, as shown in Figure 3.
The bias-corrected forecasts are merged using three different probabilistic model averaging techniques, namely, uniform weight averaging (UWA), inverse variance averaging (IVA), and Bayesian model averaging (BMA) (Hoeting et al. 1999; Raftery et al. 2005). Figure 4 shows how the individual probabilistic forecasts (CHIRPS included) on a given day are merged (done for each day of the lead time), and Figure 5 shows an example of the final merged streamflow forecast, along with confidence interval estimates of the uncertainty. Note that for basins with observed streamflow records, the merged forecasts are based on historical error distributions, whereas for basins without discharge stations, we display 95% confidence intervals of the multimodel and multiproduct simulations based on the assumption of normal distribution of the daily values. The calculations are carried out on a transformed space that removes skewness (Roy et al. 2017a, 2017b). For more details on the methodology underlying the MMSF platform please refer to Roy et al. (2017a).
Structure and functions
The MMSF-RT platform (Figure 6) consists of eight main modules that perform the following tasks:
- 1.
Initial setup
- 2.
Precipitation downloading and processing
- 3.
Precipitation bias correction
- 4.
Hydrologic model simulation
- 5.
Streamflow bias correction
- 6.
Probabilistic forecasts representation
- 7.
Probabilistic forecasts merging
- 8.
Visualization and data publication.
In the first module (the initial setup), all of the necessary information to run the forecasting platform (e.g., starting date, basin co-ordinates, area, etc.) is loaded. The second module downloads and processes daily precipitation data products; the script connects to FTP servers at the data repositories (three SPPs and NCEP GFS Forecasts) and downloads the data. All of the precipitation products are processed to consistent resolutions (daily temporal and 0.05° spatial). In the third module, the processed SPPs are bias corrected using monthly bias factors computed from historical rain gauge measurements and CHIRPS estimates. Precipitation forecasts with 10-day lead time from NCEP GFS are then appended to the SPPs after adjusting for the lag in the local time (compared to GMT), which eventually results in a 9-day effective lead time.
The fourth module performs the task of hydrologic modeling. The bias corrected SPPs, with GFS forecasts appended, are fed to the different hydrologic models (VIC, HYMOD, and HBV-EDU). The streamflow forecasts generated by each hydrologic model are further bias corrected in the fifth module of the MMSF-RT platform. In the sixth module, error distributions computed using the historical data are added to the bias-corrected forecasts (from the fifth module), in order to represent the probabilistic nature of the forecasts. The seventh module merges the probabilistic forecasts using several different model averaging techniques (e.g., BMA). Finally, the eighth module creates the outputs for web visualization and facilitates data downloading in different formats. For basins without streamflow observations, Steps 5–7 are not applicable. Arithmetic means of the forecasts generated from different model–product combinations are reported as the final forecasts, and the confidence bounds are calculated assuming that the daily forecasts are normally distributed.
Running the platform
Modes of run
The platform can run in two different modes, as specified by the user in the first module:
- 1.
Daily run
- 2.
Data filling.
The daily run of the platform is automatic, as controlled by a scheduler (see description later). The data-filling mode is useful when the platform needs to be run for hindcasting (historical simulation). It also fills gaps in the daily datasets, which may be due to missing values resulting from past delays in the availability of input data, either for satellite estimates or NPFs. To initiate the data filling mode, the user must specify the dates for which the platform should be run. Since the VIC model is computationally expensive, and its use in the data-filling mode is time-consuming, an option is available to opt out VIC simulations while running the platform in the data filling (or daily run) mode.
Time lag
There is invariably a lag between the actual time and the time when the daily data are updated on the corresponding servers. For example, the Mara River basin in Africa is 3 hours ahead of GMT and 10 hours ahead of Tucson, Arizona (where the forecasting platform is implemented). We run the script every day at 5 pm (Arizona time), since by that time all of the datasets for the previous day have become available. Thus, considering the local time in the basin, there can be a lag of almost a day between the last rainfall in the basin and the generation of the streamflow forecasts. However, since we are using 10-day ahead rainfall forecasts from NCEP GFS, the streamflow forecasts effectively provide a 9-day lead time, over and above the concentration time of the basin.
Scheduler
The MMSF-RT platform is automated using Crontab in Linux, which executes the given commands at a specified time of the day (e.g., 5 pm in our case). The main forecasting file is introduced through a shell script called by Crontab. Please refer to the Supplementary material for an example showing how the platform is automated using Crontab (available with the online version of this paper).
Data storage
We store all relevant input data and results for each river basin on a daily basis, which include: (1) distributed daily precipitation (raw and bias-corrected); (2) daily bias-corrected lumped precipitation series; (3) individual model forecasts; (4) arithmetic and probabilistic averages of model forecasts; (5) confidence bounds; and (6) forecasts with 9-day (effective) lead time. All these data can be freely downloaded from our website for research and academic purposes. Figure 7 shows the structure of the folder system to store daily data.
TOOLBOX AND TRANSFERABILITY
The MMSF-RT platform is written in MATLAB and can integrate executables. Thus, it can be used with a wide range of models written in other programming languages. For example, the current platform includes the VIC model and its routing component (Lohmann et al. 1996), compiled from C and FORTRAN, respectively. The toolbox is modular in nature, i.e., it consists of multiple MATLAB function files, each of which is assigned to some particular repetitive task. The toolbox files, and the main MATLAB script that calls all the associated functions, are discussed in the Supplementary material (Table S1, available with the online version of this paper). Due to its modular nature, the platform is flexible; accordingly, any model(s) or precipitation product(s) can be included or excluded from the daily simulations.
The platform is transferrable in two different ways:
- 1.
Transferring as MATLAB Toolbox: This requires MATLAB to be installed in the computer where the platform will run. The toolbox comes with all required scripts and data files within a single folder (size <100 MB).
- 2.
Independent Executables: This option is useful when MATLAB is not installed in the new system. This version of the forecasting platform comes with an executable file and associated data files (no scripts) that are updated on a daily basis. The new system needs to have the MATLAB Compiler Runtime (MCR) installed in it to run the executable file. Additional information on this topic is provided in the Supplementary material.
The MMSF-RT platform can be transferred to either a new computer system or can be adapted for a new river basin. When transferred to a new computer system, the source codes for the VIC and routing models will need to be recompiled and the directory paths updated (in the text file ‘pathfile.asc’ provided within the toolbox). When adapting the toolbox to a new basin, some basic-specific tasks will need to be carried out offline before initiating the automated runs. For example, the bias factor files for precipitation bias correction will need to be updated for the new basin and precipitation products. The hydrologic models will need to be re-calibrated for the new basin, thereby producing updated model parameter files. The residual streamflow error distributions will need to be calculated from the historical simulations corresponding to each hydrologic model and precipitation product combination. Note that the last two steps are not applicable for the basins without any streamflow observations.
VISUALIZATION MODULE
The daily results produced by the MMSF-RT platform are displayed on our research website www.swaat.arizona.edu. Each day, the website publishes lumped and distributed precipitation plots as well as an interactive streamflow forecasting plot that includes both individual and merged forecasts along with the confidence bounds.
ACKNOWLEDGEMENTS
This work was supported by the NASA-USAID SERVIR Program through award 11-SERVIR11-58, and by the IWR-USACE International Center for Integrated Water Resources Management (ICIWaRM-UNESCO) in developing the first concept prototype.