Abstract
Two software development hurdles to advancing real-world operationalization of satellite datasets for water management are addressed in this study. First, a simple, easy-to-build and open-source web portal connecting to a back-end complex model is developed for resource-constrained developing nations. Second, to enhance the skill of satellite-based predictions, an innovative and dynamic web analytics-based correction system is developed to reduce the uncertainty of satellite estimates. The correction system comprises dynamic precipitation bias correction and streamflow correction. Dynamically web crawled in-situ hydrologic data pertaining to the region are used to estimate satellite estimation bias. These corrected datasets are finally shared through the web portal. On average, these dynamic correction techniques reduced root mean squared error in streamflow by 80–90% for the case of South Asian river basins. The take-home message is that it is now possible to build cost-effective operational web portals based on satellite data and non-proprietary software.
HIGHLIGHTS
- 1.
An easy to access and easy to build web interface for connecting complex back end physical models with decision makers on the front end is developed using open source tools.
- 2.
This template is suited for water management in the developing world using satellite data.
- 3.
A web-crawling system is developed to correct dynamically correct satellite data and improve the skill of hydrologic predictions.
- 4.
The correction system can reduce RMSE of satellite based prediction by about 80–90%.
INTRODUCTION
The water cycle can be described as a complex process comprising a number of highly interconnected water, energy and vegetation processes with variability in time and space. Estimation of all of the components of the water cycle is quite impossible by purely observational approaches due to the limited sampling they provide. Hydrological modelling driven by observations can be utilized as an alternative approach for better understanding of the physical processes of the water cycle (Bowden et al. 2003; Siddique-E-Akbor et al. 2014). By using mathematical modelling along with updated computational technology, one can overcome the sampling limitations of observations and realize routine simulation for better water resources management (Han et al. 2007; Siddique-E-Akbor et al. 2014).
However, some hurdles remain, particularly when it comes to the developing world. These are prohibitive costs for maintaining observations and computational technology (Shivakoti et al. 2011; Gebregiorgis & Hossain 2014), institutional issues (e.g. hydro-political issues – Akanda 2012; Hossain et al. 2014a, 2014b) and poor data quality. Such hurdles limit the capability and skill of hydrological models in the developing world where the river basins are international (or International River Basin (IRB); Bonnema et al. 2016; Maswood & Hossain 2016). According to Katiyar & Hossain (2007), about 33 countries situated at the most downstream of IRBs in the developing world are heavily dependent on hydrologic information from the upstream riparian nations and are challenged in basin-wide hydrologic modelling due to institutional and cost issues. Satellite observations today provide a platform for better understanding of hydrological processes by overcoming the traditional difficulties of in-situ measurements as well other hurdles highlighted above. Satellite observations can indirectly estimate several variables of the water cycle such as soil moisture, river height, stream flow, vegetation cover etc. These variables can be used to force, calibrate or validate hydrological models and allow decision making for water management in challenging situations such as IRBs in the developing world (Gebregiorgis & Hossain 2014; Musa et al. 2015). For over a decade, satellite observations have been used for various weather and climate prediction studies and applications at operational scales (Nijssen & Lettenmaier 2004; Gebregiorgis & Hossain 2011; Khan et al. 2012; Woldemichael et al. 2012; Kansakar & Hossain 2016). Several integrated hydrological and water resources modelling systems have been developed based on the satellite data products to enable hydro-meteorological studies and applications (e.g. Global Land Data Assimilation System (GLDAS) – Rodell et al. 2004; Brown et al. 2014).
Despite these advancements, challenges on scale, quality and integration remain. Quality of satellite data can often become unacceptable, resulting in simulations that are found limited in skill or useless for decision making. A good example is satellite precipitation estimation, where the uncertainties at smaller space-time scales are known to be complex and often the limiting factor to its operational use for hydrological applications (Hossain & Huffman 2008). The end result of such a data quality issue can be understood from Figure 1. This figure shows the stream flow simulation by a calibrated hydrologic model (Variable Infiltration Capacity (VIC); Liang et al. 1994) for the Brahmaputra Basin at a location called Bahadurabad. Precipitation data from the Global Precipitation Measurement (GPM) mission, known as the IMERG product (Hou et al. 2014; Huffman et al. 2015) was used. IMERG is a multi-sensor product dominated by passive sensors calibrated to the GPM's precipitation radar. Comparison with the observed (rated) stream flow shows significant bias to the extent that no end-user or water manager would have trust in using it for decision making. We attribute such issues to the often, if not always, poor estimation capability of low or high rain rates.
In addition to data quality issues, satellite observations also suffer from delayed transmission (i.e., latency) and various data formatting issues, the awareness of which is mostly limited to the scientific community but not to the application world. When these issues are considered in sum, the increasing observational capability of satellites will not have an equivalent impact on increasing societal applications until creative and cost-effective solutions are devised to improve the utility of satellite data for decision makers (Bulatewicz et al. 2014; Hossain 2012, 2015). Without such out-of-the-box solutions, stakeholder agencies with a mandate to provide decisions for water management (as an example) will remain institutionally dependent on third-party entities (such as scientific or the data producing community). These stakeholder agencies are unlikely to benefit from the true potential of satellite observations. Take for example, the Flood Forecasting and Warning Centre (FFWC) of the Bangladesh Government (www.ffwc.gov.bd). FFWC has made noticeable progress in adopting satellite and modelling platforms (such as GPM IMERG data, satellite altimeter, weather models) since 2011 (Hossain et al. 2013; 2014a, 2014b). Yet, FFWC remains heavily dependent on the scientific community for guidance on ways to handle data or satellite mission constellation changes. Such dependency is not uncommon in other water management agencies of the developing world (Hossain 2015; Kansakar & Hossain 2016).
At this stage, two critical solutions are needed to empower stakeholder agencies to become independent users of satellite data for operational water management. These are: (1) an open-source interface building framework that connects complex back-end models with front-end user needs (such a framework should be easy to follow and build using cost-effective solutions that are sustainable in the agency environment of developing nations); (2) an automated correction system that can harness in-situ data availability on the public domain to improve accuracy of satellite data; such a system should be able to take advantage of the power of the internet and avoid non-physical/unrealistic simulations (as shown in Figure 1) due to the satellite's indirect method of estimating water cycle variables.
Development of these two solutions is timely as information technology (IT) development has progressed significantly in the realm of the open-source/non-proprietary community (Gregersen et al. 2007). There are now powerful non-proprietary tools available to empower end users and stakeholder agencies in the developing world and bypass cost-prohibitive proprietary software that most developing nations cannot afford (Solomatine & Ostfeld 2008; Horsburgh et al. 2009; Castronova et al. 2013). A classic example is the Linux Operating System and python. The python scripting language is a relatively simple, clean syntax language with a full suite of object-oriented capabilities. It is now widely used for web and internet development, scientific and numeric computation and software development purposes. However, there is no consistent template or methodology for taking advantage of such an open-source interface building approach for operationalization of satellite data for water management.
The open-source community now needs to formalize a framework. Today there exists a vast amount of in-situ information on water cycle measurements (such as precipitation and streamflow) posted online in nowcast mode that remains heavily ‘untapped’ for dynamic adjustment of satellite data. For example, in South Asia, there are half a dozen agencies (see Appendix, available with the online version of this paper), to the best of our knowledge, that post only the most current day's measured rainfall on their website for several hundred locations. This online availability, although limited in record as being only a ‘nowcast’, provides an opportunity to pursue simple adjustment techniques on the fly, and explore if such publicly available data can improve the skill of operational satellite-based hydrologic simulation. In other words, can we take advantage of the internet as a level playing field through web crawling and pull as much in-situ data as possible through supervised search and improve the data quality of satellite observations of parameters such as precipitation and streamflow?
The key objectives of this study are two-fold and as follows:
to develop an open-source web interface building system that is simple and easy to implement for agencies of the developing world as a ‘build-it-yourself’ template for water management;
to explore the effectiveness of online and dynamic data quality improvement techniques that leverage the public domain in-situ data posted on the internet to correct satellite data on the fly through web-analytics (web crawling).
This paper is organized as follows. In the next section we discuss the data, model and open-source tools we have used to build the generic and open-source framework for objective 1 and the web-analytic correction system for objective 2. This is then followed by a detailed outline of the framework itself that we present as a modular and scalable template. Then we describe the performance of the framework and correction system followed by conclusions, lessons learned and recommended areas of future study.
MODEL, DATA AND TOOLS
VIC hydrological model
The hydrological model used in this study is the VIC model, which was developed by Liang et al. (1994). VIC is a macro-scale, semi-distributed hydrological model that can solve full water and energy balances. It is a research grade model and has been used widely for a variety of studies ranging from seasonal hydrological forecasting to climate change and water-energy budget analysis (Cherkauer et al. 2003; Zhu et al. 2009; Dan et al. 2012). There are several distinguishing features of the VIC model such as sub-grid heterogeneity, daily to sub-daily meteorological drivers, land-atmosphere fluxes and the water energy balances at land surface and independent simulation of each grid cell. Streamflow that results from runoff routing is calculated using a separate horizontal routing model developed by Lohmann et al. (1998).
The key outputs of VIC models are runoff, streamflow, base-flow, soil moisture and evapotranspiration that are considered key for enabling water management in developing countries. As an example in enhancing water management in South Asia, these outputs were rendered using our generic open-source framework through the South Asian Surface Water Modelling System (SASWMS; http://depts.washington.edu/saswe). Streamflow at different prominent locations of basins can be used to make a decision on water availability in the downstream. Reference evaporation is an important parameter for crop water management. Currently, it is successfully driving the irrigation advisory services in Pakistan (Hossain et al. 2017). Soil moisture drives agriculture as it works as a principal source for growing plants.
Satellite datasets
Four types of satellite-estimated datasets were used in this study to demonstrate the value of the framework and correction system. For precipitation, IMERG Early run datasets of GPM were used (Hou et al. 2014; Huffman et al. 2015). Daily maximum and minimum temperature and average wind speed datasets are collected from National Centers for Environmental Prediction Final (NCEP FNL) Operational Model Global Tropospheric Analyses (National Centers for Environmental Prediction/National Weather Service/National Oceanic and Atmospheric Administration (NOAA)/U.S. Department of Commerce 2000). The IMERG products are characterized by high temporal and spatial resolutions (half-hour and 0.1° × 0.1°).
Other datasets were derived from the NCEP Final server comprising temperature and wind speed. These NCEP Final Operational Global Analysis data are on 1-degree by 1-degree grids and prepared operationally every six hours. This product is from the Global Data Assimilation System (GDAS), which continuously collects observational data from the Global Telecommunications System (GTS), and other sources. The final products are prepared about one hour after the global forecasting datasets are initialized so that more observational data can be utilized.
All these datasets are resampled spatially and temporally to make them compatible with the hydrologic model spatial and temporal resolution. The simulation time step of the VIC model is daily and spatial resolution is 0.125 degrees in the case of the Ganges Basin and 0.25 degrees in the case of the Brahmaputra Basin. As the system is operational and we need to consider the limitations of the user agency environment internet availability and computational power restriction and their limitations to simulate hydrological models in the sub-daily time step, we limited it to daily scale.
Open-source scripts, and software-making tools
For objective 1 (i.e. development of open-source web interface for complex back-end models), several free and open-source software, programs and tools were used. For this objective, XAMPP (https://www.apachefriends.org/index.html) is used to make a user environment in the localhost. XAMPP stands for Cross-Platform (X), Apache (A), MariaDB (M), PHP (P) and Perl (P). It is a completely free, easy to install Apache distribution under the terms of the GNU (GNU's Not Unix) Public License. The main use of XAMPP is it facilitates the developers to create a local web server for testing and development purposes. Some JavaScript-enabled application programming interface (API) (Google Maps, HighCharts) is also used for open-source web interface development. Google Maps JavaScript API is a powerful, popular mapping API which is very simple to use to add maps to any website, or web or mobile application, and provides a wide range of services and utilities for data visualization, map manipulation, directions, and more (Wu et al. 2013). HighCharts (http://www.highcharts.com/) is also used which is a charting library written in pure JavaScript. It offers an easy way of adding interactive charts (i.e. line, spline, area, area-spline, column, bar, pie, scatter etc.) to any web site or web application (ElTayeby et al. 2013).
During designing the SASWMS Web Data Crawling System for dynamic correction (objective 2), Microsoft Visual Studio Community Edition 2015 (C#) was used (described further in the ‘Results’ section). Several external libraries of C# (e.g. Html Agility Pack (HAP), iTextSharp Pdf and WinSCP) were used along with internal library files of Visual Studio. HAP is a .NET code library used to parse ‘out of the web’ HTML files during extracting information from different web portals. During PDF document reading and extracting data, a Dot Net (.NET) PDF Library named iTextSharp (http://developers.itextpdf.com//) was used. For transfer of files between local PC and server PC, WinScp (https://winscp.net/eng/download.php) .net assembly was used. During map preparation, python along with the arcpy API of ArcGIS was used. Essentially all these tools and software are open-source, non-proprietary with many of them being identified through a web search for solving specific components of the framework building.
METHODOLOGY
Open-source web portal development
A very simple, well organized, easy to navigate and consistent web portal is developed with the facility for visualization and downloading. A fast loading and consistent layout-enabled template was downloaded, necessary html, CSS (Cascading Style Sheets) codes were modified, and JavaScript codes were added in the webpages to make it more dynamic. Google Maps JavaScript API was enabled to provide a Google map-enabled platform for showing results. Facilities are provided to visualize and download necessary observations and simulation results from the portal. Figure 2 illustrates how the portal is developed using free-of-cost online resources.
During the development of the portal (as part of objective 1), a free template was downloaded from https://www.templated.co/transit, CSS codes and html codes were customized according to user needs. JavaScript was used to facilitate, visualize and download datasets. The preliminary design of the portal was done in localhost by using XAMPP. During customization, JavaScript was used to link with model outputs (images, datasets) and other media files. Google Maps API was embedded to enable Google maps in the portal. On the Google maps, station locations, rivers and tributaries and basin boundary were added. A pop-up window was added to each station location to facilitate new windows. Highchart.js was enabled and linked with streamflow text files to visualize streamflow time-series of the stations in the pop-up windows of Google maps. Three-way interaction between the users and the portal was added (e. g. raster visualization, time-series visualization and dataset download). Users can visualize raster formatted maps of any datasets by selecting the basin, dataset, temporal resolution and date. They also can visualize streamflow at prominent locations by clicking on the station's icon. There are also download options for all the datasets via the selection query located in the Dataset Download page (Figure 3).
Development of web-analytic correction system for satellite data
As part of objective 2, a real-time web-based data crawler was developed that crawls the web each night and extracts ground measured rainfall data from bona fide government water management agencies. For the South Asian portal, these were countries of India, Nepal, Bangladesh and Bhutan. A list of the websites and the index or id of the html table of corresponding websites where rainfall information is posted was predefined in the WebCrawler. The crawler iterates through each of the items of the list and goes to that site to grab the rainfall data table according to the specified index number using the HAP library. The crawler also searches for rainfall date and time which is specified in the webpages. After extracting rainfall data from the table and acquiring the date of rainfall, it downloads datasets in text format where rainfall date, station name and rainfall amount (in mm) is saved.
For PDF (Portable Document File) files shared in the agency websites, the program uses another library known as ITextSharp to extract station name, date of rainfall observation and amount of rainfall. For more dynamic webpages, like www.cwc.gov.in, the HTTP (Hyper Text Transfer Protocol) post web request method is used to send a station id, and then the response is captured to extract rainfall information of the corresponding station. Of the 14 websites, only two websites share water level data which is also saved by the scheduled crawling. After completing download of all the web information, a quality check is done by calculating the number of stations, and maximum and average rainfall amount, and any unwanted information is excluded. To track the whole process, a log file is also generated where all the information (including quality checking information) about the web crawling event (i.e. no. of stations found, error message during extracting rainfall data, date-time missing in webpage, index number of html table not found, etc.) are saved.
In this way, the crawler forages these sites every day, and crawls the latest (last 24 hours) precipitation and water level data. There are 913 stations currently included in the download program of SASWMS. Of them, on average 650–800 station data were found to be posted regularly by the agency websites as ‘nowcast’ for that day (Figure 4). Websites included in the online web crawler are listed in the Appendix (available with the online version of this paper). Hereafter, we shall call the SASWMS data correction system a SASWMS WebCrawler.
Two different correction systems were built for SASWMS. The first one is for precipitation bias correction and the second one is a streamflow correction system. The following flow diagram (Figure 5) shows how the correction system along with other components works in the SASWMS.
Precipitation bias correction system
We developed this system as one of the primary data quality issues associated with satellite precipitation data (such as IMERG) was related to excessive bias (Prakash et al. 2016) that often renders the data unusable or results in physically unrealistic simulation of water cycle variables (see Figure 1 for an example). There are four different methods of precipitation bias correction which are suitable for real-time satellite-estimated precipitation. They are mean bias correction (Seo et al. 1999), use of a regression equation (Immerzeel et al. 2009; Cheema & Bastiaanssen 2012), distribution transformation (Bouwer et al. 2004) and the spatial bias method (Cheema & Bastiaanssen 2012). As the spatial distribution of stations included in the web crawler are not very dense and the variation of bias is heterogeneous and also due to the lack of long-term observed rainfall, the spatial bias method was found to be the most suitable method among all the methods for applying real-time bias correction.
In the spatial bias correction method, the bias amount between observed and satellite-estimated precipitation is calculated in all the observed station locations of the basin. The daily bias at each station is then spatially interpolated using a suitable interpolation technique. In this study, two different methods of bias interpolation are used in all the basins, i.e. Inverse Distance Weightage (IDW) and spline interpolation techniques. Finally, this bias amount is applied to the IMERG satellite-estimated precipitation. After applying this bias, in some grids, negative precipitation in modest amounts was found that was set to zero if that grid cell or the immediately neighbouring ones were zero according to in-situ station or uncorrected satellite data. The precipitation correction system flow chart is shown in Figure 6.
Streamflow correction system
In the streamflow correction system, the simulated streamflow was corrected by using climatology of discharge (rated) and upstream drainage area of each correction point. Climatology discharge was derived using data pertaining from 2002–2015 for Bahadurabad station in Brahmaputra and 1910–2015 for Hardinge Bridge station of the Ganges River. First a ‘no-correction’ envelope of streamflow was developed using these datasets for each Julian day. This range of streamflow for a given station and given Julian day pertains to the range that covers all recorded values between 25% higher than the climatologically minimum discharge and 25% lower than the climatologically maximum discharge. We considered this range as a ‘safe’ and physically realistic zone that would not trigger an automatic web-analytic based correction. However, when the simulated streamflow is outside this ‘safe and physical’ no-correction zone, the system crawls the in-situ discharge of that day derived from observed water level records and compares the values. If the simulated streamflow is lower than 75% of the public domain in-situ discharge or higher than 125% of the rated discharge, an automatic correction is triggered. This correction is based on the ratio of simulated streamflow to the in-situ discharge. This ratio is then applied at other streamflow locations by multiplying the ratio by flow and further adjusting by multiplying it by the ratio of the drainage areas of the two locations (reference and upstream/downstream). The streamflow correction system is illustrated in Figure 7.
RESULTS
Build-it-yourself portal
Several hydrological parameters (Precipitation, Reference Evapotranspiration, Streamflow, Runoff, Base flow, Soil Moisture and Evaporation) at multi-temporal scale (e.g. daily, weekly and monthly) were shared in the form of raster maps (GIF Image) and ESRI (Environmental Systems Research Institute) ASCII file. IMERG satellite-estimated precipitation was corrected using web crawling and then sent to the server in the daily and weekly map and ASCII file format. Every day, by using the corrected rainfall, VIC and Route model simulations were updated, and all the resulting datasets were uploaded to the UW hosted SASWMS Server. Using a predefined template, raster maps of .gif format are prepared from these datasets and uploaded in the same server. Besides raster maps and datasets, corrected streamflow time-series of all stations are also uploaded. The entire chain of processes is still on-going at the time of writing this manuscript and can be witnessed first-hand at http://depts.washington.edu/saswe.
Users can view the model results in the Visualization Tab. Raster maps of different datasets of each basin can be viewed in the Raster Gridded Surface Viewer and streamflow data can be found at Streamflow Time Series Viewer. In the streamflow page, streamflow locations and their corresponding streamflow time-series data of a particular basin can be viewed by selecting the basin and clicking on the view time series button. All the station's information (e.g. Latitude, Longitude, River Name and Basin Name) are also mentioned in the streamflow visualization page. In the Raster Gridded Surface Viewer page, Ganges, Brahmaputra, Meghna, Indus and Pakistan are included in the Basin Name tab. In the Datasets option, precipitation, runoff, base flow, soil moisture, evaporation and reference evapotranspiration (for only Pakistan) are included. Options in the temporal accumulation are daily, weekly and monthly. Firstly, the basin name needs to be selected to see the available datasets and also datasets must be selected to see their temporal accumulation type. After selecting all the available options and a particular date, the corresponding raster map can be viewed.
A user can also download precipitation, streamflow, evaporation, runoff, soil moisture and base flow of specified temporal scale of a specific basin from the Dataset Download page. By clicking on the download button, the portal downloads the required number of files according to the user's selection. The files are ESRI ASCII formatted text files. The design is kept as simple as possible so that the minimum amount of data transmission is required during navigation in the portal and visualization and download of datasets.
Performance of web analytics based correction system
The web analytics based correction is applied in the Ganges, Brahmaputra and Indus River Basins. Precipitation correction is applied to all three of the basins whereas streamflow correction is applied to Brahmaputra and Ganges Basins only as no in-situ water level or discharge data of the Indus Basin is available, to the best of our knowledge. In this study, performance of only precipitation correction, only streamflow correction and combined correction of precipitation and streamflow is assessed for the Ganges and Brahmaputra Basins. In the case of precipitation correction, data used is from 1st January 2016 to 31st August 2016. As the public domains started sharing water level data from 27th March, streamflow performance is assessed from 27th March to 31st August 2016.
Performance of precipitation bias correction
To compare with satellite-estimated gridded rainfall, web crawled rainfall is interpolated in the whole basin using the IDW method with power 2 and the number of points in search radius used 12. In Figure 8, results from two types of precipitation bias correction in Brahmaputra Basin are shown as an example for 21st July 2016 (the rainiest day, when cell average precipitation of IMERG data was maximum). On that day, maximum and average precipitation in IMERG-RT (real time) data were 680 mm and 90.29 mm, respectively. By using the IDW method of bias correction, maximum precipitation decreased to 546.01 mm and average precipitation was 24.39 mm. By using the spline method of interpolation, maximum and average values found were 1253.06 mm and 76.17 mm, respectively. This shows that by using the IDW method, the pattern of satellite-estimated precipitation is preserved and magnitude is decreased. On the other hand, by using the spline method, maximum precipitation is increased but cell average precipitation is decreased although spatial pattern and magnitude of IMERG estimated precipitation change radically. In Figure 9, comparison of spatially averaged precipitation from web crawling, IMERG-RT data, corrected rainfall from the IDW method and corrected rainfall from the spline method are shown. This figure shows a continuous overestimation of precipitation over Brahmaputra Basin by IMERG-RT satellite, especially during the monsoon season. The spline method of interpolation also behaves very poorly compared to the cell-averaged web crawled rainfall. The overestimation behaviour of IMERG-RT precipitation is further illustrated in Figure 10. The scatter plot of three types of rainfall with web crawled rainfall shows a decrease in rainfall after implementation of the spline method, but still a high amount of overestimation remains. Among the three plots, the IDW method of correction clearly improves the prediction capability of satellite-estimated precipitation over Brahmaputra Basin.
In Table 1, statistical metrics to quantify performance of the dynamic correction are described. The analysis shows an 85% reduction in precipitation root mean squared error (RMSE) due to use of the IDW method of bias interpolation, and a 12% reduction of RMSE due to use of the spline method of correction by using web crawled rainfall. Due to implementation of the web based correction system, average precipitation decreases from 17.51 mm/day to 5.63 mm/day (IDW method) for most cases, indicating that IMERG-RT suffers mostly from overestimation. Mean error in precipitation, due to implementation of this correction system, also decreased from 12.49 mm to 0.59 mm.
Metrics . | IMERG-RT (mm) . | Bias correction (IDW method) . | Bias correction (Spline method) . |
---|---|---|---|
RMSE of cell average rainfall (mm/day) | 20.54 | 3.06 | 18.04 |
Correlation coefficient | 0.80 | 0.86 | 0.75 |
Average precipitation (mm/day) | 17.51 | 5.63 | 15.57 |
Mean bias error (mm/day) | 12.48 | 0.59 | 10.54 |
Mean absolute error (mm/day) | 13.25 | 0.82 | 12.75 |
Relative bias (%) | 71.27 | 10.48 | 67.69 |
Metrics . | IMERG-RT (mm) . | Bias correction (IDW method) . | Bias correction (Spline method) . |
---|---|---|---|
RMSE of cell average rainfall (mm/day) | 20.54 | 3.06 | 18.04 |
Correlation coefficient | 0.80 | 0.86 | 0.75 |
Average precipitation (mm/day) | 17.51 | 5.63 | 15.57 |
Mean bias error (mm/day) | 12.48 | 0.59 | 10.54 |
Mean absolute error (mm/day) | 13.25 | 0.82 | 12.75 |
Relative bias (%) | 71.27 | 10.48 | 67.69 |
The effect of the precipitation correction system on the prediction of streamflow is characterized by simulating the VIC model using the corrected precipitation and shown in Figure 11. From the figure, it can be seen that the streamflow using the uncorrected IMERG-RT dataset is very unrealistic. Both the IDW and spline methods show a decrease in peak flows. However, in some cases, the spline method overestimates the uncorrected IMERG-derived stream flow. The IDW method captured the pattern of rated discharge as well as decreasing the high flow. Both methods of correction system improved the quality of simulated streamflow from the IMERG-RT precipitation, but overall performance of the IDW method is found to be superior to the spline method. The modest but systematic overestimation in streamflow prediction that remains can be taken care of through agency-based adjustment factors.
The impact of precipitation correction was also studied in Ganges Basin. Spatial distribution of the corrected and non-corrected precipitation for the rainiest day (1st July 2016) is shown in Figure 12. On that day, the maximum of the IMERG dataset was 630 mm and average rainfall over the basin was 88.51 mm. By using IDW correction, these decreased to 261.51 mm and 16.39 mm, respectively. Like Brahmaputra Basin, the maximum rainfall amount is increased but cell averaged amount is decreased. In Figure 13, daily average precipitation is shown from 1st January 2016 to 31st August 2016. Among the two methods of bias correction, the IDW method performed better in decreasing the IMERG estimated precipitation. During low rainy days, IMERG-RT rainfall is overestimated by the spline method for corrected rainfall, and during high rainy days the situation is reversed (Figures 13 and 14). From Table 2, RMSE can be seen to reduce by 90% using the IDW method of interpolation whereas 32% reduction is achieved by using spline interpolation techniques. Considerable reduction in mean error is indicative of the positive effect of implementing a dynamic precipitation correction system.
Metrics . | IMERG-RT (mm) . | Bias correction (IDW method) . | Bias correction (Spline method) . |
---|---|---|---|
RMSE of cell average rainfall (mm/day) | 24.17 | 2.19 | 16.29 |
Correlation coefficient | 0.94 | 0.95 | 0.81 |
Average precipitation (mm/day) | 16.37 | 4.82 | 13.15 |
Mean bias error (mm/day) | 12.57 | 1.02 | 9.35 |
Mean absolute error (mm/day) | 13.45 | 1.04 | 9.85 |
Relative bias (%) | 76.79 | 21.16 | 71.10 |
Metrics . | IMERG-RT (mm) . | Bias correction (IDW method) . | Bias correction (Spline method) . |
---|---|---|---|
RMSE of cell average rainfall (mm/day) | 24.17 | 2.19 | 16.29 |
Correlation coefficient | 0.94 | 0.95 | 0.81 |
Average precipitation (mm/day) | 16.37 | 4.82 | 13.15 |
Mean bias error (mm/day) | 12.57 | 1.02 | 9.35 |
Mean absolute error (mm/day) | 13.45 | 1.04 | 9.85 |
Relative bias (%) | 76.79 | 21.16 | 71.10 |
Performance of streamflow correction system
Before implementing streamflow correction in the system, climatology discharge was prepared from the observed water level records for each Julian day. Climatology minimum, maximum and average discharge of both basins along with the safe zone (the no-correction zone between 25% lower than maximum discharge and 25% higher than minimum discharge) where no streamflow correction is triggered are shown in Figure 15.
By using this climatology discharge, regular correction of streamflow after hydrological model simulation is implemented. Several hydrological error estimation metrics are used to differentiate and compare the performance of correction techniques. Root mean square error, Nash Sutcliffe efficiency, correlation coefficient, comparison of peak discharge and comparison of total runoff are assessed to measure skills of simulated streamflow.
In Figure 16, simulated streamflow by using corrected and non-corrected precipitation is shown for Brahmaputra Basin. Streamflow from the IMERG-RT product is plotted in the secondary axis as the values are very high. Table 3 summarizes the skill of different combinations of correction and their effect in streamflow estimation for the Brahmaputra Basin at Bahadurabad gauging station. The value of online and dynamic correction techniques to improve the skill of IMERG-RT is quite obvious from this table.
Error metrics (in stream flow) . | IMERG-RT . | Precipitation bias correction . | Precipitation + streamflow correction . | |
---|---|---|---|---|
IDW method . | Spline method . | |||
RMSE (m3/s) | 158,275 | 29,658 | 106,802 | 6,929 |
Correlation | 0.96 | 0.93 | 0.84 | 0.95 |
Nash–Sutcliffe efficiency | −65.36 | −1.33 | −29.21 | 0.87 |
Peak discharge (m3/s) | 433,269 | 133,318 | 301,176 | 90,693 |
Error in peak (m3/s) | 348,594 | 48,643 | 216,501 | 6,018 |
Percentage error in peak (relative to observed) | 412 | 57.5 | 255 | 7.11 |
Peak discharge ratio (simulated to observed) | 5.12 | 1.57 | 3.56 | 1.07 |
Total runoff volume (109 m3) | 1996 | 703 | 1,484 | 394 |
Runoff ratio with observed runoff | 5.43 | 1.91 | 4.04 | 1.07 |
Percentage error in runoff | 443 | 91 | 303 | 7.45 |
Error metrics (in stream flow) . | IMERG-RT . | Precipitation bias correction . | Precipitation + streamflow correction . | |
---|---|---|---|---|
IDW method . | Spline method . | |||
RMSE (m3/s) | 158,275 | 29,658 | 106,802 | 6,929 |
Correlation | 0.96 | 0.93 | 0.84 | 0.95 |
Nash–Sutcliffe efficiency | −65.36 | −1.33 | −29.21 | 0.87 |
Peak discharge (m3/s) | 433,269 | 133,318 | 301,176 | 90,693 |
Error in peak (m3/s) | 348,594 | 48,643 | 216,501 | 6,018 |
Percentage error in peak (relative to observed) | 412 | 57.5 | 255 | 7.11 |
Peak discharge ratio (simulated to observed) | 5.12 | 1.57 | 3.56 | 1.07 |
Total runoff volume (109 m3) | 1996 | 703 | 1,484 | 394 |
Runoff ratio with observed runoff | 5.43 | 1.91 | 4.04 | 1.07 |
Percentage error in runoff | 443 | 91 | 303 | 7.45 |
DISCUSSION
We have focused on the real-time bias correction of the satellite-estimated precipitation using a novel approach that leverages the public domain in-situ data posted by various agencies. Most bias correction schemes of any modelled/estimated observations require long-term historical data. For instance, Tian et al. (2010) have proposed a real-time bias correction scheme using Bayesian logic to establish a relationship between satellite estimates and gauge measurements from recent historical data. Chumchean et al. (2006) have used Kalman Filtering techniques to reduce bias in radar rainfall estimates requiring gauge rainfall measurements. Another study from Lee et al. (2015) on using satellite precipitation estimates for streamflow forecasting proposed adjustment of mean field bias in precipitation data and, subsequently, data assimilation of streamflow observations. However, obtaining a long-term dataset becomes a major hurdle when it comes to transboundary river situations and when the upstream countries are unwilling to share the data in real time with their downstream neighbours. Our study is practically applicable in those circumstances as the only way to obtain the in-situ data remains that of using the web-posted datasets from respective agencies. Furthermore, this study has used the newest satellite precipitation product of IMERG from the GPM mission launched in February 2014. To the best of our best knowledge, this precipitation product has not been analysed extensively to minimize the amount of bias and increase the real-time predictability. For operational applications, the real-time dynamic adjustment is important as any traditional bias correction scheme requires long-term agreement between ground validated precipitation and the satellite product. Hence, the quality of estimated precipitation from this product is improved when it is coupled with the bias correction scheme from a diverse network of in-situ data sources.
CONCLUSION
Despite the plethora of satellite-based hydrologic data, hurdles remain, particularly when it comes to the developing world, that prevent water management agencies from benefitting directly from the vantage of space to improve their decision making. In this study we have targeted the resolution of two key hurdles: (1) the high cost and software complexities of building easy-to-access, easy-to-maintain web portal interfaces that connect physical models with the water managers; (2) the low degree of skill of satellite-based hydrologic prediction that often results in physically unrealistic and untrustworthy scenarios for water managers at operational timescales. By demonstrating how any agency can build cost-effective web interfaces using open-source and non-proprietary tools, we provided an open-source framework to overcome the prohibitive costs and software-making challenges. By developing a web-analytic procedure that takes advantage of public domain in-situ data posted by agencies, we also provide a simple-to-implement assimilation scheme to enhance and maintain the skill of satellite-based predictions of water fluxes for water managers. The figures and tables reported here show the clear benefit of applying a dynamic correction scheme based on web crawling and the ease with which the SASWMS was built. It should be mentioned that the SASWMS took 2 months (about 100 man-hours) to build from scratch as an end-to-end system by the first author.
We have demonstrated the solution for the two key hurdles through the development of the SASWMS for the region of South Asia as an example. Although SASWMS was kept very simple, there are avenues for improvement as a further study. Currently, for the dynamic correction, as the spatial distribution of the stations is not homogeneous, spatial interpolation techniques do not always work properly. We have occasionally observed the worsening of precipitation data quality by the web-crawling correction during no-rain or low-rain situations. Also, the list of stations that is crawled is ‘static’ – which means that the user has to specify this list and be in charge of its updating. For this reason, any new stations that dynamically appear on the web beyond the specified 934 stations cannot be added into the interpolation scheme. Such an issue can be solved through more dynamic and intelligent search engine optimization.
There are other methods of real-time bias correction of satellite estimation that have not been assessed during bias correction application. These are natural neighbour, Kriging, and nearest neighbour algorithms. Similarly stream flow correction could potentially benefit from satellite altimeters that cross rivers and provide a more realistic assessment of river height changes compared to satellite precipitation-based hydrologic models. It is well known that a reasonably long record of altimeter river heights can help develop a virtual rating curve (between model discharge and satellite heights) and an assimilation scheme to keep wayward simulations in check (Hossain et al. 2014a, 2014b). By applying altimeter-based streamflow correction techniques, streamflow at other locations may be corrected further.
Despite the areas for improvement, the take-home message we provide for readers is that the growth of open-source and non-proprietary tools has now made it possible for any resource-constrained water management agency in the developing world to build robust and cost-effective operational web portals using internal resources. Using easy-to-replicate frameworks and templates (as shown here), the value of satellite-based operational water management can soon be a reality for many living in regions of South and Southeast Asia.
ACKNOWLEDGEMENTS
The first author acknowledges the Ivanhoe Foundation for supporting part of his work. This work was also supported by NASA Water Grant NNX15AC63G S04 managed by the second author. For in-situ data sources, Bangladesh Water Development Board (BWDB) is acknowledged. Interested readers may apply for a detailed PowerPoint presentation on how to build a similar system to SASWMS, by request, from the second author.