The purpose of this research is to enable better understanding of current environmental conditions through the relations of environmental variables to the historical record. Our approach is to organize and visualize land surface model (LSM) outputs and statistics in a web application, using the latest technologies in geographic information systems (GISs), web services, and cloud computing. The North American Land Data Assimilation System (NLDAS-2) (http://ldas.gsfc.nasa.gov/nldas/; Documentation: ftp://hydro1.sci.gsfc.nasa.gov/data/s4pa/NLDAS/README.NLDAS2.pdf) drives four LSM (e.g., Noah) (http://ldas.gsfc.nasa.gov/nldas/NLDAS2model.php) that simulate a suite of states and fluxes for central North America. The NLDAS-2 model output is accessible via multiple methods, designed to handle the outputs as time-step arrays. To facilitate data access as time series, selected NLDAS-Noah variables have been replicated by NASA as point-location files. These time series files or ‘data rods’ are accessible through web services. In this research, 35-year historical daily cumulative distribution functions (CDFs) are constructed using the data rods for the top-meter soil moisture variable. The statistical data are stored in and served from the cloud. The latest values in the Noah model are compared with the CDFs and displayed in a web application. Two case studies illustrate the utility of this approach: the 2011 Texas drought, and the 31 October 2013 flash flood in Austin, Texas.
ABBREVIATIONS
- API
application programming interface
- CDF
(statistical) cumulative distribution function
- CRWR
University of Texas Center for Research in Water Resources
- DEM
digital elevation model
- GCM
global circulation model
- GIS
geographic information system
- GLDAS
global land data assimilation system
- HPC
high-performance computing
- JSON
JavaScript Object Notation
- LDAS
Land Data Assimilation System
- LSM
land-surface model
- NASA
National Aeronautical and Space Administration
- NLDAS
North American Land Data Assimilation System
- Noah
acronym formed by the four institutions that developed it: National Centers for Environmental Prediction, Oregon State University, Air Force, and Hydrology Laboratory of the National Weather Service
- REST
REpresentational State Transfer
- SDK
software development kit
- TACC
(University of) Texas Advanced Computing Center
- VIC
Variable Infiltration Capacity model
INTRODUCTION
Hydrologic variables are used to describe spatial-temporal processes that are the result of intricate interactions from large scale climate dynamics to local conditions. This complex system carries a large degree of uncertainty, so hydrologic variables are considered stochastic, i.e., variables with underlying distributions that vary in time and space. The Land Data Assimilation System (LDAS) is a model developed, implemented, and shared by the National Aeronautics and Space Administration (NASA) with the purpose of being a thorough, consistent dataset of the heat and water fluxes between the atmosphere and the land-surface hydrology. LDAS variables are exposed through two sets of products: National (NLDAS) and Global (GLDAS). The NLDAS grid has 0.125 degree spatial resolution (approximately 14 km), with hourly and monthly estimates since 1979. The GLDAS grid has 0.25 degree resolution (approximately 28 km), with 3-hourly and monthly estimates since 2000. The LDAS model output carries information about the spatial-temporal nature of the hydrologic variables, and provides a consistent and comprehensive time-space dataset for the model variables represented. Through statistical analysis of the 35-year record of these simulations, it is possible to gain important insights into seasonal patterns, spatial distribution, and frequency values of the model output.
Gourbesville (2009) describes a general context of how new technologies can create datasets with added value, emphasizing that current datasets are more extensive and dense than the ones in the past. It concludes that future studies must focus on data management; improving the ways that information is shared and presented. The latest conditions of soil moisture in NLDAS can be compared to those in the past by using statistical distributions for each variable unique to each calendar day and spatial grid point.
Lakshmi (2004) identifies the challenges in hydrologic sciences for the estimation of variables in ungauged basins. The research acknowledge the importance of satellite products and land-surface models for improving estimations, and emphasizes the need of studying and mapping soil moisture for accurate water balances (Kothyari et al. 2010).
Kothyari et al. (2010) develop a geospatial application for the analysis of soil erosion rates. The research uses as a study case the Garhwal region in the Himalayas. The study reinforces the fact that geospatial applications efficiently communicate information through maps displaying the geographic distribution of the variable of interest (soil erosion in that case). The research did not provide a web-based application solution, whose implementation would be preferable. In addition, there is a significant effort being applied in the data preparation and post-processing. In the present research, an integration of historic data, statistical analysis, and latest results helps to create a more complete picture of the current state of a hydrologic variable in a spatial location. This integration can be automated, through web applications and the latest developments in geoinformatics technologies to increase the added value of hydrologic data. The objective of the present research is to integrate the latest and historic NLDAS soil moisture data through a statistical analysis, and to expose the results in a web application. This approach has been extended to other NLDAS Noah variables (precipitation rate, snowfall rate, surface runoff, evapotranspiration, and surface temperature) but the visualization and analysis of these variables' time series model outputs is still in progress and is not presented here.
The North American Land Data Assimilation System
The North American Land Data Assimilation System (NLDAS) is a compilation of land-surface model (LSM) datasets of hydrologic variables in a time-space continuum with regular intervals (Mitchell 2004; Xia et al. 2012). The strength of NLDAS relies on the forcing parameters (derived from observations) to reduce the bias and error of geophysical models, rather than being coupled with atmospheric models. The advantages of LSM in hydrology over global circulation models (GCMs) are described by Jiang et al. (2013). They show that GCMs have problems replicating extreme precipitation values, especially high precipitation values in short periods of time. The research states that the use of these models is improper for flood and drought assessments and suggests that LSM, uncoupled from atmospheric models, and calibrated with in-situ measurements, has improved results.
From the models in NLDAS, the Noah model (acronym formed by the four institutions that developed it: National Centers for Environmental Prediction, Oregon State University, Air Force, and Hydrology Laboratory of the National Weather Service) is used in this research. The Noah LSM was derived from a less complex model developed at Oregon State University in the 1980s which has been subject to revisions and improvements since then (Ek et al. 2003). The NLDAS-Noah dataset is indexed by space in the Grid Application Development Software (GrADS) (Mitchell 2004), which means that a file with the spatial coverage over the whole domain can be obtained for a single time-step (Berman et al. 2001). Similarly, the NLDAS-Noah data are also available in a web service indexed by time, called ‘data rods’ (Rui et al. 2013), which means that the time-series for a single point in space can be obtained. The double indexing by time and space optimizes the process of accessing the data. Jones et al. (2000) pose a methodology to handle temporal data in hydrologic models or applications. This solution improves the performance of the web application prototype in the present research; specifically, at the data access process using the ‘data rods’ and at displaying the data in the plots.
The research performed by Elshorbagy & El-Baroudy (2009) acknowledges the importance of soil moisture data as key elements for understanding the hydrologic cycle. It uses data-driven techniques to analyze the soil response to external forcing. The research concludes that models of soil moisture are necessary besides in-situ measurements or satellite data, due to the intricate physical process. NLDAS provides extensive validated hydrologic datasets with continuous historical information which can be used as a deterministic model, as well as to identify trends, statistical distributions, and the strength of the relationships between variables. The NLDAS dataset is extensive but it can be queried and accessed online with standard filters, such as location, time, and variable of interest, in an automated way. The study performed by (Lakshmi et al. 2004) used soil moisture data from the Variable Infiltration Capacity (VIC) model as an indicator of hydrologic extremes (i.e. floods and droughts) in the upper Mississippi basin. The research shows that the analysis of long-term soil moisture and its anomalies can be used as a precise drought indicator. They identified mean and common ranges of soil moisture values during normal, flood, or drought conditions but did not associate probabilities for these values.
Statistical analysis
The expectation of extreme events is one of the major applications of statistics in hydrology. The research of El Adlouni et al. (2008) examines the common distributions used in hydrology and classifies them accordingly to their tail behavior. The research reinforces the importance of the tail behavior for the estimation of extreme events. The study performed by Coles et al. (2003) uses rainfall data from coastal and central Venezuela and proves that extreme rainfall events although exceptional can be properly estimated. Hence, constructing the rainfall probability distributions can be relevant in the estimation of these extreme events.
The probability distribution of hydrologic variables can be estimated on the complete time series or on partial, minimum, and maximum series. Beguería (2005) focuses on modeling extreme rainfall events using the series of annual maximum and partial duration. This method requires defining threshold values which can be complicated and subjective to estimate. The percentiles of the distributions are affected by this threshold (Katz et al. 2002), that can lead to subjective results. If sufficient data are available, the complete time series can be used to estimate the probability distribution. The research performed by Mishra (2009) recognizes the importance of estimating the range of likely values and uncertainty. In the present research, the uncertainty is measured from the long-term historical probability distributions, the latest values are compared to these distributions to evaluate if the values that are extreme are expected.
The study performed by Husak et al. (2007) set the basis of computing cumulative distribution functions (CDFs) per grid cell and per time interval (monthly). It uses rainfall data from the collaborative historical African rainfall model (CHARM) (Funk et al. 2003). The CHARM data are coarser than NLDAS with half-degree cell size, and the research downscaled the data using an underlying digital elevation model, which could induce additional errors.
Information technologies and data access
The paper presented by Beniston et al. (2012) is a thorough description of the data access challenges faced in water resources research. In summary, the challenges are: (1) sparse datasets; (2) gaps in data; (3) consistent, reliable availability; (4) charges involved on sharing datasets; (5) disparity between data availability in successful socio-economical areas vs. areas with limited resources; and (6) the inconsistent use of standards in storing and sharing information. The use of NLDAS and the data rods web services minimizes all the challenges because: (1) the dataset is continuous in time and space; (2) there are no gaps of data in the model; (3) the data are available through web services; (4) the dataset is public; and (5) the cell size and time interval are the same for all estimations in the dataset. The remaining challenges (6) are to connect the data services with geographic information and ease the process of exposing the results in informative websites. The present project provides solutions for these challenges, including a web application prototype that uses international data exchange standards and cloud storage.
Choi et al. (2005) implements a web application for water management through a spatial decision support system. The research states the advantages of web-based geographic applications to ease the constraints of using large datasets, data transmission, and integration between models. The web application was implemented using a Common Gateway Interface method, instead of a more robust web framework (e.g., Django) as in the present research. The use of web application frameworks can facilitate the web application development process and the interconnectivity with data services.
The emerging field of CyberGIS (Liu et al. 2015) is about the evolution of geographic information systems (GISs) on the web. It studies all parts required for a spatial analysis on the web: data services, spatial services, geoprocessing services, online modeling and analysis, and infrastructure. CyberGIS focuses on interactive solutions that rely on large geospatial datasets and its integration with other networks. The web application presented in this research emerges as a significant prototype of CyberGIS for statistical and hydrologic fields.
To summarize, the methodology for this research builds on the latest technologies in high-performance computing (HPC), cloud storage and deployment, and GISs. The analysis is performed on a large dataset (NLDAS), using the soil moisture data of Texas and studying the soil moisture conditions during the 2011 drought and the ‘Halloween Flood’ on 31 October 2013 in Onion Creek located south of Austin. The historic analysis of NLDAS soil moisture provides relevant information about extreme values and their probability distributions, providing a thorough study of the model output. This approach can indicate which areas may be most vulnerable to hydrologic extreme events, visualizing key aspects of large scientific datasets in informative web applications.
STATISTICAL ANALYSIS
The statistical analysis is based on the modeling of the empirical CDFs on a daily basis, based on the NLDAS 2-Noah soil moisture data obtained for Texas during a period of 35 years (1979–2013). The data retrieval is made through NASA's data rods web service, which provides the time series for a given NLDAS grid cell. The use of data rods improves the data access process for this analysis, because each cell can be processed independently (i.e., in parallel). The data retrieval process is automated and implemented using HPC resources at the Texas Advanced Computer Center (TACC).
Each grid cell in Texas has 365 CDFs, one for each calendar day d. The CDFs are the soil moisture values for the percentiles from 0 to 1 at 0.05 steps. The latest soil moisture results in NLDAS are compared with the CDFs values given the location and the day of the year. A percentile value is calculated where values close to zero are dryer than historic conditions and values close to one are wetter than historic conditions.
WEB APPLICATION
Client-side
The client web application is deployed using a Python web framework known as Django (Django 2015). This web framework allows the deployment of complex applications that need access to databases, external servers and provide an HTML interface for the user. In this scheme, the user interacts with the web application by clicking on the map, resulting in click-events sending to Django the grid location identifier using the ArcGIS API for JavaScript (Esri 2015). This grid identifier is used as an input for two Python modules: one that retrieves data from the cloud storage platform, and another that collects the soil moisture time series for the previous 30 days from the NASA data rods web service. The results from these two requests are encoded in JSON and returned to the web client for geospatial visual representation and plotting statistical charts. Geospatial content is loaded in the client web application using the ArcGIS API for JavaScript, which provides a light and fast way of embedding map layers in web applications. Statistical plots are created using the HighCharts JavaScript library (Highcharts Developing Team 2015).
Server-side
Two servers support the web application: a geospatial server and a data server. The geospatial server is an instance of ArcGIS for Server running at CRWR, which provides the map layers used for the click-events interaction with the web application. The map layers on the geospatial server are web services exposed through a REST API that can be queried from the client-side to visualize them and provide user interaction. The map layers are used to identify the geographic locations, invoke the statistical analysis (through user interaction), and include the latest results for each cell. The maps also serve as an intermediate platform between the user and the cloud storage. The NASA data rods server provides the latest information and the time series data. It is accessed every time a user clicks on the map, and the real-time response allows the charts to be drawn instantaneously.
Cloud
The cloud component of the web application is deployed using Microsoft Azure (Microsoft 2015) and it includes two parts: web deployment and data storage. The web deployment component loads (1) the entire Django website, (2) the Python libraries, functions, and scripts, and (3) the JavaScript code. By doing this, the web application is available on the Internet through the prototype website (http://texassoilmoisture.azurewebsites.net/) and can benefit from the main features of cloud computing, such as supporting simultaneous multiple users or a fast performance between all the components of this web application. The data cloud storage contains the results from the statistical analysis. The data are accessible through the Django application itself using the Python Azure SDK (Microsoft Azure 2015).
CASE STUDIES AND RESULTS
Case 1: Texas drought 2011
Case 2: Halloween flood, Onion Creek 2013
DISCUSSION
The statistical analysis of soil moisture provides an insight of the pre-storm absolute water content in the soil. Nevertheless, more information is needed to properly describe an extreme event such as the 2013 Halloween flood in Austin. The estimation of percentage saturation from soil moisture would be ideal, but additional soil information would be required. The Halloween flood was a complex extreme event in which different factors converged to increase its impact: the timing, magnitude, and distribution of the storm; the characteristics of the watershed; and human development and lack of flood control infrastructure. The statistical analysis of soil moisture presents a guide of when the soil is wetter than historic conditions and prone to produce more runoff if a storm occurs. However, soil moisture data alone are insufficient for a flood risk characterization. A study that expands this work to use more variables and linking the statistical analysis with real time data would be necessary.
For the Texas drought 2011 case, the soil moisture percentiles are a good parameter for understanding the persistent dry conditions. The daily time-step is small for this application and the same results can be achieved using a monthly time-step: reducing the data needed and the processing time. Moreover, similar statistical analyses for other NLDAS variables such as precipitation, runoff, and their interrelations are needed.
FURTHER WORK
The web application prototype is a seamless integration of data and geographic services. It is a cloud-based solution for displaying a large dataset in its geographic context. It shares its core ideas with Tethys (Jones et al. 2014) which is a Django-based platform to develop web applications, easing the web application development process. The following updates would be implemented using the Tethys platform:
Improving performance of drawing grid (especially needed for national scale).
Supporting selection of begin/end dates for time series (data rods explorer).
Supporting display of other LSM outputs, such as precipitation rate, surface and subsurface runoff, and surface temperature.
ACKNOWLEDGEMENTS
The research was funded by NASA under the ROSES NNH11ZDA001N-ACCESS project. The authors gratefully acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper (http://www.tacc.utexas.edu).