Abstract
Planning an event-based monitoring campaign on the regional scale is challenging, e.g. the timing and location of monitoring visits can dramatically impact monitoring efficacy and depends on the optimal environmental conditions required by the measurement parameters and the overarching monitoring goal. Therefore we developed a generic campaign planning approach utilizing interactive visualization methods and implemented this approach into the component-based web tool called Tocap: Tool for Campaign Planning. As a case study, we determine the most suitable time and location for event-driven, ad-hoc monitoring in hydrology using soil moisture measurements as our target variable. Our approach supports: (1) data acquisition from various digital data sources, (2) identification of the most suitable locations for measurements, (3) identification of the most suitable time for measurements at the selected locations, and (4) planning an optimized monitoring route.
HIGHLIGHTS
Development of an open-source web-based environment to explore and process time-variant datasets.
Providing target variable computation by multiple linear regression and filter functions.
Determining monitoring locations by visualizing time series of hydro-meteorological forecast data.
Computation and export of a monitoring route using OpenStreetMap data.
Demonstration of the framework for planning a soil moisture monitoring campaign.
INTRODUCTION
A deeper understanding of the Earth system as a whole and its interacting sub-systems depends on both, accurate mathematical approximations of the physical processes and the availability of environmental data across time and spatial scales. Advanced numerical simulations (de Graaf et al. 2017; Zaherpour et al. 2018) and satellite-based remote sensing (Butler 2014; Simmons et al. 2016; Chen & Wang 2018) in conjunction with sophisticated algorithms such as machine learning tools (e.g. Mao et al. 2019; Sarafanov et al. 2020) can provide 4D environmental datasets with unprecedented resolution, coverage, and accuracy.
However, local and mesoscale monitoring continues to be the backbone in many disciplines such as population ecology (Rockwood 2015; Smith et al. 2017) and hydrology (Li et al. 2020). In general, a monitoring design comprises the number of locations and the monitoring frequency at each location. Both depend on factors such as the monitoring purpose, the available equipment, budgetary and quality requirements, the spatial-temporal characteristics of the study site, and the heterogeneity of parameters impacting the target variable (For an overview see e.g. Harmancioglu et al. 1999; De Gruijter et al. 2006; Brus & Knotters 2008). Furthermore, existing continuous monitoring networks are designed for objectives different from research interest, and establishing new stations is associated with high financial burdens and needs to be well prepared (Zhou et al. 2013; Bhat et al. 2015).
On the local scale, monitoring a single location may be sufficient, if spatial characteristics of relevant parameters can be assumed to be homogeneous. If not the case, environmental parameters can be monitored on a grid taking observations at regular intervals across the study site (e.g. Joseph & Possingham 2008). However, on larger scales, monitoring at a grid with a resolution large enough to account for the scale of spatial heterogeneities is often limited by the availability of human and technical resources (Huston 1999). As an alternative, random, transect based as well as stratified monitoring strategies can be applied whereas the latter tries to capture the range of variability across parameters of interest which need to be available with sufficient spatial-temporal coverage and resolution (Li et al. 2017; Selmoni et al. 2020).
The selection of suitable monitoring frequency is influenced by technological constraints such as the incomplete automation of in-situ parameter estimation procedures as well as temporal characteristics of time-variant environmental factors such as habitat quality, temperature, and precipitation (Bonneau et al. 2018). This is even more important for ad-hoc monitoring campaigns of distinct dynamic events such as heavy precipitation events, floods, and droughts which require an adequate design that can evolve throughout the event and depending on the available data.
This calls for an integrated campaign planning approach that addresses the following main functional requirements:
Identify the most suitable location based on the state of environmental parameters, available historical measurement records as well as the target parameter and chosen measurement technique.
Identify the most suitable time to initialize the campaign depending on projected environmental conditions and synchronize with complementary measurements.
Optimize the route to reach all measurement areas within the campaign schedule.
Reprocess the workflow based on changing environmental conditions and retrieved datasets during an ongoing campaign.
Integrate monitoring results and environmental database updates into the data exploration framework without the need for any programming skills.
Most of the requirements above could be addressed with computational or visual methods. Since we are aiming at a generic solution, being as independent from the application domain and target parameter as possible, most existing computational methods and models analyzing spatio-temporal data require a lot of tailoring towards individual settings (Atluri et al. 2018). Therefore our approach will heavily rely on the expertise of the user in his research domain and only apply basic computational methods in the background to support the expert's assessment.
One helpful set of tools to address the shown requirements are classical Geographic Information System (GIS) tools, due to the spatial nature of the usually involved data. GIS tools work great with spatial data but often lack built-in possibilities for the analysis of spatio-temporal data or time series. For the visualization of gridded spatio-temporal data, specialized visualization software exists such as Panoply data viewer (National Aeronautics & Space Administration 2021) which is cross-platform and capable to produce either two-dimensional plots of geographically referenced data on a background map or time series curves of zonal averages as well as to export the output as images or comma-separated values. Although being great to visualize complex geospatial data stored in arrays, these viewers do not provide the functionality to manipulate and process the visualized data sets. On the other hand, data visualization and analytic platforms like Tableau (Hoelscher & Mortimer 2018) and Grafana (Grafana Labs 2018) are great to analyze big table-like data sets but lack built-in support for geospatial raster data. More general, versatile programming languages such as Python and R provide pre-packaged sets of functions to give the user maximum flexibility to process and analyze data sets in various formats, however, a certain level of programming skills is required.
Depending on the specific objectives, there have been previous initiatives and approaches for scientific data visualization in environmental science (for an overview see Lin et al. 2013; Chen et al. 2015). Exemplarily, Rink et al. (2013, 2020) developed 4D virtual geographic environment systems focusing on the animation of complex environmental datasets in a predefined static geographical setting employing the Unity engine. In contrast to tailor-made software implementations, Hunter et al. (2016) and Miao et al. (2017) present web-based solutions for their specific domains. They demonstrate the major advantages of web-based implementations, such as their support for simple accessibility via web browsers, cross-platform capability, and compatibility with common data formats and web services. Xu et al. (2019) developed a web-based geospatial platform for sedimentation at culverts in Iowa including mapping and analyzing functionality but focussing on the particular use-case.
However, approaches to address the specific requirements of event-driven analysis are less frequently developed. Goharian & Burian (2018) and Nikoloudi et al. (2021) developed interactive visual-analytical near-real-time decision-making tools for urban water management aiming to optimize predefined target functions (e.g operation costs) by manual interaction. In contrast, the web-based information system for landslides developed by Pumo et al. (2016), which integrates data from deployed sensors and rainfall forecast data, provides its product, a warning map, by an automatized algorithm without user interaction. Zhai et al. (2016) and Ahamed & Bolten (2017) present a solution to visualize active floods and hydrological disasters using web service and sensor-web technologies but provide limited visual-analytical functionality.
On the other hand, web-distributed integrated environmental simulation and modeling systems allow linking, collaborative executing, and sharing environmental models but familiarisation with these systems and interpreting the results of heterogeneous models is demanding (Chen et al. 2020; Bayer et al. 2021; Qiao et al. 2021).
The dependence of required datasets on both, the type of event and the target parameter of the campaign raises significant data access and management challenges. Although standardization of data and metadata is progressing in environmental science led by initiatives such as the Open Geospatial Consortium and the GO FAIR Initiative, datasets from different disciplines and organizations are stored in disparate databases in heterogeneous formats and coordinate systems making a generic database access and integration approach difficult. Once an event occurs and develops, the challenge is to quickly evolve monitoring strategies in response based on existing heterogeneous data sets.
The proposed approach comprises a visual-analytical tool to provide an interactive web-based environment for a suitable representation and analysis of geographical datasets in the context of ad-hoc campaign planning. The remainder of the article is organized as follows. Section 2 describes our general approach, while section 3 outlines the web-based implementation of our approach, including the software architecture, the dataset and database integration, as well as the visual user interface. Section 4 provides an overview of potential application areas as well as a step-by-step application example from the field of event-driven hydrological monitoring to demonstrate the benefits of the proposed approach. Concluding remarks and plans for future extensions of Tocap are given in Section 5.
CONCEPT & METHODS
We propose a visual-analytical approach to address the defined requirements outlined in section 1. A UML (Object Management Group 2007) activity diagram (Figure 1) depicts the main steps and tasks of our approach. First, the relevant spatio-temporal data is acquired. Second, based on the imported data, the most suitable location for measurements is identified. Subsequently, the most suitable time for conducting the measurements in that area is identified. If viable a route can be planned for the identified measurement locations. The final output of the approach are measurement locations that satisfy the measurement preconditions, a time-optimized route to reach the proposed measurement locations, as well as the estimated driving time. Although the step-by-step order of the workflow components implies a closed-loop for simplicity's sake, it should be possible to switch between the individual processes at any time. Aiming to facilitate the planning and conduction of terrestrial monitoring campaigns, the platform and device-independent approach of Tocap gives both scientists and field technicians the flexibility to plan a campaign at the desktop computer first and to refine it later in the field using mobile devices. Using a visual-analytical approach with an app-like frontend visualization in a web browser, required user skills are reduced to normal IT competence that can be trained quickly. The following subsections will discuss each step and task, and provide details on how the individual requirements are addressed.
Data acquisition and visualisation
Our technical concept considers the integration of data from three major data sources into Tocap (Figure 2): data from the user's hard drive; data from standardized services such as Web Mapping Service (WMS), Web Coverage Service (WCS), and Web Feature Service (WFS) technologies; as well as data sent by the backend module of Tocap (see section 3.3), which provides tailor-made solutions for data retrieval and processing from common environmental databases. The imported spatio-temporal data are immediately visualized to gain a first overview. This first step addresses, in particular, the requirement (e) ‘Integrate monitoring results and environmental database updates into the data exploration framework without the need for any programming skills’. Its implementation is discussed in detail in section 3.3.
Identification of the most suitable location for measurements
Based on the acquired data, the most suitable location for the intended measurements can be identified within this step; it directly addresses requirement (a) ‘Identify the most suitable location based on the state of environmental parameters, available historical measurement records as well as the target parameter and chosen measurement technique’. A generic way to support this shall be through the visual exploration of the spatial distribution of relevant parameters and the judgment by experts. In many cases, there will be multiple parameters with differing importance according to the case of application. Visually deriving a region where all parameters are optimal can be challenging, especially with multiple relevant parameters. We propose to identify the most suitable location through an assessment of the spatial distribution of individual parameters relevant for the intended monitoring campaign and as well their combinations.
This task involves three phases (Figure 3): First, cells containing relevant values can be selected from each dataset by applying a filter, known in image processing as ‘binary threshold filter’, over the entire spatial-temporal extension of the data layer. The result is treated as binary raster, meaning that a bit mask is derived where all pixels satisfying the constraints of the binary threshold filter are set to 1 and all non-satisfying pixels are set to 0. The threshold itself can be determined manually or by applying different statistical functions (the implemented options are described in section 3.2.2).
Second, the filtered cells of the selected data layers are merged into a single data product using linear superposition with user-defined weights for each data layer. Since the input data might use different resolutions and projections, it is reprojected and re-rasterized into a common raster with a unified projection (e.g. WGS 84/Pseudo-Mercator) and resolution. The spatial resolution of the common raster equals the highest available resolution of all selected input data layers. Once the data is reprocessed into a common grid, the individual grid cells/pixels are merged by treating each involved layer as a binary raster. Along with user-defined weights, the generated bit masks are merged into a single result. Assuming that all weights sum up to one, the resulting raster will contain a quasi-discrete set of values in the range from zero to one, with one indicating that this pixel satisfies all constraints and zero indicating that none of the constraints is satisfied. All values in between hint at locations where some constraints are satisfied but others are not. For time-dependent data, the linear superposition function uses the currently selected time step of each data layer as input data.
Finally, the resulting gridded dataset represents the combined distribution of all parameters considering their respective relevance via user-defined weights. Visualizing this distribution highlights areas with a high convergence of relevant parameters and eventually supports the user in identifying the most suitable location for conducting the measurements. The filtering and weighting tasks both include visual assessment for their respective outputs to assert that the generated results are reasonable. In cases where for instance contradicting or misleading outputs have been generated the user can interactively refine the filter parameters and layer weights.
Identification of the most suitable time for measurements in a region
After selecting the relevant regions for measurements, the next task is to identify the most suitable time for conducting measurements (Figure 4), which addresses requirement (b) ‘Identify the most suitable time to initialize the campaign depending on projected environmental conditions and synchronized with complementary measurements’. Hereby, a meaningful time frame for the upcoming campaign can be defined by the user when requesting the distributed available forecast data from the data catalog. The selection of the time frame depends on the forecast range of the dataset but also on the monitoring objective, organizational constraints as well as the dynamics of the event which induces the monitoring campaign. The user selects the relevant parameters from the available datasets. Subsequently, the corresponding forecast data for the selected region and time frame is acquired and subsequently visualized. Based on the time series visualization of chosen parameters, the user has the ability to determine a suitable time frame for the upcoming measurements.
Route planning for the measurement area
Following the identification of relevant measurement locations (see section 2.2), Tocap allows the user to request a route to reach these areas (Figure 5). Since the route calculation tries to determine the best route to reach all measurement locations in succession, the user has to provide a start location. Once the route is created the user can either visually explore or export it for further use. This addresses requirement (c) ‘Optimize the route to reach all measurement areas within the campaign schedule’. Since route planning only relies on spatial data, the execution is independent of the identification of the most suitable time for measurements. For more information on the route calculation, see section 3.3.
IMPLEMENTATION
Application architecture
The campaign planning tool Tocap is implemented (Figure 6) based on the Data Analytics Software Framework (DASF) (Eggert & Dransch 2021). DASF allows building applications with a strong component-based design that operate in highly distributed environments, like multiple collaborating research centers. It is available under the Apache-2.0 license at https://git.geomar.de/digital-earth/dasf. A DASF-based application consists of three main components: a web application frontend to visualize data, a backend module to acquire and process the data, and a middleware connecting both. Tocap's backend (Eggert & Nixdorf 2022) is made freely available under 3-Clause BSD License in a public repository (https://git.geomar.de/digital-earth/de-smart-monitoring-backend-module) to enable wide usage, collaborative software development as well as to provide the user documentation of the problems tackled during the development process.
Tocap's web-frontend (Eggert & Nixdorf 2022) implementation uses the components provided by the DASF framework which are developed in Typescript and VueJs (Eggert et al. 2021). It is freely available under Apache-2.0 License in a public repository (https://git.geomar.de/digital-earth/flood-event-explorer/fee-smart-monitoring-workflow).
The framework's middle-ware is built around a publish-subscribe system with a central message broker instance, connecting all separated components. We use Apache Pulsar (https://pulsar.apache.org/) as the message broker implementation.
The backend module provides functionalities such as the retrieval and processing of datasets from external environmental data sources and the application of statistical functions and geometric operations on user-selected datasets.
Frontend
The frontend represents an integral and important component for all interactive visualization approaches. We decided to implement our approach as a web-based tool and platform-agnostic approach to lower the technological and overall computer skill requirements on the user side.
As a technological basis, we used the web application template and components provided by the DASF framework. Since all steps of the approach involve the visual assessment of spatio-temporal data a map view based on OpenLayers (OpenLayers 2021) is used as the main visual component. A collapsible interaction panel is used to provide the needed user functions (Figure 7). Furthermore, additional time-series data is visualized as times series charts based on D3 (Teller 2013).
Frontend for implementing ‘data acquisition and overview’
The frontend provides access to local data via drag and drop onto the map component to support the data acquisition step (see section 2.1). Vector data needs to be provided in GeoJSON format describing polygon geometries and grid-based input data as NETCDF3 files. Aside from drag-and-drop capabilities, external databases can be accessed twofold. First, standard protocols developed by the Open Geospatial Consortium (OGC) (Reed 2011) for serving georeferenced maps (WMS), grids (WCS), and vector features (WFS) are supported. In this context, two open-access WebGIS projects (Haas et al. 2016) have been set up using the geodata infrastructure of the Alfred Wegener Institut Bremerhaven. Focusing on relevant datasets for applications in terrestrial hydrology, datasets stored in a PostgreSQL database include raster and vector products from state authorities, previous monitoring campaigns, and results of numerical models as well as environmental datasets that are freely available via online data repositories. All data is published as WebGIS services relying on OGC-standardized WMS and WFS technologies and accessible at https://maps.awi.de.
Apart from data access via standardized interfaces, the utilization of the backend module provides the flexibility to integrate a large number of additional data formats and data sources (see section 3.3). Selected datasets from the ‘Layer from Catalog’ panel are retrieved from the relevant server and homogenized automatically before being exported to the frontend for data visualization and exploration (Figure 7). To provide a first overview all imported data sets are rendered as map layers on top of a generic base map.
Frontend for implementing ‘identification of the most suitable location for measurements’
The identification of the most suitable location demands certain filtering functionalities for each data layer as well as functionality to create a weighted combination of multiple input parameters (see section 2.2). The used framework provides a basic filtering mechanism (Figure 8 top). Filter thresholds can either be set manually utilizing a slider or derived from the percentile statistics of the parameter distribution. For time-dependent spatial data, a checkbox is used to define whether the statistic is derived from the parameter distribution of the selected time step only. Using a slider spares the manual input, which guarantees an error-free input. Additional spatio-temporal subsetting functions are added as a layer plugin exposed via the layer context menu. These functions are provided by the backend module, for details see the following section 3.3.
Since defining the weight for each layer in the individual layer components lacks the needed global overview, we provide an additional tab component (Figure 8 bottom). In this component, all visible layers are listed, while the user can define individual weights for each layer via an interactive slider component.
Following the fine-tuning of the weights, the weighted data merging is initialized via the corresponding button. The generated result is visualized as a heatmap on the base map. For a comprehensive assessment of the visualized result, the user can apply default pan & zoom operations, change the used color scale for the heatmap and even use the initially discussed filter functions. Finally, the result is compared with the original imported data layers by toggling the display of each layer. If the assessed result yields major shortcomings, the user can either refine the filter thresholds or change the weight distribution. Once the result is deemed acceptable, the area comprising all suitable measurement locations is visualized. For further processing, an empty layer can be added via the layer group context menu and the edit & draw feature can be used to create a polygon representing the derived region of interest.
Frontend for implementing ‘identification of the most suitable time for measurements’
After promoting the layer containing the drawn polygon to the application-wide region of interest via the layer context menu, the temporal forecast data can be requested within the ‘Temporal’ tab. The user can select forecast parameters relevant for the intended measurement campaign which are provided by the backend module. Details on the provided data sources are given in section 3.3. The requested parameter, e.g. precipitation, as well as the visualization type (bar, line, area), can be selected from additional dropdown menus. Once the requested data is received, it is displayed in an interactive time series chart (Figure 9). Forecast data is explored and assessed via the interactive chart, e.g. by zooming in and out on the temporal axis to derive the most suitable time frame for the intended measurement campaign.
Frontend for implementing ‘plan route for the measurement area’
The route planning is based on an input raster layer, e.g. generated by the implemented weighting algorithm. Since the route planning is bound to a specific map layer, we decided to put the option to trigger the planning into the layer context menu. Next to the raster input layer, a starting point is needed, which can be entered in three different ways: First, the geographical coordinates can be manually typed in. Second, the device location can be requested via the web browser localization API. The third option lets the user click a point on the map to define the starting point. After receiving the calculated measurement route, it is displayed as a map layer and can be explored by the user. Finally, the user can download the route via the layers context menu.
Backend
The backend of Tocap comprises different modules embedded in a framework to address data layer and data logic operations. The modules of the backend are written platform-independent using Python 3.x scripting language (Van Rossum & Drake 1995). Python language is used due to its clear and readable syntax, its flexible data structure, the extensive standard library, and the many third-party libraries that are freely usable and distributable. Furthermore, its implementation in many GIS software packages and the wide application in Spatial Data Science provides high usability and simple extendibility.
The backend module uses a Python wrapper (Eggert et al. 2021) to generate results that serve as the content for the frontend visualization. In its current implementation, it consists of 8 sub-modules that are partly coupled with each other and with third-party python packages (Figure 10):
The downloader.py module comprises all functions for the initialization of a class for data transfer between the communication module and the subordinate python modules addressing the communication with external databases. It determines the spatial and temporal boundaries of the requested datasets by analyzing the user-defined region-of-interest (ROI) and date, respectively.
The satellites.py module draws a rectangular boundary box around the ROI and requests information on upcoming satellite overpasses for more than 20 environmental earth observation satellites. Both, the swath width of different sensors deployed on the satellites as well as partial overpasses are considered in the calculation. The Python module creates a GeoJSON file containing information on partial and complete overpasses and sends it back to the frontend for visualization.
The rasterprocessing.py module offers functionality and interfaces to external geospatial libraries to retrieve statistical metrics on raster datasets over space and time and to convert between different gridded data formats. It computes parameter statistics for each time step and filters the time step where a condition is met.
The rasterrouter.py module contains functionality to calculate a time-optimized sampling route between the identified locations of interest within the ROI. It transforms the relevant areas defined by the frontend to polygon patches and assigns their location to the nearest nodes of the road network. The third-party Python library OSMnx (Boeing 2017) is used to interact with OpenStreetMap's API to access and download the road network data from the spatial database. A time-optimized monitoring route between all identified nodes and user-selected start and end positions is computed by employing the third-party Python library mlrose (Hayes 2019) which solves the Traveling Salesman Problem using genetic algorithms.
The weatherprediction.py module provides functions to retrieve the most recent numerical weather model forecast runs from the German Weather Service (Deutscher Wetterdienst, (DWD)) FTP webserver. This includes extracting the binary data from compressed archives for each requested parameter and time step and converting the data into xarray datasets covering the entire run period by employing the xarray library (Hoyer & Hamman 2017).
Similarly, the precipitation.py module accesses the DWD FTP webserver to retrieve a recent set of radar precipitation data covering the area of Germany in either hourly or daily resolution. Binary data of a user-defined period is extracted from single compressed archives and converted to xarray datasets employing the wradlib library (Heistermann et al. 2013). For areas outside of Germany, gridded data of global hourly precipitation is available after login by accessing the ERA5-Land database using the Python API provided by the European Centre for Medium-Range Weather Forecasts (ECMWF 2020).
The soilmoisture.py module retrieves soil moisture datasets from two different databases. First, a file server hosted at the Helmholtz Centre for Environmental Research (UFZ) provides data for Germany generated from hydrological simulations on a river basin scale. Users need to request a secure shell key before getting access to the file server. Second, soil moisture estimates based on multiple satellite observations with global coverage (excluding the Polar Regions) can be retrieved from the NASA-hosted Earthdata database after login.
The modis.py module allows retrieving MODIS data products available at a webserver maintained by NASA. After log-in with credentials, communication with the server is conducted by a modified version of the downmodis.py script provided by the PyModis library (Delucchi & Neteler 2013). The module downloads the most recent dataset available from the database, converts the data structure from MODIS HDF-EOS2 format to xarray datasets, and homogenizes the data using the data product-dependent fill and scale values.
The utilization of the backend module provides the flexibility to integrate a large number of additional data formats and data sources (Table 1). All retrieved grid-based datasets are converted into an in-memory representation of a NETCDF3 file with variables, coordinates, and attributes which together form a self-describing dataset. The in-memory NETCDF3 file contains a uniform structure by describing each data variable in the three dimensions latitude, longitude, and time as well as with meta-data information as attributes. All requested data is re-projected to World Geodetic System 1984 geographic reference system and clipped to the region of interest defined in the frontend by utilizing python libraries dealing with geospatial datasets such as GDAL (Warmerdam 2008) and Rasterio (Gillies & others, 2021).
Data type . | Data source . | Type . | Spatial coverage/resolution . | Temporal resolution . |
---|---|---|---|---|
Precipitation | DWD RADOLAN Radar Network | Grid | Germany/1,000 m | 1 hour |
Precipitation | ECMWF ERA 5 Reanalysis data | Grid | Global/9 km | 1 hour |
Weather Forecast | DWD ICON-d2 | Grid | Central Europe/2,100 m | 3 hours |
Weather Forecast | DWD ICON-EU | Grid | Europe/6,500 m | 3 hours |
Soil Moisture | UFZ Drought Monitor | Grid | Germany/4,000 m | 1 day |
Soil Moisture | SPL2SMAP product | Grid | Global (60S-60N, 180E-180 W)/3 km | 6–12 days |
Hydrological Forecast | UFZ mHM model | Grid | Mueglitz Basin/100 m | 1 day |
Selected MODIS Datasets | NASA | Grid | Global/500–1,000 m | Daily-weekly |
Satellite Position Forecast | DLR /Heavens-Above | Vector | Satellite dependent | Satellite dependent |
Street Network | OSM | Vector | – | – |
Base Maps | OSM/ESRI | Grid | Scalable | – |
Data type . | Data source . | Type . | Spatial coverage/resolution . | Temporal resolution . |
---|---|---|---|---|
Precipitation | DWD RADOLAN Radar Network | Grid | Germany/1,000 m | 1 hour |
Precipitation | ECMWF ERA 5 Reanalysis data | Grid | Global/9 km | 1 hour |
Weather Forecast | DWD ICON-d2 | Grid | Central Europe/2,100 m | 3 hours |
Weather Forecast | DWD ICON-EU | Grid | Europe/6,500 m | 3 hours |
Soil Moisture | UFZ Drought Monitor | Grid | Germany/4,000 m | 1 day |
Soil Moisture | SPL2SMAP product | Grid | Global (60S-60N, 180E-180 W)/3 km | 6–12 days |
Hydrological Forecast | UFZ mHM model | Grid | Mueglitz Basin/100 m | 1 day |
Selected MODIS Datasets | NASA | Grid | Global/500–1,000 m | Daily-weekly |
Satellite Position Forecast | DLR /Heavens-Above | Vector | Satellite dependent | Satellite dependent |
Street Network | OSM | Vector | – | – |
Base Maps | OSM/ESRI | Grid | Scalable | – |
Abbreviations are explained in the running text.
We focus on integrating two types of hydro-climatological datasets into Tocap. First, for the applications presented in the paper, we implemented functionality to access databases containing gridded datasets of the target parameters covering Germany. However, if datasets do not cover areas outside of Germany, we provide access to an alternative dataset that provides data at a European or global scale. In addition, access to further datasets can be realized manually by adding the appropriate function to the Python downloader.py module in the backend architecture.
Gridded information on hourly precipitation in Germany with a 1-km resolution can be retrieved using the data from the RADOLAN Radar Network (Winterrath et al. 2012) provided by the DWD. Data is downloaded from the FTP server, re-projected to the World Geodetic System 1984, stacked and sent to the frontend. For regions outside of Germany, hourly precipitation estimates at a coarser scale of 0.1° are provided by the ERA 5 reanalysis dataset (Hersbach et al. 2020). Similarly, all climatological variables provided from the numerical weather prediction model for Germany, ICON-d2, and Europe, ICON-EU, (Reinert et al. 2015) can be integrated into the frontend via selection in the layer catalog.
Information on near-real-time soil moisture dynamics is provided by the German Drought Monitor (Zink et al. 2016) hosted at the UFZ. Soil moisture expressed in percentage of field capacity of uppermost 25 cm soil, is categorized in 12 distinct classes before being displayed in the frontend. The visual-analytical approach supports the integration of data provided by upcoming hydrological forecasting initiatives at the UFZ. Exemplarily, the backend module provides access to the hydrological forecast parameters computed for the Mueglitz River Basin in Saxony. Further, near-surface soil moisture on a 3 km grid is obtained from the SMAP/Sentinel-1 L2 soil moisture product SPL2SMAP (Das et al. 2019; Das et al. 2020) by selecting and downloading relevant tiles for each day available within the selected observation time. The files, provided in the Hierarchical Data Format, are converted into an in-memory NETCDF file before being stacked and sent to the frontend.
The backend module allows the integration of remote sensing data products derived from the MODIS earth observation program (Justice et al. 1998) which provides gridded information on environmental parameters such as evapotranspiration, emissivity, NDVI, and land surface temperature. After log-in at the NASA data server, the most recent raster of the selected dataset is downloaded employing the downmodis module (Delucchi & Neteler 2013) before being processed and sent to the frontend. Other remote sensing data products such as Sentinel 2 (Drusch et al. 2012) can be integrated directly via WMS/WCS interfaces (see section above) if available. In addition to data from remote sensing sources, information on location-dependent satellite swath coverage within the next two weeks is provided for all major earth observation satellites (Peat 2021). This allows practitioners to synchronize their measurement time and location with datasets from satellites available in the future.
Information on the existing global road network together with a topographic base map is embedded in the frontend visualization using existing interfaces to the OpenStreetMap database (OpenStreetMap contributors 2020). Alternatively, a topographic base map from Esri can be used as the foundation layer of the map layer.
APPLICATION
Areas of potential application
Due to its generic approach, Tocap has the potential to be applicable in multiple disciplines in terrestrial environmental science where the planning of monitoring and sampling campaigns depends on the spatial-temporal distribution of landscape parameters. Reflecting the dynamics in water catchments dynamics, short rainfall events are responsible for exporting the majority of nutrients (Haraldsen & Stålnacke 2006; Puczko & Jekatierynczuk-Rudczyk 2020). Considering that technical options to measure many water quality properties with field-based continuous sensors are limited, field hydrologists need to rely on the short-term deployment of automatic samplers or manual grab sampling during storm events. If the objective is the correct estimation of pollutant loads, sampling activities have to continue during the rising and falling limb of a storm event (Marsh & Waters 2009). In this context, Tocap can be used to map the upcoming rainfall distribution and to plot the temporal distribution. This information can be combined with user-defined datasets on catchment-specific features and the location of surface water bodies to find optimal locations for the short-term deployment of samplers. Another application area is the detection of areas at risk of landslides in response to rainfall events. Aside from topographic characteristics like slope and soil texture, the occurrence of landslides is controlled by rainfall features (duration and intensity) as well as by the initial soil-moisture conditions (Comegna et al. 2016; De Vita et al. 2018). The filtering and weighting functionality of Tocap can be applied on required datasets to generate a map indicating areas with higher landslide risk for an upcoming rain event. Hence, a field campaign to determine the initital state of soil moisture in these areas can be initialized prior to an upcoming rainfall event, which can be determined using the temporal data assessment functionality of Tocap.
Another potential subject of the application is species sampling. For instance, Dusek et al. (2018) developed a Decision Tree based on 1,428 soil samples depicting simple implementable rules for the prevalence of Escherichia coli within soils depending on land cover, site characteristics and soil chemistry. After uploading all feature datasets into Tocap, its threshold and weighting functionality allows rebuilding each branch of the decision tree by applying the subsequent conditional statements and transferring the knowledge included in the decision tree to the user's region of interest. If target features of the species to be sampled depend on dynamic conditions, e.g precipitation events impacting the hygienic quality of blue mussels (Tryland et al. 2014), these data can be easily incorporated using the data catalog of Tocap to determine the optimal species sampling time.
Case study: planning a basin-scale ad-hoc cosmic ray rover campaign to monitor soil moisture with tocap
To demonstrate the capabilities of our developed visual-analytical tool Tocap, a typical data processing, and analysis workflow is illustrated to support the design of an ad-hoc campaign to measure soil moisture on the catchment scale from the practitioner viewpoint. The demonstration area is the Mueglitz River, which flows within an about 210 km2 large catchment from the Eastern Ore Mountains in Germany into the Elbe River. The course of the Mueglitz River is partly characterized by narrow flood plains within steep-walled valleys and plateaus which are either used for forestry or agriculture. The shallow soils in the basin, the basin shape, and the occurrence of very intense precipitation events occasionally cause severe floods which, like the flood in August 2002, result in high financial losses (Walther & Pohl 2004). Subsequently, an efficient flood management strategy is of high importance in the Mueglitz River Basin which, among others, requires a comprehensive understanding of the hydrological processes leading to the formation of flash floods in the basin. In this context, intensive monitoring and modeling efforts are conducted within the MOSES initiative (https://www.ufz.de/moses/). MOSES is being developed to monitor the evolution of such weather extremes and their impacts on the affected regions by combining event-oriented measurements with stationary integrative monitoring programs and observatories. In the Mueglitz River Basin, it comprises mobile and modular sensor systems aiming to improve the understanding of the interaction between short-term storm events and the dynamics of the terrestrial water cycle.
One key state variable which controls run-off generation and flood hydrograph properties is soil moisture. However, the spatial-temporal variations of soil moisture distribution are highly variable making its observation challenging (Ochsner et al. 2013). Among the various existing soil moisture monitoring techniques, the application of vehicle-mounted Cosmic Ray Neutron Sensors (CRNS) allows monitoring soil moisture noninvasively by surveying larger regions within a reasonable time (Zreda et al. 2008; Schrön et al. 2018). CRNS-based soil moisture measurements can be validated with soil moisture estimates derived from space-borne sensors (Montzka et al. 2017), e.g. using data products of sensors deployed at the pair of Sentinel-1 satellites (Paloscia et al. 2013; Bauer-Marschallinger et al. 2018).
In the following, the showcase task for the proposed visual-analytical approach Tocap is to determine the optimal date and route to drive a CRNS equipped vehicle through the Mueglitz River Basin to areas where the highest values of soil moisture are expected within the next seven days. Statistical investigations on parameter sensitivity (e.g. Martõ et al. 2001; Gill et al. 2006; Gwak & Kim 2017; Cai et al. 2019) have shown that initial soil moisture (SM), expected precipitation (P), topographic slope (SL), and clay content (CL) are among the most important variables to explain spatial and temporal trends in soil moisture distribution in a catchment. The first two parameters are functions of space and time but the slope and clay content is spatially heterogeneous only. The latter can be derived from public databases such as SoilGrids (Hengl et al. 2017) and SRTM (Rabus et al. 2003). Tocap supports the CRNS campaign planning as follows. After manual format conversion, the prepared NETCDF files can be added by drag-and-drop (Figure 11). For initial soil moisture and expected precipitation, data from the UFZ soil moisture product as well as from the ICON-EU precipitation forecast estimate are added using the data catalog functionality. Information on past monitoring activities in the Mueglitz River Basin and about topographic and hydrological features can be integrated from the AWI WebGIS project ‘MOSES/Digital Earth Mueglitz Campaign’ by adding the WMS server address to the ‘Add Layer from URL’ function.
In contrast to soil moisture, for which the most recent data of the most recent time step should be used in eq.1, more than one significant precipitation event may occur within the upcoming seven days. Besides, the roving campaign should take place with some time lag to a storm event due to the adverse impact of raindrops and water accumulated in puddles on the CRNS measurements. The temporal analysis functionality assists to determine the optimal date for the CRNS rover campaign which is scheduled to be within 7 days after the 9th of April 2021. Adding both datasets to a time series chart with two y-Axes, the 15th of April 2021 is determined to be the optimal date for the campaign taking into account the conditions listed above (Figure 12). Subsequently, a new precipitation dataset comprising the accumulated future precipitation until the selected date is created by applying the ‘accumulate to date’ filter in the layer panel.
The application of Equation (2) reveals four areas where comparatively high soil moisture estimates are expected (Figure 13). Finally, the ‘calculate path function’ computes the optimal driving route from the base of the user to all identified regions and back and provides an estimated travel time as supporting information whether the campaign can be conducted with one CRR rover within one working day. The suggested driving route is visualized using connected polylines and can be downloaded in GeoJSON-format readable by state-of-the-art automotive navigation systems.
Evaluation of results
The performance of Tocap to suggest areas in space and time where target parameter values are in the range of interest depends on multiple factors. A first factor is the quality and relevance of the input datasets selected by the user. For datasets accessible from the data catalog, information on data quality and areas of application are available from the references. Moreover, the chosen filters and layers included in the weighted linear regression affect the results significantly. Tocap allows a fast approach to evaluate the outcome visually and adapt the workflow parameters if required. Although Tocap provides the functionality, the performance of the outcome heavily relies on the user's expertise in selecting datasets and setting filter and weights, which best represent the target parameter dependencies.
The potential of using Tocap to improve campaign planning is demonstrated by applying Equation (2) and comparing the results with 29229 CRNS based soil moisture records obtained during 25 measurement campaigns in the Mueglitz River Basin between February 2019 and July 2020. Since the comparison covers a period in the past, legacy data of hourly ICON-EU forecasts with a horizon of +72 hours provided by the Fraunhofer Institute for Energy Economics and Energy System Technology was used instead of the DWD-based ICON-EU forecasts. Similarly, daily soil moisture estimates from the UFZ Drought Monitor are only available for the last 21 days. Subsequently, this data source is replaced with daily soil moisture data from 2019 to 2020 provided by a hydrological model for the Müglitz River Basin (Hannemann 2020). For comparison, the dates, which predict more than 5 mm daily rainfall for at least one day within the forecast horizon, are selected from all available ICON-EU forecasts. Hence, Tocap sums up the forecasted precipitation records until the date of the rain event and applies the algorithm described by Equation (2) to detect locations where higher soil moisture levels are expected. Subsequently, the distribution of CRNS based soil moisture records available within each forecast period and located nearest to a suggested measurement location within a maximum spatial distance of 1 km is compared with the distribution of all CRNS based soil moisture measurements (Figure 14). The results show that measured soil moisture at areas suggested by Tocap was at average 0.42±0.19 kg/m3, which was higher than the average of all measured records at the selected campaign dates with 0.32±0.16 kg/m3. These significant differences between both distributions are confirmed by a Kruskal-Wallis H Test (Kruskal & Wallis 1952), which results in a very low p-value of 2.05 × 10−13. Summarizing, even with the default weighting scheme and only four input datasets, this comparison with measurement data demonstrates that Tocap has the ability to prioritize locations where target parameter values are in the range of interest.
CONCLUSIONS AND OUTLOOK
This paper introduces the development of the web-based visual-analytical tool Tocap to support ad-hoc campaign planning in terrestrial hydrology. It presents a comprehensive design for fast visualization and analysis of gridded environmental datasets. In contrast to standard GIS tools, it is platform and device-independent and tailored to meet the needs of processing time-variant distributed datasets by using handy statistical functions and filters.
The developed tool includes innovative features aiming to facilitate and optimize measurement and sampling location design for event-driven monitoring campaigns such as drag-drop functionality of datasets and the one-click integration of data from selected public sources without further user interaction. Furthermore, both, user-provided and accessed datasets are visualized together in space and time for a user-defined region of interest (e.g. a river basin). Threshold and statistical filters as well as weighting functions give the user the flexibility to identify regions of interest for his particular purposes. Time-series analysis graphs further assist to identify the most suitable time to measure and to synchronize the measurement plan with transit times of relevant earth observation satellites. Lastly, the workflow allows computing a time-optimized sampling route between identified regions of interest and the user location by accessing the OpenStreetMap road network. Both, the open-source approach and the usage of Python programming language in the backend module allow future users to modify the visual analytics tool by their needs.
Although the presented case study demonstrates the ability of Tocap at its current development state to support soil moisture monitoring campaigns, future work will focus on improving the capabilities of the system and applying it to other fields. Future releases may assist the user by including an automatized sensitivity and uncertainty analysis for the selected datasets and weighting schemes. One promising area of future application is population ecology, where human resources often limit species sampling. In this context, Tocap may assist to identify potential species habitats and optimizing sampling routing.
ACKNOWLEDGEMENTS
We acknowledge funding from the Initiative and Networking Fund of the Helmholtz Association through the project ‘Digital Earth’. We would like to acknowledge Gloria Kwok for her technical support to develop the satellite.py backend module. Many thanks also to the reviewers for their comments to improve the quality of the article.
DISCLOSURE STATEMENT
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
FUNDING
This work was supported by the Initiative and Networking Fund of the Helmholtz Association through the project ‘Digital Earth’ (funding code ZT-0025).
DATA AVAILABILITY STATEMENT
All relevant data are available from the online repositories listed in section 3.1.