ABSTRACT
The Upper Mississippi Information System (UMIS) is a cyberinfrastructure framework designed to support large-scale real-time water-quality data integration, analysis, and visualization for the Upper Mississippi River Basin (UMRB). UMIS is intended to directly address three of the Grand Challenges for Engineering including (1) understanding access to clean drinking water, (2) management of the nitrogen cycle, and (3) engineering the tools of scientific discovery. The UMIS is designed to provide significant immediate and long-term impacts including a central platform for data access, integration, discovery, and adoption of cyberinfrastructure tools and services. The UMIS demonstrates that public data aggregators and central repositories can provide important services to anyone interested in water-quality research or education. In addition, working across multiple scales (e.g., state, region, county, or watershed) allows researchers to understand broad and narrow effects of water-quality strategies. Exploration of data across these scales encourages the development of problem-based research questions that can eventually provide feedback to public policies.
HIGHLIGHTS
A web-based cyberinfrastructure developed for water-quality research and operations.
Intuitive and interactive visualizations for community-oriented data analytics are provided.
Big data access to nutrient and hydrological information is enabled.
INTRODUCTION
Mobilization and delivery of nutrients (nitrogen and phosphorous) from point sources and farmed fields to the UMRB stream network is a decades-long problem (e.g., Turner et al. 2008; David et al. 2010; Rabotyagov et al. 2014). In particular, seasonal Gulf of Mexico hypoxia caused by nutrient pollution delivered via the Mississippi and Atchafalaya rivers and their tributaries is a problem that seemingly defies solution (EPA 2008). Consequences include eutrophication of local and regional water resources (Turner & Rabalais 1994; Mueller & Helsel 1996; Dodds & Welch 2000) and drinking water impairment (Weyer et al. 2001; Jones et al. 2016). Pollutant loading resulting from agriculture and other sources and its runoff and streamflow transformation in the region have had national consequences (Yildirim & Demir 2022). Reduction of the anoxic area (hypoxia) in the Gulf of Mexico has been a national priority for over 20 years.
The Mississippi River – Gulf of Mexico Watershed Nutrient Task Force was formed in 1997 to coordinate an effort to understand and mitigate Gulf hypoxia (EPA 2008). The task force released an Action Plan in 2001 to serve as a strategy for hypoxic area reduction in the Gulf. Twelve states within the Mississippi River Basin continue to implement a revised plan, released in 2008. The task force's long-term goal, at that point, was to reduce the hypoxic area to 5,000 km2 by 2015. Because the 5-year average size of the hypoxic area has remained largely unchanged since 1994, the goal was extended to 2035. Stemming the loss of nitrate-nitrogen from row crop areas has been an especially difficult problem (Feyereisen et al. 2022).
The 2001 Action Plan estimated that nitrogen loads would need to be reduced by 30% to reach the hypoxic area objective; later research showed nitrogen reductions as high as 45% may be necessary (Scavia et al. 2003). Because NOx-N delivery to streams comes from a myriad of widely dispersed sources, including farm-field drainage pipes (tiles) and shallow groundwater (Baker et al. 1975; Burkart & James 1999), regulations governing its release to the environment are nearly non-existent. As a result, reductions in NOx-N loads have relied on educating farmers, offering financial incentives, and encouraging voluntary actions in the region, as highlighted by Rabotyagov et al. (2014). This approach has not demonstrated reduced NOx-N loading to the Mississippi River stream network (Sprague et al. 2011; Jones et al. 2018a, 2018b, 2018c). In fact, the 2017 hypoxic area is reported to be the largest ever (Rabalais & Turner 2019).
In response to this lack of progress, several states in the UMRB have instituted nutrient loss reduction programs of their own (Iowa State University 2013; Illinois Nutrient Loss Reduction Strategy 2014; Anderson et al. 2016). By embracing strategies with specific targets, such as a 45% reduction, states have inherently integrated accountability into the process essential for utilizing public funds. It is crucial to quantify and monitor alterations in nutrient discharge to the watershed's stream network in order to quantify policy-driven changes in a credible way (Schilling et al. 2017).
Strategic and scientifically credible monitoring is the best way to track progress toward water-quality objectives and support watershed management (D) and water infrastructure (Beck et al. 2010). The quantity of nitrate leaving Iowa is particularly well documented, as Iowa has a state-wide network of about 75 real-time, continuous nitrate sensors co-located with river discharge measurements. Data from these sensors are transmitted to the Iowa Water Quality Information System (IWQIS), which is the established mechanism for tracking nitrate loads in Iowa (Jones et al. 2018a). The IWQIS visualization platform provides immediate access to credible water-quality data to the public. Expansion of this platform to the entire UMRB will provide multiple benefits to scientists, policymakers, producers and land managers, municipal governments, agencies, and others seeking solutions to these difficult water-quality challenges. By defining and implementing data and semantics specifications as well as data service APIs (Application Program Interface), the expansion will be interoperable with other data systems used by partner organizations.
Web technologies and platforms have revolutionized the way information is collected, analyzed, and shared in various disciplines, including environmental science (Yeşlköy et al. 2023), watershed management (Demir & Beck 2009), water quality and infrastructure challenges (Xu et al. 2019), and related fields. These technologies provide an efficient and accessible means of gathering data from multiple sources, such as remote sensing satellites (Li & Demir 2023), weather stations, sensor networks, and predictive models (Krajewski et al. 2017). With the help of web-based tools and platforms, researchers can collaborate and analyze vast amounts of complex data in real-time (Sit et al. 2021), leading to better decision-making and more effective management strategies (Li & Demir 2022). Furthermore, web technologies enable the creation of online communities where scientists, policymakers, and the public can exchange information, knowledge, and experiences. This enhances transparency, encourages public participation, and facilitates the dissemination of valuable research findings, thereby promoting awareness and understanding of environmental issues.
We aimed at developing a cyberinfrastructure framework to support large-scale water-quality data integration, analyses, and visualization in the UMRB in real time using data-enabled information technologies. The system originated from a multi-institution project with researchers at the IIHR-Hydroscience and Engineering at the University of Iowa, Great Lakes to Gulf Virtual Observatory (GLTG), and National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign, Iowa State University, and National Great Rivers Research and Education Center at the Lewis and Clark Community College.
Seamless integration of existing real-time and ad hoc water-quality data streams with continuous modeling in the context of relevant data resources is a major challenge in the big data domain (Demir et al. 2022). Undertaking a project of this scale within the UMRB is only achievable by establishing a comprehensive big data ecosystem. This endeavor calls for a profound understanding of water-quality data collection from a wide array of sources, including academic institutions, government agencies, and non-governmental organizations spanning multiple states. It also involves the seamless integration of data that may vary in quality, format, and duration into a unified, user-friendly system. Additionally, active collaboration with partners and stakeholders is essential to gain insights into the diverse ways in which the data can be optimally accessed and utilized. Finally, access to substantial computing resources is crucial to support the management and analysis of this extensive dataset.
The UMIS platform embraces many of the calls for interoperability from previous research into hydrology-based information systems (Brodaric & Piasecki 1996). For example, UMIS serves as a data aggregator utilizing existing APIs from federal and regional agencies with automated standardization of ingested data. Standardization creates an environment where data can be ingested, analyzed, transformed, and disseminated in well-known data formats. In addition, the APIs that power UMIS can also be used in external applications for interoperability with other systems. Spatial and aspatial representations of the site and observational data are non-proprietary, as well, so they can be used to query, transform, and disseminate hydrological information using spatial, temporal, or aspatial methods. Although we consume data using modeling languages such as WaterML and exchanges using WQX, we do not extend semantic and interoperability practices such as those in OGC (2024), W3C, and RDF. UMIS uses standards established by OGC and W3C, but we did not seek to extend those protocols. However, many of those standards are implicit in the open-source software we use. The purpose of UMIS is to provide an intuitive, yet powerful, information system for accessing hydrological data for the Upper Mississippi River Basin (UMRB). UMIS focuses more on user operation than cutting-edge approaches for data interoperability.
The UMIS platform can be used in a variety of ways. We consider the following to be important features and potential uses for UMIS: (1) education, (2) exploration, (3) research, and (4) as a decision-assisting tool. UMIS was designed to be used across the spectrum of skills, knowledge, and abilities so it would fit well in classrooms ranging from grade school levels through college. For example, the UI was designed to encourage the exploration of sites and sensors across the entire UMRB through graphical selection and text-based queries. Graphing and charting allow users to select and compare observations across various spatial and temporal scales. These capabilities are useful in teaching the fundamentals of hydrological processes and water-quality issues. The system could be easily used in educational systems because only a web browser is required to use the platform. Researchers can use the platform to visualize trends in data across multiple decades of observations.
One of the major contributions of UMIS can be seen in capabilities that provide scale-adjustable views of massive amounts of water-quality information. This provides users with a unique opportunity to examine trends across temporal and spatial scales. Given these capabilities, UMIS can provide evidence of trends that can be addressed in policy and management. As a decision-assisting tool, policymakers can discover, monitor, and export evidence of perceived trends in water-quality across spatiotemporal scales. UMIS provides an agnostic portal to water-quality information that is outside political affiliation or vested interests. In addition, as UMIS brings together water-quality information, water-quality models (e.g., SWAT), remotely sensed imagery, and in situ measurements, users can explore trends across corresponding datasets to build a more comprehensive understanding of environmental responses to systemic effects including flooding and nitrate loss.
Currently, UMIS is focused on the UMRB. Although this region is one of the largest contributors to water-quality issues downstream, UMIS is designed to be expanded to continental-scale extents. Currently, the computational resources of UMIS are extremely modest. It is based on a single virtual machine with limited memory and storage. UMIS can easily be expanded in scope and scale in terms of aggregated data (e.g., additional data sources including citizen scientist contributions and other governmental agencies), spatial extent, temporal coverage, the inclusion of additional models, and more remotely sensed imagery. UMIS builds upon existing information systems including those created by federal and state agencies (e.g., USGS, EPA, NOAA, and DNR), universities (e.g., Jones et al. 2018a; Krajewski et al. 2017), institutional (Zeng et al. 2012), and those jointly developed (e.g., Abdelkader et al. 2024).
The system is responsive even though there are billions of observation records and thousands of sites contained in the platform. This was a primary goal during development as similar platforms are often bottlenecked by database activity. We sought to make the system intuitive, yet powerful, and with a focus on user experience. This study is organized as follows: (1) Section 2 discusses the methods used to create the cyberinfrastructure framework for the information system, (2) Section 3 presents the functionality of UMIS with an emphasis on the back-end data services and user interface capabilities, and (3) Section 4 presents the overall results and conclusions of the project.
METHODS
The UMIS framework can help address important issues around water quality by providing unfettered access to data that can be difficult to obtain and use. Although data incorporated into UMIS are publicly available, it requires accessing and processing multiple federal and state-level data repositories (i.e., United States Geological Survey – USGS; Environmental Protection Agency – EPA), parameter codes, and data handling methods, to access and integrate environmental observations into easily accessible formats. These datasets can provide insights into the movement of nutrients, especially nitrogen and phosphorus, through stream networks.
UMIS programmatically acquires, aggregates, and adds analytical capabilities to water-quality data from existing repositories including USGS NWIS, EPA STORET, and IWQIS (IQWIS 2023). Additional ingestion sources can be added to include data from other federal, state, regional, and local organizations or individuals or research groups collecting their own data. Currently, all data are ingested automatically at defined intervals; however, one-off data collections can also be added and exposed in UMIS.
The UMIS framework can serve as a central platform for water-quality data access, integration, and knowledge discovery and provide a focal point for water-quality research, education, and collaboration efforts.
Cyberinfrastructure development
As part of the UMIS framework, a comprehensive web-based cyberinfrastructure is designed with an emphasis on efficient high-dimensional spatiotemporal water-quality-related data consumption and effective resource utilization.
System architecture
Back-end
PostgreSQL is a powerful free and open-source database that has gained popularity over the past 25 years. PostgreSQL is an object-relational database that supports many of the SQL standards while supporting a framework that can be extended by developers and normal users. For example, PostGIS is an extension that provides support for creating, storing, and modifying spatial data, geometric and geographic analytical methods, data transformation, and data export. PostgreSQL provides the central storage location for most of the data in UMIS. Nginx is a high-performance open-source web server, load balancer, proxy, and gateway. It is also non-blocking and capable of high concurrency. In UMIS, Nginx serves regular webpages and provides routing to the gateway API. The Web Services Gateway Interface (uWSGI) is the application server that works in conjunction with Nginx to provide functionality in UMIS. Any HTTP requests that include the API route are passed off from Nginx to uWSGI.
Front-end
We employed a component-based software architecture and encapsulated guidelines for maintainability and adaptability. A web application for intuitive client-side interaction, presentation, and data/service integration was developed and deployed. UMIS front-end is implemented on top of the React framework with Material-UI design library in accordance with best user interface and user experience (UI/UX) practices. Data visualization and analytics capabilities are served via a map-oriented interface (Google Maps API) for interactive raster, polygon, and point data with geospatial filtering as well as dynamic plotting for sensor data exploration.
From an optimization and quality assurance standpoint, a generalized server communication mechanism was established with error handling for reliable data acquisition and service provision from a variety of sources. All external data and service requests are handled asynchronously to avoid throttling and promise-based chain operations are utilized to ensure client event queue and proper flow of actions. State-based and modular design of the platform allows partial rendering when triggered with user interaction or server-side update, and hence provides a responsive experience, and while minimizing the computational load both on the server as well as the client. The software is implemented abiding by the SOLID and DRY development practices to ensure long-term sustainability (Cabezas et al. 2020). Furthermore, polymorphic sensor provider classes and template-based data retrieval and service endpoints introduce flexibility to account for potential future changes in types, providers, and schemas of external data resources.
Data resources
There are three basic types of data used in the UMIS framework. Most spatial data in UMIS use the Geographic Coordinate System 1984 (GCS84, EPSG:4326). However, some imagery is overlayed in the map interface to fit within bounding coordinates. In these cases, latitude and longitude measurements provide the bounding box that Google Maps uses to compute the placement of images as overlays on the map. Table 1 shows the types of data available in UMIS and how they are used in the system.
Data type . | Usage in the framework . |
---|---|
Vector spatial data | Spatial selection |
Relational joins with aspatial data | |
Aspatial data | Time series data storage and retrieval |
Informal metadata | |
Relational joins with spatial data | |
Temporally based aggregation statistics | |
Raster data | Map overlays |
Data type . | Usage in the framework . |
---|---|
Vector spatial data | Spatial selection |
Relational joins with aspatial data | |
Aspatial data | Time series data storage and retrieval |
Informal metadata | |
Relational joins with spatial data | |
Temporally based aggregation statistics | |
Raster data | Map overlays |
Vector data
The first type of data used in UMIS is vector spatial data. Vector data are composed of geometries based on points, lines, or polygons. zero-dimensional data are represented as points, one-dimensional data are lines and two-dimensional data are polygons. These data are generalizations of real-world phenomena and can be characterized in a variety of ways. For example, although cities are three-dimensional phenomena, they can be represented as points or polygons on the map. These are often based on the view scale, but the important aspect is that maps are generalizations of phenomena. Vector data are stored in spatial tables in PostgreSQL or generated on the fly. Since these are spatially explicit, they show in the correct locations on the maps.
Aspatial data
UMIS also collects and stores aspatial data. These kinds of data are not spatially explicit but can be linked to spatially explicit data based on a common id. For example, water-quality information may not contain information about locations of stream gages, but these data can be joined to gage locations based on a gage id. Most of the data collected and stored in UMIS are considered aspatial data but all these data can be joined back to vector spatial data for representation. Examples of these kinds of data include time series observations about nutrients, streamflow, or temperature. UMIS uses aspatial data for map symbolization, graphs and charts, and animations.
Raster data
The final class of data is raster data formats. Raster datasets are cell-based representations of continuous phenomena such as precipitation, temperature, or soil moisture. They are space-filling in that there generally is a value for all locations within the enumeration area. Cells, in this sense, represent a tessellation of the area within the bounding coordinates of the layer. Generally, all cells are the same size and orientation within the raster. In UMIS, raster data are only used for visual data exploration using map overlays.
Sites
In UMIS, the most common spatial feature is ‘site’. A site is a physical location where sensors are installed, and environmental conditions are recorded. Sites are represented as 0-dimensional features with a coordinate pair describing their location and metadata storing aspatial attributes of the site. Ingestion of sites into the database is through scheduled scripts (i.e., cronjobs) connecting to APIs on external servers. Every month, UMIS sites are checked against all available sites within the UMRB for each contributing agency. Sites that are not present in the site table in the database are automatically added.
Metadata are aspatial attributes of sites including site id, elevation, agency, dates of activity, and descriptions of the site. Other site attributes are derived through spatial joins between sites and areal geometries including states, counties, urban areas, and hydrological unit codes (HUC) used by the USGS. These joins are geometric intersections between sites and other areal geometries. During the join process, attributes from the intersecting geometries are added to each site so queries to sites are based on attributes instead of geometries. The computational requirements for queries based on attributes are significantly lower than queries using spatial relationships. A series of spatial joins between sites and the other bounding geometries are used to transfer aspatial attributes from polygons to points (i.e., sites).
New sites
Bounding geometries
Sensors
Sensors are devices that measure environmental parameters at a certain frequency. There are many types of sensors but all measure and record physical observations in situ. The type of phenomenon being measured is referred to as a parameter. In UMIS, we focus on the following parameters from different sources (Table 2). The data sources are discussed in detail later in this section.
IWQIS . | NWIS DV/IV . | STORET . |
---|---|---|
– | Air temperature C | Air temperature C |
Discharge | Discharge | – |
Dissolved oxygen conc | Dissolved Oxygen conc | – |
– | Dissolved Oxygen sat | – |
Load | – | – |
Nitrate | Nitrate | Nitrate |
pH | pH | pH |
– | – | Phosphorus |
Specific conductance | Specific conductance | Specific conductance |
Turbidity | Turbidity | Turbidity |
Yield | – | – |
Water temperature C | Water temperature C | Water temperature C |
IWQIS . | NWIS DV/IV . | STORET . |
---|---|---|
– | Air temperature C | Air temperature C |
Discharge | Discharge | – |
Dissolved oxygen conc | Dissolved Oxygen conc | – |
– | Dissolved Oxygen sat | – |
Load | – | – |
Nitrate | Nitrate | Nitrate |
pH | pH | pH |
– | – | Phosphorus |
Specific conductance | Specific conductance | Specific conductance |
Turbidity | Turbidity | Turbidity |
Yield | – | – |
Water temperature C | Water temperature C | Water temperature C |
Data acquisition
Data ingestion is automated using Linux-based scheduled scripts (i.e., cronjobs). These are automated system processes that occur at set frequencies on the server. Most of our cronjobs fall into two basic categories including processes that connect to external resources such as application programming interfaces (APIs), and processes that run locally and provide server housekeeping services and local data handling.
The first type of cronjob can be viewed as a type of ingestion or collection service. These run at various times based on the type of data being collected. Some of these applications connect to external APIs using formal query parameters while others connect to open filesystems available through HTTP(S) queries. Currently, UMIS collects data using explicit API queries from the external sites including IWQIS, National Water Information System (NWIS), EPA STOrage and RETrieval data waterhouse (STORET), and weather data from the National Weather Service (NWS) provided by Iowa State University's (ISU) Mesonet services.
IWQIS is an information system that offers real-time nutrient levels and other water quality and quantity information (e.g., streamflow and soil moisture) for the State of Iowa (Weber et al. 2018). Currently, IWQIS monitors over 100 environmental sensors placed along Iowa rivers and watersheds. The platform is open to everyone so users can see real-time state-wide trends in water-quality and stream conditions or drill down to specific sites to look at historical information.
NWIS data are provided by the USGS (2016) through a formal API that allows external access to real-time and historical stream data for the entire United States. Queries are shaped to explicitly retrieve desired data using a variety using spatial parameters including state or territory, hydrologic unit code or watershed, spatial bounding box, or county. Other aspatial query parameters include site name, date ranges, providing agency, status, altitude, and parameter types. A combination of spatial and aspatial attributes are provided as query parameters to tailor requests to exactly those sites of interest without the need to download all the data and exclude non-essential values. Data can also be returned in a variety of formats based on need. A single query to a well-designed API can return the desired data if the query is properly formatted. UMIS pulls daily values (DV) data and instantaneous values (IV) data from the NWIS API. DV is collected every day at 2 am and rainfall data layers (Stage IV) are collected every hour using cronjobs. UMIS also collects site data from the NWIS platform to add new sites to the site table. In this way, UMIS stays current with the USGS gauge locations. This is updated monthly, and new sites will automatically be available once updated.
STORET data are collected by federal, state, tribal, groups and individuals to monitor water-quality conditions across the US. Over 900 partners have collected and shared their water-quality data through the EPA Water Quality Portal (WQP). As there is a wide variety of agencies and individuals posting water-quality data to the portal, data can be sparse with large gaps in collection dates. There are many collection sites and a very large number of parameters that are available in the WQP. Paring down parameters that may be of interest to UMIS users was difficult, so we tried to match parameters available from other systems that we query data.
Mesonet data are requested on-demand when a user selects data to view. The ISU Mesonet services provide access to important weather information such as GOES-R (16) visible imagery, precipitation, radar imagery, storm reports and weather condition data, and road conditions. Appropriate data are updated every 15 min, so the current imagery is displayed. We currently do not collect these data independently but add requested data as map overlays on the interface. Users can show these to visually help them understand the relationships between weather events and stream information. UMIS also ingests data from other sites which are basically exposed filesystems containing data including radar-rainfall datasets and water-quality model outputs.
MRMS is an automated system that integrates data from multiple radars, surface observation, weather detection systems, environmental models, and satellite feeds (Zhang et al. 2016). This system was developed by the Cooperative Institute for Severe and High-Impact Weather Research and Operations (CIWRO, formerly CIMMS) and the National Severe Storm Laboratory (NSSL) of NOAA. A wide variety of weather and other environmental data can be obtained from MRMS including precipitation rates, precipitation type, soil moisture, composite reflectivity, and surface temperature. Data are updated at given frequencies and images are overwritten every 24 h.
SWAT is a watershed-/river basin-scale model that can be applied on a daily or sub-daily time step to simulate stream system hydrology and pollutant transport (Arnold et al. 2012a). A watershed is configured in SWAT by overlaying soil, land use, topographic, management, stream network, and climate data within subwatersheds, which are further delineated with smaller homogeneous hydrologic response units (HRUs). The model has been used to analyze an extensive array of water resource problems worldwide for study areas ranging from less than 1 km2 to multi-national transboundary river systems as documented in existing SWAT literature. This includes dozens of applications for the UMRB; over 40 of those studies were tabulated in a concise review by Chen et al. (2020).
Data ingestion process
Data ingestion is an event-driven set of Python processes that make HTTP requests to external APIs (i.e., USGS, EPA, and IWQIS) to return new data from each web service. These events are triggered by scheduled cronjobs on the server-side at regular intervals. The code checks for the data already in the database before requesting for any new data and limits the request to new data available since the past ingestion. Any returned data are processed, checked for consistency or errors, and then inserted into the database.
Derived data
Title . | Description . |
---|---|
MRMS_MultiSensor_QPE_01H_Pass2 | Multi-sensor accumulation 1-h (2-h latency) |
MRMS_PrecipRate | Radar-derived precipitation rate |
MRMS_PrecipFlag | Surface precipitation type (convective, stratiform, tropical, hail, snow) |
MRMS_FLASH_SAC_MAXSOILSAT | FLASH QPE-SAC soil saturation |
Title . | Description . |
---|---|
MRMS_MultiSensor_QPE_01H_Pass2 | Multi-sensor accumulation 1-h (2-h latency) |
MRMS_PrecipRate | Radar-derived precipitation rate |
MRMS_PrecipFlag | Surface precipitation type (convective, stratiform, tropical, hail, snow) |
MRMS_FLASH_SAC_MAXSOILSAT | FLASH QPE-SAC soil saturation |
RESULTS AND DISCUSSIONS
Data web service and APIs
In UMIS, requests are made to endpoints that define required parameter inputs from the requestor. Endpoints are basically URLs that the API listens to for requests. Endpoints provide isolation between user requests and the database, enforce rules regarding required information to make non-ambiguous queries, provide resilient and common access protocols, and tailor results to that requested by the user or system. The API sits in a middle position between users and data operating independently of external data aggregation and processing. Because of this, UMIS can continue to operate on existing data in the system even if there are issues with USGS servers, for example.
Platform functionality
Data discovery
Sensor view
Viewing observations
Multi-sensor comparison
Raster-based info-layers
In addition to the point-level sensor data and vector geometries, the UMIS platform further provides raster-format information layers to assist in conveying the spatiotemporal relationships and correlations. The user can enable different layers simultaneously, including SWAT Model outputs for each pertinent variable, MRMS rainfall information, different precipitation temporal resolutions, and Stage IV overlays. For raster data with a temporal variability, such as SWAT and MRMS data, the platform offers a play and pause interface to move through acquired data at different dates and times as well as to automatically play to observe progression.
Big data challenges and opportunities
Engagement with vast repositories of time series data, extensive spatial datasets, and imagery highlighted the necessity of technological proficiency. The handling of tables containing billions of rows of data was found to be a non-trivial task, necessitating the consideration of best practices for data manipulation. The significance of seemingly minor factors, such as query commit frequency in the database, was brought to the fore when processing extensive datasets. Code and methodologies often had to be adjusted to expedite essential operations. Furthermore, a comprehensive understanding of software idiosyncrasies was acquired, enabling the navigation of behaviors that were previously unencountered or insignificant in smaller-scale operations.
Given the role of UMIS as a data aggregator, it became distinctly evident that fault-tolerant ingestion methods were required. Interactions with external data repositories via APIs introduced challenges in data ingestion and subsequent management. Initially, the presumption was made that API calls would remain stable. However, following the experience of a series of cascading failures within UMIS subsystems, the necessity of fortifying the ingestion process against faults was recognized. In the event of a failure, a system was implemented to record the point of failure and reattempt the process at a later time.
While UMIS currently offers a wide range of features and capabilities, there is still substantial room for improvement and growth in future studies. The implementation of requested functionalities presents a substantial avenue for further development. Feedback from our user community is highly valued, as it will help guide the future enhancements of UMIS, making it an even more robust tool for water-quality research and analysis.
Additionally, we welcome contributions from federal, state, regional, local, and individual sources to expand the scope of data ingestion within UMIS. Although we already collect data published in federal water-quality portals, providing the option for other researchers to directly share their data with UMIS offers an alternative method of data acquisition. This collaborative approach will further enrich the data ecosystem of UMIS, ultimately benefiting the entire water-quality research and education community. As we continue to evolve and refine UMIS, we look forward to the collaborative efforts and feedback of our diverse user base and contributors in shaping the system's future.
CONCLUSION
In conclusion, UMIS stands as a comprehensive and powerful one-stop information system, accessible at https://umissis.org, which aggregates and enhances water-quality data from significant contributors. This system encompasses and exposes billions of records detailing nutrient data and streamflow characteristics, presented through an intuitive interface that accommodates users of various skill levels, facilitating the exploration of the extensive data repository of UMIS. Users can readily select and compare observation data from numerous major data repositories, enhancing their research and analysis capabilities.
The potential benefits of UMIS extend far beyond its current capabilities, with significant implications for the realms of water-quality management, research, education, and policymaking. First and foremost, UMIS serves as a vital tool for data-driven decision-making in water-quality management. Its ability to aggregate and enhance data from diverse sources enables stakeholders to gain a comprehensive understanding of the UMRB's water quality, facilitating the identification of critical areas and trends. This, in turn, can inform targeted interventions and strategies to improve water quality and mitigate issues such as nutrient pollution and eutrophication. Furthermore, UMIS fosters collaborative research endeavors by providing a centralized platform for data access and integration, enabling scientists to tackle complex, cross-scale questions related to water quality. This, in turn, supports innovation and the development of sustainable solutions.
For educational purposes, UMIS offers a valuable resource for students, educators, and researchers. It provides a real-world, dynamic dataset for educational institutions, enabling the integration of practical, hands-on experiences into curricula. Students can explore and analyze water-quality data, gaining insights into the environmental challenges faced by the region. Moreover, UMIS can serve as a catalyst for future water-quality research by inspiring students and researchers to pursue innovative inquiries and projects.
From a policy perspective, UMIS contributes to evidence-based decision-making. Policymakers and regulators can utilize the platform to access reliable, up-to-date data, supporting the formulation of more effective and targeted policies to address water quality issues. As UMIS continues to grow and evolve, it has the potential to become a cornerstone in shaping public policies related to water quality, enabling data-backed regulations and interventions that safeguard the environment and public health.
In sum, UMIS holds the promise of playing a pivotal role in advancing water-quality management, fostering groundbreaking research, enriching educational experiences, and informing sound policymaking, all contributing to the sustainable stewardship of water resources in the UMRB. The potential also exists to extend the UMIS system beyond the UMRB to the entire Mississippi-Atchafalaya River Basin (MARB), to support broader MARB-focused initiatives including the implementation of natural (green) infrastructure practices (Gassman et al. 2022; Schilling et al. 2023a, 2023b).
ACKNOWLEDGEMENTS
This paper is based upon work supported by the National Science Foundation under Grant No. (1761887).
AUTHOR CONTRIBUTIONS STATEMENT
J. M. conceptualized the whole article, rendered support in formal analysis, validated the data, rendered support in data curation, developed the methodology, arranged the software, and wrote the original draft. Y. S. conceptualized the whole article, rendered support in formal analysis, , developed the methodology, arranged the software, validated the data, visualized the process, and wrote the original draft. C. S. J. conceptualized the whole article, rendered support in funding acquisition, and wrote the review and edited the article. K. E. S. conceptualized the whole article, rendered support in funding acquisition, and wrote the review and edited the article. P. W. G. conceptualized the whole article, rendered support in funding acquisition, and wrote the review and edited the article. L. J. W. conceptualized the whole article, rendered support in funding acquisition, and wrote the review and edited the article. W. F. K. conceptualized the whole article, rendered support in funding acquisition, and wrote the review and edited the article. I. D. conceptualized the whole article, rendered support in funding acquisition, investigated the work, rendered support in project administration, arranged the resources, supervised the work, validated the data, and wrote the review and edited the article.
DECLARATION OF COMPETING INTEREST
Each of the contributing authors declares that they have no known financial interests or personal interactions with others that have influenced the research presented in this paper.
DATA AVAILABILITY STATEMENT
The system interface can be found here: https://umissis.org/umisapp/dev/.
CONFLICT OF INTEREST
The authors declare there is no conflict.