Water data networks are increasingly being integrated to answer complex scientific questions that often span large geographical areas and cross political borders. Data heterogeneity is a major obstacle that impedes interoperability within and between such networks. It is resolved here for groundwater data at five levels of interoperability, within a Spatial Data Infrastructure architecture. The result is a pair of distinct national groundwater data networks for the United States and Canada, and a combined data network in which they are interoperable. This combined data network enables, for the first time, transparent public access to harmonized groundwater data from both sides of the shared international border.
Application Programming Interface
Geoscience Markup Language
Global Earth Observation System of Systems
Groundwater Information Network
Geographic Markup Language
GroundWater Markup Language
HyperText Markup Language
Hypertext Transfer Protocol
Infrastructure for Spatial Information in the European Community
Linked Open Data
National Ground-Water Monitoring Network
- O & M
Observations and Measurements
Open Geospatial Consortium
Web Ontology Language
Resource Description Framework
Representational State Transfer
River Markup Language
Spatial Data Infrastructure
Simple Knowledge Organization System
Sensor Observation Service
Terse RDF Triple Language
Uniform Resource Identifier
Visualizing Victoria's Groundwater
Water Markup Language
Web Feature Service
Web Mapping Service
Extensible Markup Language
EXtensible Stylesheet Language Transformation
Canada and the United States share the world's longest international border. It is crossed by many aquifers supplying water to both sides of the border, causing cross-border groundwater management to be a significant issue. This issue, as well as other cross-border science issues related to climate, energy, public health and safety, can be greatly facilitated by the efficient sharing of groundwater data. Such sharing is generally complex, because the data are segmented among multiple data providers at the watershed, State, Provincial, and Federal levels, on each side of the border. This greatly amplifies the overall heterogeneity of the data, making cross-border groundwater data difficult to find and use. To ease data access, and overcome segmentation and heterogeneity, Federal agencies in Canada and the United States are developing data networks that provide a united view over the distributed data sources: the Canadian Groundwater Information Network (GIN) and the US National Ground-Water Monitoring Network (NGWMN). After mutual experimentation (Brodaric & Booth 2010; Brodaric et al. 2013), these two data networks have become increasingly interoperable: as a result, the data holdings of one network can be seen as an extension of the other network, such that data are exchanged seamlessly and automatically between networks, and any of the total pool of data can potentially be accessed from either network. Such interoperability is achieved through the implementation of a variety of international open geospatial data standards, and the development of a novel solution for North American groundwater data at five levels of interoperability: systems, syntax, structure, semantics, and pragmatics (Sheth 1999; Brodaric 2007). While the technical approaches adopted by the data networks at each level are generally well known, architectures for hydro interoperability have not yet been framed using the five levels, and the depth of implementation at each level is also new for hydro interoperability. GIN and NGWMN also represent the first national and international data networks for groundwater data. This work therefore makes the following original contributions: (1) it establishes a five-level architecture for hydro data interoperability in which each level is developed extensively; (2) it links and harmonizes national groundwater data from the United States and Canada, thereby representing a significant new online resource for the data.
The paper is organized into sections that describe: (1) two examples of data heterogeneity encountered by the data networks; (2) background material on data interoperability, including prevailing levels, architectures, infrastructures, and networks; (3) related work on geospatial and hydrologic data networks; (4) the solution adopted by GIN and NGWMN for groundwater data; (5) implementation examples; and (6) a summary and brief indication of possible future directions. To avoid disruption of the text, common technical abbreviations are expanded at the beginning of the paper.
GROUNDWATER DATA HETEROGENEITY
While groundwater data interoperability exhibits significant similarities to interoperability in other hydro and geoscientific domains, notable differences are also evident. The similarities suggest surface and subsurface water data networks might adopt common systems and syntax, and the differences in key entities suggest a disparity in at least data structure, semantics, and pragmatics. The similarities include a common need to interconnect massive observational networks with the features being observed, such as the water bodies in rivers, lakes, or aquifers. The types of observations are also similar across the surface and subsurface water domains, and include properties such as water level or temperature, as well as a vast array of chemical, biological, and material constituents. Despite these significant resemblances, differences can exist in the methods used to make measurements, as well as in the selection of properties and constituents to measure. Another major difference involves the features that contain a water body. For example, inland surface water containers include water channels and their various parts, as well as features such as watersheds and catchments. In contrast, related groundwater containers include aquifers and other geological bodies, such as geological formations. Features related to sampling are another difference, as water wells are unique to the groundwater domain.
An interoperability scenario suggested by the data heterogeneity in Figures 1 and 2, and motivated by cross-border aquifer management through co-location of the data within the same cross-border aquifer, requires an application to retrieve the well and groundwater level data in a common format. Specifically, because the GIN web portal only displays data that conform to certain international data standards, the heterogeneous Alberta and Montana data must be transformed to these standards so that the groundwater information can be viewed in the GIN web portal during aquifer assessment. This scenario is resolved in the section below on demonstration of results.
Interoperability has been studied widely and diversely, with mature approaches demonstrated in many domains by major research initiatives and operational systems. Interoperability is also defined variously although, in general, it refers to the working together of diverse autonomous entities to achieve a common goal, requiring the entities to possess the ability to exchange information about system function and domain content (Manso et al. 2009). Data interoperability then specifically refers to a collaboration among data providers in which their goal is to exchange, deliver, or use data in a coordinated way. This involves the ability to send and receive messages that are mutually understood (Brodeur et al. 2003), including requests for data as well as responses that contain either retrieved data or control messages such as error descriptions. Such messages must typically be transformed at each interoperability level, either by the sender or receiver, to a construct that can be readily consumed and thus understood by the receiver – this process is often referred to as alignment. The components involved in attaining alignment can be arranged in various ways. Thus, in addition to describing levels of transformation, we also distinguish here between architectures, infrastructures, and networks to progressively describe the arrangement of the components that enable message alignment, hence interoperability, between data providers.
Systems: refers to hardware or software elements required for core functions such as message passing or data manipulation, and largely involve platform aspects such as operating systems, transmission protocols (e.g., Hypertext Transfer Protocol (HTTP)), or particular database limits (e.g., maximum record size or maximum number of points in a polygon). Systems interoperability involves overcoming heterogeneity in such aspects.
Syntax: refers to the language used to encode a message, including requests for data as well as the actual data content. Syntax includes a language's alphabet, words, and grammar, and it can be abstract or concrete: for example, the Resource Description Framework (RDF) language has an abstract syntax composed of nested triples, as well as several concrete encodings such as RDF/XML or Turtle. Syntax interoperability thus involves overcoming differences in abstract or concrete syntaxes. It is achieved in Open Geospatial Consortium (OGC) geospatial web service standards primarily through the common deployment of the Geographic Markup Language standard (GML; Portele 2012) for the encoding of geographical features; alternatively, a standard RDF option has also lately emerged via the GeoSPARQL standard (Perry & Herring 2012), alongside other non-OGC syntaxes such as GeoJSON (Butler et al. 2008).
Structure: refers to the arrangement of the parts of a message, such as representing something as one entity or many (e.g., the diverse structures for well intervals in Figure 1). Structures are typically manifest as schemas that provide a pattern for arranging the message. They are realized in the OGC suite of standards as domain-specific schema that apply OGC languages such as GML and GeoSPARQL. Hydro-related schema are primarily GML applications such as WaterML for water measurements (WaterML2; Taylor 2012), GroundWaterML for subsurface features like wells or aquifers (GWML1; Boisvert & Brodaric 2012), HY_Features for a broad suite of hydrologic features including river networks (Dornblut & Atkinson 2014), and RiverML for river channel descriptions (Jackson et al. 2014). Structure interoperability is also manifest by standard OGC web service structures, which provide a canonical interface to geospatial operations, including many for data access. Structural interoperability involves overcoming diverse structures for data or related web services, via alignment of associated schemas.
Semantics: refers to the meaning inherent in some component of a schema or data content. Such meanings are typically represented as digital definitions, and their form can vary from highly unstructured, e.g., free-form text, to highly structured, e.g., inter-related logic statements. The latter forms the basis for machine-readable ontologies, while vocabularies (e.g., glossaries, thesauri) are considered here to be semi-structured and thus positioned somewhere in middle of this range. An ontology element, often referred to as a concept, consists of a label, a logic-based definition, and its connection to other concepts via formal relations such as IS-A or PART-OF. In contrast, a vocabulary element, here called a term, might consist of a label, a text definition, and its connection to other terms via simple linguistic relations such as SYNONYM. For data interoperability, both ontologies and vocabularies are often used to overcome semantic heterogeneity, primarily by aligning concepts and/or terms to deal with meaning differences. For example, synonymy occurs when multiple words refer to the same meaning, and polysemy occurs when multiple meanings refer to the same word. A typical solution then maps the words to a canonical ontology or vocabulary containing distinct concepts or terms, respectively, for each meaning. OGC provides minimal support for semantics in relation to GML data delivery. This occurs primarily through feature catalogs, which have moderately expressive techniques for defining geospatial feature types (i.e., concepts), such as ‘river’ or ‘road’, and through standards for deriving, storing, and using ontologies developed from schemas for data or web services (ISO/TS 19150-1 2012). However, these efforts are not tightly integrated with GML for encoding and transmitting data, requiring integration of GML with ontologies and vocabularies to occur via ad hoc techniques often normalized within specific communities. In contrast, the Semantic Web enables geospatial feature types and data to be tightly coupled via languages such as RDF/OWL, which is leveraged by the OGC GeoSPARQL standard. Integration of RDF/OWL approaches with GML-based standards and related systems is ongoing and represents a fertile research domain (Schade & Smits 2012; Harvey et al. 2014).
Pragmatics: refers to contextual factors. While such factors can be quite diverse, including legal, organizational, economic, and other such factors, pragmatic interoperability frequently involves information about data provenance as well as guidelines about data use including best practices – how the data have been created and how they should be used or accessed (Brodaric 2007). Such factors are important because interoperability can be impeded even if all remaining levels are aligned. For example, the same type of data (semantically) could be collected using a variety of scientific methods that might be incompatible, due to application of discordant instruments or procedures, thus prohibiting data integration and the unified use of the data. OGC supports pragmatics through best practice specifications that further constrain how web services are to be deployed within specific domains or communities, typically by placing restrictions on aspects of a web service interface (e.g., GEOWOW Consortium 2014). Although such specifications could document how data are to be used, they typically describe how data should be accessed within a particularly constrained environment.
A conceptual interoperability architecture describes how functional components are arranged to enable transformation at each level. Such architectures are orthogonal to the interoperability levels, as any such component can provide functions that cross levels; for example, a component might carry out syntactic, structure, and semantic translation, or it might provide infrastructure support for other core functions such as data discovery. Such architectures, in general, can be described in terms of the type of functional components, connectivity, or layers (Heitmann et al. 2014). Functional interoperability components include those for the storage of data, metadata, and knowledge artifacts (e.g., schemas, ontologies, vocabularies, queries, and mappings), as well as for their access, caching, cache update, discovery, translation, integration, distribution, orchestration, mediation (i.e., via a mediator), and display. Connectivity approaches describe how functional components are organized, and include configuration aspects such as: (1) centralized or distributed, to describe the logical and physical location of components within a data network; (2) monistic or pluralistic, to describe the use of single or multiple knowledge artifacts; and (3) static or dynamic, to describe the degree of change among components, data, or knowledge artifacts. Unlike component or connectivity approaches, layer-based architectures focus on the role of a component within the data network, and prominent examples include client–server, service-oriented, and web-oriented architectures.
Client–server architecture (CSA): is a two layer approach to distributed computing in which tasks are segmented between resource requesters (clients) and providers (servers). In interoperable data networks, data management components (e.g., storage, cache, access, and update) are typically server responsibilities, display is a client responsibility, and the rest are deployed variously. As clients and servers are tightly coupled, components tend to be platform-specific resulting in high system heterogeneity.
Service-oriented architecture (SOA): is a three layer and loosely coupled approach in which platform-neutral and often standardized middleware is introduced between servers and clients. For example, the OGC standard web services represent canonical middleware primarily for accessing geospatial data, metadata, and executable codes. Component interactions follow a publish-find-bind strategy whereby server resources, such as data, are published via metadata descriptions and discovered by clients who request them from servers. The use of SOA primarily reduces system heterogeneity, due to the focus on platform independence.
Web-oriented architecture (WOA): extends SOA, mainly by adopting principles for web service interfaces from the web-oriented Representational State Transfer (REST) approach, such as standard web functions (HTTP GET, POST, etc.), identifiers Uniform Resource Identifier (URIs), and transport protocols (HTTP). Obtaining a web resource, such as a data set, then amounts to issuing a web request using a standard web function and identifier. WOA is thus resource-centric in contrast to the function-centric approach of SOA. WOA can be seen as further reducing structure heterogeneity via implementation of simpler web service interfaces, and by advancing pragmatic interoperability through increased ease-of-use of web services.
Geospatial interoperability infrastructures
Interoperability infrastructures refer to particular configurations of levels and architectures, and effectively constitute technical paradigms for data interoperability. Geospatial interoperability infrastructures target geospatial data and are most prominently exemplified by Spatial Data Infrastructures (SDI; Masser 2010) and Linked Open Data (LOD; Kuhn et al. 2014).
Spatial Data Infrastructures: an SDI is a network of online geospatial data resources, and affiliated people, policies, agreements, standards, and technological approaches that aim to share geospatial data and executable code over the web (Masser 2010). The vast majority of SDI implement SOA-based OGC standards at various levels of interoperability, causing SDI to be largely aligned with SOA-based techniques despite ongoing efforts to incorporate WOA approaches (Harvey et al. 2014). As a result, SDI can be seen to refer to two things, somewhat ambiguously: (1) a suite of technological and social approaches weighted towards SOA, which constitute an infrastructure; and (2) an interlinked collection of specific data sets and codes that implement these approaches, which constitute a network. In this paper we distinguish where required between SDI as an infrastructure vs a network. Functional components in SDI-based infrastructures are focused primarily on data and metadata access, and schemas are the main knowledge artifact. Data components tend to be distributed, while metadata components tend to be centralized. Monistic schemas often exist for various domains, networks can be static or dynamic, and a wide array of data sources are supported, such as those for maps, objects, fields, and sensors.
Linked Open Data: also aim to connect data on the web, however, primarily via WOA. LOD refines WOA principles by distinguishing between (a) web-resolvable identifiers (URIs) that return useful structured information about data, (b) a web page, and (c) its type definition, and also by promoting connectivity within and between these resources (Kuhn et al. 2014). LOD also adopts the Semantic Web stack, and thus supports all interoperability levels, but explicitly omits schemas inasmuch as structure constraints can be implicitly handled at the semantic and pragmatic levels. Semantics is tightly integrated, with intrinsic support for ontologies via RDF, Web Ontology Language (OWL), and related languages. LOD architectures are migrating from centralized to distributed approaches, which are strongly pluralistic and dynamic, particularly for knowledge artifacts such as ontologies. As a result, some aspects of these artifacts can often be discovered on-the-fly without adherence to preset and fixed content standards. Integration of LOD and SDI approaches is an ongoing concern (Schade & Smits 2012; Harvey et al. 2014).
Geospatial data interoperability networks
A geospatial data interoperability network is a group of geospatial data providers that interact using a particular implementation of an infrastructure. For example, major national multi-theme SDI data networks that utilize SDI infrastructures exist in Canada (e.g., GIN), the United States (e.g., NGWMN), and Europe (INSPIRE 2008), as well as globally (e.g., Global Earth Observation System of Systems (GEOSS); Nativi et al. 2014). For the most part, such SDI data networks are self-contained with limited connectivity between them – a counter-example is GEOSS, which explicitly aims for inter-connectivity. LOD networks are somewhat different, as LOD aims to create one open interconnected data network, the so-called LOD cloud, which is essentially part of the greater world-wide web. However, despite this aim, many data sets within this cloud are relatively isolated, essentially forming a distinct quasi-network, because connectivity is greater within than without; the number of external links, to or from the data set, is significantly less than the number of internal links within the data set, although this situation is highly dynamic and changing (Hogan et al. 2012).
Data interoperability has been widely researched theoretically with diverse applications in many disciplines. Of relevance here is the interoperability of geospatial, hydrological, and groundwater data, and the evolution and transference of approaches from the geospatial to hydrological to groundwater domains.
Geospatial data interoperability: interoperability levels for geospatial data (Bishr 1998; Sheth 1999; Brodaric 2007; Manso et al. 2009) are addressed via architectural approaches that have evolved initially from SOA to SDI to LOD (Wache et al. 2001; Manso et al. 2009; Nativi et al. 2014; Kuhn et al. 2014). Associated data networks span the variety of disciplines concerned with geospatial data, including the geosciences such as geology, land cover and use, soils, oceans, vegetation, and others (Brodaric & Gahegan 2006; INSPIRE 2008; Durbha et al. 2009; Lutz et al. 2009), but have not been created to date for the groundwater domain.
Surface water data interoperability: early interoperability data networks in the hydro domain focused on surface water data, rather than groundwater data, initially through the adoption of SOA and proprietary web services (e.g., Tarboton et al. 2011), including approaches to semantics (Beran & Piasecki 2009; Duce & Janowicz 2010), soon followed by SDI architectures and standard web services (Bermudez & Arctur 2011; Yu & Di 2014). Complementing this shift to SDI approaches is ongoing work on standard schemas for various aspects of surface water (Taylor 2012; Dornblut & Atkinson 2014; Jackson et al. 2014). Other recent developments, primarily among academic researchers, include LOD approaches to surface water to aid the creation of relations between data, particularly to aid decision-making (Curry et al. 2014; Anzaldi & Wu 2014). While many of these architectural aspects are transferable to groundwater data, groundwater features remain largely a secondary concern and are not represented comprehensively, and semantics support is focused primarily on data discovery rather than data delivery.
Groundwater data interoperability: until recently the construction and linkage of groundwater data networks has been largely absent outside the GIN and NGWMN projects. Early SDI efforts in groundwater include the specification of schemas for data exchange in North America and Europe (Boisvert & Brodaric 2012; INSPIRE 2013). These serve as a foundation for the initiation and ongoing development of international standards for groundwater features (Lucido & Booth 2014), as well as the construction of SDI-based groundwater data networks such as the Visualizing Victoria's Groundwater project for the Australian state of Victoria (Dahlhaus et al. 2012), and emerging efforts in New Zealand (Klug & Kmoch 2014). However, these groundwater data networks have not (yet) been explicitly linked with other such networks, in contrast to GIN and NGWMN.
RESULTS: GROUNDWATER DATA INTEROPERABILITY
In relation to the GIN and NGWMN data networks, data interoperability refers to the ability of each network to receive requests for data, and return appropriate data unified from multiple distributed, autonomous, and heterogeneous data sources. Both GIN and NGWMN can, in theory, include the other network as a data source – in essence, each can operate as a virtual unified online database that encompasses the other network's data holdings; although to date only GIN has enabled access to NGWMN. The cumulative holdings are massive, comprising millions of features, thousands of sensors, and billions of data values.
GIN and NGWMN architectures
Both GIN and NGWMN implement an SDI architecture. To enhance data network performance, primarily to optimize responses to data requests, both maintain centralized data caches, but they differ in the extent of cache, the cache contents, and in cache usage and completeness. NGWMN centrally pools all data in its US network, such that all data requests are answered from this cached copy. GIN, on the other hand, pools only some Canadian data, obtaining and translating the remainder dynamically upon request. In terms of cache contents, GIN caches original data, translating it on-the-fly to standard outputs, such as GWML1 (for features) or WaterML2 (for observations), in response to requests, while NGWMN translates and then caches the standard representations directly, thus avoiding on-the-fly translation, but thus limiting itself to specific representations. NGWMN maintains a complete cache of all observational data at all times, intended for all uses, while GIN maintains two caches of partial data for specific uses: a dynamic and temporary cache of recently requested observational data strictly for presentation functions, and a permanent cache of observation and other data for other functions such as download. Cached data are harvested periodically by both data networks via the OGC standard web services.
Both GIN and NGWMN are primarily monistic, as they support a selected suite of schemas and vocabularies for data requests and responses, although some features can be returned in multiple syntaxes, e.g., GML or JSON. They are also fundamentally static as the data sources are permanently connected to each data network, and while the amount of content might increase, the overall structure, semantics, and pragmatics of the data are relatively stable and do not fluctuate greatly. Another similarity is the adoption by both data networks of a typical three-tier SDI configuration for functional components, as shown in Figure 4: the bottom tier consists of distributed data components wrapped by data access components; the middle tier consists of centralized middleware components such as a mediator, data caches, metadata, and semantic repositories; and the top tier consists of web portals or web applications that use the middle tier to access data from the data networks.
GIN and NGWMN interoperability workflow
Addition of a data source to either data network involves registration of source details in the network's metadata catalog. This involves registration of artifacts at all five levels of interoperability, including web service locations (systems) and profiles for their invocation (pragmatics), as well as a mapping between source data and the network for each language-specific encoding (syntax) of a source schema (structure) and vocabulary (semantics). Once these are registered, the data networks can interoperate with the data sources.
GIN's centralized mediator is the core functional component for such interoperability, as it receives and responds to data requests, and in doing so it carries out distribution and translation functions. Requests for data originate from either human-driven web clients (e.g., the GIN portal) or machine-driven applications (e.g., external applications or its own data harvester). A request is made by calling an external web service, which triggers the mediator. The mediator determines which data sources, or cache, should receive the request by using data discovery components tied to metadata catalogs, and it translates the original request at each level to a request appropriate for each data source, finally sending the suitably translated requests to the data sources. Next, the mediator receives and translates the data responses at each level to a canonical standard (e.g., GWML1 for groundwater features, WaterML2 for observations), using the previously established data mappings and associated technologies, such as RDF repositories for vocabularies. The specific translations at each level are discussed in the next section. Lastly, it integrates the translated responses from each data source into a single result and returns this result to the requester. Where translation is not required at a specific step or level, GIN simply shuffles the request or response to the next portion of the workflow.
NGWMN provides the same external web services as GIN. However, unlike GIN, NGWMN's mediator is not invoked upon data requests to these external services, as the data are simply retrieved from the central pre-translated cache without translation or distribution. Mediation plays a role only in the harvesting process, during population of the cache: the mediator initiates requests for external data, customized to source query needs and using data discovery components, and translates the responses to its internal database structure for caching.
GIN and NGWMN levels of interoperability
The translation component within an interoperability architecture requires messages, primarily data requests and responses, to be transformed at each interoperability level. To compare interoperability strategies at each level, it is then useful to delineate between inter-network and intra-network interoperability: inter-network strategies consider interoperability approaches solely between GIN and NGWMN, while intra-network strategies are focused on approaches between a data network and its source data from Provincial and State agencies.
For inter-network interoperability, heterogeneity is largely eliminated between the data networks due to the adoption of common standards at each level. Consequently, many common knowledge artifacts, such as queries or schema, need not be transformed between the data networks; transformation is only required for a few disparities, such as some diverse vocabularies. Common aspects are restricted to certain core datatypes and functionality, and these represent a subset of the resources available for each data network. For example, NGWMN includes water quality data and some REST-conformant interfaces, whereas GIN does not. Inter-network interoperability is thus targeted at key overlapping data types and functions, and translation of knowledge artifacts is largely focused on the semantic level.
The same does not hold for intra-network interoperability, as heterogeneity prevails between the data sources and the common standards used by the data networks. Transformations are then required at each level to ensure intra-network interoperability. GIN and NGWMN utilize various strategies at each level to achieve intra-network interoperability.
In greater detail, the inter-network and intra-network interoperability approaches utilized at each level involve the following.
GIN and NGWMN system interoperability: is achieved through the implementation of technologies that enable the deployment of platform-independent OGC web service standards, which eliminate platform-specific system heterogeneity. These web services represent the canonical interfaces for all machine interactions not only between the GIN and NGWMN networks, but also to all other users of the data, and thus denote a common strategy for both inter-network and intra-network interoperability. They include WMS for accessing map images that depict the location of groundwater features such as wells and aquifers, WFS for accessing well-structured descriptions of such features, and SOS for accessing related groundwater measurements. Together, these services and underlying technologies provide a platform neutral and common approach to accessing vast amounts of data.
Intra-network syntax interoperability varies for the data networks: GIN utilizes GML as a canonical syntax between data sources and the data network, hence eliminates syntax heterogeneity for data, whereas NGWMN accepts a variety of non-standard syntaxes, e.g., un-constrained XML for water quality data, requiring source-specific syntax mappings between the data network and its syntactically variant data sources.
GIN and NGWMN structure interoperability: is achieved for both inter- and intra-network interoperability, except where noted, via the deployment of domain-specific data schema that extend GML. These include WaterML2 for water time series accessed via SOS, Observations & Measurements (O & M; Cox 2013) for other types of measurements accessed via SOS, GWML1 for groundwater features accessed via WFS, and GeoSciML (Sen & Duffy 2005) for geological features accessed via WFS.
The common adoption of these schemas enables inter-network interoperability significantly, as specific queries and resulting data can be passed between the GIN and NGWMN networks without further alteration of structure. Minor exceptions include schemas for rock logs from water wells, which are structured heterogeneously and require translation between data networks.
Intra-network structure interoperability is more complex, due to the prevalence of structure heterogeneity between the data network and its sources of data from State or Provincial agencies. Overcoming such heterogeneity requires establishment of unidirectional mappings between each source and a canonical schema. GIN uses GWML1 as a canonical data structure, whereas NGWMN's internal database structure serves as its canonical schema for consuming data from sources. Both GIN and NGWMN use the OGC web service interfaces as their canonical query structures. GIN's data mappings are expressed using EXtensible Stylesheet Language Transformation (XSLT), an XML transformation language, in combination with a declarative specification that equates XML schema components. XSLT plays a functional role, transforming a data structure from one schema to another using the equivalences between schema components expressed in the declarative specification. GIN query mappings are expressed in custom code, also drawing upon these equivalences. Schema interoperability within NGWMN is two-fold: individual data sources or clusters either support the canonical standards directly without need for translation, or source-specific mappings are constructed for each source schema.
GIN and NGWMN semantic interoperability: is achieved through the translation of data values to terms defined in canonical vocabularies and expressed in RDF or its Simple Knowledge Organization System (SKOS) application. However, in contrast to previous levels, GIN and NGWMN do not adopt the same canonical vocabularies, as each implements their own distinct internal standards. These are used to achieve both inter- and intra-network semantic interoperability, as data values from all sources, including the other data network, are mapped to the internal vocabularies. Both data networks deploy a limited number of such vocabularies for key data types, consequently each network outputs semantically homogenous data for those data mapped to the vocabularies, and heterogeneous data for the rest. For example, GIN implements a rock type vocabulary into which are mapped rock type values from all source data, including NGWMN, while NGWMN implements internal vocabularies for rock types, units-of-measure, water quality constituents, and methods. As with the schema mapping, GIN's semantic mapping consists of two parts: a declarative component that maintains a list of equivalences between source and network vocabulary terms, and a functional component, expressed in XSLT and linked code, that carries out the translation.
GIN and NGWMN pragmatic interoperability: is achieved via usage agreements, or profiles, developed between the data network and its data sources, including the other data network, thus facilitating both inter- and intra-network interoperability. For example, GIN and NGWMN have developed profiles for the common deployment of WMS, WFS, and SOS web services for inter-network interoperability. This is particularly significant for the SOS service where, for instance, some optional aspects were designated as mandatory, e.g., a listing of measured parameters, and other general aspects were made more specific, e.g., WaterML2 as the sole output data structure. Indeed, realization of the broader need for such an SOS profile in the water domain has prompted its recent and ongoing development within OGC (GEOWOW Consortium 2014). Intra-network pragmatic interoperability is similarly enforced by NGWMN through the development of distinct profiles for its data sources, and through the screening for incompatible methods of water quality data collection, to avoid incorrect usage of the data.
DEMONSTRATION OF RESULTS
Figure 6 illustrates inter- and intra-network interoperability of observational data, through retrieval of groundwater levels from the sites shown in Figure 2, which are located on either side of the international border. Inter-network interoperability is shown via GIN querying NGWMN for the Montana data, while intra-network interoperability is shown via GIN directly querying the Alberta data source. System and pragmatic interoperability is demonstrated via profiled use of SOS, syntax and structure interoperability is achieved via the adoption of WaterML2, and semantic interoperability is achieved through adoption of some common terms for measured properties, most notably for groundwater level. Observed response times, including the time to construct and display the graph, can vary from 1–5 seconds to 20–30 seconds, depending on the number of measurements and their cached status.
CONCLUSIONS AND FUTURE DIRECTIONS
Accompanying the rapid growth of online water data networks is an interoperability need: the quicker and larger such networks grow, the greater is the need to have them function in unison. All comprehensive solutions to this interoperability problem must align data at five levels, system, syntax, structure, semantics, and pragmatics, regardless of the adoption of any particular interoperability architecture. The GIN and NGWMN data networks exemplify how conformance to international standards and SDI architectures, local agreements about their implementation, and the development of shared vocabularies, can make large groundwater data networks interoperable at each level. An outstanding challenge remains at the semantic level, where very few re-usable vocabularies have been established both within and between respective data networks, as well as in the international community overall. Another significant challenge involves architectural expansion, to complement the existing SDI architecture with LOD approaches. A specific concern with LOD in the context of water data interoperability is the granularity problem: which resolution of data should be exposed as a web-indexed entity? For example, should each of the millions of sensors be exposed, or each of the billions of readings taken by those sensors? Another concern is the connectivity and scalability of data: how best to make linkages between data sources without centralizing the data into a common material (vs virtual) RDF repository? This is a problem in the groundwater domain especially, because relevant information is often distributed among various databases maintained by different agencies: for example, information about a water well itself, as well as about related features such as sensors or aquifers, is rarely available from one database or agency, yet it is often essential to have the full suite of information at hand for key activities such as resource management. Despite these anticipated challenges, the current state of North American groundwater interoperability represents a significant achievement: the first edition of an interoperable USA–Canada groundwater data network.
The authors gratefully thank the various individuals and agencies that contributed to GIN and NGWMN, particularly to the owners of the data illustrated herein: Alberta Environment and Sustainable Resource Development, as well as the Montana Bureau of Mines and Geology and specifically Luke Buckley. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the US Government.