Ageing infrastructure, increasing frequency and intensity of extreme events due to climate change, and increasing population demand have created various stresses on wastewater and stormwater infrastructure, which has led to frequent cases of combined and sanitary sewer overflows (CSOs and SSOs), among other issues. This has exacerbated the impact of sewershed management on society and the environment. With the advent of efficient sensory technologies, higher processing power and accessibility of advanced mathematical modeling techniques, it has become possible to create intelligent digital systems for sewersheds in the United States. This study proposes a holistic framework with best practices to improve data governance and model development at a sewershed scale and support the digital transformation of wastewater utilities in the United States. This study also presents the results from the questionnaire sent out to wastewater utilities in the United States to verify and evaluate the framework and the recommended steps.

  • Guidelines for the digital transformation of wastewater utilities are proposed.

  • A sewershed-scale framework for intelligent water systems is presented.

  • Data governance practices for intelligent water systems are discussed.

  • Modeling and data analytics for intelligent water systems are discussed.

  • The requirement for a holistic system-of-systems perspective for wastewater management is highlighted.

Wastewater infrastructure, a cornerstone of maintaining essential hygiene standards for a healthy society, plays a pivotal role in fostering economic growth, societal progress, and environmental improvements within communities. However, these systems are faced with challenges stemming from natural factors such as ageing infrastructure and extreme events, resulting in an uptick in incidents involving the overflow of both combined and sanitary sewers, known as combined sewer overflows (CSOs) and sanitary sewer overflows (SSOs), which can lead to surface and groundwater contamination (Liggett et al. 2018). To address these challenges, it is imperative to enhance the operational and managerial aspects of wastewater infrastructure to mitigate the adverse impacts of CSOs and SSOs on society and the environment. In the United States, wastewater utilities are increasingly embracing digital transformation initiatives to bolster the efficiency and reliability of their wastewater systems (Kapelan et al. 2020). Cutting-edge modeling techniques, increased computational capabilities, and the use of big data can be harnessed to innovate and revamp the oversight of sewershed systems, ushering in new approaches to sewershed management.

Wastewater utilities require a well-structured blueprint with clear, step-by-step directives to ensure the reliable implementation of available technologies. An intelligent water system framework, titled ‘Intelligent Water Infrastructure Systems Engineering (iWISE)’ has been developed as part of a project funded by The Water Research Foundation and it was developed through a collaborative effort involving Jacobs Engineering and the Sustainable Water Infrastructure Management (SWIM) Lab at Virginia Tech.

This paper concentrates on two key building blocks of the iWISE framework, which hold paramount importance in the development of an intelligent water system: data governance and modeling and analytics. In developing effective models for wastewater management, data governance, and robust modeling practices are fundamental. Data governance ensures accuracy, consistency, and accessibility for data, which are essential for constructing reliable models (Abdallah & Rosenberg 2019).

These building blocks provide the foundation for the digital transformation of wastewater utilities. The framework was developed based on a comprehensive literature and practice review. It provides wastewater utilities with the means to improve the efficiency of their operations at the sewershed scale, with the data governance modeling and analytics components.

In order to understand the need for a ‘Smart’ or ‘Intelligent’ ecosystem for wastewater infrastructure, it is important to introduce the concepts of Smart Cities, Smart Electric Grids, and Intelligent Transportation Systems. The concept of ‘Smart’ or ‘Intelligent’ approaches to managing infrastructure may have its origins in the Smart Growth movement of the late 1990s, which advocated new policies for urban planning. The term ‘Smart City’ is defined as a city in which Information and Communication Technology (ICT) is merged with traditional infrastructures, coordinated, and integrated using new digital technologies (Batty et al. 2012). The phrase ‘Smart’ has been adopted since 2005 by several technology companies for the application of complex information systems to integrate the operation of urban infrastructure and services such as buildings, transportation, electrical and water distribution, and public safety. It has since evolved to mean almost any form of technology-based innovation in the planning, development, and operation of infrastructure. There have been significant developments in the application of similar frameworks to other areas of infrastructure like electricity and transportation services. The two-way flow of electricity and data that is the essential characteristic of a smart grid enables to feed of information and data to the various stakeholders in the power sector which can be analyzed to optimize the grid, foresee potential issues, react faster when challenges arise, and build new capacities and services as the power landscape is changing (Ali & Choi 2020). Intelligent Transportation Systems is an advanced application that aims to provide innovative services relating to different modes of transport and traffic management and enable users to be better informed and safer, more coordinated, and a ‘smarter’ use of transport networks. From each of these applications, we can understand that intelligence is not just the application of technologies to get an output, but rather it leverages the use of these technologies to achieve a greater goal (Mathew 2020).

Intelligent water systems

Various types of water, including drinking, waste, storm, industrial, agricultural, and environmental water, along with water for energy and agriculture, are currently managed separately, despite their reliance on the natural water cycle. Stakeholders in natural, built, and social environments are now advocating for more coordinated and less isolated water management and governance. If we allow isolated approaches to persist in the governance of water systems, unaddressed vulnerabilities will increasingly threaten both local and global water security. The system-of-systems (SoS) approach for water governance integrates complex, interdependent water management problems across the natural, built, and social environments of water. A SoS is an assemblage of components that can individually be regarded as a system. SoS is more than the sum of constituents as it possesses emergent properties that stem from interactions between its component systems and dynamic environments.

The ‘Anthropocene’ represents the era where human influence significantly impacts Earth's systems. Coping with its intricate socio-environmental challenges becomes progressively challenging, as it's hard to foresee unintended consequences and align goals across interconnected systems while self-governing in this complex epoch (Little et al. 2019). Incorporating a social-ecological-technical systems (SETS) perspective into the adaptive management process requires a conceptualization of coupled human and natural systems and an assessment of underlying interdependencies among social drivers, institutions, and accrued benefits (McPhearson et al. 2022). Water SoSs can be categorized into natural, built, and social sub-systems, where the natural subsystem comprises all naturally occurring resources within the sewershed like water, land, and climate, the built sub-system comprises all man-made infrastructure assets managed by wastewater utilities, and the social subsystem comprises of the factors influencing societal and economic dimensions of the sewershed. These subsystems have individual components that are highly interdependent. For example, extreme climate events in the natural environment can impose stresses on the built environment, and the service demands that come from urbanization from the built and socio-economic environments of a sewershed can, in turn, impose stresses on the natural environment. Socio-environmental challenges are highly complex, necessitating a holistic modeling approach that considers the interconnected nature of each sub-system.

For each of these sub-systems, data collection is a crucial component of sewershed management for understanding the characteristics and complexity of the various interdependent components and developing models that can provide meaningful insights for decision-making. It is important to have a comprehensive understanding of all the parameters that have an impact on the insights derived through data analysis and modeling of a sewershed component (Angkasuwansiri & Sinha 2014). Identifying the critical parameters required for all components of the sewershed system also helps to identify the sources. Data sources for sewershed-related components can include utility instrumentation data (like CCTV, infrared, temperature, or flow sensors, among others), utility operational data (including GIS, SCADA, and inventory records, among others) and external data (from EPA, USGS, NOAA, among others). Data quality plays a crucial role in determining data reliability and directly impacts the performance and accuracy of modeling. In the context of sewer asset management, the quality of data has a direct impact on assessing the condition of assets and their stocks, and inaccuracies in data can impede effective asset management and can lead to erroneous assessments. Precision and comprehensiveness of data directly influence the efficacy and dependability of asset management models. Moreover, adjusting model parameters can help mitigate data limitations, thereby enhancing overall asset management results, and underscoring the interconnectedness between data quality and model parameters (Ahmadi 2014).

Wastewater utilities frequently employ mathematical models as a tool to analyse and derive insights for wastewater treatment and management. These models serve several functions, including forecasting performance, design and optimization of treatment plants, risk assessments, and renewal optimization. Various types of models, including physical, empirical, statistical, and simulation models, have been used to make predictions about future performance and optimize the operation of wastewater systems. The majority of the models examined in this research can be generally classified as either deterministic or probabilistic models. Deterministic models are generally used where relationships between components are certain and they produce a definite output, and a common challenge with such models is that the applicability of such models is limited to specific locations, whereas probabilistic models produce a range of outputs and account for uncertainty, increasing the range of applicability for such models. However, probabilistic models require extensive data to be able to effectively predict the probability of an event occurring (St. Clair & Sinha et al. 2012). The modeling methods used in the specific domains of failure prediction, performance estimation, and risk evaluation for wastewater infrastructure assets can be categorized as stochastic (Koo & Ariaratnam 2006; Robles-Velasco et al. 2021). Mathematical models, simulation modeling, and neural networks have been used to predict contaminant flow and pollutant fate and transport in rivers to manage downstream water quality (Kachiashvili et al. 2007; Parsaie & Haghiabi 2017). Various studies have used artificial intelligence (AI), machine learning (ML), and mathematical and statistical models for the modeling of sewer and stormwater pipes. AI techniques like fuzzy logic have been used for the performance prediction of gravity pipes and force mains while dealing with uncertainty in data (Yan & Vairavamoorthy 2003). Neural networks have also been widely used for the condition assessment of sewer pipes and assessing the importance of certain parameters on structural performance (Khan et al. 2010). Fuzzy logic and neural networks have also been used for the optimal scheduling time problem for pumping stations (Ostojin et al. 2011). Lifecycle assessments have also been employed to assess the environmental impacts of wastewater treatment systems and pumps. Such models are an efficient approach to quantifying the environmental impacts of different assets in the built environment of sewersheds (Jocanovic et al. 2019). Contemporary computing methods have found application in the field of treatment plant cost modeling, aiding in the evaluation of cost frameworks and the formulation of cost-saving strategies. ML methods have been used to construct an effective cost model for wastewater treatment facilities, considering energy consumption and water quality indicators (Torregrossa et al. 2018). Statistical modeling has also been utilized to create a cost function for sewage sludge and waste management, enhancing comprehension of cost structures and identifying potential savings by reducing sludge production (Molinos-Senante et al. 2013). The study identified a gap in the existing body of knowledge regarding the impact of socio-economic factors on the characteristics and management of sewersheds. Previous research has mainly centered on the consequences of societal actions and community infrastructure (Morton & Padgitt 2005). Nevertheless, the attention given to government policies and the repercussions of regulations on the evolution of sewersheds has been limited. Model evaluation, verification, and validation are crucial steps to ensure the reliability of the models developed. Sensitivity analysis has been extensively used in literature to identify the impact of input parameters on the model output (Saltelli 1999; Marrel et al. 2011).

Through the literature review, prominent data governance and modeling applications for the sewershed system were identified. The aim of the practice review is to capture the data governance and modeling practices of wastewater utilities. Interviews were conducted and case studies were developed to capture a comprehensive understanding of the real-world practices of utilities, including Houston Water, the New York City Department of Environmental Protection, and Hampton Roads Sanitation District (HRSD) in Southeast Virginia.

Houston water

Houston Water's implementation plans for intelligent water practices involve innovative approaches like collaboration with academia and other utilities, progressive operations, and in-house planning and analytics among others, to solve challenges including ageing infrastructure, meeting stakeholder expectations, regulatory compliance, and climate change. They have a holistic data collection methodology that enables them to collect critical parameters for components from the natural, built, and social sub-systems. They have also established a data pipeline for both manual and automated data preparation, that enables data collection, quality checks, and data analysis. Their modeling practices include (i) rating SSO risk levels for optimizing preventative cleaning; (ii) optimized sensor placement, (iii) predictive analytics for SSO detection; (iv) preventative risk-based asset management; and (v) defect detection in pipes using AI. The risk-based asset management approach was implemented to force main renewal prioritization. Their methodology followed clearly outlined steps, starting with identifying individual assets for data collection. The factors impacting renewal were identified and their relative importance was determined based on a weighted score. Risk scores were determined for individual criteria and the asset ranking was used to prioritize renewal plans.

New York City Department of Environmental Protection (NYDEP)

The NYDEP's Bureau of Water Supply (BWS) Data Governance Program addresses various issues regarding data collection, data quality, quantity, storage, and regulatory requirements. NYDEP defines data governance as an organizing strategy for the creation of policies and for managing data within an organization. These frameworks and defined protocols for data governance are crucial to ensure data quality and reduce risk associated with data. At present, data management within the BWS operates independently within each work unit or directorate, frequently in coordination with either the Bureau of Business Information Technology or at the Agency level. The introduction of a governance program at the Bureau level seeks to build upon existing effective practices within directorates while strengthening areas in need of enhancement. The swift technological progress within the Bureau has resulted in the emergence of scattered data repositories and, in certain instances, duplicate data across various operational domains. Introducing an effective data governance structure is imperative to set higher benchmarks for data generation, utilization, and management. Unlike certain sectors that adopt elaborate data governance frameworks driven by regulations, BWS has devised a streamlined model that provides adequate governance to improve data accessibility, functionality, and reliability. The objectives of this data governance project are primarily to provide a good understanding of their collected data, better documentation of data utilization, improve data access and reliability and ultimately advance BWS' readiness for cloud architecture adoption to centralize and standardize data governance across the enterprise.

Hampton roads sanitation district

HRSD has been actively working on several projects toward developing a smart water system. Their Sustainable Water Initiative for Tomorrow (SWIFT) program, which involves taking highly treated water and putting it through additional rounds of advanced water treatment and then added to the Potomac aquifer, integrates building information modeling (BIM) with geographic information systems (GIS) to create a detailed digital model or digital twin. This technology enables operational intelligence with 3D data for newly constructed facilities and presently the team is collecting and visualizing real-time sensor data.

HRSD uses sensors for condition assessment of irrigation wells to measure water quality changes. This data collected is integrated with a GIS map, enabling continuous monitoring. They also use sensors for monitoring pump performance and pressure levels, and utility professionals use GIS to inspect each asset. The generated data contributes to an operational performance indicator (PI) dashboard, integrating a map, sensor data, and graphical representations to monitor various variables like flow, pressure, and rainfall.

HRSD uses the MIKE URBAN software to improve their regional hydraulic model (RHM), which facilitates system capacity assessment, facility dimension determination, and flow routing decisions.

With the advent of advanced data governance and modeling techniques and applications for wastewater management, as highlighted in the literature review, many water utilities are implementing such techniques to support their digital transformation. However, through this practice review, it can be highlighted that these efforts are limited from a SoS perspective, failing to account for the interdependencies between subsystems.

There are many definitions of intelligent water systems that exist in literature which focus on technologies and techniques used and rarely consider the human aspect in the application of these technologies. The iWISE framework proposes a new definition of intelligent water systems based on the comprehensive literature and practice review. The definition is as follows – ‘An Intelligent Water System integrates and derives information from a cyber-space, physical-space and social-space based on improved water system-of-systems understanding at sewershed scale implementing data collection, database management, modeling techniques, decision support paradigms, and intelligent workforce skills to support risk-based decision making and optimize lifecycle management of one water (drinking water, wastewater, stormwater and clean water) that are equitable, affordable, efficient, reliable, sustainable, and resilient for healthy and thriving communities.’ This framework relies on data generated throughout it's lifecycle by people, process, and technology. The application of intelligence to water systems is not limited to just sensing technologies and advanced modeling techniques. Rather, the application of intelligence should follow the entirety of the data lifecycle from the point of data collection to the point where it becomes useful knowledge for decision support with humans in the loop. There are many challenges associated with implementing intelligent water systems and the iWISE framework, like other disruptive technologies, requires a cultural shift in the utility for better adoption of the utility-wide changes, more focus on enhancing resiliency of all digital systems, building trust in the new proposed methods and technologies through decision support and visualization tools, and a diverse workforce with the necessary digital skills for solving complex sewershed issues by leveraging advanced computational methods (Thompson et al. 2025).

The proposed iWISE Framework for implementing Intelligent Water Systems at the sewershed scale is based on building blocks and provides recommended approaches to help implement each building block and support the transition from regular operations to iWISE at the sewershed scale. The structure of the framework takes a systems approach that considers the complex nature of a sewershed system and is inspired by the SETS framework (McPhearson et al. 2022), which highlights the importance of coordinating natural, built, and social sub-systems within the sewershed system and understanding their interactions and the factors that affect ecosystem services. The concept of considering the interactions between the SETS lays a foundation for the holistic systems approach that is used to develop the iWISE framework at the sewershed scale (Sinha et al. 2023).

The framework has been developed for implementation at the sewershed scale, and this system is divided into three sub-systems – natural, built, and social. The natural sub-system consists of all naturally occurring resources within the sewershed like water, land, and climate. The built sub-system includes components like wastewater, stormwater, and household, commercial, and industrial infrastructure. The social subsystem is categorized into communities, policies and regulations, and finance and economics.

The building blocks of the overall framework are shown in Figure 1 and are categorized into the foundational layer, technical layer, and organizational layer of iWISE, which provide recommended approaches for each of these aspects for wastewater utilities to transition to iWISE at a sewershed scale. The foundational layer includes the Intelligent Water Systems Understanding building block. The technical layer includes the upper level and lower level building blocks, where the upper layer consists of the Physical Systems, Digital Systems, and Planning & Implementation building blocks and the lower level consists of the Data Governance, Database Management, Data Analytics, and Decision Support Building blocks. The organizational layer includes the Workforce, Diversity and Inclusion, Environmental, Social and Governance, Stakeholder Engagement, and Innovation Ecosystem building blocks. The building blocks have been structured in a circular fashion such that utilities can implement the various building blocks in any order and may customize their implementation plans in accordance with their goals, available resources, and the current stage of their digital transformation journey.
Figure 1

Building blocks of iWISE Framework.

Figure 1

Building blocks of iWISE Framework.

Close modal

This paper presents the proposed methods for the development of the following building blocks of the iWISE framework:

  • Data Governance

  • Data Analytics

Data governance building block

The complex nature of sewershed SoS comes from the interdependent characteristics of the subsystems and their associated components. These complexities are captured by having a structured approach to the collection of accurate data from each of these subsystems. The collection of accurate data results in an improved ability to measure, analyze, and manage the status quo. Therefore, a structured approach to data collection, identification of data sources, and data quality-checking protocols are essential for reliable system-level modeling and data analytics. These analytics can help capture interdependent characteristics and build efficient and resilient models for implementing iWISE tasks (Rinaldi et al. 2001). The essential tasks to be followed for effective data governance are detailed as follows.
  1. Identify data parameters: A comprehensive list of data parameters from each of the sub-systems enables a better understanding of the sewershed and results in a list of parameters required to perform analysis on different sub-systems. Table 1 describes the main categories from which parameters must be collected within the natural, built, and social subsystems. Specific data parameters and their units within each of these categories must be collected to then be utilized for developing the models as described in the data analytics building block (Thompson et al. 2025).

  2. Collect data parameters: The iWISE framework recommends the characterization of data sources based on the point of origin as it helps track the data for verification, set privacy levels, develop metadata, compare performance, and set standards for data generation systems in terms of data quality. The categories of data source and the corresponding metadata information are shown in Figure 2. This figure shows how data from different sources in a sewershed can be categorized and what type of metadata should be collected for each type of source.

  3. Check data parameters: The iWISE framework proposes the steps shown in Figure 3 for developing holistic DQPs that can be applied to different sub-system data management tasks in Intelligent Water Systems at a sewershed scale. Utilities should develop data quality protocols (DQPs) to evaluate the accuracy and precision of collected data.

Table 1

Data parameter categories for sewershed sub-system components

Sub-systemCategoryData parameter category
Natural Water 
  • ▪ Contamination of surface water, groundwater by discharge, and runoff water

  • ▪ Inflow and infiltration

 
Land 
  • ▪ Contamination of surface and groundwater due to runoff from landfills and agricultural activities

 
Climate 
  • ▪ Frequency and duration of extreme events in the sewershed

 
Built Household/commercial/industrial, wastewater, and stormwater infrastructure 
  • Asset inventory: Asset ID, Geospatial location, and other identification data of all managed assets (manholes, gravity pipes, network structures, etc.)

 
  • Asset performance: Asset current and historical condition and functional adequacy data used to support renewal decisions

 
  • Asset renewal: Historical information on renewal activities (including forensic data) to identify trends and patterns to support renewal decisions

 
  • Asset risk: Consequence of failure parameters for risk-based renewal prioritization (likelihood of failure parameters are covered in asset performance)

 
  • Asset finance: Costs involved in every stage (design, construction, installation, operations, and maintenance (O&M), renewal, and disposal) of the lifecycle of an asset to support comparative analysis of replacement alternatives

 
Social Community 
  • ▪ Modeling of system demand and social conditions of the residential, commercial, and industrial population of a sewershed

 
Laws and Policies 
  • ▪ Analysis of management practices, organizational structure and regulatory impacts on water utility management

 
Finance and Economic 
  • ▪ Operational and local criteria to support models such as life cycle assessments to quantify the environmental impacts of a product across each stage of it's life cycle, and analysis of sources of funding (community willingness to pay, private and government funding)

 
Sub-systemCategoryData parameter category
Natural Water 
  • ▪ Contamination of surface water, groundwater by discharge, and runoff water

  • ▪ Inflow and infiltration

 
Land 
  • ▪ Contamination of surface and groundwater due to runoff from landfills and agricultural activities

 
Climate 
  • ▪ Frequency and duration of extreme events in the sewershed

 
Built Household/commercial/industrial, wastewater, and stormwater infrastructure 
  • Asset inventory: Asset ID, Geospatial location, and other identification data of all managed assets (manholes, gravity pipes, network structures, etc.)

 
  • Asset performance: Asset current and historical condition and functional adequacy data used to support renewal decisions

 
  • Asset renewal: Historical information on renewal activities (including forensic data) to identify trends and patterns to support renewal decisions

 
  • Asset risk: Consequence of failure parameters for risk-based renewal prioritization (likelihood of failure parameters are covered in asset performance)

 
  • Asset finance: Costs involved in every stage (design, construction, installation, operations, and maintenance (O&M), renewal, and disposal) of the lifecycle of an asset to support comparative analysis of replacement alternatives

 
Social Community 
  • ▪ Modeling of system demand and social conditions of the residential, commercial, and industrial population of a sewershed

 
Laws and Policies 
  • ▪ Analysis of management practices, organizational structure and regulatory impacts on water utility management

 
Finance and Economic 
  • ▪ Operational and local criteria to support models such as life cycle assessments to quantify the environmental impacts of a product across each stage of it's life cycle, and analysis of sources of funding (community willingness to pay, private and government funding)

 
Figure 2

Sewershed data sources and corresponding metadata.

Figure 2

Sewershed data sources and corresponding metadata.

Close modal
Figure 3

DQPs for managing sewershed data.

Figure 3

DQPs for managing sewershed data.

Close modal

The DQPs shown in Figure 3 can be broadly classified into three main categories – project planning, data checking, and review and improvement of overall DQPs.

The project planning step comprises tasks like establishing a data quality-checking team along with identifying data quality objectives, any training requirements, or documentation protocols.

The data-checking step consists of aspects related to checking data quality, data quantity, and data preprocessing. The quality of data can be determined based on five major categories – data source, data integrity, data timeliness, data relevance, and data reliability. The quality of data depends on the reliability of the data source. Data integrity refers to the accuracy and completeness of data maintained across different formats. If data collected is not recent, it can lead to inaccuracies in the insights derived from data analysis. Data relevance refers to how accurate and relevant the data being collected is to the use case. To check for data reliability, data should be categorized into three main categories ranked in order of reliability – direct measurement, derived indirectly, and educated guesses.

Models developed by utilities perform well when they are trained with large amounts of data. The 5 Vs of big data should be associated with the quantity check of collected data – volume, velocity, variety, veracity, and value. Volume refers to having enough samples, features, and quantity that will help the model learn patterns in the data better and perform more efficiently. Velocity is an important factor as a faster rate of data collection leads to more timely model training. A variety of data is crucial to ensure that the dataset is representative of all types of data. Veracity refers to the inconsistency and uncertainty observed in data. The value of the data and the insights it will ultimately provide should be considered while developing models.

Data preprocessing involves steps to make sure the data is ready before model development. It involves steps like data cleaning, data transformation, data balancing, and data normalization. Data cleaning refers to techniques like removing rows or columns with missing data or using mean and median to fill in missing values to handle missing data values. Data transformation refers to the conversion of categorical variables into numerical format. Data balancing refers to the use of techniques like undersampling or oversampling to address class imbalances in the data to avoid an imbalance in the number of samples in each class of datasets. Data normalization is used to organize and standardize data collected from multiple sources that may have inconsistent formats.

The ‘Review and Improvement’ steps in the DQPs involve periodically assessing and improving the DQPs based on feedback, changing regulations, and advancements in technology.

Data analytics building block

Performing modeling and data analytics for intelligent sewershed management is a highly complex task and requires the understanding of the various dimensions of modeling involved, how to maintain the models, and finally ensuring the model is robust through effective verification and validation (V&V) of model outputs. Any model is an abstraction of the real world, and this abstraction of real-world complexities into simpler mathematical formats may lead to inaccuracies. Robust V&V protocols can help increase the reliability of these models. The following are the main tasks to perform for iWISE data analytics:
  • 1. Identify systems modeling dimensions: Different modeling methods are used to analyse sewershed operations and use the output to facilitate informed decision-making. These models also encapsulate the interdependencies among distinct components through model parameters, weights, and regulations. The typical modeling areas should be identified for all three sub-systems (natural, built, and social) to effectively model the entire sewershed, and have been shown in Table 2.

  • 2. Identify systems modeling techniques: Modeling can be performed in a variety of ways depending on the problem at hand. Each use case can be represented differently in the form of different techniques. Figure 4 demonstrates that a modeling strategy must start with identifying the various techniques available across different dimensions. These dimensions include categories of insights offered, level of analysis, system component categories, types of decisions, mathematical techniques used, and analysis techniques used. Mathematical techniques used for modeling in wastewater management include deterministic, probabilistic, and AI models, where deterministic techniques are formulaic models where the final output is completely determined based on the input parameters, whereas probabilistic models produce a range of outputs and account for uncertainty.

Table 2

Modeling focus areas for sewershed sub-system components

Sub-systemCategoryModeling focus area
Natural Water 
  • ▪ Modeling of contamination of surface water, groundwater, and runoff water and the risk it presents

 
Land 
  • ▪ Modeling of contamination of surface and groundwater due to runoff from landfills and agricultural activities and risks presented by leachate

 
Climate 
  • ▪ Modeling of rainfall–runoff and risk analysis of CSO/SSO events, risks presented by extreme events

 
Built Household/commercial/industrial, wastewater, and stormwater infrastructure 
  • Asset performance: Asset current and historical condition and functional adequacy analysis used to support renewal decisions

 
  • Asset renewal: Modeling using historical information of renewal activities (including forensic data) to identify trends and patterns to support renewal decisions

 
  • Asset risk: Consequence of Failure prediction modeling for risk-based renewal prioritization

 
  • Asset finance: Cost modeling (involving costs involved in every stage of the lifecycle of an asset) to support comparative analysis of replacement alternatives

 
Social Community 
  • ▪ Demand forecasting and public perception understanding

 
Laws and policies 
  • ▪ Impact of organizational structure on asset management practices, economic impacts of laws and regulations

 
Finance and economic 
  • ▪ Life cycle assessment modeling, willingness to pay, asset management

 
Sub-systemCategoryModeling focus area
Natural Water 
  • ▪ Modeling of contamination of surface water, groundwater, and runoff water and the risk it presents

 
Land 
  • ▪ Modeling of contamination of surface and groundwater due to runoff from landfills and agricultural activities and risks presented by leachate

 
Climate 
  • ▪ Modeling of rainfall–runoff and risk analysis of CSO/SSO events, risks presented by extreme events

 
Built Household/commercial/industrial, wastewater, and stormwater infrastructure 
  • Asset performance: Asset current and historical condition and functional adequacy analysis used to support renewal decisions

 
  • Asset renewal: Modeling using historical information of renewal activities (including forensic data) to identify trends and patterns to support renewal decisions

 
  • Asset risk: Consequence of Failure prediction modeling for risk-based renewal prioritization

 
  • Asset finance: Cost modeling (involving costs involved in every stage of the lifecycle of an asset) to support comparative analysis of replacement alternatives

 
Social Community 
  • ▪ Demand forecasting and public perception understanding

 
Laws and policies 
  • ▪ Impact of organizational structure on asset management practices, economic impacts of laws and regulations

 
Finance and economic 
  • ▪ Life cycle assessment modeling, willingness to pay, asset management

 
Figure 4

Modeling techniques and dimensions.

Figure 4

Modeling techniques and dimensions.

Close modal
The categories of modeling refer to how the output of the model is used for different types of insights required. For example, structural performance models like performance prediction or leak detection fall under the engineering category, whereas renewal prioritization or economic predictions would fall under the management category. The strategic, tactical, and operational models refer to how the model output is used for decision-making. Strategic models support long-term decisions, tactical models help identify specific problem areas and operational models are used for asset-level analysis.
  • 3. Identify V&V strategies: The V&V protocols are used to systematically test and understand the model's internal logic, quantify uncertainties, and ensure model reliability and robustness. Verification is done by checking the model logic and parametric influence on the model output. Validation is the process of evaluating the model performance through ground truth comparison, competing methods, and statistical and graphical tests. The effectiveness of models relies heavily on the quality of the data and knowledge used to construct them. They cannot be more precise than the errors present in the input and observed data. Also, since the measurements used to describe and assess a model vary depending on the specific problem, there is not one universally accepted statistic or test to determine if a model is validated. Typically, it requires a mix of methods like expert review, visual comparisons, and statistical tests, using both quantitative and qualitative measures. This approach is also known as the ‘weight-of-evidence’ approach. The steps for model V&V are shown in Figure 5.

Figure 5

Modeling V&V approach.

Figure 5

Modeling V&V approach.

Close modal

iWISE framework verification and validation

The iWISE framework building blocks were verified and validated through a series of workshops, brainstorming sessions, and feedback from technical experts and finally validated through a set of questionnaires shared with utilities across the United States to gain insight into their current practices and assess their readiness for digital transformation.

Verification

The verification of the proposed framework was performed by involving and consulting with utilities and domain experts from the water sector across various organizations. This helped verify the content of the framework and ensure that the building blocks are appropriate and can help guide utilities toward digital transformation. This process was done in three steps:

  • 1. Workshops with large utilities: A series of two-hour workshops with large-scale utilities were conducted to get critical feedback on the structure of the framework, individual building blocks, and the content within each building block.

  • 2. Brainstorming session with Jacobs Consulting: The building blocks were reviewed and revised based on feedback from a two-day brainstorming session with Jacobs Engineering, a large consulting firm in the water sector.

  • 3. Comments and feedback from the iWIN Committee: The Intelligent Water Infrastructure Network (iWIN) committee comprises technical domain experts from Oak Ridge National Lab (ORNL), Jacobs, Arcadis, DC Water, City of Houston, Metropolitan Water Reclamation District of Greater Chicago, Clean Water Services, and Virginia Tech from the United States, Anglian Water from the United Kingdom, and Metro Vancouver from Canada. We collected input from the committee and incorporated it into the development of the building blocks.

Validation

Following the survey methodology adopted by the Water Research Foundation for assessing the current and future states of the use of advanced sensors in urban sewersheds (Liggett et al. 2018), to validate the framework, layers, building blocks and the individual steps proposed, questionnaires were prepared with questions that address each of the steps discussed in each building block. A set of questionnaires was developed to explore the current intelligent water practices and willingness to implement the proposed building blocks of large, medium, and small utilities. The questions were prepared based on all the building blocks of iWISE and were sent to 100+ utilities across the United States, Canada, and the United Kingdom. The questionnaires captured that some of the utilities are already applying intelligent water practices and are at various stages of implementation across small, medium, and large utilities, and collectively follow similar steps as proposed in the iWISE framework.

The questionnaires were developed based on the overall framework for assessing water and wastewater utilities' current level of digital maturity and willingness to adopt intelligent water practices. This section discusses the results from questionnaire and interview responses on the data governance and data analytics building blocks. The discussion is presented as a mix of the responses received from the questionnaire and the information captured during the workshop-style interviews.

To determine the extent of iWISE implementation, utilities were classified into three groups: small, medium, and large utilities. Water and wastewater systems in the United States exhibit a range of characteristics and issues, stemming from variations in customer base, wastewater volume, infrastructure complexity, and regulatory supervision.

Small water utilities typically serve populations of less than 50,000 and are often situated in rural areas. They tend to have simpler treatment and distribution systems, featuring fewer wells, pumps, and storage tanks. Distinct challenges for small utilities stem from factors like their geographic location, workforce availability, and financial funding mechanisms. These utilities can be managed by local governments, private companies, or cooperatives, and they often encounter difficulties in securing funding and maintaining their infrastructure. Small utilities see themselves in the basic to preliminary stage of the ‘digital transformation’ journey where they are exploring options and developing strategies for implementing some aspects of data governance and modeling. Regarding data collection, they focus solely on gathering process-specific data mandated by regulations. They primarily collect data from the systems within their service area, and any data collected outside is project-specific. Their analysis and data collection adhere to daily regulatory demands, lacking long-term applications, and are limited to analyzing individual built components to inform decisions regarding specific assets. While some respondents from small utilities do not have established DQPs, they are implementing preliminary methods to ensure data quality, such as identifying data quality objectives, instrument calibration, regular testing and maintenance, and field and lab-generated data quality control. Although smaller utilities have limited resources, the questionnaire findings show that preliminary DQPs can still be achieved. The questionnaire showed similar results for medium utilities.

Medium-scale utilities generally cater to populations ranging from 50,000 to 250,000 and are situated in suburban areas or smaller cities. These utilities possess a more extensive array of water and wastewater infrastructure compared to smaller counterparts, encompassing multiple water sources, treatment plants, and storage facilities. Typically managed by public entities like municipal or county governments, medium-sized utilities are subject to regulatory supervision and obligatory reporting.

Medium-scale water utilities, often located in suburban or smaller city areas, may have numerous small utilities nearby in towns and rural regions. Consequently, they understand the significance of gathering data related to external natural systems, particularly for tasks such as monitoring water contamination levels to safeguard downstream areas. Their data collection is relatively preliminary, focusing on what's necessary for analyzing assets within their service area. Their databases typically operate in isolation and lack automation; data integration is a manual process needed for specific applications. They lack automated data lifecycles and interoperability within their information systems. Their decision-making supports both short-term and long-term needs, providing insights into sewershed performance and individual assets, and aiding strategic planning for daily operations and long-term sustainability. They acknowledge the importance of systematic technology utilization, albeit without the economies of scale enjoyed by larger utilities, which can pose challenges in financing capital improvements and iWISE projects.

Large water utilities typically serve a population of more than 250,000 and are in larger cities or metropolitan areas. They have an extensive network of water and wastewater assets, including multiple sources of water supply, treatment plants, and storage facilities. Most large utilities have a holistic data collection process as they collect all the types of data within each sub-system as described in the iWISE framework. Figure 6 shows that most utilities are collecting a wide range of data from all three subsystems as defined by this framework for the sewershed scale (natural, built, and social).
Figure 6

Data collected from the natural, built, and social sub-system components by large utilities.

Figure 6

Data collected from the natural, built, and social sub-system components by large utilities.

Close modal
As large utilities collect large amounts of data, this data also comes from a variety of sources, including utility instrumentation operational data and external data. From the interviews and questionnaires with large utilities, it was found that the majority of utilities use utility instrumentation and utility operational data, while half of the respondents also collect external data depending on the use case. After data collection, the data is verified and goes through quality-checking protocols. While most of the respondent utilities have some level of data quality-checking protocols, there were a few exceptions. The data quality-checking steps are done as part of their regular procedures and they do not have a formal or dedicated project planning setup for quality assessment. The results from their responses regarding the steps they follow in their DQPs are shown in Figure 7.
Figure 7

DQPs followed by large utilities.

Figure 7

DQPs followed by large utilities.

Close modal
Large utilities develop models that offer insights for various focus areas for the components of the natural, built, and social subsystems of their respective sewersheds. The findings, as shown in Figure 8, indicate that all the respondent utilities engage in modeling to evaluate their asset performance, and develop a well-rounded set of models for asset renewal and asset risk. However, there is potential for enhancement in the realm of modeling related to asset finance. In the social sub-system, the primary modeling emphasis centers around demand forecasting and assessing the lifespan of constructed assets. The responses from major utilities suggest that there's an opportunity to enhance the models, the data collected, and consequently, the decision-making process for elements within the social sub-system.
Figure 8

Models developed for different sub-system focus areas by large utilities.

Figure 8

Models developed for different sub-system focus areas by large utilities.

Close modal

The utilities were also asked about their various dimensions of modeling, mathematical techniques used, subsystems considered, and types of insights their models offer. Large utilities use several types of analytical models to understand their natural, built, and social subsystems and support their decision-making process. Most large utilities carry out component-level analysis, which supports decisions to manage specific problems. The analysis conducted by most utilities aids in tactical decision-making, involving the identification of trouble spots and areas of concern to discern patterns in datasets. It also supports operational decision-making, which pertains to day-to-day choices made at the asset level. Most of their models are probabilistic, indicating that their analyses facilitate predictive and dynamic modeling, which in turn can assist in making real-time decisions.

This study developed the specific steps and guidelines for developing the data governance modeling and analytics aspects of wastewater management at the sewershed scale for utilities moving toward digital transformation. The entire iWISE framework offers guidelines for intelligent water system implementation based on system understanding of sewersheds, and the challenges utilities face when making the transition towards holistic and digitally driven management practices. Based on the literature review and practice review for understanding intelligent water practices, the framework was developed, and data governance and modeling practices were outlined. Based on the questionnaire responses and interviews conducted with domain experts, several findings were concluded.

  • The proposed data governance and modeling building blocks can be utilized by any wastewater utility and should begin with understanding the definition of iWISE.

  • Data governance and modeling practices for intelligent water systems require a SoS approach to account for interactions between the natural, built, and social sub-systems as well as their interactions across sewersheds.

  • Having the right parameters is essential to develop effective models that provide accurate analysis of sewershed operations, and data needs to be collected from all components and sub-systems within the sewershed to ensure holistic data collection.

  • Knowledge about data sources is crucial to ensure reliability, ease of understanding, identifying relationality among datasets, and developing data integration and preprocessing rules.

  • Modeling for iWISE at the sewershed scale requires understanding the various dimensions of modeling involved and utilities must have a range of modeling techniques that can be used to derive insights for all components and processes managed.

  • V&Vof models can be considered as a feedback loop to continuously calibrate and improve the model and ensure robust analytics.

The authors would like to express their gratitude to The Water Research Foundation for funding this study, Jacobs for feedback and review of the study material, the members of the Sustainable Water Infrastructure Management (SWIM) lab at Virginia Tech for their support and guidance, and the participating water utilities for their valuable input and response to this study.

This study was funded by the Water Research Foundation (WRF project 4797).

R.D. and S.S. conceptualized the study, and wrote, reviewed, and edited the article.

All relevant data are available from an online repository or repositories: https://docs.google.com/spreadsheets/d/1kc4aB2xrrg3Fn-B2gmBS2-X048_gDsvEobcp8RCly3M/edit?usp=sharing.

The authors declare there is no conflict.

Abdallah
A. M.
&
Rosenberg
D. E.
(
2019
)
A data model to manage data for water resources systems modeling
,
Environmental Modelling & Software
,
115
,
113
127
.
https://doi.org/10.1016/j.envsoft.2019.02.005
.
Ahmadi
M.
(
2014
)
Sewer Asset Management: Impact of Data Quality and Models’ Parameters on Condition Assessment of Assets and Asset Stocks
.
Doctoral dissertation
,
Villeurbanne, France: INSA de Lyon
.
Ali
S. S.
&
Choi
B. J.
(
2020
)
State-of-the-art artificial intelligence techniques for distributed smart grids: a review
,
Electronics
,
9
(
6
),
1030
.
https://doi.org/10.3390/electronics9061030
.
Angkasuwansiri
T.
&
Sinha
S. K.
(
2014
)
Development of wastewater pipe performance index and performance prediction model
,
International Journal of Sustainable Materials and Structural Systems
,
1
(
3
),
244
.
doi:10.1504/ijsmss.2014.062767
.
Batty
M.
,
Axhausen
K. W.
,
Giannotti
F.
,
Pozdnoukhov
A.
,
Bazzani
A.
,
Wachowicz
M.
,
Ouzounis
G.
&
Portugali
Y.
(
2012
)
Smart cities of the future
,
The European Physical Journal Special Topics
,
214
(
1
),
481
518
.
Jocanovic
M.
,
Agarski
B.
,
Karanovic
V.
,
Orosnjak
M.
,
Ilic Micunovic
M.
,
Ostojic
G.
&
Stankovski
S.
(
2019
)
LCA/LCC model for evaluation of pump units in water distribution systems
,
Symmetry
,
11
(
9
),
1181
.
Kachiashvili
K.
,
Gordeziani
D.
,
Lazarov
R.
&
Melikdzhanian
D.
(
2007
)
Modeling and simulation of pollutants transport in rivers
,
Applied Mathematical Modelling
,
31
(
7
),
1371
1396
.
Kapelan
Z.
,
Weisbord
E.
&
Babovic
V.
(
2020
)
Explained: Artificial Intelligence Solutions for the Water Sector
.
London, UK: International Water Association (IWA)
.
Retrieved July 1, 2023, Available at: https://iwa-network.org/wp-content/uploa ds/2020/08/IWA_2020_Artificial_Intelligence_SCREEN.pdf (Accessed: 6 November 2024)
.
Khan
Z.
,
Zayed
T.
&
Moselhi
O.
(
2010
)
Structural condition assessment of sewer pipelines
,
Journal of Performance of Constructed Facilities
,
24
(
2
),
170
179
.
Koo
D. H.
&
Ariaratnam
S. T.
(
2006
)
Innovative method for assessment of underground sewer pipe condition
,
Automation in Construction
,
15
(
4
),
479
488
.
Liggett
J.
,
Macintosh
C.
&
Thompson
K.
(
2018
)
Designing Sensor Networks and Locations on an Urban Sewershed Scale. Project 4835
.
Denver, CO, USA
:
The Water Research Foundation
.
Little
J. C.
,
Hester
E. T.
,
Elsawah
S.
,
Filz
G. M.
,
Sandu
A.
,
Carey
C. C.
,
Iwanaga
T.
&
Jakeman
A. J.
(
2019
)
A tiered, system-of-systems modeling framework for resolving complex socio-environmental policy issues
,
Environmental Modelling & Software
,
112
,
82
94
.
Marrel
A.
,
Iooss
B.
,
Jullien
M.
,
Laurent
B.
&
Volkova
E.
(
2011
)
Global sensitivity analysis for models with spatially dependent outputs
,
Environmetrics
,
22
(
3
),
383
397
.
Mathew
E.
(
2020
)
Swarm Intelligence for Intelligent Transport Systems: Opportunities and Challenges
.
Ch. 7, pp. 131–145. Amsterdam, The Netherlands: Elsevier. https://doi.org/10.1016/B978-0-12-818287-1.00013-9
.
Mcphearson
T.
,
Cook
E. M.
,
Berbés-Blázquez
M.
,
Cheng
C.
,
Grimm
N. B.
,
Andersson
E.
,
Barbosa
O.
,
Chandler
D. G.
,
Chang
H.
,
Chester
M. V.
,
Childers
D. L.
,
Elser
S. R.
,
Frantzeskaki
N.
,
Grabowski
Z.
,
Groffman
P.
,
Hale
R. L.
,
Iwaniec
D. M.
,
Kabisch
N.
,
Kennedy
C.
,
Markolf
S. A.
,
Matsler
A. M.
,
Mcphillips
L. E.
,
Miller
T. R.
,
Muñoz-Erickson
T. A.
,
Rosi
E.
&
Troxler
T. G.
(
2022
)
A social- ecological-technological systems framework for urban ecosystem services
,
One Earth
,
5
(
5
),
505
518
.
Molinos-Senante
M.
,
Hernandez-Sancho
F.
&
Sala-Garrido
R.
(
2013
)
Cost modeling for sludge and waste management from wastewater treatment plants: an empirical approach for Spain
,
Desalination and Water Treatment
,
51
(
28–30
),
5414
5420
.
Morton
L. W.
&
Padgitt
S.
(
2005
)
Selecting socio-economic metrics for watershed management
,
Environmental Monitoring and Assessment
,
103
,
83
98
.
Ostojin
S.
,
Mounce
S. R.
&
Boxall
J. B.
(
2011
)
An artificial intelligence approach for optimizing pumping in sewer systems
,
Journal of Hydroinformatics
,
13
(
3
),
295
306
.
https://doi.org/10.2166/HYDRO.2011.059
.
Parsaie
A.
&
Haghiabi
A. H.
(
2017
)
Computational modeling of pollution transmission in rivers
,
Applied Water Science
,
7
,
1213
1222
.
Rinaldi
S. M.
,
Peerenboom
J. P.
&
Kelly
T. K.
(
2001
)
Identifying, understanding, and analyzing critical infrastructure interdependencies
,
IEEE Control Systems
,
21
(
6
),
11
25
.
Robles-Velasco
A.
,
Cortés
P.
,
Muñuzuri
J.
&
Onieva
L.
(
2021
)
Estimation of a logistic regression model by a genetic algorithm to predict pipe failures in sewer networks
,
OR Spectrum
, 43 (3), 759–776.
Saltelli
A.
(
1999
)
Sensitivity analysis: could better methods be used?
,
Journal of Geophysical Research: Atmospheres
,
104
(
D3
),
3789
3793
.
Sinha
S. K.
,
Davis
C.
,
Gardoni
P.
,
Babbar-Sebens
M.
,
Stuhr
M.
,
Huston
D.
,
Cauffman
S.
,
Williams
W. D.
,
Alanis
L. G.
,
Anand
H.
&
Vishwakarma
A.
(
2023
)
Water sector infrastructure systems resilience: a social–ecological–technical system-of-systems and whole-life approach
,
Cambridge Prisms: Water
,
1
,
e4
.
St. Clair
A. M.
&
Sinha
S.
(
2012
)
State-of-the-technology review on water pipe condition, deterioration and failure rate prediction models
,
Urban Water Journal
,
9
(
2
),
85
112
.
https://doi.org/10.1080/1573062X.2011.644566
.
Thompson
K. A.
,
Dermody
C.
,
Sinha
S.
,
Dadiala
R.
&
Vishwakarma
A.
(
2025
)
Phase II demonstration: designing sensor networks and locations on an urban water sewershed scale with big data management and analytics (RFP 4797)
,
Water Research Foundation
. Available at: https://www.waterrf.org/resource/designing-sensor-networks-and-locations-urban-water-sewershed-scale-big-data-management.
Torregrossa
D.
,
Leopold
U.
,
Hernández-Sancho
F.
&
Hansen
J.
(
2018
)
Machine learning for energy cost modelling in wastewater treatment plants
,
Journal of Environmental Management
,
223
,
1061
1067
.
Yan
J. M.
&
Vairavamoorthy
K.
(
2003
)
Fuzzy approach for pipe condition assessment
. In: Najafi, M. (ed.)
New Pipeline Technologies, Security, and Safety
, Reston, VA, USA: ASCE, pp.
466
476
.
Available at: https://doi.org/10.1061/40690(2003)11.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).