ABSTRACT
Ageing infrastructure, increasing frequency and intensity of extreme events due to climate change, and increasing population demand have created various stresses on wastewater and stormwater infrastructure, which has led to frequent cases of combined and sanitary sewer overflows (CSOs and SSOs), among other issues. This has exacerbated the impact of sewershed management on society and the environment. With the advent of efficient sensory technologies, higher processing power and accessibility of advanced mathematical modeling techniques, it has become possible to create intelligent digital systems for sewersheds in the United States. This study proposes a holistic framework with best practices to improve data governance and model development at a sewershed scale and support the digital transformation of wastewater utilities in the United States. This study also presents the results from the questionnaire sent out to wastewater utilities in the United States to verify and evaluate the framework and the recommended steps.
HIGHLIGHTS
Guidelines for the digital transformation of wastewater utilities are proposed.
A sewershed-scale framework for intelligent water systems is presented.
Data governance practices for intelligent water systems are discussed.
Modeling and data analytics for intelligent water systems are discussed.
The requirement for a holistic system-of-systems perspective for wastewater management is highlighted.
INTRODUCTION
Wastewater infrastructure, a cornerstone of maintaining essential hygiene standards for a healthy society, plays a pivotal role in fostering economic growth, societal progress, and environmental improvements within communities. However, these systems are faced with challenges stemming from natural factors such as ageing infrastructure and extreme events, resulting in an uptick in incidents involving the overflow of both combined and sanitary sewers, known as combined sewer overflows (CSOs) and sanitary sewer overflows (SSOs), which can lead to surface and groundwater contamination (Liggett et al. 2018). To address these challenges, it is imperative to enhance the operational and managerial aspects of wastewater infrastructure to mitigate the adverse impacts of CSOs and SSOs on society and the environment. In the United States, wastewater utilities are increasingly embracing digital transformation initiatives to bolster the efficiency and reliability of their wastewater systems (Kapelan et al. 2020). Cutting-edge modeling techniques, increased computational capabilities, and the use of big data can be harnessed to innovate and revamp the oversight of sewershed systems, ushering in new approaches to sewershed management.
Wastewater utilities require a well-structured blueprint with clear, step-by-step directives to ensure the reliable implementation of available technologies. An intelligent water system framework, titled ‘Intelligent Water Infrastructure Systems Engineering (iWISE)’ has been developed as part of a project funded by The Water Research Foundation and it was developed through a collaborative effort involving Jacobs Engineering and the Sustainable Water Infrastructure Management (SWIM) Lab at Virginia Tech.
This paper concentrates on two key building blocks of the iWISE framework, which hold paramount importance in the development of an intelligent water system: data governance and modeling and analytics. In developing effective models for wastewater management, data governance, and robust modeling practices are fundamental. Data governance ensures accuracy, consistency, and accessibility for data, which are essential for constructing reliable models (Abdallah & Rosenberg 2019).
These building blocks provide the foundation for the digital transformation of wastewater utilities. The framework was developed based on a comprehensive literature and practice review. It provides wastewater utilities with the means to improve the efficiency of their operations at the sewershed scale, with the data governance modeling and analytics components.
LITERATURE REVIEW
In order to understand the need for a ‘Smart’ or ‘Intelligent’ ecosystem for wastewater infrastructure, it is important to introduce the concepts of Smart Cities, Smart Electric Grids, and Intelligent Transportation Systems. The concept of ‘Smart’ or ‘Intelligent’ approaches to managing infrastructure may have its origins in the Smart Growth movement of the late 1990s, which advocated new policies for urban planning. The term ‘Smart City’ is defined as a city in which Information and Communication Technology (ICT) is merged with traditional infrastructures, coordinated, and integrated using new digital technologies (Batty et al. 2012). The phrase ‘Smart’ has been adopted since 2005 by several technology companies for the application of complex information systems to integrate the operation of urban infrastructure and services such as buildings, transportation, electrical and water distribution, and public safety. It has since evolved to mean almost any form of technology-based innovation in the planning, development, and operation of infrastructure. There have been significant developments in the application of similar frameworks to other areas of infrastructure like electricity and transportation services. The two-way flow of electricity and data that is the essential characteristic of a smart grid enables to feed of information and data to the various stakeholders in the power sector which can be analyzed to optimize the grid, foresee potential issues, react faster when challenges arise, and build new capacities and services as the power landscape is changing (Ali & Choi 2020). Intelligent Transportation Systems is an advanced application that aims to provide innovative services relating to different modes of transport and traffic management and enable users to be better informed and safer, more coordinated, and a ‘smarter’ use of transport networks. From each of these applications, we can understand that intelligence is not just the application of technologies to get an output, but rather it leverages the use of these technologies to achieve a greater goal (Mathew 2020).
Intelligent water systems
Various types of water, including drinking, waste, storm, industrial, agricultural, and environmental water, along with water for energy and agriculture, are currently managed separately, despite their reliance on the natural water cycle. Stakeholders in natural, built, and social environments are now advocating for more coordinated and less isolated water management and governance. If we allow isolated approaches to persist in the governance of water systems, unaddressed vulnerabilities will increasingly threaten both local and global water security. The system-of-systems (SoS) approach for water governance integrates complex, interdependent water management problems across the natural, built, and social environments of water. A SoS is an assemblage of components that can individually be regarded as a system. SoS is more than the sum of constituents as it possesses emergent properties that stem from interactions between its component systems and dynamic environments.
The ‘Anthropocene’ represents the era where human influence significantly impacts Earth's systems. Coping with its intricate socio-environmental challenges becomes progressively challenging, as it's hard to foresee unintended consequences and align goals across interconnected systems while self-governing in this complex epoch (Little et al. 2019). Incorporating a social-ecological-technical systems (SETS) perspective into the adaptive management process requires a conceptualization of coupled human and natural systems and an assessment of underlying interdependencies among social drivers, institutions, and accrued benefits (McPhearson et al. 2022). Water SoSs can be categorized into natural, built, and social sub-systems, where the natural subsystem comprises all naturally occurring resources within the sewershed like water, land, and climate, the built sub-system comprises all man-made infrastructure assets managed by wastewater utilities, and the social subsystem comprises of the factors influencing societal and economic dimensions of the sewershed. These subsystems have individual components that are highly interdependent. For example, extreme climate events in the natural environment can impose stresses on the built environment, and the service demands that come from urbanization from the built and socio-economic environments of a sewershed can, in turn, impose stresses on the natural environment. Socio-environmental challenges are highly complex, necessitating a holistic modeling approach that considers the interconnected nature of each sub-system.
For each of these sub-systems, data collection is a crucial component of sewershed management for understanding the characteristics and complexity of the various interdependent components and developing models that can provide meaningful insights for decision-making. It is important to have a comprehensive understanding of all the parameters that have an impact on the insights derived through data analysis and modeling of a sewershed component (Angkasuwansiri & Sinha 2014). Identifying the critical parameters required for all components of the sewershed system also helps to identify the sources. Data sources for sewershed-related components can include utility instrumentation data (like CCTV, infrared, temperature, or flow sensors, among others), utility operational data (including GIS, SCADA, and inventory records, among others) and external data (from EPA, USGS, NOAA, among others). Data quality plays a crucial role in determining data reliability and directly impacts the performance and accuracy of modeling. In the context of sewer asset management, the quality of data has a direct impact on assessing the condition of assets and their stocks, and inaccuracies in data can impede effective asset management and can lead to erroneous assessments. Precision and comprehensiveness of data directly influence the efficacy and dependability of asset management models. Moreover, adjusting model parameters can help mitigate data limitations, thereby enhancing overall asset management results, and underscoring the interconnectedness between data quality and model parameters (Ahmadi 2014).
Wastewater utilities frequently employ mathematical models as a tool to analyse and derive insights for wastewater treatment and management. These models serve several functions, including forecasting performance, design and optimization of treatment plants, risk assessments, and renewal optimization. Various types of models, including physical, empirical, statistical, and simulation models, have been used to make predictions about future performance and optimize the operation of wastewater systems. The majority of the models examined in this research can be generally classified as either deterministic or probabilistic models. Deterministic models are generally used where relationships between components are certain and they produce a definite output, and a common challenge with such models is that the applicability of such models is limited to specific locations, whereas probabilistic models produce a range of outputs and account for uncertainty, increasing the range of applicability for such models. However, probabilistic models require extensive data to be able to effectively predict the probability of an event occurring (St. Clair & Sinha et al. 2012). The modeling methods used in the specific domains of failure prediction, performance estimation, and risk evaluation for wastewater infrastructure assets can be categorized as stochastic (Koo & Ariaratnam 2006; Robles-Velasco et al. 2021). Mathematical models, simulation modeling, and neural networks have been used to predict contaminant flow and pollutant fate and transport in rivers to manage downstream water quality (Kachiashvili et al. 2007; Parsaie & Haghiabi 2017). Various studies have used artificial intelligence (AI), machine learning (ML), and mathematical and statistical models for the modeling of sewer and stormwater pipes. AI techniques like fuzzy logic have been used for the performance prediction of gravity pipes and force mains while dealing with uncertainty in data (Yan & Vairavamoorthy 2003). Neural networks have also been widely used for the condition assessment of sewer pipes and assessing the importance of certain parameters on structural performance (Khan et al. 2010). Fuzzy logic and neural networks have also been used for the optimal scheduling time problem for pumping stations (Ostojin et al. 2011). Lifecycle assessments have also been employed to assess the environmental impacts of wastewater treatment systems and pumps. Such models are an efficient approach to quantifying the environmental impacts of different assets in the built environment of sewersheds (Jocanovic et al. 2019). Contemporary computing methods have found application in the field of treatment plant cost modeling, aiding in the evaluation of cost frameworks and the formulation of cost-saving strategies. ML methods have been used to construct an effective cost model for wastewater treatment facilities, considering energy consumption and water quality indicators (Torregrossa et al. 2018). Statistical modeling has also been utilized to create a cost function for sewage sludge and waste management, enhancing comprehension of cost structures and identifying potential savings by reducing sludge production (Molinos-Senante et al. 2013). The study identified a gap in the existing body of knowledge regarding the impact of socio-economic factors on the characteristics and management of sewersheds. Previous research has mainly centered on the consequences of societal actions and community infrastructure (Morton & Padgitt 2005). Nevertheless, the attention given to government policies and the repercussions of regulations on the evolution of sewersheds has been limited. Model evaluation, verification, and validation are crucial steps to ensure the reliability of the models developed. Sensitivity analysis has been extensively used in literature to identify the impact of input parameters on the model output (Saltelli 1999; Marrel et al. 2011).
PRACTICE REVIEW
Through the literature review, prominent data governance and modeling applications for the sewershed system were identified. The aim of the practice review is to capture the data governance and modeling practices of wastewater utilities. Interviews were conducted and case studies were developed to capture a comprehensive understanding of the real-world practices of utilities, including Houston Water, the New York City Department of Environmental Protection, and Hampton Roads Sanitation District (HRSD) in Southeast Virginia.
Houston water
Houston Water's implementation plans for intelligent water practices involve innovative approaches like collaboration with academia and other utilities, progressive operations, and in-house planning and analytics among others, to solve challenges including ageing infrastructure, meeting stakeholder expectations, regulatory compliance, and climate change. They have a holistic data collection methodology that enables them to collect critical parameters for components from the natural, built, and social sub-systems. They have also established a data pipeline for both manual and automated data preparation, that enables data collection, quality checks, and data analysis. Their modeling practices include (i) rating SSO risk levels for optimizing preventative cleaning; (ii) optimized sensor placement, (iii) predictive analytics for SSO detection; (iv) preventative risk-based asset management; and (v) defect detection in pipes using AI. The risk-based asset management approach was implemented to force main renewal prioritization. Their methodology followed clearly outlined steps, starting with identifying individual assets for data collection. The factors impacting renewal were identified and their relative importance was determined based on a weighted score. Risk scores were determined for individual criteria and the asset ranking was used to prioritize renewal plans.
New York City Department of Environmental Protection (NYDEP)
The NYDEP's Bureau of Water Supply (BWS) Data Governance Program addresses various issues regarding data collection, data quality, quantity, storage, and regulatory requirements. NYDEP defines data governance as an organizing strategy for the creation of policies and for managing data within an organization. These frameworks and defined protocols for data governance are crucial to ensure data quality and reduce risk associated with data. At present, data management within the BWS operates independently within each work unit or directorate, frequently in coordination with either the Bureau of Business Information Technology or at the Agency level. The introduction of a governance program at the Bureau level seeks to build upon existing effective practices within directorates while strengthening areas in need of enhancement. The swift technological progress within the Bureau has resulted in the emergence of scattered data repositories and, in certain instances, duplicate data across various operational domains. Introducing an effective data governance structure is imperative to set higher benchmarks for data generation, utilization, and management. Unlike certain sectors that adopt elaborate data governance frameworks driven by regulations, BWS has devised a streamlined model that provides adequate governance to improve data accessibility, functionality, and reliability. The objectives of this data governance project are primarily to provide a good understanding of their collected data, better documentation of data utilization, improve data access and reliability and ultimately advance BWS' readiness for cloud architecture adoption to centralize and standardize data governance across the enterprise.
Hampton roads sanitation district
HRSD has been actively working on several projects toward developing a smart water system. Their Sustainable Water Initiative for Tomorrow (SWIFT) program, which involves taking highly treated water and putting it through additional rounds of advanced water treatment and then added to the Potomac aquifer, integrates building information modeling (BIM) with geographic information systems (GIS) to create a detailed digital model or digital twin. This technology enables operational intelligence with 3D data for newly constructed facilities and presently the team is collecting and visualizing real-time sensor data.
HRSD uses sensors for condition assessment of irrigation wells to measure water quality changes. This data collected is integrated with a GIS map, enabling continuous monitoring. They also use sensors for monitoring pump performance and pressure levels, and utility professionals use GIS to inspect each asset. The generated data contributes to an operational performance indicator (PI) dashboard, integrating a map, sensor data, and graphical representations to monitor various variables like flow, pressure, and rainfall.
HRSD uses the MIKE URBAN software to improve their regional hydraulic model (RHM), which facilitates system capacity assessment, facility dimension determination, and flow routing decisions.
With the advent of advanced data governance and modeling techniques and applications for wastewater management, as highlighted in the literature review, many water utilities are implementing such techniques to support their digital transformation. However, through this practice review, it can be highlighted that these efforts are limited from a SoS perspective, failing to account for the interdependencies between subsystems.
iWISE FRAMEWORK
There are many definitions of intelligent water systems that exist in literature which focus on technologies and techniques used and rarely consider the human aspect in the application of these technologies. The iWISE framework proposes a new definition of intelligent water systems based on the comprehensive literature and practice review. The definition is as follows – ‘An Intelligent Water System integrates and derives information from a cyber-space, physical-space and social-space based on improved water system-of-systems understanding at sewershed scale implementing data collection, database management, modeling techniques, decision support paradigms, and intelligent workforce skills to support risk-based decision making and optimize lifecycle management of one water (drinking water, wastewater, stormwater and clean water) that are equitable, affordable, efficient, reliable, sustainable, and resilient for healthy and thriving communities.’ This framework relies on data generated throughout it's lifecycle by people, process, and technology. The application of intelligence to water systems is not limited to just sensing technologies and advanced modeling techniques. Rather, the application of intelligence should follow the entirety of the data lifecycle from the point of data collection to the point where it becomes useful knowledge for decision support with humans in the loop. There are many challenges associated with implementing intelligent water systems and the iWISE framework, like other disruptive technologies, requires a cultural shift in the utility for better adoption of the utility-wide changes, more focus on enhancing resiliency of all digital systems, building trust in the new proposed methods and technologies through decision support and visualization tools, and a diverse workforce with the necessary digital skills for solving complex sewershed issues by leveraging advanced computational methods (Thompson et al. 2025).
The proposed iWISE Framework for implementing Intelligent Water Systems at the sewershed scale is based on building blocks and provides recommended approaches to help implement each building block and support the transition from regular operations to iWISE at the sewershed scale. The structure of the framework takes a systems approach that considers the complex nature of a sewershed system and is inspired by the SETS framework (McPhearson et al. 2022), which highlights the importance of coordinating natural, built, and social sub-systems within the sewershed system and understanding their interactions and the factors that affect ecosystem services. The concept of considering the interactions between the SETS lays a foundation for the holistic systems approach that is used to develop the iWISE framework at the sewershed scale (Sinha et al. 2023).
The framework has been developed for implementation at the sewershed scale, and this system is divided into three sub-systems – natural, built, and social. The natural sub-system consists of all naturally occurring resources within the sewershed like water, land, and climate. The built sub-system includes components like wastewater, stormwater, and household, commercial, and industrial infrastructure. The social subsystem is categorized into communities, policies and regulations, and finance and economics.
This paper presents the proposed methods for the development of the following building blocks of the iWISE framework:
Data Governance
Data Analytics
Data governance building block
Identify data parameters: A comprehensive list of data parameters from each of the sub-systems enables a better understanding of the sewershed and results in a list of parameters required to perform analysis on different sub-systems. Table 1 describes the main categories from which parameters must be collected within the natural, built, and social subsystems. Specific data parameters and their units within each of these categories must be collected to then be utilized for developing the models as described in the data analytics building block (Thompson et al. 2025).
Collect data parameters: The iWISE framework recommends the characterization of data sources based on the point of origin as it helps track the data for verification, set privacy levels, develop metadata, compare performance, and set standards for data generation systems in terms of data quality. The categories of data source and the corresponding metadata information are shown in Figure 2. This figure shows how data from different sources in a sewershed can be categorized and what type of metadata should be collected for each type of source.
Check data parameters: The iWISE framework proposes the steps shown in Figure 3 for developing holistic DQPs that can be applied to different sub-system data management tasks in Intelligent Water Systems at a sewershed scale. Utilities should develop data quality protocols (DQPs) to evaluate the accuracy and precision of collected data.
Data parameter categories for sewershed sub-system components
Sub-system . | Category . | Data parameter category . |
---|---|---|
Natural | Water |
|
Land |
| |
Climate |
| |
Built | Household/commercial/industrial, wastewater, and stormwater infrastructure |
|
| ||
| ||
| ||
| ||
Social | Community |
|
Laws and Policies |
| |
Finance and Economic |
|
Sub-system . | Category . | Data parameter category . |
---|---|---|
Natural | Water |
|
Land |
| |
Climate |
| |
Built | Household/commercial/industrial, wastewater, and stormwater infrastructure |
|
| ||
| ||
| ||
| ||
Social | Community |
|
Laws and Policies |
| |
Finance and Economic |
|
The DQPs shown in Figure 3 can be broadly classified into three main categories – project planning, data checking, and review and improvement of overall DQPs.
The project planning step comprises tasks like establishing a data quality-checking team along with identifying data quality objectives, any training requirements, or documentation protocols.
The data-checking step consists of aspects related to checking data quality, data quantity, and data preprocessing. The quality of data can be determined based on five major categories – data source, data integrity, data timeliness, data relevance, and data reliability. The quality of data depends on the reliability of the data source. Data integrity refers to the accuracy and completeness of data maintained across different formats. If data collected is not recent, it can lead to inaccuracies in the insights derived from data analysis. Data relevance refers to how accurate and relevant the data being collected is to the use case. To check for data reliability, data should be categorized into three main categories ranked in order of reliability – direct measurement, derived indirectly, and educated guesses.
Models developed by utilities perform well when they are trained with large amounts of data. The 5 Vs of big data should be associated with the quantity check of collected data – volume, velocity, variety, veracity, and value. Volume refers to having enough samples, features, and quantity that will help the model learn patterns in the data better and perform more efficiently. Velocity is an important factor as a faster rate of data collection leads to more timely model training. A variety of data is crucial to ensure that the dataset is representative of all types of data. Veracity refers to the inconsistency and uncertainty observed in data. The value of the data and the insights it will ultimately provide should be considered while developing models.
Data preprocessing involves steps to make sure the data is ready before model development. It involves steps like data cleaning, data transformation, data balancing, and data normalization. Data cleaning refers to techniques like removing rows or columns with missing data or using mean and median to fill in missing values to handle missing data values. Data transformation refers to the conversion of categorical variables into numerical format. Data balancing refers to the use of techniques like undersampling or oversampling to address class imbalances in the data to avoid an imbalance in the number of samples in each class of datasets. Data normalization is used to organize and standardize data collected from multiple sources that may have inconsistent formats.
The ‘Review and Improvement’ steps in the DQPs involve periodically assessing and improving the DQPs based on feedback, changing regulations, and advancements in technology.
Data analytics building block
1. Identify systems modeling dimensions: Different modeling methods are used to analyse sewershed operations and use the output to facilitate informed decision-making. These models also encapsulate the interdependencies among distinct components through model parameters, weights, and regulations. The typical modeling areas should be identified for all three sub-systems (natural, built, and social) to effectively model the entire sewershed, and have been shown in Table 2.
2. Identify systems modeling techniques: Modeling can be performed in a variety of ways depending on the problem at hand. Each use case can be represented differently in the form of different techniques. Figure 4 demonstrates that a modeling strategy must start with identifying the various techniques available across different dimensions. These dimensions include categories of insights offered, level of analysis, system component categories, types of decisions, mathematical techniques used, and analysis techniques used. Mathematical techniques used for modeling in wastewater management include deterministic, probabilistic, and AI models, where deterministic techniques are formulaic models where the final output is completely determined based on the input parameters, whereas probabilistic models produce a range of outputs and account for uncertainty.
Modeling focus areas for sewershed sub-system components
Sub-system . | Category . | Modeling focus area . |
---|---|---|
Natural | Water |
|
Land |
| |
Climate |
| |
Built | Household/commercial/industrial, wastewater, and stormwater infrastructure |
|
| ||
| ||
| ||
Social | Community |
|
Laws and policies |
| |
Finance and economic |
|
Sub-system . | Category . | Modeling focus area . |
---|---|---|
Natural | Water |
|
Land |
| |
Climate |
| |
Built | Household/commercial/industrial, wastewater, and stormwater infrastructure |
|
| ||
| ||
| ||
Social | Community |
|
Laws and policies |
| |
Finance and economic |
|
3. Identify V&V strategies: The V&V protocols are used to systematically test and understand the model's internal logic, quantify uncertainties, and ensure model reliability and robustness. Verification is done by checking the model logic and parametric influence on the model output. Validation is the process of evaluating the model performance through ground truth comparison, competing methods, and statistical and graphical tests. The effectiveness of models relies heavily on the quality of the data and knowledge used to construct them. They cannot be more precise than the errors present in the input and observed data. Also, since the measurements used to describe and assess a model vary depending on the specific problem, there is not one universally accepted statistic or test to determine if a model is validated. Typically, it requires a mix of methods like expert review, visual comparisons, and statistical tests, using both quantitative and qualitative measures. This approach is also known as the ‘weight-of-evidence’ approach. The steps for model V&V are shown in Figure 5.
iWISE framework verification and validation
The iWISE framework building blocks were verified and validated through a series of workshops, brainstorming sessions, and feedback from technical experts and finally validated through a set of questionnaires shared with utilities across the United States to gain insight into their current practices and assess their readiness for digital transformation.
Verification
The verification of the proposed framework was performed by involving and consulting with utilities and domain experts from the water sector across various organizations. This helped verify the content of the framework and ensure that the building blocks are appropriate and can help guide utilities toward digital transformation. This process was done in three steps:
1. Workshops with large utilities: A series of two-hour workshops with large-scale utilities were conducted to get critical feedback on the structure of the framework, individual building blocks, and the content within each building block.
2. Brainstorming session with Jacobs Consulting: The building blocks were reviewed and revised based on feedback from a two-day brainstorming session with Jacobs Engineering, a large consulting firm in the water sector.
3. Comments and feedback from the iWIN Committee: The Intelligent Water Infrastructure Network (iWIN) committee comprises technical domain experts from Oak Ridge National Lab (ORNL), Jacobs, Arcadis, DC Water, City of Houston, Metropolitan Water Reclamation District of Greater Chicago, Clean Water Services, and Virginia Tech from the United States, Anglian Water from the United Kingdom, and Metro Vancouver from Canada. We collected input from the committee and incorporated it into the development of the building blocks.
Validation
Following the survey methodology adopted by the Water Research Foundation for assessing the current and future states of the use of advanced sensors in urban sewersheds (Liggett et al. 2018), to validate the framework, layers, building blocks and the individual steps proposed, questionnaires were prepared with questions that address each of the steps discussed in each building block. A set of questionnaires was developed to explore the current intelligent water practices and willingness to implement the proposed building blocks of large, medium, and small utilities. The questions were prepared based on all the building blocks of iWISE and were sent to 100+ utilities across the United States, Canada, and the United Kingdom. The questionnaires captured that some of the utilities are already applying intelligent water practices and are at various stages of implementation across small, medium, and large utilities, and collectively follow similar steps as proposed in the iWISE framework.
QUESTIONNAIRE AND INTERVIEW RESULTS AND DISCUSSION
The questionnaires were developed based on the overall framework for assessing water and wastewater utilities' current level of digital maturity and willingness to adopt intelligent water practices. This section discusses the results from questionnaire and interview responses on the data governance and data analytics building blocks. The discussion is presented as a mix of the responses received from the questionnaire and the information captured during the workshop-style interviews.
To determine the extent of iWISE implementation, utilities were classified into three groups: small, medium, and large utilities. Water and wastewater systems in the United States exhibit a range of characteristics and issues, stemming from variations in customer base, wastewater volume, infrastructure complexity, and regulatory supervision.
Small water utilities typically serve populations of less than 50,000 and are often situated in rural areas. They tend to have simpler treatment and distribution systems, featuring fewer wells, pumps, and storage tanks. Distinct challenges for small utilities stem from factors like their geographic location, workforce availability, and financial funding mechanisms. These utilities can be managed by local governments, private companies, or cooperatives, and they often encounter difficulties in securing funding and maintaining their infrastructure. Small utilities see themselves in the basic to preliminary stage of the ‘digital transformation’ journey where they are exploring options and developing strategies for implementing some aspects of data governance and modeling. Regarding data collection, they focus solely on gathering process-specific data mandated by regulations. They primarily collect data from the systems within their service area, and any data collected outside is project-specific. Their analysis and data collection adhere to daily regulatory demands, lacking long-term applications, and are limited to analyzing individual built components to inform decisions regarding specific assets. While some respondents from small utilities do not have established DQPs, they are implementing preliminary methods to ensure data quality, such as identifying data quality objectives, instrument calibration, regular testing and maintenance, and field and lab-generated data quality control. Although smaller utilities have limited resources, the questionnaire findings show that preliminary DQPs can still be achieved. The questionnaire showed similar results for medium utilities.
Medium-scale utilities generally cater to populations ranging from 50,000 to 250,000 and are situated in suburban areas or smaller cities. These utilities possess a more extensive array of water and wastewater infrastructure compared to smaller counterparts, encompassing multiple water sources, treatment plants, and storage facilities. Typically managed by public entities like municipal or county governments, medium-sized utilities are subject to regulatory supervision and obligatory reporting.
Medium-scale water utilities, often located in suburban or smaller city areas, may have numerous small utilities nearby in towns and rural regions. Consequently, they understand the significance of gathering data related to external natural systems, particularly for tasks such as monitoring water contamination levels to safeguard downstream areas. Their data collection is relatively preliminary, focusing on what's necessary for analyzing assets within their service area. Their databases typically operate in isolation and lack automation; data integration is a manual process needed for specific applications. They lack automated data lifecycles and interoperability within their information systems. Their decision-making supports both short-term and long-term needs, providing insights into sewershed performance and individual assets, and aiding strategic planning for daily operations and long-term sustainability. They acknowledge the importance of systematic technology utilization, albeit without the economies of scale enjoyed by larger utilities, which can pose challenges in financing capital improvements and iWISE projects.
Data collected from the natural, built, and social sub-system components by large utilities.
Data collected from the natural, built, and social sub-system components by large utilities.
Models developed for different sub-system focus areas by large utilities.
The utilities were also asked about their various dimensions of modeling, mathematical techniques used, subsystems considered, and types of insights their models offer. Large utilities use several types of analytical models to understand their natural, built, and social subsystems and support their decision-making process. Most large utilities carry out component-level analysis, which supports decisions to manage specific problems. The analysis conducted by most utilities aids in tactical decision-making, involving the identification of trouble spots and areas of concern to discern patterns in datasets. It also supports operational decision-making, which pertains to day-to-day choices made at the asset level. Most of their models are probabilistic, indicating that their analyses facilitate predictive and dynamic modeling, which in turn can assist in making real-time decisions.
CONCLUSIONS AND RECOMMENDATIONS
This study developed the specific steps and guidelines for developing the data governance modeling and analytics aspects of wastewater management at the sewershed scale for utilities moving toward digital transformation. The entire iWISE framework offers guidelines for intelligent water system implementation based on system understanding of sewersheds, and the challenges utilities face when making the transition towards holistic and digitally driven management practices. Based on the literature review and practice review for understanding intelligent water practices, the framework was developed, and data governance and modeling practices were outlined. Based on the questionnaire responses and interviews conducted with domain experts, several findings were concluded.
The proposed data governance and modeling building blocks can be utilized by any wastewater utility and should begin with understanding the definition of iWISE.
Data governance and modeling practices for intelligent water systems require a SoS approach to account for interactions between the natural, built, and social sub-systems as well as their interactions across sewersheds.
Having the right parameters is essential to develop effective models that provide accurate analysis of sewershed operations, and data needs to be collected from all components and sub-systems within the sewershed to ensure holistic data collection.
Knowledge about data sources is crucial to ensure reliability, ease of understanding, identifying relationality among datasets, and developing data integration and preprocessing rules.
Modeling for iWISE at the sewershed scale requires understanding the various dimensions of modeling involved and utilities must have a range of modeling techniques that can be used to derive insights for all components and processes managed.
V&Vof models can be considered as a feedback loop to continuously calibrate and improve the model and ensure robust analytics.
ACKNOWLEDGEMENTS
The authors would like to express their gratitude to The Water Research Foundation for funding this study, Jacobs for feedback and review of the study material, the members of the Sustainable Water Infrastructure Management (SWIM) lab at Virginia Tech for their support and guidance, and the participating water utilities for their valuable input and response to this study.
FUNDING
This study was funded by the Water Research Foundation (WRF project 4797).
AUTHOR CONTRIBUTIONS
R.D. and S.S. conceptualized the study, and wrote, reviewed, and edited the article.
DATA AVAILABILITY STATEMENT
All relevant data are available from an online repository or repositories: https://docs.google.com/spreadsheets/d/1kc4aB2xrrg3Fn-B2gmBS2-X048_gDsvEobcp8RCly3M/edit?usp=sharing.
CONFLICT OF INTEREST
The authors declare there is no conflict.