Data-driven asset management in urban water pipe networks: a proposed conceptual framework

Analytical tools used in infrastructure asset management of urban water pipe networks are reliant on asset data. Traditionally, data required by analytical tools has not been collected by most water utilities because it has not been needed. The data that is collected might be characterised by low availability, integrity and consistency. A process is required to support water utilities in assessing the accuracy and completeness of their current data management approach and de ﬁ ning improvement pathways in relation to their objectives. This study pro-poses a framework to enable increased data-driven asset management in pipe networks. The theoretical basis of the framework was a literature review of data management for pipe network asset management and its link to the coherence of set objectives. A survey to identify the current state of data management practice and challenges of asset management implementation in ﬁ ve Swedish water utilities and three focus group workshops with the same utilities was carried out. The main ﬁ ndings of this research were that the quality of pipe network datasets and lack of interoperability between asset management tools was a driver for creating data silos between asset management levels, which may hinder the implementation of data-driven asset management. Furthermore, these ﬁ ndings formed the basis for the proposed conceptual framework. The suggested framework aims to support the selection, development and adoption of improvement pathways to enable increased data-driven asset management in municipal pipe networks. Results from a preliminary application of the proposed framework are also presented.


INTRODUCTION
S & W Projects) (Saegrov 2015), as well as Hafskjold (2010) and Yang et al. (2018), highlighted insufficient DQ and uncertainties in data collection and used as constraints limiting IAM implementation. Accessibility (unavailability of data), consistency (aggregation of data), interpretability, timeliness, accuracy, data quantity, integration and interoperability factors were reported which affected the fitness of data for set objectives (Koronios et al. 2005;Halfawy 2008b;Woodall et al. 2014;Parlikad & Jafari 2016;Rokstad et al. 2016;Carriço et al. 2020). There follows below a more detailed description of aspects reported to influence objective-driven data IAM. Rokstad (2012) emphasised that data collection and recording is a primary source of degradation of DQ. Rokstad et al. (2016) further indicated that IAM data collection should be based on the intended use of the data and that data needs to support the strategic, tactical and operational levels of IAM. However, these levels are significantly different in terms of the degree of detail and required accuracy of the data, but all need to be aligned towards the water utility's strategic objectives, which is often tricky to achieve (Flintsch & Bryant 2009). Therefore, the data collection strategies and requirements have to consider how the collected data will be used at the various decision levels. Also, a lot of the data required by the new decision-support tools have traditionally not been collected by the municipalities, thus often making data availability another impeding factor with respect to the capabilities of IAM tools (Halfawy and Figueroa 2006;Tscheikner-Gratl et al. 2020). Also, data representativeness and survival bias during data collection have been reported to influence data use by IAM tools such as deterioration models (Laakso 2020;Tscheikner-Gratl et al. 2020).

Data structure
Angkasuwansiri & Sinha (2018) proposed developing a standard data structure for sewer network AM, based on known failure modes and mechanisms. Tscheikner-Gratl et al. (2020) recommended a similar approach in their state-of-the-art review on sewer AM. These standard data structures enable the development of specific minimum datasets for reliable IAM analytics. Angkasuwansiri & Sinha (2018) further proposed a standard data structure for sewer networks based on municipal size (very small, small, medium and large) and data availability. The proposed data structure included 5-51 parameters, depending on the size of the municipality. These parameters were grouped into five classes based on sewer network characteristics: physical/structural, operational/functional, environmental, financial and others. Such a clearly defined and proposed data structure has not been found in the literature for water and stormwater pipe networks. However, water utilities may have locally defined data structures. Since obtaining new datasets may not always be easy, it may be more efficient to map available data to informational benefits based on IAM tools in use.

Integration and interoperability
During the last two decades, significant advances have been made in developing solutions towards the integration and interoperability of municipal IAM tools, which include the development of standards, frameworks and middleware (Halfawy et al. 2002(Halfawy et al. , 2003(Halfawy et al. , 2006Halfaway et al. 2006;Beck et al. 2007Beck et al. , 2008Vanier 2014;Carriço et al. 2020). The complexity of data integration and interoperability (levels at which the data can be operated as a single entity) is especially pronounced at data storage and structure levels. Much inefficiency has been attributed to the use of inconsistent data models and formats across IAM tools and diverse new data sources (Tscheikner-Gratl et al. 2020). This was reported to widen the already existing interoperability gap between municipal IAM tools (Gay & Sinha 2014). Despite the link between DQ and interoperability, a lack of interaction between DQ frameworks and interoperability still exists. In addition, the assessment of the interoperability of IAM tools in order to set objectives is a necessary but lacking step in IAM data management for pipe networks.
Digital advances leading to increased computational capabilities have led to increased development of real-time data collection, i.e. Internet of things, intelligent sensors and use, i.e. building information modelling, artificial intelligence and machine learning analytics, predictions, big data use, virtual and augmented reality applications. Advances have led to new data models, such as the smart sewer asset information model (Edmondson et al. 2018), data-driven decision support tools such as new/hybrid variants of the evolutionary genetic algorithms and application of graph theory (Meijer et al. 2018;Oyebode 2019;Shende & Chau 2019). However, barriers preventing water utilities from adopting such models for pipe networks are data availability and interoperability between existing data models and such new data models (Badea & Badea 2019;Garramone et al. 2020). This interoperability hindrance to digitalisation has led to the development of new interoperability standards, such as the framework of the open specification for smart cities (Hernández et al. 2020). However, it is important to acknowledge that it may not be necessary for all systems or data to be interoperable. Therefore, a need exists for decision support systems to assess the current state of interoperability relative to set objectives for pipe networks' performance. There have also been limited or no methods to measure and evaluate the interoperability of IAM tools for pipe networks in a systematic way (Kasunic & Anderson 2004).

Existing DQ frameworks for urban water pipe networks
Several general DQ assessment frameworks such as those described by Lee et al. (2002), Wang (1998), Pipino et al. (2002), Carlo et al. (2011) andSebastian-Coleman (2013) have been developed for DQ assessment (Cichy & Rass 2019). These frameworks provide a broad basis for data assessment and profiling. However, limited specific DQ frameworks for pipe network AM have been developed to assess municipal pipe networks' DQ relative to set objectives (Lin et al. 2006). An overview of some of these frameworks is presented in Table 1. The quantitative assessment of DQ using DQ dimensions is a limitation noticed in these frameworks. Interoperability issues, specifically synaptic, semantic and schematic heterogeneity of data for IAM as defined by Beck et al. (2008), are also not addressed by these existing frameworks.
The overall architecture of municipal databases for the IAM of pipe networks is expected to be defined with structured asset hierarchy data. Asset data needs to be grouped appropriately with pre-established links to informational benefits and aligned between IAM levels towards achieving set strategic objectives (Rokstad 2012). However, the lack of structured data, lack of standard datasets, inconsistent datasets, missing records, lack of records on past rehabilitation decisions, lack of data collection guidelines, and lack of integration and interoperability between IAM tools are considered some of the major hindrances to objective-driven data AM of municipal pipe networks (Halfaway et al. 2006;Rokstad 2012;Carriço et al. 2020).
The main objective of this study was to develop a conceptual framework including an application tool to enable increased chain management of set IAM objectives by establishing a link between IAM decision-making and data management. The methodology section highlights how this link is established by validating the literature findings using responses from an online survey and three focus group workshops. This link then formed the basis for the conceptual data-driven AM framework proposed in the discussion section, with results from a preliminary framework application. Data-driven IAM in the context of this study refers to an understanding of the data needed to achieve set goals (objectives), i.e. the right quantity and quality of data along with the appropriate level of data exchange between IAM tools to support set objectives and objective-driven data collection in IAM.

RESEARCH METHODOLOGY
The research model of Koronios et al. (2005) for the identification of DQ problems in AM, which combines both the technical, organisational and personal (TOP) perspectives approach (Mitroff & LInstone 1993) and the total DQ management (TDQM) framework (Wang 1998), was adopted and applied as the primary methodology for the study. This methodology focused on identifying evidence of data considered important for IAM analytics of municipal pipe networks from a data management perspective. It also focused on providing insights into the current data management practices with the aim of validating the literature review findings. Figure 1 presents the bottom-up approach of how both the literature review and findings from the survey and workshops have been used to develop the conceptual framework's theoretical base.  Beck et al. (2008).

Uncorrected Proof
The main advantage of this methodology is that it enables the capture of domain knowledge from practitioners in water utilities. Domain knowledge is needed to identify the challenges of IAM analytics in municipal pipe networks from a data management perspective to develop an appropriate conceptual model that reflects the reality of current practice.

Online survey
The online survey was carried out using a real-time Internet-based Delphi approach because it omits sequential rounds and reduces the drop-off rate of experts (Gnatzy et al. 2011;Garson 2014). This approach is a well-known and already applied technique for utilising an expert panel's tacit experience and judgement to find consensus or informal opinions (Mitroff & LInstone 1993).
The questionnaire for the online survey consisted of four broad sections, covering the potable water pipe network (7 data items), stormwater pipe network (9 data items), sewer network (12 data items) and combined pipe networks (13 data items) at the strategic, tactical and operational IAM levels (Supplementary data 1, section II). Each data item in the questionnaire had a 5-point Likert scale, ranging from least to most important. The choice of information types investigated in the questionnaire was based on the literature review (Table 1 of Supplementary data 1, section I) and tacit experience of practitioners.
The expert panel for this study consisted of representatives from five Swedish municipalities/water utilities with combined estimated network coverage of 22.5% of the population of Sweden. The respondents were all involved in pipe network management. The roles and responsibilities of respondents included project engineer, pipe network engineer, investigation engineer, and water and wastewater strategist.
The expert panel's responses were evaluated using measures of central tendency (mean and standard deviation) to determine consensus (Garson 2014). In this study, a consensus of high importance was considered a mean of expert panel responses ranging from 4.4 to 5.0 with a standard deviation of less than 0.5 at the strategic, tactical and operational levels ( Figure 2). Details of a consensus of medium and low importance are provided in Figure 2. The implication of a consensus was considered at two levels. At the individual information level, a consensus was considered to be achieved when the mean and standard deviation of the expert panel indicated that an item was most or least important at a specific IAM level. At the overall level, a consensus implied an alignment of data considerations between IAM levels, capturing the domain knowledge to provide insight into the current data management practices.

Focus group workshops
Three focus group workshops were held consisting of a mix of practitioners and engineers between 5 and 9 participants in each workshop from the same five municipal Swedish water utilities/water utilities that participated in the online survey. These workshops were themed around (i) network-level AM for pipe networks, (ii) project-level renewal/rehabilitation (data collection and utilisation) and (iii) data integration and interoperability. Workshop themes were further subdivided into subjects and questions. The questions considered the current state of the art of data collection, storage and analysis to support IAM activities for each subject. The response from participants was documented in a paper form. Responses were

RESULTS AND DISCUSSION
Online survey and focus group workshopsevidence of current data management practices The analysis of the responses to the questionnaire sent to five Swedish water utilities is shown in Figure 2. Data collection and use are driven by the utilities objectives/challenges, and the strategic goal of the sampled water utilities in this survey was resource-efficient, coordinated maintenance and renewal of water and sewer pipe networks through AM. Predominantly physical pipe characteristics and trench details in GIS stored in the GIS database with unique ID and codes Sparse records regarding project costs, reasons for renewal and co infra coordination at the street level mostly lacking structure Historical data is available between 2 and 10 years Lack of interoperability between tools, i.e. GIS inventory, hydraulic models, failure and maintenance database, SCADA system and customer complaints database Increased adoption of digitalisation tools (sensors), i.e. smart flow meters; however, integration and interoperability of such data remain a concern Reported challenges of data management for pipe networks are highlighted in bold.
At the strategic level, hydraulic capacity, operational failures and their consequences were considered the most important for AM of the potable water network. Simultaneously, the physical condition of pipes, construction and renewal methods, as well as customer complaints, were deemed of low importance. For the sewer network (stormwater, sewers and combined sewers), hydraulic capacity and consequences of operational failures, including flooding and overflows, were considered most important. The physical condition of pipes and environmental consequences of operational failures were of medium importance. However, the latter was considered important for the combined sewer network. Operational disturbances, customer complaints, renewal methods cost and remaining life estimation were considered of low importance.
At the tactical level, for the potable water network, hydraulic capacity and consequences of operational failures, leakages/bursts, were considered most important. Construction costs, customer complaints and exfiltration were deemed of medium importance and physical pipe condition low importance. A difference was observed between the combined network and the sewer and stormwater networks for the sewer networks. For the combined network, hydraulic capacity, consequences of operational failures, physical pipe condition, overflows and basement floodings were considered of high importance. The environmental consequences of operational failures, construction costs and operational disturbances, i.e. blockages and infiltration, were regarded as having medium importance. Customer complaints, estimation of remaining life and flooding were considered least important. By contrast, the physical pipe condition and basement floodings were considered most important for stormwater and sewer networks. Hydraulic capacity, construction costs and all consequences of operational failures were considered to be of medium importance, while customer complaints and estimation of the remaining life of pipes were of low importance.
At the operational level, for both the potable water and sewer network, construction and renewal costs, as well as operations failures (leakages, burst, exfiltration, overflows, blockages), were considered most important. The physical condition of pipes and customer complaints data were regarded as having medium importance. Hydraulic capacity and all consequences of operational failure were considered to have low importance. Generally, leakages and pipe bursts were considered most important at all levels for the potable water network. Estimating the remaining life of pipe was considered of least importance for all sewer networks.
The questionnaire responses indicated a lack of consensus between IAM levels, i.e. information type considered most important at the strategic level was observed to be of low importance at the operational level and vice versa. The observed lack of consensus with the responses reflects the different needs to support different decisions, meaning a data management approach should take all the data value chains into account to support all the flow of information needed at the three decision levels. When information type was considered in the larger context of overall AM objectives for all pipe networks, the lack of consensus could also indicate the existence of data silos between IAM levels regarding data relevance. Similarly, Martenssoon & Rumman (2019) concluded that some of the most significant challenges to implementing IAM included a lack of definition of strategic objectives and a lack of information sharing, leading to data silos between IAM levels.
The observed lack of consensus in information types ( Figure 2) between IAM levels may also indicate some expected consequences. These consequences include the presence of data silos leading to a lack of data visibility, fragmentation and data management inconsistencies in supporting strategic objectives. Findings from Laakso (2020) support the results above, highlighting the gap between potential and actual data collection and usage for water and sewer pipe networks. Additionally, findings from a scoping review by Bento et al. (2020) indicated that organisational data silos might assume different forms, i.e. information flow. Results also supported the use of the data and information management approach to define silos, identify their effects on the functioning of organisations towards achieving goals and driving behavioural/cultural change towards reducing the presence of silos.
A summary of the existing data management practices based on three focus group workshops is presented in Table 2. Various aspects of DQ and interoperability were observed as challenges to implementing IAM under each theme (bold) in Table 2. The responses indicated the need to overcome the lack of data, DQ and interoperability between AM tools if data management is to be improved in municipal pipe networks. The online survey and focus group workshops' results that DQ and interoperability between IAM tools influence the adoption of data-driven IAM of pipe networks are supported by findings from previous studies (Rokstad et al. 2016;Carriço et al. 2020;Therrien et al. 2020).

DQ, interoperability and data-driven decision-makingthe theoretical basis for the conceptual framework
Findings from the literature survey, the questionnaire and focus group workshops were conceptualised into an assumption that formed the suggested framework's theoretical basis. The conceptualised assumption is that increased levels of DQ and interoperability between IAM tools will lead to a decreased presence of data silo between IAM planning levels and increase data-driven IAM and vice versa. Examples of this assumption include that standardised representation of municipal pipe data may improve DQ and interoperability between IAM tools. A common and consistent data structure mapped to the strategic objectives of IAM may ensure the alignment between IAM levels and reduce the presence of data silos. Studies supporting this assumption include Khisro (2020), which demonstrated through their findings that DQ and interoperability are conjointly interrelating. Their results indicated that a lack of understanding of the relationship between DQ and interoperability leads to information management silos. Furthermore, higher levels of DQ can decrease the complexity and increase the reliability of interoperability.
By conceptualising this assumption towards data-driven IAM, decision-makers may get better insights into understanding dependencies and improvement pathways for alignment across IAM levels to support set objectives. This conceptual relationship is schematically presented in Figure 3 and illustrates that an increase in interoperability between IAM tools and the quality of datasets encourages a shift towards data-driven IAM. This shift may occur via pathways that entail data management models adopted or developed by municipalities. Figure 3 also highlights the significant factors which intrinsically affect interoperability and DQ for pipe networks. Ultimately, to move towards a more linear adoption of data-driven IAM in urban water pipe networks, there is a need to select an appropriate improvement pathway based on the set needs of DQ and interoperability.
The origin represents a region of AM implementation based on tacit knowledge and expert judgements. Points A, B and C in Figure 3 illustrate three examples of different kinds of decision-making pathways based on the data management models applicable in pipe network IAM by municipalities. Pathway A is mainly driven by interoperability between the current IAM tools. Such a pathway may, for instance, enable increased adoption of digitalisation and increased availability of data from sensors but is prone to data collection bias, lack of data structure and accuracy. Pathway B is driven by DQ. For instance, such a pathway may possess medium high-quality datasets, i.e. structured, relatively complete with only a few missing records, accurate and updated. However, interoperability between available tools is low, i.e. pipes in the hydraulic model do not have the same ID as in the Geographic Information System (GIS) inventory, and data formats between both systems are inconsistent. Pathway C shows a linear move towards data-driven IAM, driven by DQ and interoperability in parallel. For example, such a pathway may consist of real-time identification of hydraulic anomalies via enhanced interoperability between hydraulic models and SCADA systems (control of processes by data acquisition from several sensors such as flow, pressure, and H 2 S). From a practical view, the pathway D-E entails how data-driven IAM may be put into practice considering the balance alternatives, negotiations and political prioritisations involved in achieving objective-driven data IAM decision-making.

Conceptual frameworkdata-driven IAM
The illustrated conceptual relationship (Figure 3) prompts the need for a framework that assesses the current status of DQ and interoperability between municipal pipe network datasets and AM tools in the progression towards ultimately advancing data-driven IAM. The proposed conceptual framework is presented in Figure 4. The DQ assessment (i) enables the evaluation of datasets for specific AM-based analytics. The interoperability evaluation (ii) assesses the current state of data exchange via schematic, semantic and syntactic heterogeneities between available AM tools. The data collection and informational benefit analysis (iii) evaluates the cost of data collection relative to the informational benefits obtainable from using this data with the available AM tools. The data collection and informational benefit analysis further allow the simulation of what informational benefits might be obtained from additional IAM tools with the available datasets. Together, all three assessments provide information to municipalities on existing data management models' performance, highlighting critical areas where routines for increased linear adoption of data-driven IAM can be planned. This proposed framework includes a spreadsheet-based tool. The framework's structure (schematically presented in Figure 4) allows for the analyses of the framework's core aspects (i, ii and iii) to either be applied in parallel or sequentially. The description of the methods used in each step of the framework to perform assessments and evaluations, including specific metrics and criteria, is provided in detail in Supplementary data 2.
Preliminary application of the suggested framework to sewer blockage management The first results based on the analyses from the preliminary application of the suggested framework on sewer blockage management in one municipality showed that static, inspection and failure data categories were observed to be most available and structured for blockage management (Table 3). Commissioning data, hydraulic model inputs and maintenance data were the most unavailable. The DQ assessment results for available data related to sewer blockages are presented in Figure 5.

Uncorrected Proof
The average DQ rating was 0.55, 0.50 and 0.35 for each of the categories: static pipe data, inspection data and failure data, respectively. The aggregated average DQ rating of all three categories was 0.47. Static pipe data was observed to be relatively complete, accurate and accessible but needing improvements in metadata documentation, verification and definition of aims for data use. The inspection data category was observed to have a high level of stored data, accessibility and verification, but needing improvements in completeness, accuracy, documentation of metadata, the definition of analysis aims and resolution. The failure data category was observed to be effectively stored, very accessible and having good resolution. However, accuracy, completeness, verification and definition of the analysis aim for the data were lacking. Lee & Strong (2003) associated various aspects of DQ with data roles synonyms to IAM levels. These associations include, i.e. first, the role of data collection may be related to the operational level of IAM with DQ dimensions of accuracy, completeness, accessibility and analysis (practical usability for define intent). Secondly, the role of data custodian may be associated with the tactical level of IAM with DQ dimensions accuracy, completeness and resolution of data. Thirdly, the role of data consumers might be related to the strategic level of IAM, with DQ dimensions analysis (practical usability for defined intent). Based on these associations, the DQ for strategic IAM level decision-making may be considered low. At the tactical level, DQ is lacking regarding supporting inspection planning and failure analysis, and DQ appears highest to support operational-level planning and decision-making. Figure 6 presents the results of the interoperability evaluation based on a procedure and criteria defined in Supplementary data 2, section 2. The overall score (0.17) indicates the level of interoperability between all IAM tools under consideration between 0 (no interoperability) and 1 (set level of interoperability). The target level of interoperability is only achieved by two IAM tools, i.e. the GIS database and failure record database. All other IAM tools were assessed to have no interoperability.
The aggregated DQ assessment score along with the overall interoperability score is plotted (Figure 7) to illustrate the current state of IAM for sewer blockage management in the municipality.
The current state of IAM for blockage management (Figure 7) is characterised by medium DQ and low interoperability between IAM tools in the municipality. The interoperability analysis between available tools was observed to be low overall except for the GIS database and the failure records database, which were schematically integrated. Overall, the current state of DQ and interoperability suggests that blockage management is based more on intuitive IAM as opposed to data-driven IAM. The current state of DQ and interoperability also indicates the possible presence of data silos, implying a lack of a data management approach that ensures the alignment of set strategic objectives across IAM planning levels for efficient blockage management. These conclusions are supported by discussions with the municipal AM engineer. Improvement pathways may be considered relative to defined objectives; however, for this specific example, objectives have not been defined. Therefore, data-driven IAM for blockage management may be achieved by improving the interoperability level between current IAM tools, i.e. improving the overall interoperability score to at least the same score as the DQ assessment score. Furthermore, the results serve as a baseline to help in the definition of objectives. The informational benefits that may be gained from current IAM tools could be assessed to further aid planning and setting of objectives. Figure 6 | Interoperability assessment matrix between available IAM tools (A-D) for blockage management analyses in the municipality. The grey colour indicates the target interoperability level. The perceived interoperability level is reported in the non-shaded cells horizontally and vertically between pairs of tools. The normalised average is equal to the vertical column sum divided by the sum of all vertical columns' expected interoperability score (3/18). The sum of all normalised averages gives the overall interoperability score (bold). Table 4 shows the results of a partial simulation of the information benefit analysis using the available data for blockage management (Table 3) and current IAM tools (A-D) ( Figure 6). The four current IAM tools were mapped to 26 expected IAM informational outcomes (Table 4 of Supplementary data 2). The total number of tools currently available was grouped into four combinations (tool combination ID, Table 4). The benefits were assessed as the percentage completeness of all the possible informational outcomes attainable with current tools, weighted with the DQ based on Equation 3 of Supplementary data 2.
Considering the interoperability evaluation ( Figure 6), combination ID 2 was most representative of the current interoperability status of AM tools in the municipality for blockage management and showed only 17% informational benefits  obtainable. Ensuring that the target level of interoperability (3) between all four currently available IAM tools is achieved, informational benefits could be improved to about 62% (combination ID 4), without increasing data collection costs substantially. The current state of blockage management in the municipality may be considered as largely intuitive with medium DQ and low interoperability. The informational benefit analysis indicated that improving interoperability between all IAM tools to the set target level could yield additional benefits without a substantial increase in data costs, thus ultimately moving the current state of blockage management into the domain of data-driven IAM. Based on this research, it is not possible to make a more comprehensive recommendation of improvement routines because the preliminary application only considered a particular aspect of IAM for sewer networks. Consequently, only limited improvement pathways can be recommended. When the framework is applied across all pipe networks covering all aspects of IAM within the municipality, the results could facilitate feasibility advice for robust and integrated improvement routines.
Studies such as Kasunic & Anderson (2004) and Sas & Avgeriou (2020) indicated that scenario-based or architectural styled assessments are better to understand interoperability measures and associated trade-offs involved between systems i.e. IAM data and tools. Similarly, the preliminary application results also indicated the need to consider possible tradeoffs from framework outputs and recommendations relative to set objectives. For example, increased interoperability levels may also lead to increased exposure of municipal systems to cybersecurity threats and risks. Increased levels of DQ and interoperability may also have associated increased costs.