Feasibility of using smart meter water consumption data and in-sewer ﬂ ow observations for sewer system analysis: a case study

Globally, smart meters measuring the water consumption with a high temporal resolution at consumers ’ households are deployed at an increasing rate. In addition to their use for billing or leak detection purposes, smart meters may provide detailed knowledge of the wastewater in ﬂ ow to the sewer systems in space and time and open up new types of system analyses aimed at closing the urban water balance. In this study, we ﬁ rst validate the smart meter data against other, independent water distribution data. Subsequently, we use a detailed hydrodynamic sewer system model to link the smart meter data from almost 2,000 consumers with in-sewer ﬂ ow observations in order to simulate the wastewater component of the dry weather ﬂ ow (DWF) and to identify potential anomalies. Results show that it is feasible to use smart meter data as input to a distributed urban drainage model, as the temporal dynamics of the model results and in-sewer ﬂ ow observations match well. Furthermore, the study suggests that in-sewer ﬂ ow observations may be subject to unrecognised uncertainties, which make them unsuitable for advanced investigations of the DWF composition, and this underlines the necessity of collecting data from independent sources. The study also exempli ﬁ es that digital system integration in the water sector may be complicated. However, overcoming these obstacles may improve both of ﬂ ine and real-time urban drainage management.


INTRODUCTION
The dry weather flow (DWF) describes the flow in the sewer system during periods without rain. This flow consists of wastewater from, for example, households, industry and institutions, as well as groundwater infiltration and raininduced infiltration into the sewer that can take place for days and weeks after the end of a rain event due to the by using the estimation of the wastewater flow distribution to pinpoint optimal heat pump locations and optimal wastewater reuse locations, respectively.
In residential areas, the wastewater flow is dominated by the citizens' behaviour. In upstream parts of the system or in small systems, the flow is intermittent (i.e. occurring at irregular intervals) and the detailed flow dynamics may not be captured if too large a sampling interval is used (Butler & Graham ; Elías-Maxil et al. ). The DWF of smaller areas is thus, in general, more difficult to estimate (Djebbar & Kadota ). Further downstream, the aggregation of different dynamic inputs from many small upstream sources results in a change in the nature of the wastewater flow, which forms a less dynamic pattern (Butler & Graham ). The DWF from a catchment has previously been studied by a range of authors. () fitted a partial least-squares model to DWF data to estimate the DWF in situations with missing data. All of these methods may be used to establish DWF patterns but will not give a real-time picture of the DWF. Such realtime information could be obtained from in-sewer measurements, but flow sensors are often scarcely distributed, since it is both expensive and impractical to cover a large urban drainage system (Djebbar & Kadota ). Highly spatially distributed real-time DWF information is therefore currently not realistic to obtain using only in-sewer observations. Water supply and urban drainage systems are intrinsically linked, since most of the consumed water ends up in the sewer system. The wastewater flow can thus be approximated by estimating the water consumption. Butler & Graham () and Elías-Maxil et al. () used a questionnaire and a probabilistic model to estimate the water consumption. Both studies subsequently modelled the resulting flow in the sewer system to obtain a spatial and temporal distribution of the wastewater flow. However, these methods only provide generalised flow patterns. Contrarily, smart meters measure the real-time water consumption in each household every hour or more frequently, and are increasingly implemented as part of the digitalisation of the water sector, mainly for billing and leakage detection purposes (Boyle et al. ; Monks et al. ). Data collected from the water supply system should be pre-processed before use including validation and re-estimation of missing and invalid data (Kirstein et al. ). After this, the smart meter data can potentially be used to estimate the wastewater flow with a high spatial and temporal resolution. Furthermore, knowledge of the wastewater flow component can, in comparison with in-sewer flow data, be used to estimate the addition or loss of water through infiltration and exfiltration and thus contribute to closing the water balance of the urban drainage system. The concept of applying smart meter data to estimate wastewater flows has been mentioned in the recent literature  This study investigates the hypothesis that smart meter consumption data can be used directly to estimate the magnitude, timing and spatial distribution of wastewater flow, without the need of a water distribution network model. This hypothesis is tested using data from the city of Elsinore, Denmark. We validate the smart meter data by comparing the data with observations from the waterworks outlet, observations from the WWTP inlet, and annual water consumption data from a year prior to the installation of smart meters. The smart meter data are also routed through a 1D hydrodynamic urban drainage model for simulating the flow dynamics. We compare the smart meter data and the simulated flow to in-sewer flow measurements from five locations in Elsinore city centre. This comparison forms the basis for assessing anomalies, including unreliable data and other DWF components than wastewater. The comparison is undertaken by using both quantitative methods and logical reasoning to evaluate the wide range of possible sources of uncertainty that arise when employing large amounts of data in a real full-scale urban catchment study.

CASE STUDY AREA
The city of Elsinore, Denmark, is located 30 km north of Copenhagen and has 47,000 inhabitants. It covers an area of 18 km 2 and has a dense medieval centre, surrounded by suburbs developed mainly in the period 1930-1970. The potable water is supplied by four waterworks and the city contains one water tower. The drainage system is predominantly a combined system that leads most of the water to a centralised WWTP. The average yearly precipitation in the city is 670 mm. The upstream parts of the city have a low degree of imperviousness (20-40%), whereas the downstream part containing the city centre is more paved (imperviousness of 60-80%). The utility company has installed permanent smart meters in all the consumers' homes, and in-sewer sensors have been installed for a temporary monitoring campaign in the sewer system, which enables the type of investigations performed in this study.

Data from the water supply system
Around 19,000 MULTICAL21 smart meters with a temporal resolution of 1 h and a measurement uncertainty of up to ±5% (Kamstrup ) are permanently installed in households and industries for billing purposes (Figure 1). In the urban area, the meters are placed inside the households.
The smart meters cover the entire city, except 14 consumers who still have manually read meters. There are no unmetered consumers. Around 2,000 of these smart meters are situated in the catchment upstream of the in-sewer sensors ( Figure 1), which is the main area of interest in this study.
Contractor 1 undertook the installation of the smart meters and provided data from three separate weeks in 2018 and 2019 (Table 1). The raw smart meter data are accumulated volume readings. Due to the various data transmission and collection mechanisms of smart meter data, the raw data arrive at non-uniformly distributed time steps (for example, Kirstein et al. ). Contractor 1 filled the data gaps primarily using linear interpolation to obtain a uniform interval between flow values of 1 h. Kirstein et al. () showed that this interpolation procedure sufficiently represents the total consumption within an area as long as a great proportion of the individual consumers' raw smart meter data are not missing and if the raw data resolution is less than 2 h. Contractor 1 furthermore reported the quality of the data. The time periods in Table 1 were selected based on the availability of good quality data in periods without rainfall creating direct runoff in the sewer system.
Hourly outflow data from the four waterworks supplying clean water to the city were obtained from MAGFLOW flowmeters type MAG3100 Water with an uncertainty of less than 5% (Siemens ) for specific days in each time period (Table 1). The utility estimated one of the waterwork outlets and the outlet from the water tower based on other observation points, meaning that the actual uncertainty is higher than the declared uncertainty for the instruments.
Furthermore, Contractor 2, who maintains the utility's hydraulic water distribution network model, provided a database with the annual water consumption readings for each individual consumer in Elsinore from 1 January to 31 December 2012. This database contains readings from manually read water meters (read once a year for billing purposes) and is the newest available independent source of water consumption data from before smart meters were installed in Elsinore. In Denmark, it is legally required that each consumer has a water meter at home, and these meters are owned by the utility that performs random control samples to ensure that the meters provide accurate and unbiased readings. The high Danish water price means that most people and the utility are very aware of large changes in the water bills, which further increases the credibility of the consumption data. The 2012 data have a declared uncertainty of ±2% and are thus the best estimation of the annual water consumption we can get that is independent of the smart meters.
Additionally, the utility provided the total amount of dis-

Data from the urban drainage system
Utility Elsinore has divided the city into wastewater catchments depending on the layout of the sewer system. These catchments, together with asset data, are described in the municipal wastewater plans, which in detail explain how wastewater and rainwater flows are and will be managed.
The asset data have been used as the basis for a detailed 1D hydrodynamic model of the sewer system, which is con- Daily inflow data were also available for the WWTP (Table 1) Furthermore, the utility and Contractor 4 provided information on last calibration dates for the WWTP inlet sensor and the in-sewer sensors, respectively (see Table 1).

Extraction of smart meter consumption data
Utility Elsinore supplied data from 18,804 meters, of which 18,449 were georeferenced. About 338 of the remaining 355 meters were successfully referenced using the QGIS tool 'MMQGIS'. We manually verified that the 17 non-referenced meters are located outside the upstream area of the insewer sensors and thus outside the main area of interest. Due to the general data protection regulation, we could not obtain water consumption data on household level for the 18,787 georeferenced meters, and they should thus be aggregated into groups. It is important that all smart meters within a group discharge water to the same part of the sewer system.
Thus, we initially based the groups on the catchments from the Elsinore wastewater plans. 20% of the 18,787 meters were discarded because they did not discharge wastewater to the WWTP used in this study, leaving 15,011 meters. It was important to get an accurate description of the flow dynamics in the area upstream from the in-sewer sensors, and we therefore further manually sub-divided the groups in this area from 19 groups in the wastewater plans to 111 groups with a total of 2,015 meters and between 2 and 38 meters in each group (see the group divisions in Figure 1).
This division was based on the general flow paths in the sewer system (described by the asset database). The overall result was 15,011 meters distributed in 245 groups containing each up to 869 meters.
We provided a list with the meters in each group to Con- Furthermore, Contractor 1 returned the aggregated consumption data from all smart meters receiving water from the Elsinore Waterworks.

Summed smart meter data
We calculated the summed smart meter data for each of the five in-sewer sensor sub-catchments, Q SM,loc , by summing the smart meter consumption, Q SM,loc,i , of the N,loc groups belonging to these sub-catchments at all time steps, t:

Simulated wastewater flow
The applied MIKE URBAN model ( Phenomena affecting the observed DWF Q SM and Q sim may differ from Q obs if the system is affected by other water inflows and outflows than the smart metermeasured water consumption or due to erroneous data. To assess these phenomena, we calculated the average flows for the five sub-catchments (loc) over the three  (t) 10, 080 min=Δt sim Furthermore, we calculated the mass balances (change in volume, ΔV ) for each of the five sub-catchments based on the water consumption in the given sub-catchments as well as inflows from upstream sub-catchments (Q obs,in,loc ) and the outflow (Q obs,out,loc ) in the three investigated 1-week time periods: t¼2 min (Q obs,in,loc (t) À Q obs,out,loc (t)) Á Δt obs (3) A positive mass balance means that more water enters than leaves a catchment (thus, there may be a loss of water in the sewer system); a negative mass balance contrarily means that there may be an additional source of incoming water. The relative importance of the difference in volume as a function of the outflow from each sub-catchment was calculated as: The difference between observed and simulated flows in the five sub-catchments, Q res,loc (t), was calculated based on the observed and simulated flows: These residuals may both be positive and negative, and exhibit constant, diurnal or seasonal variations. They may also vary according to the outside temperature, previous rainfall and pipe geometry. Table 2 lists three overall possible reasons for deviations as well as expected residual patterns.
Some of the uncertainties can be quantified, such as the declared uncertainty from the measurement devices (Table 1). The uncertainty associated with the equipment installed in situ may, however, be higher than declared by the manufacturer, which can be difficult to quantify without comparing the data with other, independent data sources.
The remaining potential uncertainties listed in Table 2 are also difficult to quantify without additional data, since these are often highly contextual and case-specific. One would thus need to make additional independent measurements in Observed flow smaller than simulated flow (negative residuals, Equation (5)) Consumed water not discharged to the sewer. Most of the consumed water ends up in the sewer system either directly, such as water used for showering, or delayed, such as the water used for washing clothes. However, some consumed water will end up elsewhere, such as water that evaporates during cooking or from wet laundry or water that infiltrates in the soil during car washing or gardening; the latter is assumed to be season dependent and may even infiltrate into the sewer system as discussed later. In Denmark, a conservative estimate is that at least 86% of the used water is discharged to the sewer system (HOFOR ). This is in alignment with results from, for example, Zhang et al.
(). Since more water is used during the day than at night, the residuals will be larger during the day and thus exhibit a diurnal pattern. In coastal cities like Elsinore, discrepancies may also arise if ferries and trains take clean water on board in the city for kitchens, toilets and cleaning but discharge the foul water somewhere else or at another time of the day. This may lead to irregular residual patterns.
Leakage may occur in the supply pipe after the smart meter location, which means the metered consumption would not enter the urban drainage system and the observed sewer flow would be smaller than the simulated flow. If the smart meter in this case is placed at the property boundary, the water from a leakage in the supply pipe between the property boundary and the house is likely to infiltrate into the soil, a situation hard to diagnose. If the smart meter is instead placed inside a household, leakage after the smart meter is likely to lead to building damage which is easier to diagnose.
Exfiltration. Exfiltration from the sewer system may occur when there are cracks and fractures in the pipes and the groundwater level is below the sewer system. The groundwater level is generally lower in the summer than during winter, meaning that the exfiltration rate is expected to follow a seasonal pattern with peaks in the summer.
Observed flow larger than simulated flow (positive residuals, Equation (5)) Unaccounted for consumers. Q sim only contains data from consumers that have smart meters installed. Some consumers may, however, have older meters requiring manual Observed flow smaller than simulated flow (negative residuals, Equation (5)) Consumed water not discharged to the sewer Diurnal Exfiltration Seasonal Observed flow larger than simulated flow (positive residuals, Equation (5) and trains that take on board clean water when they set out and discharge it through the sewer system at their destination. This may result in irregular residual patterns.
Pumping of groundwater to the sewer system. Buildings and construction sites that are located partly underground may need to pump away groundwater and possibly intruding seawater if they are located on the coast. This water may be discharged into the ocean, infiltrated somewhere else or discharged to the sewer system. Such pumping would most likely occur both day and night leading to a constant deviation. If the groundwater level is only an issue in the winter, the residual pattern will be seasonal. Sea-level variations may, however, also affect the groundwater level and thus impact the amount of pumping.
Rainwater harvesting. Rainwater may be used for toilet flushing and laundry, and this will increase the flow in the urban drainage system compared with the recording by the smart meters. Since more water is consumed during the day, the residual pattern would follow a diurnal pattern.
An increase in water consumption from the water distribution system would be seen again when the rainwater tanks are empty.
Snow melting. In winter, precipitation falling as snow can cause a delayed runoff into the sewer system. This may increase the measured flow in the urban drainage system.
The temperature determines when the snow melts, and the residuals will thus neither follow a constant, diurnal or seasonal pattern.
Infiltration. Infiltration into leaky sewers can arise from either groundwater, rain, groundwater pumped from perimeter drains around buildings and construction sites, or leaking water supply pipes: -Groundwater levels change slowly over the course of the year, and groundwater infiltration thus exhibits a seasonal change in the infiltration rate, which may also be affected by the sea water level.
-Rain-induced infiltration increases the sewage flow only after rain events and slowly decreases with time.
-Infiltration stemming from the pumping of groundwater exhibits a constant or seasonal pattern (see 'Pumping of groundwater to the sewer system').
-Leakage from the water distribution system into the urban drainage system is expected to be a function of the pressure in the water distribution system. If the system pumping is pressure controlled or if the pressure fluctuates due to variations in demand, the leakages could, to some extent, display a diurnal pattern. In Denmark, the water supply pipes in general have too large diameters due to former expectations of growth in the water consumption and requirements of being able to supply fire hydrants. Therefore, the pressure drops in the distribution systems are small, and temporal variations in demand will only have a minor effect on the pressure in the pipe system. The residuals would thus predominantly be constant over the day.
Sedimentation. Some sewer system flow observations, like the ones in Elsinore, are obtained by multiplying the velocity with the wetted area of the cross-section of the pipe. If sedimentation is present at the location of the sensor, the water level will rise and lead to a larger calculated than actual flow. The residuals will vary in size depending on the geometry of the pipe. Usually sensor companies will try to avoid installing flow sensors in pipes with a lot of sediments, but sedimentation is a dynamic process that may change after the installation of the sensor.

General reasons for deviations between observed and simulated flows (negative and positive residuals, Equation (5))
Erroneous smart meters, data transmission or data handling.

RESULTS AND DISCUSSION
Here, we (1)   Looking at the other two time periods, the smart meters registered on average 14-17 L/s (15-17.5%) less water than Q obs,WW . This deviation could be due to leakage in the water distribution network, errors in the waterworks' flow sensor (higher than the <5% reported uncertainty, potentially due to the partial estimation of the total waterwork outflow) or errors in the smart meter data set. The smart meter data set may be erroneous due to meter errors (higher than the 5% reported uncertainty), data transmission errors, faults in the data handling or missing consumers. The latter is, however, not assumed to be an issue, since all consumers are metered and less than 0.1% of the consumers still have manually read meters. The data in Figure 2 thus do not give a clear indication of the validity of the smart meter data.  Figure 1). The deviation may thus instead be due to increased groundwater infiltration in February (winter) in the unsensed part of the WWTP catchment, but it may also be due to erroneous WWTP inlet observations above the reported uncertainty (Table 1).
Since the 2012 consumption data are independent from the smart meter data, we assume that the completeness and quality of the smart meter data set can be assessed by comparing these with this data (see Figure 4). In October, November and February, the water consumption was specifically 2, 4 and 10% higher than the mean.
We have therefore conservatively added a 10% uncertainty to all three time periods on top of the declared uncertainty of 2%. We have not added error bars to the simulated wastewater flow, Q sim , since this is simply a post-processing of the smart meter data, Q SM . These two differ from each other due to water generated in empty pipes of the model for the sake of numerical stability and because Q sim is affected by the routing time in the sewer system. Overall, the smart meter data and the consumption data from 2012 match well, taking the uncertainties into account. Figure 4 also shows a large difference between the 2012 water consumption data and Q obs , which is larger than the declared  (Table 2) is limited, meaning that the smart meter data set is complete and sufficiently correct to represent the wastewater component of the DWF.
Assessment of anomalies, including other DWF components and erroneous data Figures 3 and 4 show that there is a discrepancy between the smart meter-based wastewater flows, Q sim and Q SM , and the observed in-sewer flow, Q obs , which is so big that it exceeds the combined, declared uncertainty of the smart meter and in-sewer sensors (Table 1).
To assess possible anomalies, we look at the mass balances for the five sub-catchments for the three time periods (Equation (3) (Table 1) and are calculated as Q ± QÃuncertainty. The error bars of Q consumption,2012 likewise indicate the declared uncertainty (Table 1) as well as the seasonal variation (±10%) and has furthermore been shifted downwards to reflect a decrease in water consumption between 2012 and 2018 of 8%.
Figure 5 | Mass balances for each sub-catchment (Equation (3)) and the relative size of the change in volume compared with the outflow (Equation (4)) for the three time periods. of this expected behaviour, but it would be premature to draw any conclusions from this considering the large unexplainable variation for the remaining sub-catchments. Figure 6 shows the time dynamics of Q sim , Q SM and Q obs , and is arranged to resemble the flow through the sewer system as conceptualised in Figure 1. The flow is naturally smallest in 'WU', since this is the smallest of the five sub-catchments and does not receive water from any further upstream catchments. The axis for this plot is thus scaled differently than for the remaining plots. The flow increases through the system as more water aggregates. Most of the outflow from the most downstream catchment (WD) is generated in the most upstream catchment, EU, which also has by far the most consumers. The residuals between Q obs and Q sim (Equation (5)) are shown in the upper left corner of Figure 6 for all five sub-catchments. Figure 6 shows that the flows in WU match well in October and November. There are negative discrepancies between Q obs and the smart meter-based flows (Q SM and Q sim ) in EU in October and in WU in February, and many time periods and sub-catchments with positive residuals.
In the following, we use Table 2 (besides unaccounted for consumers and uncertainty related to the smart meters, data transmission and data handling, which were previously deemed valid) and Figure 6 to deduct what could be plausible causes for the observed discrepancies between the flow results from various data sources.
The negative diurnal residuals likely do not stem from exfiltration from the sewer system since Q res,EU exhibits a clear diurnal pattern in October while exfiltration is expected to vary according to the season. Furthermore, a net exfiltration is not expected to occur in February (where the groundwater level generally is higher) without also occurring in October and November. Neither do the negative residuals stem from consumed water not discharged to the sewer system since residential areas like 'EU' and 'WU' should discharge at least 86% of the used water to the sewer system, whereas the observed differences on average are 60 and 40% in EU (October) and WU (February), respectively. Furthermore, the smart meters in Elsinore are located inside the houses, which means that it is fair to assume that leakage on the consumers' side of the smart meters is minimal and can be disregarded. It is not likely that the positive diurnal residuals stem from pumping of groundwater to the sewer system because this will not lead to diurnal residuals.  periods. This indicates that the differences are not due to slowly drifting sensors.
Since the sensors were installed as duplicate sensors, it might at first seem unlikely that these observations are as uncertain as indicated by this study. It is, however, worth noting that the duplicate sensors are not truly independent since they are installed at the same location and use the same sensing technology. This means that they are likely to produce the same systemic errors when exposed to the same conditions and thereby confirm each other's miscon-   the summed smart meter data and the simulated flow will naturally be more pronounced the longer the water has travelled in the system and thus depends on the size of the catchment upstream of the flow observations.

Outlook
Sewer system analysis will likely never be the sole reason for installation of smart meters, which are typically installed for better operation and management of the water distribution system. The urban drainage sector can, however, still take advantage of this data, which is increasingly collected in the future anyway. This study shows that water supply smart meter data can potentially be used to estimate the wastewater flow. The fact that smart meters thus become multi-purposed will likely make their operation more robust. Furthermore, the robustness of the approach also stems from the sheer amount of smart meters, where a failure of a single meter will have much smaller consequences regarding the information level than the failure of a sewage flow sensor.
The use and comparison of data across the water supply, urban drainage and wastewater sectors and the subsequent anomaly analysis performed in this study is a tedious process due to the many aspects affecting both the water distribution and urban drainage networks. Furthermore, it is laborious to systematically gain access to and compare all relevant data sources. This was further complicated by the fact that not one single person could access all the models and data as they were managed and stored in different silos both within the utility and by different contractors.
To get the most value out of data, we need to break down these silos in the future and aim for more open standards for data exchange. Many utilities are currently taking the first steps towards using data in new and more integrated ways. This study shows that digitalisation is not easy, and sometimes the data quality remains unknown until the data are used and compared with other data sources.
More integrated data analysis and systematic uncertainty assessments are clearly needed to bring this field further ahead. The process of actually using data will provide important learnings regarding good practices within sensing and data accessibility, and enable the utility and their contractors to further refine their work processes.
The coupling between smart meter data and urban drainage models can be done offline for post-analysis of data and system performance (as done in this study). If the model is trustworthy, this coupling could also be done, and the sewer model run in real time, to get an up-to-date picture of the system state.
This would open up enhanced real-time data validation as well as entirely new ways of operating sewage infrastructure during dry weather, for example, for improved heat recovery from sewage systems or better sewer system control.

CONCLUSIONS
The current study aimed at using smart meter water consumption data to simulate the wastewater flow and to combine this information with in-sewer observations to detect system and data anomalies, such as infiltration, exfiltration and sensor errors.
Smart meter data were validated with data from other independent sources, including data from the waterworks' outflow, WWTP inflows and households' annual water consumption audits for a period prior to the installation of the smart meters. Subsequently, we illustrated the feasibility of using the smart meter data as input to a 1D hydrodynamic sewer model to simulate the wastewater component of the DWF. An estimate of the wastewater flow was thus obtained with a high spatial and temporal resolution. Even though there is still uncertainty related to this estimation, we believe that having this data-based approach to estimate the wastewater flow directly from its source (the water consumption) is an important step towards closing the water balance in urban drainage systems.
The main difference in the results from simply summing the smart meter data and using an urban drainage model is the conditions. This underlines the necessity of using truly independent data sources from different measuring techniques for future data quality assessments.
The main take-away messages from this study are: 1. Using smart meter data as input to hydrodynamic urban drainage models is feasible and promising as a means of estimating the wastewater flow directly from water consumption data. This is an important step towards closing the water balance in urban drainage systems.
2. Installing multiple in-sewer flow sensors of the same type at the same location is not enough to assess the data quality. Including independent data sources using different sensor types in future investigations is paramount.
3. It is a tedious task to compare and analyse data from multiple sources. However, the only way to start breaking down data silos and realise the real quality of collected data is to start using it in integrated studies as exemplified in the case study.