Using multi-event hydrologic and hydraulic signatures from water level sensors to diagnose locations of uncertainty in integrated urban drainage models used in living digital twins

Digital twins of urban drainage systems require simulation models that can adequately replicate the physical system. All models have their limitations, and it is important to investigate when and where simulation results are acceptable and to communicate the level of performance transparently to end-users. This paper ﬁ rst de ﬁ nes a classi ﬁ cation of four possible ‘ locations of uncertainty ’ in integrated urban drainage models. It then develops a structured framework for identifying and diagnosing various types of errors. This framework compares model outputs with in-sewer water level observations based on hydrologic and hydraulic signatures. The approach is applied on a real case study in Odense, Denmark, with examples from three different system sites: a typical manhole, a small ﬂ ushing chamber, and an internal over ﬂ ow structure. This allows diagnosing different model errors ranging from issues in the underlying asset database and missing hydrologic processes to limitations in the model software implementation. Structured use of signatures is promising for continuous, iterative improvements of integrated urban drainage models. It also provides a transparent way to communicate the level of model adequacy to end-users.


INTRODUCTION
Stimulated by the emergence of increasing amounts of monitoring data the expectations to digital twins (DTs) in the water sector are high, as they are anticipated to provide improved insights and overview of the infrastructure systems for water distribution, drainage, and treatment (Fuertes et al. 2020;Therrien et al. 2020;Pedersen et al. 2021a;Valverde-Pérez et al. 2021). Multiple companies currently explore the DT concept which is also being widely debated in academia, and Pedersen et al. (2021a) suggested, based on a thorough literature review across engineering disciplines, a definition that is used here as a conceptual frame. A DT for an urban drainage system ( Figure 1) is a systematic virtual representation of the elements and dynamics of the physical system, organized in a star-structure (a representation of the Internet of Things (IoT)) with a set of features connected by data links. Coupled to the physical system, simulation models are among the most important features, and (at least) four different model categories are distinguished. Two of these are prototyping models used for planning or design purposes, and two are living models used for control or operation purposes, living here referring to the coupling of close-to real-time observations from an ever-changing physical twin (which may change over time) with a Such a classification of model performance at the whole-system level would assist iterative model improvements, allow assessing the uncertainties in the model, and help understand for which processes and management objectives a given model can provide trustworthy estimates. There has been a wide focus in the urban draining community on ameliorating model errors through parameter calibration techniques targeting the hydrological (rainfall-runoff) part of integrated urban drainage models (e.g., Deletic et al. 2012;Breinholt et al. 2013;Tscheikner-Gratl et al. 2016;Vonach et al. 2019). However, there has been relatively little focus on technical and structural errors that cause large problems in practice, such as faulty asset data, missing physical processes, and others. This paper aims to present a classification scheme that embraces all errors in integrated urban drainage models. It does so by combining the 'uncertainty frameworks' of Walker et al. (2003) and Gupta et al. (2012), in a manner somewhat similar to what was previously done by Del Giudice et al. (2015). However, here we additionally include the uncertainty 'locations' defined by Walker et al. (2003), i.e., 'context,' 'input,' 'model structure,' and'parameter'. Deletic et al. (2012) also looked into these uncertainties with a special interest in calibration methods and acknowledged the difficulty in addressing the model structural uncertainties we aim to detect.
The objective of this paper is to develop a framework tailored for practitioners in water utilities for use in their iterative improvements to the hydraulic network representation in existing integrated urban drainage models often used in DTs. To do so, the following three things are considered: Firstly, a scheme for classifying various types of errors and uncertainties in integrated urban drainage models is presented. Then the paper introduces the concept of hydrologic and hydraulic signatures to the urban drainage community and highlights their use as the backbone of a systematic approach to error diagnostics. Finally, the framework is illustrated though application to a real case study with up to 10 years of monitoring data. Its utility is illustrated by identifying three different classes of errors in the available hydrodynamic model through multi-event signature comparisons.

CONCEPTUAL FRAMEWORK Urban drainage modeling and surrounding states
The urban drainage model type considered in this paper is an 'integrated urban drainage model' (Bach et al. 2014), which is a 'classical' industry-standard type of model used to simulate entire urban drainage systems, from a raindrop hits the ground until it leaves the system via outlets and overflow structures or due to sewer surcharge or flooding. Several software products used in the urban drainage profession worldwide include this type of model, e.g., SWMM, InfoWorks, and Mike Urban. It consists of two distinct modules: a hydrologic surface runoff module that converts precipitation data into inflow to the pipe system for each sub-catchment in the system and a hydraulic model that distributes water throughout the entire pipe system by solving the full St. Venant equations. Additional model forcing components can be defined, such as wastewater inflow, pumped flows, and infiltration-inflow. The surface runoff module is lumped-conceptual on the sub-catchment scale but distributed at the entire urban drainage system-scale (Hansen et al. 2014). Several types of surface modules are available. The hydraulic model is physically distributed, and as for the surface module, several versions are also available. This urban drainage modeling approach is sometimes referred to as 'detailed' or 'high-fidelity' (Hi-Fi) modeling, referring to the detailed manner based on physical asset data in which pipe-flow is described. However, the pipe-flow module also includes hydraulic structures in which the hydrodynamics are described in a lumped, conceptual manner, for example, manholes, overflow weirs, and pumps.
In Danish engineering practice, the 'classical' urban drainage modeling approach is generally expected to simulate small and medium-sized rain events leading to increased flow and occasionally overflow and surcharge well. This is because routine monitoring campaigns typically capture these events, where mostly paved surfaces contribute to rainfall-runoff, and therefore model comparisons are often made with these events. Furthermore, with the increasing exploitation of the models to simulate other operational domains, it is expected that the performance may be useable for dry-weather situations. However, there may be big challenges with simulating the wastewater flow in dry weather (infiltration-inflow may increase the flow seasonally or after long wet-periods, which is usually not included in the model) and with simulating flows during storm events where unpaved areas start contributing to the runoff, a process not well described in state-of-the-art models used in Danish engineering practice. In addition, weather patterns may influence the simulation quality; spatially wide-spread rainfall during frontal weather systems may, for example, be well represented by a single rain gauge located in or close to the catchment in question. In contrast, small-scale convective rainstorms causing local flooding may be missed by a single rain gauge. These processes and phenomena are either not included or not well represented in 'classical' urban drainage models, and there is a need to qualify under which conditions this is of concern, so that we can avoid attempting to find solutions to uncertainty originating from outside the model boundaries through calibration of parameters inside the model.
We use the term 'surrounding states' for any relevant information outside the model boundary that can help identify relationships and processes relevant to the system but are currently not included in the model. These may be indicators that potentially explain the uncertainty of the modeling approach or why the model fit is expected to differ for different operational conditions. For example, precipitation at low temperature may not give high water levels in the sewers, as precipitation will remain on the surface as snow and only later reach the sewers. High soil moisture may also lead to higher flows due to runoff from pervious surfaces and infiltration-inflow. Some rainfall types with limited spatial variability may furthermore be difficult to capture with only a few rain gauges. The impact of the surrounding states can be site-and model-specific, and therefore the ones found to have an impact in this paper may not have an impact in other models.

Classification of uncertainties
Three dimensions can characterize uncertainty: location, level, and nature (Walker et al. 2003;Warmink et al. 2010). The location is where the model's uncertainty manifest itself (e.g., context, input, model structure, parameters). The level of uncertainty ranges from statistical uncertainty to scenario uncertainty, qualitative uncertainty, and ignorance, suggesting that quantification is not easy for all kinds of uncertainty. Finally, the nature of the uncertainty distinguishes epistemic uncertainty (due to imperfect knowledge), natural variability (e.g., rainfall variability), and ambiguity (when there are multiple parallel knowledge frames). Gupta et al. (2012) identifies three different locations of model uncertainty: (a) conceptual model, which incorporates the physical and process structure of the system; (b) the mathematical model, which incorporates the spatial variability and equation structure; and (c) the computational model relating to the model technical uncertainty. To identify and classify the sources of error in a model, we combine the two uncertainty frameworks (Table 1). We add temporal variability to the mathematical model location, which is a large uncertainty in urban drainage system modeling. The different uncertainties are in Table 1 exemplified by a non-exhaustive list of examples. For example, input uncertainty may be related to both rainfall data (external forcing) and the elevation of a crest level (a system attribute).

Signatures for urban drainage systems
Signatures are quantitative measures derived from parts of time series to analyze patternssuch as peak level, duration of the level above a crest level, the area under the level curve, time of the peak, and others ( Figure 2). They may provide insights into the underlying processes (in the case of observed time series) or in the model used to simulate the processes (in the case of simulated time series). Several studies provide examples of signatures (McMillan 2020a; Gnann et al. 2021). Analyzing each signature can help diagnose the likely source of a discrepancy (error) between model output and observations. In addition, Uncorrected Proof peak level can give insight into local physical attributes. In hydrology, they started identifying various process-based signatures (McMillan 2020b), which needs to be done as well in the urban drainage field. This study uses signatures for a specific operational domain in the urban drainage system, namely rain-induced events. We look both at signatures characterizing hydrologic processes in the rainfall-runoff module and signatures characterizing hydraulic processes in the hydraulic module.

Graphic interpretation of multi-event signatures
Many hydrology studies use signatures to extract statistical properties of the system to be replicated in a lumped model, and the model is then assumed to be sufficient. Properties from a gauged area can even be transferred to similar ungauged catchment areas to model these (Hrachowitz et al. 2013). This approach can also be performed in urban hydrology. However, urban hydrology, to a larger degree than general hydrology studies, is influenced by physical assets and direct runoff from paved areas, direct comparison of time series event by event, model vs. observation, is often required by regulators and utility companies.
Signatures can be calculated for several physical sites across an urban drainage system, focusing both on characteristics of individual events and on multiple events to investigate persistence and trends in the error structure at the particular physical site. Errors here refer to the discrepancy between the signature from the model event to the observed event, and diagnostics should potentially support identifying and classifying the source of the errors based on these comparisons. Figure 3 a2)). Scattered dots for high levels may indicate a high input uncertainty for large events (Figure 3(a3)), and a bend in the dots may indicate a missing process in the model (Figure 3(a4)). With water level data as a signature unit, one could, for example, find the crest level, the maximum level of the pass-forward pipe, and descriptions of other physical attributes in the model that must match reality (Figure 3(a5)). Different characteristics can also be extracted by comparing different signatures (Figure 3(b1)) and looking for clusters and variance in the displayed pattern.
Results presented in Figure 3 make it possible to classify the structural error/uncertainty according to Table 1. An assessment can then be made whether the cause of the error can be rectified or if the model is accepted by an informed evaluation of the model performance for the specific objective at hand. If required, further development of the models could be needed.

Case studysystem, data, and model
The case area is located in the outskirts of Odense, Denmark (Figure 4), which is a relatively flat area. We use observation data from level sensors in hydraulic structures of three sites in upstream branches of the urban drainage system (Figures 4 and 5). Site 1 is a manhole in a normal flow-through pipe with a 1 m diameter. Site 2 is a flushing chamber upstream from a storage pipe (trunk sewer for combined sewer overflow (CSO) storage); the chamber fills with water from an internal overflow weir located higher in the chamber. A valve can be opened quickly to flush the volume pipe when it is empty. Finally, site 3 is an We used one-minute model input and level observations for up to 10 years with the number of rain-induced events (events observed in the water level observations that are induced by rain events) calculated as indicated in Figure 4 for each site (143 events at site 1, 578 events at site 2, 345-357 events at site 3). Observation data were cleaned for outliers as proposed in Pedersen et al. (2021b), but anomalies naturally occurring in urban drainage systems were not considered. Weather data (solar radiation, wind velocity, humidity and temperature) were extracted from the Danish Meteorological Institute's (DMI) open dataset for a weather station (Aarslev) 7 km away (DMI 2020). In addition, rainfall data were obtained from nearby rain gauge stations 5419, 5422, 5425 and 5427 (Jørgensen et al. 1998 (Pedersen et al. 2021b). The surface runoff module is based on the time-area principle, and the pipe-flow module is the MOUSE hydraulic engine. The utility company and its consultants estimated the imperviousness of sub-catchments from satellite data using spectral analysis and standardized imperviousness percentages for individual area classes (LNHwater 2017). The applied runoff model is 'Model A', which considers only impervious areas and is applied due to historical reasons in Danish engineering practice. The pipe-flow module has a fixed calculation time step of 5 seconds. Rainfall input is from two nearby rain gauges, '5425 Braendekilde' and '5427 Dalum'. Wastewater input is included with a daily mean calculated based on the annual water consumption and a mean daily pattern with hourly time steps. Infiltration-inflow is not included in the model even though this is technically possible in the applied software using a built-in RDII-module; this is because of the utility's focus on improving the knowledge base on the physical assets rather than calibration of parameters in the lumpedconceptual model components.

Identification of rain-induced events
A time-varying, daily water level threshold (from now on referred to as a 'rain threshold') was calculated as the sum of the infiltration-inflow level and the extent of the daily fluctuations in dry weather ( Figure 6). Rain-induced events were identified when the observed water level exceeded the rain threshold. The start of events was defined as when the rising limbs started sloping upwards, events had to be at least three time steps long (3 minutes), and unique events were separated by at least 60 minutes. The equations to estimate the different components are defined below.

Infiltration-inflow level
The infiltration-inflow level in the system was calculated by applying a recursive, exponentially weighted moving average function: where x t is the observed daily minimum level at time (day) t, y t and y t-1 are the calculated (exponentially weighted) daily minimum value at day t and t-1, and α¼2/(spanþ1), where span !1 is a user-defined parameter defining how important the current observation is. With a span of 3 days, α¼0.5, which means that the exponentially weighted value calculated today depends 50% on today's observed value, 25% on yesterday's observed value, and 12.5% on the observed value two days ago and so on.

DWF-height
The height of the dry weather flow (DWF) indicates the range that the level will change due to variation in water consumption in individual households during a normal dry-weather day: This gives a variety of values as some days reach near maximum level when rain occurs. In Denmark, it rains approximately every third day. Therefore, it was assumed that the 0.5 quantile of the data would correspond to a standard dry weather day. The daily DWF-height was calculated based on the previous 365 days.

Rain threshold
The rain threshold for a runoff event is calculated.

Uncorrected Proof
The rain threshold was calculated with one value each day, and therefore very small events that occur during low DWF may not be detected with this approach.
Finally, we defined the events used for analysis in this study as the union set of events detected in both simulated and observed time series (see the number of events for each site in Figure 4).

Quantification of signatures
A range of signatures was calculated from the time series of the water level sensors in the three sites (Table 2), typically expressing max/min values, or integrating or differentiating values over time. For each event, the peak level, h max , was calculated together with the peak time occurring, t hmax . The duration of periods with water above the crest level, dur crestlevel , of a weir can be used as a signature to estimate the overflow volume. The 'Area Under Curve' (AUC) calculates a 'surrogate volume' for an event from a reference level (similar to the area under a flow hydrograph, but with a different unit). This signature attempts to estimate how much water is passing the manhole. However, attention must be drawn to the fact that different levels in the manhole will give different AUC. It can therefore not be transferred directly to an estimate of volume. The low depths in a manhole may furthermore not contain the same volume as when the levels are higher in a manhole. The level rate of change unit is a very simple way of analyzing the entire time series and is not applied to single events. It calculates the change in water level between two time steps.

Surrounding states
This paper focuses on quantifying two surrounding states: soil moisture, and spatial variability of the rainfall input.

Soil moisture
We used a daily model for drought index prepared by DMI (Scharling & Vilic 2009) that takes five input parameters: solar radiation, wind velocity, humidity and temperature from a local national weather station (DMI 2020), as well as rain input from a nearby local rain gauge. Soil moisture was calculated as a single value per day, and to avoid influence from rain on the day of interest, the value from the day before is applied.

Conceptual model for rain input uncertainty
Non-uniform rainfall occurs for convective patterns, where the spatial rainfall extent may be smaller than the catchment area a rain gauge may not capture the event well. More uniform precipitation types are, for example, frontal rain events, where the intensity is roughly the same for a larger area and the uncertainty of the rain input to the model thus is lower.
Rain gauges within a certain catchment distance are assumed to affect the rain uncertainty. 5 km is the best estimate of this distance based on local spatial rainfall statistics in Denmark (Gregersen et al. 2013). One way to estimate this uncertainty is to look at the difference in recordings from the surrounding rain gauges. We here used the coefficient of variation (CV, standard deviation divided by mean) calculated from intensities recorded at the surrounding rain gauges that meet the distance criteria to the catchment area. The rain input was considered uncertain if CV.0.5, and if the total depth of the event was above 3 mm or the maximum intensity over 10 minutes was above 3.3 mm in at least one rain gauge, which on average occurs five times per year (Arnbjerg- Nielsen et al. 2006). This is a rough estimate for a model describing rain input uncertainty. It is assumed Uncorrected Proof that the rain uncertainty can be significantly improved by introducing weather radar data in the larger and more convective rain events. However, this is out of scope for this paper.

RESULTS AND DISCUSSION
Direct time series comparison and signatures Figure 7 shows the observed and simulated time series for a rain-induced event at site 3. Each signature is tabulated for the event, and differences between observed and modeled values are calculated. The simulated level (model) of the first peak at 00:30 is much higher than the observed, leading to overestimating the surrogate overflow volume (AUC above crest level).
There is a flattening of the curve for the second peak at 02:00, indicating the crest level (at level 24.35 m). The water level sensor has a lower bound at 23.81 m (zero-point), which is the same as the invert level in the manhole. If the invert slope inside the manhole is large, the difference between the zero-point and the invert level can be large, and using values in this range can lead to misinterpretations. The dash-dotted line at 24.06 m indicates the assumed top of the pass-forward pipe (according to the asset database).

Multi-event signature comparisons
Extracting signatures from several rain-induced events provides insight into general tendencies and increases the robustness of the analysis. Examples from all three sites ( Figure 4) were selected to illustrate how the signatures (Table 2) can be used to diagnose errors.   well. This is remarkable as water-levels above weir crests are neither constant over time nor the length of the crest, and this highlights the potential power of multi-event signature diagnostics as suggested here. The observed peak values seem bounded with 24.8 m as a maximum, whereas the modeled peak levels are generally much higher, including even pluvial flooding (ground level is at 25.5 m). This could indicate that there is an upper constraint in the measurements. The AUC also tends to be higher for the model.

Signatures illustrating system attribute input
Signatures calculated for events causing overflow can be seen in Figure 8(b). The modeled 'AUC above crest level' is generally larger than the observed. The overflow duration reaches the same maximum, but modeled values tend to be larger than observed. The characteristics of Figure 8(a) can also be found in Figure 8(c) (pass-forward pipe not well represented in the model). Figure 8(a)-8(c) indicate that there is a system replication deviation in at least two places: (1) the top of the pass-forward pipe seems to be lower than expected and thus simulated in the model, and (2) there is too much runoff in the model indicated by the AUC above crest level and the peak level. A field trip to the overflow structure revealed that the pass-forward pipe had an inlet orifice diameter of 160 mm in the manhole instead of 250 mm as specified in the asset database for the pass-forward pipe. Furthermore, the construction drawings showed (see SM) that the pass-forward pipe was intended to be 160 mm, but instead a 250 mm pipe was constructed. A 160 mm orifice opening of the manhole thus serves as a throttle in the overflow structure even though the pass-forward pipe is actually larger. Analysis of building construction drawings in the area upstream from site 3 also showed that the connected area was overestimated (see SM). The model was hence updated with a smaller connected area and a correct throttle diameter of 160 mm, and the results are shown in Figure 8(d)-8(f). The model still overestimates the peak levels above 24.8 m and the AUC above the crest level, but there are clear general improvements. Interestingly, the two flood occurrences originally simulated (Figure 8(a)) disappear when simulated with the modified model (Figure 8(d)). The apparent maximum peak level at app. 24.8 m was finally identified as a limitation in the sensor installation, as it was impossible to measure higher levels with the current sensor.

Signatures and surrounding states illustrating process knowledge and spatial variability
An example where signatures can help diagnose a gap in process knowledge is a multi-event comparison of modeled vs. observed peak levels combined with states of the surrounding environment in terms of soil moisture and spatial rain uncertainty, as illustrated for site 1 in Figure 9. Events with both high CV and high peak intensity over 10 minutes ('i10max') are highlighted and reveal a large spread in the Figure. This confirms the well-known limitation of urban drainage modeling based on rain input from only a few rain gauges that spatio-temporal variability can lead to very large simulation uncertainty that prevents systematic comparison with in-sewer measurements. The remaining events are clustered in two groups, those in the wet period (above 50% soil moisture) and those in the dry period (below 50% soil moisture), and RMSE is calculated and depicted for both groups. The dry period events overestimate the peak level compared to the wet period events). This may indicate that either the initial loss in the model needs to be differentiated according to soil moisture, which indicates a parameter assessment error, or that the imperviousness is lower for events with a low soil moisture content, which indicates a process knowledge error. Without giving a precise answer, we refer to the literature dealing with this specific knowledge gap in urban drainage modelling (Thorndahl et al. 2008;Davidsen et al. 2018;Nielsen et al. 2019).

Signatures illustrating equation structure and temporal variability
Figure 10 exemplifies for site 2 how signatures can help identify errors due to equation structure and temporal representation of process dynamics in the model. Levels are shown against the signature 'level rate of change' for short time intervals (1 and 5 min duration), which illustrates the response dynamics in the system. For example, a value of þ4 m/min at level 14.2 m in the upper right panel (Figure 10) illustrates that the water can rise 4 m during 1 min from level 14.2 to 18.2 m. The observed response is much faster than the modeled, and the shape of the graphs cannot be replicated for shorter time steps than 5 min (bottom panels). The shape of the graphs indicates the physical limits of how high and low the water level in this structure can go. The observed values appear to have a cluster around À2 m/min (a drop) from level 19 to 16 m, which is due to the control of the flow regulator. Opening and closing the valve can occur in 12 seconds because it must flush the storage pipe up to three times when the flushing chamber is completely full. The model cannot simulate this as the control setting is implemented with a temporal resolution of one minute. This is why the patterns modeled as observed values for 1-minute delta levels are different. The Figure also illustrates a minor discrepancy between the lowest point in the observed and modeled values because the sensor has a zero point that is higher than the invert level of the model. Therefore, we would never get the same values in these areas. The high levels (vertical axis) show that the observed values tend to cause a drop in the water level when they are above the crest, which the modeled values are not. This may be due to a utopian replication of the control rules, which are not always executed perfectly in reality. For example, the valve must be open to give a drop of 4 meters in 1 min, and this is not allowed in the model unless the water level is below the crest level.

Limitations by applying water level sensors
Signature-based error diagnostics are developed for water level sensors in urban drainage systems. However, measured water levels are not always good indicators of flows and volumes because these can be affected by local conditions both upstream and downstream from the measurement site. Therefore, caution must be exercised in its use. Knowledge of the physical system becomes especially important when applying water level sensors to validate the model. However, most sensors in urban drainage systems are water level sensors, and therefore we need to find ways to extract knowledge from these as well. In addition, water level sensors are cheaper to install and more robust than flow meters, and caution must also be taken when using flow meters in partially filled pipes.
The validity of the observations is in this paper taken for granted. However, there could be time steps where the model cannot replicate the measurement due to, for example, blockages in the system, poor calibration of sensors or manual errors in the sensor setup, which ideally should be identified before conducting error diagnostics in models. We suggest making a model specifically for anomaly detection (e.g., Bertrand-Krajewski et al. 2003;Therrien et al. 2020;Palmitessa et al. 2021), but this is beyond the scope of the present paper. Figure 9 | Multi-event signature comparisons for rain-induced events for site 1. Modeled vs. observed values of the signature 'max peak' together with two surrounding states: 'soil moisture' (indicated by a color scale) and 'spatial rain input uncertainty' (three categories indicated). RMSE is calculated for events that do not exceed the rain input uncertainty thresholds and is extracted into two categories: above or below 50% soil moisture.

Uncorrected Proof
The eyes of the data scientist When validating urban drainage models, it is necessary to pay attention to the fact that stakeholders with different backgrounds and experiences may have different views of model adequacy, which will lead to different approaches to model construction. Gupta et al. (2012) illustrate three different viewpoints for defining model structural adequacy: (I) the engineering viewpoint where the evaluation is focused on functional adequacy, (II) the physical science viewpoint where the models must be in accordance with the physical system, and finally (III) the 'system science' viewpoint, which is a hybrid of the two previous ones. In (III), adequate simplified representation of a physical system is in focus, also referred to as the principle of parsimony. The last one favors the information content in observations as the model is compared to observational data (Gupta et al. 2012).
Different types of urban drainage models (control, operational, planning, design, cf. Pedersen et al. (2021a) require different viewpoints. Control models may require little physical knowledge and instead obtain high accuracy by utilizing observations through data assimilation. Design models may require a good physical foundation and therefore favor the physical science viewpoint. However, the insights presented here on the limitations of an integrated urban drainage model probably require most modelers to accommodate the engineering viewpoint. For example, some hydraulic models (like the one used in this study) cannot resolve flow in pipes shorter than 10 m, which must be handled differently. Short pipes should either be merged with connecting pipes or kept as is, in both cases leading to an inaccurate representation of reality. With increased automation of the model updating processes in living DTs (Pedersen et al. 2021a), the subjectivity of the modeler's choices may be reduced.
Are traditional calibration methods the best way to get good models?
The different viewpoints will greatly influence how a given model error is remedied. For example, the estimation of runoff from an area can be based on physical knowledge of the contributing areas and an acceptance of the apparent error in Uncorrected Proof these estimates. On the other hand, it can also be based on an engineering approach where runoff-relevant parameters are calibrated against observations. Calibration of parameters will likely reduce the apparent error of the model, but it might also reduce the ability to interpret parameter values from a physical point of view. Ideally, calibration is done on a subset of events followed by validation on a separate set of events. However, the author's experiences suggest that many utility companies in practice end up only calibrating parts of the model as observations are obtained in an ad-hoc manner and mainly perform calibration whenever model-observation comparisons show very different behavior than expected. The danger of such an ad-hoc approach is that the tuning of parameters compensates for structural errors located in other parts of the model, like the errors shown in this paper in the asset database, surrounding states, and process dynamics. Therefore, it is suggested that applying a structured error diagnostics framework like the one developed here will be paramount for the success of any later calibration exercise.
The developed framework is highly scalable to many sites in the system and provides the ability to learn across sites. In the service area of VCS Denmark, the suggested methodology will be applied on all observation sites (þ400) during the coming years. The general framework of signature-based error diagnostics is in principle applicable to any type of model. This study argues for tailoring its use towards the detailed hydrodynamic part of integrated urban drainage system models that is based on detailed representation of physical system attributes. The results are thus applicable to other model codes than the Mike Urban software applied in this study (SWMM, InfoWorks, etc.). This will improve the understanding of where the models generally fail due to limitations of the applied software, for example, conceptualizations of the hydrologic and hydraulic/ hydrodynamic processes or numerical implementation, and where an error is site-specific and justifies a designated, local field investigation.

CONCLUSION
This paper presents a structured framework for diagnosing errors in integrated urban drainage models intended for continuous, iterative improvements of living digital twins. The framework provides a classification scheme, which divides different types of errors and uncertainty locations in urban drainage models into four categories: context, input, model structure, and parameter. We show that the concept of hydrologic and hydraulic signatures, i.e. characteristics of system functioning extracted from time series, can successfully be applied to urban drainage systems. While signatures are typically estimated from flow measurements in general hydrology, this paper shows that water level measurements can also be used for feature extraction. This is highly relevant for urban drainage systems where flow measurements can be difficult to obtain due to the harsh sewer environments.
The framework is applied to a real case study in Odense, Denmark, with more than 10 years of observations representing more than 500 rain-induced events available for a diverse set of structures in the local drainage system: a regular manhole, a flushing chamber upstream of a large storage pipe, and an internal overflow structure. Rain-induced stormwater events were identified and extracted from the time series. Two 'surrounding states' related to soil moisture and uncertainty of rain input due to spatial variability were defined to support this process. The diagnostic framework was shown to be an efficient way to identify the source of various errors and identified patterns in model performance that were not clear to the utility company beforehand. The provided examples identified errors in the utility's asset database, errors due to hydrologic processes not represented in the model, and mathematical limitations in the model code in terms of temporal resolution. Especially, the ability to find errors in the asset database is interesting as it is of value to the entire utility organization. With the developed framework it is easier to identify what actions are suitable to mitigate the errors and improve the transparency of the model performance for end users.
With increasing digitalization, the introduction of living digital twins in utility companies, and a surge in the number of insewer sensors enabled by IoT, there is a clear need for automating model performance evaluation within a structured framework like the one developed here. Therefore, continued research should focus on matching specific signatures to specific types of model errors for improved guidance on the location of errors related to representation of physical attributes. There is furthermore a need for defining which signatures that can serve as indicators of how suitable a given model is for various different operational domains (dry weather conditions, 'every day' rain events, extreme rainfall), different objectives (simulation of overflow volumes, simulation of critical level exceedance) and different processes (infiltration-inflow, combined sewer overflows, manhole surcharging). This will greatly improve the transparency of model usefulness to end users