ABSTRACT
Even though it has been established that a hyetograph's shape affects the results of hydrological simulations, common engineering practice does not always account for this fact. Instead, a single design storm is often considered sufficient for designing a urban drainage system. This study examines the impact that this design paradigm, combined with the uncertainty introduced by subjective choices made during the design process, has on the robustness of a designed system. To do so, we evaluated a set of individual designs created by engineering students using the same Chicago hyetograph as a design storm. We then created ensembles of hyetographs with the same precipitation volume and duration as the Chicago hyetograph and evaluated the designs' hydrological responses. The results showed that designs, which performed equally well for the initial design storm, triggered varying responses for the storms in the ensembles and, consequently, showed different levels of robustness, hinting at a need to adapt the current design approach.
HIGHLIGHTS
Designing an urban drainage system (UDS) using a single design storm does not produce a robust result.
Storms with the same total precipitation volume but different temporal patterns can expose different vulnerabilities in an UDS.
Designer subjectivity introduces uncertainty to the performance of a UDS.
INTRODUCTION
The application of (computer) models for simulating rainfall–runoff and the hydraulic response of urban drainage systems (UDSs) is routine practice for (re)designing such systems. They serve as tools for estimating how different designs respond to various scenarios (e.g., climate change or urban development). It is understood that the uncertainty in the model results affects the final design. Many studies have focused on identifying the sources of this uncertainty and quantifying its influence on the model results (Dotto et al. 2014; Tscheikner-Gratl et al. 2019). The goal should not be eliminating uncertainty completely (which is impossible), but rather acknowledging and, where possible, reducing it (Deletic et al. 2012). Since uncertainty is bound to influence the outcome of the design process, it should also affect the choices made to structure this process. In other words, we should acknowledge that any design approach is also a source of uncertainty in itself (Refsgaard et al. 2006). In this work, we describe this as a structural uncertainty – stemming from assumptions and decisions that are implicit when following an approach (e.g., choice of design event(s) and performance metrics) – insofar as it is connected to the form of the model itself and its inadequacy to represent the processes present in the modelled system (Deletic et al. 2012).
The recognition of this type of uncertainty should prompt us to a reflexive look into our design approaches and the framing of the task of UDS design (Craye et al. 2005). Ignoring that can lead to a false level of trust in the performance of our designed system when it comes to meeting its set goals.
A common design approach is to set a goal (or a set of goals) to be met at the system scale (e.g., no flooding) and run simulations to arrive at an acceptable system design using a design hyetograph to drive the model simulation. This method results in a UDS that is heavily dependent on the design hyetograph used (Alfieri et al. 2008). Many guidelines prescribe the goals to be set as well as a rudimentary approach to selecting a design hyetograph. In Europe, the European Standard EN 752 ‘Drain and sewer systems outside buildings – Sewer system management’ defines return periods for flood protection. It serves as the basis for various national guidelines, such as the ÖWAV Guideline 11 in Austria, ‘DWA-A 118E’ (2006) in Germany, and national versions of the European Standard like the SS-EN 752 (SIS – Bygg och anläggning, 2017) in Sweden, the BS EN752 in the United Kingdom (British Standards Institution (BSI), 2017), and the SN EN 752 in Switzerland. In the USA, the design of UDSs is described in ASCE (2013) and ‘Urban Drainage Design’ (Kilgore et al. 2024), in Queensland (Australia) in the Queensland Urban Drainage Manual (Institute of Public Works Engineering Australasia (IPWEA), 2016).
All design approaches have in common that they are based on a risk/cost assessment that balances investment and damage. As a result, design return periods of rainfall events are proposed (sometimes depending on the land use, i.e. the expected damage) for which the runoff has to be calculated and for which the hydraulic design has to be carried out. They often use a frequency-based approach, which means that a return period is given to the practitioner, and steps to arrive at a hyetograph are recommended as follows: (1) obtain an intensity–duration–frequency (IDF) curve, (2) calculate the critical rainfall duration and (3) create a design hyetograph using a temporal pattern from literature (e.g., Keifer & Chu 1957; Chow et al. 1988; DWA-A 118E 2006; Veneziano & Villani 1999). There has been a lot of focus on addressing how to construct and use IDF curves (first two steps), with many researchers criticizing the assumption of climate stationarity (Underwood et al. 2020), which is part of the standard IDF curve usage. Meanwhile, others have proposed a shift towards stochastic models to replace the IDF curves (Koutsoyiannis 2021).
While in almost all other fields of engineering, safety coefficients on either load, the system's capacity, or both are applied to compensate for uncertainties in the design, this tactic is, to the authors' knowledge, not explicitly adopted in the urban drainage community. Lately, there have been climate change safety coefficients introduced in some countries (Arnbjerg-Nielsen 2012). These coefficients are used to account for increased rainfall loads in climate projections, but they do not address the uncertainties in the designed system's performance. Instead, they attempt to ensure that it can handle anticipated future conditions regarding increased volumes while maintaining existing hyetograph shapes.
Conversely, the effect of the shape of the hyetograph on the design has not been discussed as frequently. Several studies have focused on the effects of rainfall's temporal pattern (Alfieri et al. 2008; Willems 2012; Funke et al. 2021) and resolution (Ochoa-Rodriguez et al. 2015) on simulated runoff for both grey infrastructure and green infrastructure (GI). Even if a perfect IDF curve is constructed (and the climate stationarity assumption is validated), the issue of choosing the temporal distribution of the hyetograph remains. Commonly used hyetograph shapes are block rainfall events (Mugume & Butler 2017), alternating blocks (Chow et al. 1988), BLUE (Veneziano & Villani 1999), Chicago (Keifer & Chu 1957), Euler type II (DWA-A 118E 2006) and more; all created using different methodologies and different sets of assumptions. However, they are all based on a similar postulation, namely that the design hyetograph pattern aims to transform the return period of a rainfall event into a flood event with the same return period (Maidment 1993). It has been shown, however, that for both natural and urban catchments, the assumed 1:1 relationship between the two return periods does not hold (Wright et al. 2014).
Acknowledging the possible mismatch between the nature of the task of designing a modern UDS and the methods commonly applied, researchers have been developing alternative approaches. Some take advantage of the increased availability of computational power to run long-term simulations (Dotto et al. 2011), bypassing the problem of design storm definition and selection but creating the need for long observed precipitation time series. This introduces a new set of difficulties, including the risk of accumulating calculational uncertainties tied to the model structure and the numerical methods used (Beven 2011; Deletic et al. 2012). Others (Guo & Adams 1998; Guo & Zhuge 2008) use an analytical stochastic conceptualization of the rainfall–runoff model, which treats the rainfall and runoff time series as realizations of random variables allowing them to better represent them regarding the dynamics of their temporal distributions. This also bypasses the design storm problem but is limited when it comes to describing the processes present in a catchment (Guo & Zhuge 2008). This approach can also seem arcane and/or cumbersome to practitioners, trammelling wider uptake. A move towards risk-based, instead of frequency-based, design has also been proposed as a different approach to effectively tackle (or attempt to circumvent) the same problem (Markolf et al. 2021). Many of these proposed alternatives have shown potential in research case studies, but they have not been consistently codified into guidelines. Most guidelines suggest a design methodology driven by a single generic design storm. Some guidelines use regional hyetograph shapes (characteristic storm events) (Cordery et al. 1984; Ball et al. 2019). But still, the possibility that an ensemble of plausible storms may be necessary to adequately (robustly) design a UDS is not being conceded.
In the design of UDSs, different system stability measures such as reliability, robustness, resistance, resilience or vulnerability linked to hazards and risks are used (Casal-Campos et al. 2018). Reliability refers to the consistent performance of a system over time without failure (Billinton & Allan 1992). Resilience is seen as the ability to return to normal operating conditions after disturbances (Mugume et al. 2015); together with resistance (describing that a system is not reacting to disturbances), they are seen as the main characteristics of an infrastructure system (Tahmasebi Birgani & Yazdandoost 2014). The concept of vulnerability describes a system-dependent property that links a specific hazard to its effect in the risk assessment framework (Hauger et al. 2006). In systems engineering, robustness is defined as the insensitivity to variations normally encountered in production and operational use (Blanchard & Blyler 2016).
For our study, we define a robustly designed UDS as a design that ‘yields outcomes that are deemed to be satisfactory according to some selected assessment criteria across a wide range of future plausible states of the world’ using the definition for a robust plan provided by Walker et al. (2013). We address two influencing factors of the design process: (i) the influence of the hyetograph shape, rather than the volume, on the performance of the designed networks and (ii) the structural uncertainty inherent in the subjectivity of the designers' choices when defining the network characteristics. In our analysis, the effects of the second factor are considered in view of the information revealed by investigating the first factor (i.e., the influence of the designers' choices is gauged after each design is tested with multiple hyetographs).
METHODS
In this study, we investigated the response of different UDS designs of the same catchment to an ensemble of hyetographs with the same duration and total precipitation volume as the original design storm, but with different temporal patterns. By doing so, we can examine the influence of varying hyetograph shapes on different UDSs that were created by different designers using the same design storm.
We can segment our process into three distinct parts: (1) creating the UDS designs, (2) creating the event ensembles, and (3) running simulations and evaluating the outcomes. The first step was to obtain a set of UDS designs for the same catchment whose variations are the product of the designer's subjectivity. To do this, over the course of three semesters, 52 MSc students were given the same exercise in which they were asked to design a UDS given a common design guideline. For the second part, we used Multiplicative Random Cascade (MRC) models to create ensembles of hyetographs. The set of UDS designs was then tested using the hyetograph ensembles in SWMM 5.2 (Gironás et al. 2010) with swmm-api (Pichler 2022) to interface with Python. The modelling results were analysed to assess robustness. The methods used in these three parts are described in the following subsections.
Creating the UDS designs
This model was provided to the students as part of a course project, together with a Chicago design hyetograph as a hydraulic load for the simulations during the design process. Their task was to optimize the UDS within the limitations of the given layout. They were explicitly given the freedom to select the dimensions of the pipes, RB volume and weir (W) height, as well as the depth of the manholes and elevation of the pipes (within local regulations, e.g., regarding frost depth and minimum/maximum slopes). The objectives were to induce no flooding in the catchment and no overflow to the river while minimizing the cost of their design using a cost function taken from the literature (De Toffol et al. 2007; Sitzenfrei et al. 2013). This cost function depends solely on the volumes of the pipes and the detention basin. Although the students were not restricted by the given exercise in changing Storm Water Management Model (SWMM) simulation settings (e.g., ponding), few chose to do so and to no visible effect.
Not all the designs submitted by the students met the given requirements, but we chose to initially discard only those that produced flood or overflow volumes above 103 m3 (20% of the rainfall volume and approximately 30% of the runoff). This resulted in a total of 41 acceptable designs used for further evaluation. By considering designs that do not meet the design requirements as well, we wanted to assess their robustness across different temporal hyetograph patterns and compare the magnitude and mode of failure to the initial results obtained from the design storm simulation. We observe two modes of failure in our system, flooding and overflow, both of which can be categorized as functional (Butler et al. 2017).
The differences between designs are a result of subjective choices made by each designer (student) within the constraints of the structure of the design approach and the software applied. By analysing these designs, we aim to focus on the impact of these subjective choices on the systems' performance.
Creating the storms
Sampling of the hyetograph ensemble was performed using MRCs. The MRCSIT,SEP model developed by Pons et al. (2022a) was used. We chose the model structure that depends on the timescale, intensity, and temperature (SIT) and also utilizes a stochastic element permutation generator (SEP), as it showed the best performance in Pons et al. (2022a). This temporal statistical downscaling method is usually applied to time series but has been suggested to be used as a weather generator in a single framework for both time series and extreme events under climate change by Pons et al. (2022b). More concisely, the MRC model is trained using long time series with high temporal resolution. This allows it to downscale from time series with a coarser resolution to a finer one. To create an ensemble of n storms, we run the same (trained) model n times, setting the total precipitation depth and storm duration, resulting in n different storms (because of the stochastic nature of the model). Moreover, the current study investigates the extension of the Highly Informed Design Evaluation Strategy (HIDES) (Pons et al. 2022b) from the local scale to the UDS scale by examining the potential of extreme event sampling for system design purposes. Indeed, the HIDES originally used synthetic rainfall events derived from the MRCSIT,SEP models to analyse the robustness of designed green roofs. In Pons et al. (2022a) 106 events were sampled; however, because the current study required the computation of a set of catchment models that increased the computational time, it was decided to only use 1,000 events per model as input. Using a smaller number of events is expected to influence the estimation of the probability of failure but still allows for assessing the behaviour of the systems under various probable hyetographs. Indeed, by simulating 10,000 events for two selected designs (see Appendix), we calculate the maximum 95% confidence interval for their probability of failure (for any threshold) as 5%. This amount of uncertainty does not influence the discussion of the results.
MRCSIT,SEP models for two locations were considered in the current study: (i) the MRCSIT,SEP model of Bergen, Norway, developed, calibrated and tested in Pons et al. (2022a) and (ii) an MRCSIT,SEP model calibrated and tested in the current study using a local dataset. The local model was calibrated and tested similarly to Pons et al. (2022a) using precipitation and temperature time series from 1993 to 2023. The measurements were taken from a weather station near the study site with a 10-min resolution. Temperature measurements were available from two sensors located 5 and 200 cm from the ground. The precipitation measurements were performed using a tipping bucket.
Simulations
Simulations of all synthetic storms were run for all different designs using SWMM 5.2 (Gironás et al. 2010). The same assumptions that were made for the design of the models are made here (i.e., the storage unit is empty when the simulation starts and there is no dry weather flow in the system). In the models, a single rain gauge was used to input the hyetographs and the spatial distribution of the storms is considered uniform across the simulated area. Given the size of the catchment (22 ha), the effect of spatial variability in rainfall is not expected to be significant.
The results of the two ensembles of hyetographs (SYNTHLOC and SYNTHBER) were analysed independently to investigate the effect of training the MRC model on local datasets.
Water that ponds above the manhole level of a node is considered in the flood volume, even if it is reintroduced into the system (this only applies to four designs where ponding was allowed in SWMM). Water that only causes node surcharge (i.e., raising the water level of a node above the crowns of all connected conduits but below the manhole level) is not considered.
After running simulations of all the selected designs for both the storm ensembles, in addition to the original design storm used in the design process, we compared the results. By using VFL and VOF to characterize the response of a design, we get information on the mode and magnitude of failure (if a failure occurred during a given simulation). Even though the initial design criteria mandate that both VFL and VOF should be zero, we used a slightly more lenient threshold of 50 m3 to define the acceptable designs for this analysis. The reason we do not use the initial strict threshold of no flooding and no overflow is to avoid characterizing designs that produce negligible volumes as not acceptable. We chose a value of 50 m3 as the cut-off between negligible and non-negligible volumes because it is less than 1% of the total runoff produced by the SWMM simulation using the design storm. However, it still differentiates between the cluster of designs that respond very well to the design storm and the rest of the designs. This assumption makes the results more interpretable, given the fact that there is a small number of available designs (41) and 22 of them produce Vtot < 50 m3 for the Chicago design storm.
RESULTS AND DISCUSSION
Subplots (b), (c) and (d) of Figure 3 show the median values of VFL and VOF for the 25, 10 and 5% most critical storms of both ensembles. The criticality of a storm was calculated by summing the Vtot values of all (41) designs when simulated with that storm.
Looking at the designs that failed to meet the design criteria for the design storm (i.e., the designs in the shaded part of Figure 3), we see a varying response to the ensembles. Some designs seem to have a better response to the ensembles in Figure 3(b), but when we only consider the most severe storms of either ensemble (Figure 3(c) and 3(d)), all but four designs produce median Vtot volumes larger than the design storms. Focusing on the four exceptions, they produce a negligible Vtot even for the most critical storms of the ensembles. The common characteristic of these four designs is that they use very large RBs (5,600 m3). In contrast to that choice, the pipe dimensions are not chosen conservatively (maybe to keep the total cost reasonable). The result is that these designs flooded (slightly, three out of four had acceptable flooding volumes) when stressed with the design storm, which had a large peak, but showed very robust behaviour when stressed with the ensembles. The mode of failure of nine designs also changes when they are stressed with the ensembles. More specifically, designs that failed by only flooding with the design storm change to either an overflow or a combined failure. The inverse is not observed.
More interestingly, the designs that met the criteria for the design storm show a varying response to the ensembles as well. All but three designs produce large Vtot values for the 10 and 5% most critical storms in the ensembles. Even though the distributions of the median values of VFL and VOF (and Vtot) have a lower mean for these designs than for the ones that fail for the design storm (null hypothesis of a permutation test (Good 1994) is rejected at the 5% significance level for the p-value), this difference does not warrant the assumption that a design's performance for the design storm is a sufficient criterion for its overall performance. In short, this means that even when considering only designs that performed well for the design storm, we cannot expect them to perform well for other storms with the same duration and precipitation depth.
It is also clear for these designs that the main mode of failure is overflow. Three designs flooded (without overflow). They produce different flood volumes, but the flooding location for all three was the same – the RB. They have a shared design flaw that was not evident in the design storm simulations. The weir level was at the same height as the maximum depth of the basin. This makes overflow to the river impossible, leading to flooding because of the RB reaching its capacity. Understanding the failure mechanism of these specific cases helps us set them aside from the other instances where flooding occurs. This is because even though they fail due to flooding, the failure is localized at the RB because of a clear design flaw rather than spreading between nodes in the catchment. There are also four designs (out of the 22 that meet the design criteria) that have a combined mode of failure when stressed with the 5% most severe storms of each ensemble. The locations of the flooding points were different for each design, but they all tended to be at nodes near the exit of the catchment. These nodes have a bigger part of the UDS (more than 30% of the total area) contributing to the water volume they receive. The different temporal patterns of the synthetic storms are more probable to stress those locations rather than more upstream ones that are only affected by one or two sub-catchments.
Figure 3 reveals two main vulnerabilities of using a single design storm to design a UDS, the resulting designs are not guaranteed to be able to handle events similar to the design one, and the way that they respond to these events cannot be known. Two dimensions of Figure 3 provide a qualitative overview of these two vulnerabilities. In the horizontal direction, we see how different (equally acceptable) designs respond to the same rainfall event. While in the vertical direction (from (a) to (d)), we see how different rainfall temporal patterns affect these designs.
It is concluded that all the weaknesses highlighted by the ensembles are connected either to the RB or the pipes near the exit of the catchment. This is a result of designs that are tailored to the design storm, dimensioning the pipes and RB, so that they marginally do not fail for the exact conditions created by the design hyetograph. However, when stressed with different temporal patterns (e.g., a storm that has a mild start, filling up a part of the basin, and a late peak, stressing the basin above its capacity), they can fail, even if these patterns have lower peak intensities than the design storm. This highlights a weakness of branched grey infrastructure systems. During the design process, the focus is solely on system reliability and resistance, while resilience and robustness are neglected (Hesarkazzazi et al. 2022). We see here that when these systems are subjected to an ensemble of expected storms, encountering conditions that were not simulated in the design process, they can fail.
The storms in the ensemble created using the MRC model that was trained on the Bergen dataset consistently produce larger volumes across all designs, compared to those in the ensemble created using the local MRC model. This could be attributed to the higher peaks observed in the storms of that ensemble (Figure 2).
We chose five designs representing different degrees of robustness to compare with each other (Figure 4(b)). The most robust one (lowest curve) is used as a reference design and the other four are compared against it. It is noteworthy that the reference design produced a flood volume of 30 m3 for the design storm. Design 1 produced a flood volume of 153 m3, meaning that it failed even by our updated, more flexible, criteria. While Design 2, which has a worse overall response to the ensemble, produced no Vtot for the design storm. Designs 3 and 4 both failed for the design storm and showed very low robustness to the ensemble. In general, these five designs represent (to an extent) the different choices made by the designers as they are part of design groups with similar behaviour.
Designs 1, 2 and 4 use larger diameters than the reference design in major parts of the network. This does not improve their performance for any of the simulated storms, as the diameters used at those locations by the reference design are already adequate to route the runoff. The common flaw in all four of the designs is that they use a smaller RB than the reference design. Compared to the reference design, the storage capacity of the RB is decreased by 28, 42, 46, and 64% for Designs 1–4, respectively. They also use smaller pipes in some downstream locations. These are the two reasons Designs 1–4 are, in varying degrees, less robust than the reference design.
Design 4 has a combined mode of failure, producing Vtot above the acceptable threshold for the original design storm as well as most of the storms in the ensembles. This is because most of the downstream pipes as well as the RB are under-dimensioned.
The effect of the designers' subjectivity during the dimensioning of the systems is made especially clear between the reference design and Design 2. Even though both systems performed well for the design storm, one designer (reference design) opted for a 74% larger RB and overall slightly smaller pipe diameters compared to Design 2. This resulted in a system that is almost 8 million Norwegian kroner (approximately. 800,000 euros) more expensive but also considerably more robust (see Figure 4).
Limitations and outlook
This work is a case study that uses modelling results from one catchment to illustrate vulnerabilities of the design approach used for the UDS. Consequently, the results have limited generalizability. However, they may still serve as an example of the presence and source of structural uncertainty in the designs we studied. These designs are created by MSc students and not by real practitioners, but, given an explicit design approach and a limited decision space (fixed system topology, predefined subcatchment delineation and properties, and no GIs), they can be used as proxies for designer uncertainty.
Furthermore, no possible malfunctions of the UDS (pipe blockages, RB sedimentation, etc.) are considered in the simulations. Meaning that one dimension of robustness is overlooked, in order to more specifically focus on the factors we consider (i.e., the hyetograph shape and design choices). That is to say that the modes of failure studied are only functional, and no possible structural and operational modes of failure are considered (Butler et al. 2017).
Future research could aim to introduce more designer freedom (e.g., by combining the design approach with a GI placement framework) to investigate the effect of this added uncertainty on the resulting designs' robustness. As we move towards more GI adaptation and resilient designs, there will be a need to better understand the effect of all parts of our design approaches and adapt our methods (Funke et al. 2021; Markolf et al. 2021). The effect of rainfall's spatial variability could also be investigated in the context of UDS design focusing on larger urban catchments since its effect on urban hydrodynamic modelling is already documented (Ochoa-Rodriguez et al. 2015). Future climate predictions can also be utilized by training the MRC model on ‘future’ time series to create rainfall events based on data accounting for climate change. That would, of course, introduce a new set of uncertainties to the event ensemble, which should not be taken lightly. More generally, the effect that uncertainty has on the robustness of UDS designs could be further investigated. For example, in the choice of design approach, quantifying and decomposing the structural uncertainty inherent in different guidelines could be the first step to reducing it. Furthermore, the effect of the choice of software used could be investigated in future work (similar to Dotto et al. (2011)). Regarding the choice of simulation input data, we have shown that a single design storm might be insufficient when attempting to describe the behaviour of a whole catchment. Using an ensemble of design storms and setting an acceptable probability of exceedance are possible solutions. To create this ensemble, especially in cases that require considerable computational power, evaluating an initial set of events to arrive at a relevant subset to be used in the design process might be necessary.
CONCLUSIONS
The results of this study show that following common engineering practices and designing a UDS using a single design storm do not consistently produce a robust system. This echoes the conclusion of studies done in the last years (Watt & Marsalek 2013; Ng et al. 2020; Markolf et al. 2021), which demonstrate, in different ways, the shortcomings of the current implementations of design storms. This points towards the need to incorporate robust decision-making (RDM) in our UDS design approaches. RDM is an approach that aims to help arrive at robust solutions, given a problem that involves deep uncertainties (Walker et al. 2013). Typically, an RDM process consists of first defining the scope of variables and uncertainties to consider and simulating the response of the system to an ensemble of scenarios representing a variety of conditions within the scope. Then, the results are analysed, and vulnerabilities of the initial system are identified and confronted by adapting its design framework.
In this study, we narrowed down the scope to two influencing factors: (i) the influence of the hyetograph shape on the performance of the designed networks and (ii) the structural uncertainty inherent in the subjectivity of the designers' choice when defining the network qualities. The results show a lack of robustness that is not aligned with the level of trust placed in current UDS design guidelines. By necessity, the vulnerabilities we can spot originate from the influencing factors we chose to investigate. This means that the lack of robustness we observed in our analysis is a product of two vulnerabilities: (i) the inability of one hyetograph shape to represent the entire range of events that should be anticipated and (ii) the lack of performance consistency in the designs produced by the design approach.
These issues can be confronted by adapting the design approach. For example, introducing an ensemble of events could provide a more rigorous (and more realistic) benchmark for the designs. In this direction, of applying a wider range of conditions to the designs, simulating multiple successive events can also illuminate vulnerabilities in a system (especially for systems with high retention or detention capacities (Adams & Howard 1986)). Finally, aiming for more resilient designs (e.g., by requiring that the network has redundant connections to avoid bottlenecks) can lead to better consistency and more robustness (Casal-Campos et al. 2018; Hesarkazzazi et al. 2022).
ACKNOWLEDGEMENTS
The authors are grateful to the anonymous reviewers for their constructive feedback leading to an improved manuscript and to the anonymized students for their invaluable contribution to this work.
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.