Assessing the impacts of extreme weather events on potable water quality: the value to managers of a highly participatory, integrated modelling approach

A decision support tool was created to estimate the treatment ef ﬁ ciency of an Australian drinking water treatment system based on different combinations of extreme weather events and long-term changes. To deal with uncertainties, missing data, and nonlinear behaviours, a Bayesian network (BN) was coupled with a system dynamics (SD) model. The preliminary conceptual structures of these models were developed through stakeholders ’ consultation. The BN model could rank extreme events, and combinations of them, based on the severity of their impact on health-related water quality. The SD model, in turn, was used to run a long-term estimation of extreme events ’ impacts by including temporal factors such as increased water demand and customer feedback. The integration of the two models was performed through a combined Monte Carlo – fuzzy logic approach which allowed to take the BN ’ s outputs as inputs for the SD model. The ﬁ nal product is a participatory, multidisciplinary decision support system allowing for robust, sustainable long-term water resources management under uncertain conditions for a speci ﬁ c location.


INTRODUCTION
Effects of extreme events on reservoir water quality to treat the raw water to acceptable water quality standards, especially when dealing with river or reservoir waters (Whitehead et al. 2009).
Heavy rainfall events can introduce more natural organic matter (NOM) in reservoirs, by causing high river inflows that collect it from the superficial layers of soil around the catchment (Hughes et al. 2013). NOM can be measured through the use of surrogate parameters such as dissolved organic carbon (DOC) or water colour. Importantly, extreme rainfall events following dry periods are even more detrimental, since the amount of accumulated sediments, as well as faecal matter, will be high and will be mobilised all at once (Wright et al. 2014), reaching surface waters and leading to high levels of colour and turbidity, as well as high concentrations of pathogens. High turbidity itself also presents a pathogen risk factor, since it is typically associated with higher numbers of disease-causing microorganisms including Cryptosporidium spp. (Khan et al. 2013), some species of which are pathogenic to humans and the cause of numerous waterborne outbreaks of cryptosporidiosis (i.e., typically a short-term acute infection affecting the intestine) worldwide. Turbidity levels during extreme rainfall events increase further following landslides, due to the delivery of remarkable amounts of sediments (Lin et al. 2011). Interestingly, dry antecedent conditions would instead reduce the landslide risk, as the soil needs to be saturated before a serious slope failure risk develops (Ayalew 1999). The intensity, as well as the sequence, of drought/flood periods, are therefore both crucial in determining health-related water quality risks. The occurrence of bushfires is another extreme event that typically deteriorates water quality, by causing increased fluxes of nutrients and sediments (Smith et al. 2011). However, the worst water quality is often realised when a rainfall event occurs post-fire, as it delivers the ashes from the catchment to the reservoir; with the levels of turbidity and organic matter sometimes taking years to return to pre-fire conditions (Rhoades et al. 2011). Thus the temporal sequence of fire and rainfall events greatly affects the impact on reservoirs' water quality.

Extreme events and water quality: modelling challenges
Addressing and modelling the complex behaviour of a water system under future conditions is typically challenging due to: (1) a lack of empirical data for certain variables of such a system; (2) the high level of uncertainty involved with, among others, data quality and long-term climate and population projections; and (3) the multiple, complex and nonlinear interactions between the different components as shown in the previous section. Also, these water system components such as the catchment, treatment facilities, land uses and population growth are typically modelled separately and in isolation from one another.
Modelling uncertain model inputs, model parameters, and model structure has been traditionally performed through probability distributions and the computation of several simulations (i.e., Monte Carlo approach, for instance) with different initial conditions resulting in a distribution of model outputs around a 'best-guess' (Maier et al. 2016); however, such an approach is more appropriate when empirical data are at least partially available, and do not allow exploitation, e.g., qualitative data coming from expert opinion which a Bayesian network (BN) approach instead allows. In addition, the linear approaches used traditionally to couple water quality and extreme events do not readily provide the level of understanding required to evaluate the dynamic relationships between extreme events and how the system might respond. This is because of the tendency to respond to events (e.g., an extreme rainfall event) and trends (e.g., increasing NOM concentration/change in NOM character) rather than understanding the role of the underlying system that generates these (Senge 2006). For example, linking rainfall directly to NOM does not address the role that a preceding flood, drought or bushfire, or some combination of these events, might have played in setting up the amount and type of NOM that the reservoir will receive with a subsequent rainfall event. Nor does it necessarily provide the level of guidance needed to best manage the system in maintaining the expected level of water quality without consideration of unintended consequences that might arise. Modelling approaches such as system dynamics (SD) make it possible to account for complex variables interdependencies, delays and feedback loops which might affect such dynamic systems. Overall, a systems' modelling approach, using a combination of two types of models, i.e., BN and SD models, is suggested to address these issues in a holistic and management-focused manner. Most importantly, a participatory approach for model development would guarantee the exploitation of experts' knowledge and the deployment of the final developed tool.
BN uses a directed acyclic graph (DAG) to capture the underlying structure of the model network. As they are acyclic, feedback loops and temporal assessments are not typically supported within a BN framework (Uusitalo 2007). However, they represent an increasingly popular modelling technique, especially when the system being modelled presents a high degree of uncertainty and complexity. They have a number of advantages, such as the ability to handle missing data, incorporate experts' input as well as provide a rapid simulation with easily interpretable graphical outputs (Uusitalo 2007). Underpinning the DAG is Bayes' theorem, which provides the mathematical basis for calculating the conditional probabilities underlying each node within the BN. Each node is usually discretised and parameterised by specifying conditional probability tables (CPTs). An advantage of discretisation is the reduced computational requirements for calculating posterior marginal probabilities for each node compared to the approach of using probability density functions. The disadvantage is that discretisation inevitably leads to a coarser representation of the system. BN have been proved to be accurate modelling tools for water quality predictions (Avila et al. 2018), and they have been applied for risk assessment in relation to water pollution issues (Tang et al. 2016). They have also been applied to model vulnerability and uncertainty in emergency water management situations (Pagano et al. 2018), or to rank different water management options (Said 2006). The ability to incorporate experts' opinion to cope with limited data has also led to BNs development in the area of groundwater management (Mohajerani et al. 2017).
SD represents a modelling approach where the structure of a system is linked to its behaviour, thus facilitating the understanding of feedbacks and dynamic aspects within the system (Sterman 2000). SD, combined with stakeholder engagement, has been recommended as an effective method to deal with water resources management issues (Winz et al. 2009). SD has been used extensively to assist policymakers in several fields, including the water management sector (Stave 2003) and urban water pricing (Sahin et al. 2017. A SD was also developed to assess the vulnerability of a water utility to changes in water demand and quality (Gastelum et al. 2018). SD models facilitate the incorporation of stakeholder knowledge by involving them in the development of the causal loop diagrams (Chen et al. 2014) and parameter elicitation (Hovmand 2014). Stakeholder input is also drawn upon to develop the SD model structure (Sahin et al. 2015), such as for community-based SD (Hovmand 2014). They can also deal with uncertainty using techniques such as Monte Carlo modelling (Richards et al. 2011) and Bayesian calibration (Richards & Chaloupka 2009), although these can be computationally expensive and, in the case of Bayesian calibration, require specification of prior distributions.
These two modelling methodologies are both flexible and effective for water resources management options' assessment. BN deals better with risk and SD is more suitable to model feedback dynamics of the system (Sušnik et al. 2013). They have complementary advantages and disadvantages, and thus it has been suggested to use both models to complement each other for a comprehensive water management assessment (Sušnik et al. 2013). However, to the authors' knowledge, confirmed by recent reviews (Phan et al. 2016b), there have been very limited attempts to combine these two modelling approaches (Phan et al. 2016a;Wang et al. 2016), in particular no attempts of using BN knowledge to develop a SD that can deal with missing data and uncertainty.
In this study, a SD model was developed integrating data and outputs of a previously developed BN, in order to assess the effects of long-term climate changes based on their impact on health-related water quality parameters for a large reservoir and drinking WTP in Australia. A participatory modelling methodology was used to build the BN and thus the SD structure in the first place, which follows the accessible, robust, integrative, dynamic (ARID) approach (Bertone et al. 2016b). This approach was used to conceptualise the BN and SD models, supplement missing data through drawing upon expert opinion, and validating and refining the models. A key for the SD development herein described is also the combination of BN and SD by using BN outputs as SD inputs, in order to overcome the SD limitations (e.g., inability to deal with uncertainty and missing data). A novel combined Monte Carlo-fuzzy logic approach is proposed for this task. In this paper, the full methodology leading to the development of the BN and SD with a participatory approach is introduced, with a strong focus on the BN-SD integration and SD development. Next, the results of the SD modelling and related implications are presented. Last, the advantages of the proposed methodology are discussed and some concluding remarks are provided.

Development of the conceptual model
A key step to developing the BN and SD models used for assessing potable water quality was the development of a systems conceptualisation. Developing a systems understanding of the problem of water quality management through a conceptual model is a critical step in systems modelling, providing the model development with a clear purpose (Sterman 2000). It can also provide a shared understanding by the researchers and stakeholders about the relationship between the structure of the system and how this determines the behaviour of a system. Central to this approach is understanding how reinforcing (positive) and balancing (negative) feedbacks combine to produce complex dynamic system behaviour.
In this section we provide a brief overview of the development of the conceptual model. For a detailed description, the reader is directed to Bertone et al. (2016a). The first three tasks were completed during a stakeholders' workshop that included water managers, operators and water scientists.
1. Problem scoping (defining the problem, spatial domain, time frame, key issues). The objective was to gain a shared understanding of scope and dimensions of the problem. The three key water quality parameters which were identified as being critical for the water utility were turbidity, water colour and Cryptosporidium spp. Key thresholds, above which the normal level of service (LoS) could not be guaranteed, were also identified for each of these parameters. 2. Parameter identification and definition (predictors). In this case, the objective was to gain a shared understanding of the components. Unstructured interviews, where the experts were asked to identify the parameters affecting the key variables to be modelled, were used for this task. The identification of such predictors is based on expert knowledge of the case study location and therefore they would not be necessarily be the same for other reservoirs. 3. Expert knowledge elicitation to establish a shared understanding of the causal relationships and system structure that might help to explain the dynamic behaviour of the 'system' over time. Structured interviews, where the experts were asked to modify a preliminary conceptual model built based on the outcomes of the unstructured interviews, was used for this task. 4. Merging the individual conceptual models into a single integrated model. This was achieved by removing duplicate variables which appeared in more than one model, and linking the remaining variables of the three models based on the relationships identified through the previous steps.

Development of the Bayesian network
The final conceptual model that was developed (Step 4 above) was used as the basis for developing a BN. The steps used to develop the BN are provided below.
Step 3 in this process (conditional probability elicitation) was undertaken through a second facilitated workshop with a larger number of participants. The reader is directed to Bertone et al. (2016a) for a detailed description of the development process for the BN.
1. Transformation of the integrated conceptual model into a BN model structure. The BN model structure was defined using the methodological framework of Chen & Pollino (2012). As model parsimony is essential (but balanced against model accuracy), it is important to, in particular, reduce the number of states for each node (i.e., discretisation) to a minimum. This assists with producing CPTs that have a relatively limited amount of entries and therefore can be more easily populated by expert knowledge. 2. Identification of data needs/data availability for each node (expert vs empirical data). Through consultation with stakeholders, the available numerical data were collected; these included data for the weather and water quality-related nodes, for which historical records were available. In addition, the nodes requiring experts' input (i.e., not enough historical data available) were identified. 3. Elicitation of conditional probabilities for the BN, based on either experts' opinions or empirical data. Ten water experts, including managers, operators and scientists, were engaged through a workshop to provide conditional probabilities for each node in the BN. 4. Model testing: sensitivity analysis and scenario testing through top-down and bottom-up approaches. After verifying specific parts of the BN based on whether the prediction was consistent with historical data, the focus was transferred to the target variables (turbidity, Cryptosporidium spp., colour and LoS), and a sensitivity analysis was then conducted to check for the most influential patterns, i.e., to look for those combinations of inputs that had the greatest impact on the target parameters.

Development of system dynamics model
A SD model was developed to explore the dynamic behaviour resulting from inevitable feedback loops in this type of complex system. The authors have introduced a methodology, which allows for the integration of knowledge gathered through the BN development process (e.g., model structure, CPT, output data) to be incorporated into the SD model. This enables the modeller of such systems to benefit from the capabilities of SD (e.g., long-term scenario assessment, feedback loop inclusion, etc.) without being constrained by its limitations (e.g., lack of data, uncertainty).
Integrating Bayesian network knowledge into the system dynamics model It was necessary to modify the essential features of a SD model to account for uncertain parameters (Nasirzadeh et al. 2008). Various methods to do this exist, such as Monte Carlo modelling or Bayesian calibration, but the authors used a combined Monte Carlo-fuzzy logic approach to take advantage of the BN structure and its outputs. 'Fuzzy SD', in which fuzzy logic is integrated to SD for this purpose, was already highlighted by Tessem & Davidsen (1994) and it has now been enhanced through the Monte Carlo approach and incorporated as an essential component of ARID. This allowed for all the links between SD variables to be quantified, while simultaneously incorporating both uncertainty and missing data. A similar approach was used recently in another research application . Fuzzy logic was integrated into the SD modelling structure by assigning fuzzy numbers to certain system parameters based on the opinions of different experts involved in the participatory BN workshops. This was done in different ways based on what data were available for each variable and for each connected pair of variables. If historical data of consistent frequency and timeframe were available, the correlation between linked SD parameters could be determined using linear and nonlinear regression approaches, as performed in Khanzadi et al. (2012). When this was not the case, BN outputs provided guidance and input to the SD model to overcome such data limitations. For instance, wind was considered to be an important variable, yet the frequency and time period of the wind data available were not consistent with other data. Hence, a variable called 'strong wind risk' was created instead, as it was established through the BN modelling that only strong wind was of concern for the system being modelled. A lookup function was applied to estimate the strong wind risk based on the current season as specified in the BN's CPT. In addition, a Monte Carlo approach was adopted, where a quasi-random number within an appropriate range was added to the estimated variable in order to take the daily wind speed into account. In this way, probabilities from the BN were transformed effectively into numerical time series to be used for SD simulations. This combined Monte Carlo-fuzzy logic approach was used to assign correlations for a number of nodes (e.g., risk of asset failure, risk of bushfire) wherever it was considered appropriate and empirical evidence was not available. Normalisation of such variables was a by-product of such an approach, since the creation of variables representing a risk or a probability implies values between 0 and 1 (i.e., where 1 equates to 100% chance). However, whenever a certain, fuzzified node was required to have a sensible numerical prediction (i.e., 'crisp outputs'), it was de-normalised (using a process called 'defuzzification'), after calibrating the model with historical data and/or based on thresholds of concern.
System dynamics model structure and data The development of the SD modelling draws heavily upon standard sub-structures and archetypes described elsewhere (Sterman 2000;Senge 2006;Sahin et al. 2015), although readapted based on the context-specific conceptual model and BN. Population growth dynamics were included to evaluate how an increased water demand might affect the reservoir level and in turn the impacts of extreme events. The authors, based on discussion with stakeholders, also incorporated a number of other variables deemed important when assessing the dynamics of such a system, in contrast to the static BN. These were mainly management-related variables, and included: (1) the potential use of alternative sources of water supply; (2) chemical dosages and, in turn, the treatment capacity of the WTP in relation to the key designated water quality variables; and (3) customers' feedback in relation to water aesthetics (complaints) and health (e.g., number of people getting sick from drinking treated water). Figure 1 shows the final structure of the conceptualised system which underpins the numerical calculations of the SD. 'Alt. sources supply', 'Alternative intakes', 'Avoidance capacity' and 'Treatment capacity' represent management options.
Some of the main inclusions, as mentioned above, were: (1) population growth projections; and (2) downscaled climate change projections in terms of evaporation, wind strength and rainfall amount. Through the SD model, it was possible to consider a number of balancing feedback loops (Figure 2). For instance: • An increase in turbidity (or colour, or Cryptosporidium spp.) in the treated water (i.e., turbidity in bold font) leads to an increase in the dosage of chemicals (provided the treatment capacity is not reached), which leads to an increase in the turbidity removed during treatment, which leads to a decrease in turbidity in the treated water and therefore a further adjustment in the chemicals' dosage.
• An increase in turbidity (or colour, or Cryptosporidium spp.) in the raw water leads to an increased need to adjust the intake depth at the dam wall, which leads to an increase in avoidance capacity (i.e., the ability to choose a water depth with better water quality), which leads to a decrease in turbidity in the raw water and therefore a reduction in the need of further adjusting the intake depth.
• If a bushfire's size is large, the total bushfire damage will increase, which will increase the need for larger bushfire management investment such as controlled fires, which leads to an increase in the amount of fuel burnt, which decreases the amount of fuel available for a fire, which leads to a smaller fire when this occurs.
• An increase in turbidity or colour in the treated water can cause a decrease in customers' satisfaction due to potential aesthetic issues with the water; this would cause complaints which would lead to a response by the WTP operators to increase the chemicals' dosage to remove more turbidity from the raw water. This would cause a decrease in turbidity in the treated water and thus an increase in customers' satisfaction and so on.
• An increase in Cryptosporidium spp. in the treated water might cause health issues for some customers; as a result, the water utility would aim at increasing its Cryptosporidium spp.-related treatment capacity (e.g., check coagulation or efficacy of disinfection processes), which would lead to more Cryptosporidium spp. being removed from the raw water, less Cryptosporidium spp. in the treated water and fewer health issues for customers. The SD model could be edited to account for other scenarios too, such as issuing boil-water advisories instead.
Population growth rates were taken from existing projection reports provided by the water utility. These were used to estimate the change in daily water consumption, which was also based on available reports from the water utility, with seasonal adjustments also extracted from data presented in those reports. Changes in future water consumption behaviours were not considered, but they could be easily integrated in such models when costs and benefits of certain policies, such as of mandatory efficient taps and shower heads (Beal et al. 2012) are assessed.
In terms of climate change projections, historical rainfall and wind data for the study locations were collected from the Australian Bureau of Meteorology (BoM), while recent annual total evaporation amounts in megalitres (ML) from the reservoir of this research study were collected from recent water utility reports. Based on climate change projections for the area, rainfall and wind intensities might increase by up to 30%, while evaporation might increase by 20%. These projections were extracted from modelling outputs by the Commonwealth Scientific and Industrial Research Organisation (CSIRO), which were in turn based on Intergovernmental Panel on Climate Change (IPCC) assessments. Hence, as explained later, several potential combinations of the expected changes were applied by running scenarios with different coefficients (between 1 and 1.3 for the rainfall and wind-related variables, and between 1 and 1.2 for evaporation). It must be emphasised that these numbers are specific for the case study location and if the model is modified and applied to a new water supply system, new projections should be considered.
The wind coefficient was used in combination with other variables to estimate the 'strong wind risk'. As mentioned before, BN outputs (e.g., CPTs) in this case were used to estimate what is this  risk based on season, where 'strong' was defined as being higher than 5 m/s based stakeholders' opinion and BN modelling outputs. These seasonal risks taken from BN CPTs were incorporated into 'seasonal wind variability' through a lookup function based on 'season'. 'Strong wind risk' was then calculated as the normalised product of 'seasonal wind variability', the wind coefficient, and 'daily wind variability', i.e., a random number generator based on historical variance of wind speed data, in order to account for wind speed variability. Among others, this variable affects the 'mixed layer depth' of the reservoir. This is affected by the turnover occurrence (set to occur in winter, thus if 'season' ¼ 'winter', then 'mixed layer depth' ¼ 'water level') since the lake is fully mixed during turnover, but also proportional to the average wind speed of the last 3 days (not shown in Figure 1's model view), with stronger winds leading to a larger mixed layer (Read et al. 2011). This, combined with the water level, will affect how many alternative intakes would be available for the water utility to draw the water from. Since historical inflow values, which include extreme conditions such as the Millennium Drought period, were available, the rain coefficient was directly applied to the inflow values, i.e., inflow values were supposed to directly increase up to 30%. This, combined with evaporation, would also affect the status of the 'storage volume' variable, which is given by the difference between inflow and outflows, where the outflow variable is defined by the following set of equations: where EF(t) is the environmental flow at time t, Ev(t) is the evaporation amount at time t, and spill is calculated as the difference between the sum of storage volume SV and Inflow minus the storage capacity MaxSV. Rather than inflow, the rainfall amount in the SD model affects other variables such as 'bushfire size' directly. Two 'if-then-else' conditions were set, specifically, if: 1. Rainfall(t) .5 mm 2. Fire trigger ,0.6 Then 'bushfire size' ¼ 0. This is to account for a wet catchment, or for the lack of triggers such as lightning strikes or uncontrolled burns, essentially minimising the chance of a fire. If neither of these conditions are met, then bushfire size is calculated based on the 'amount of fuel' available, multiplied by the sum of 'strong wind risk' and 'fire trigger' (both normalised variables). For the purpose of this modelling exercise, this bushfire variable is kept normalised. Similarly, the rainfall amount would directly affect the normalised 'landslip size' variable, which is directly proportional to 'area vulnerable to slip' too.
When the storage level falls below specified thresholds, it was also assumed that water can be accessed from alternative sources (e.g., desalination plant, other reservoirs) until the level increases back to acceptable levels. Moreover, scenarios with decreased forested area within the catchment (a possible consequence of population growth and land-use change), were also hypothesised. Previous studies (Warziniack et al. 2016) showed empirical evidence of a negative correlation between forest cover and turbidity, as well as treatment chemicals' costs. We also added a chemicals' dosage variable as a way to control the treated water quality, as well as customers' satisfaction and health. However, these were kept qualitative as numerical data were either not available (e.g., customers' satisfaction) or estimating the correlation would require the collection of several other datasets. This was the case with chemicals' dosages, since they are affected by other water quality parameters. Bertone et al. (2016c) developed a model to predict chemical dosages based on several water quality input variables, but a whole new model would be required for this specific location before a correlation could be integrated into the SD model. Nevertheless, the authors believe the qualitative incorporation of such parameters provides a better qualitative understanding of feedbacks and time delays present in such a system, and future work could incorporate more quantitative assessments of the impact of such loops.
The SD model was developed using Vensim DSS v6.3 software package (Ventana Systems, Inc.). This software facilitates the creation of interactive interfaces for any of the variables in the model. These interfaces then enable the user to change the values of the variables (by dragging the related slider to a new position) based on their interest in evaluating a particular scenario. This allows the user to explore how hypothetical changes may affect the other variables and the system as a whole, over time.

System dynamics model deployment and validation
In order to assess the relative importance of storage volume and other key climate parameters numerically, a sensitivity analysis was undertaken by considering different ranges of those variables ( Table 1). The selected ranges for the climate-related parameters were based on climate change projections for the region as previously described; while the remaining parameter ranges were based on potential impacts of increased population (i.e., increased water demand, land-use change).
The effects of variations in these parameters were estimated against the number of days with predicted insufficient LoS, i.e., inability to deliver safe potable water to the consumers via the WTP ('level of service' in Figure 1). Unlike traditional sensitivity analyses where one parameter only is varied at each time to check the effect on the model's outputs, in this case, given that for SD models there are a number of nonlinear effects among interrelated variables (e.g., a 'rain' increase on its own might considerably affect the long-term predictions, but with a simultaneous increase in 'evaporation' the effects might be negligible), it was deemed important to run several scenarios where not only the variable of interest for the sensitivity analysis was changed, but also the other four parameters were varied within the predefined ranges. The simulation outputs were then clustered and compared based on the input values of the variable of interest. Scenarios were run by considering different combinations of potential input changes within the ranges from Table 1, specifically by taking either one of the extreme values or the average value of change (e.g., 0%-15%-30% for rain amount increase); three different input options per variable ensured consistency with, e.g., climate change projections' outputs and, at the same time, model parsimony (i.e., keeping the overall simulation time manageable); future work could focus on running more scenarios to detect more localised nonlinear behaviours.
Each simulation was run over a time period of 25 years and the number of days for which the risk of LoS being satisfied was lower than 90% (i.e., relatively high chance of delivering unsafe water) was recorded. Such risk of LoS not being guaranteed is based on a combination of the risks of colour, turbidity and Cryptosporidium spp. to be over the specified thresholds. Consensus about these thresholds was determined by the stakeholders during the first BN workshop and they were 40 NTU for turbidity, 60 CU 400 for colour, and 10 oocysts/10 L for Cryptosporidium spp. (IFA method, adjusted for recovery). These are based on the quality of the raw water, while the final LoS is in relation to the treated water. The way the thresholds were combined to obtain the estimation of LoS was based on experts' opinion incorporated into the BN and is not linear; for instance, the treatability of raw water with both high turbidity and high colour is generally higher than when the raw water has high colour only, due to higher stability of the flocs. It is also dependent on potential failures of the treatment infrastructure, which is also linked to weather conditions as well as management strategies. There are difficulties in validating numerical SD models of natural systems, since most of them are unique (Orestes 2004). Common validation procedures include testing the model against a portion of historical data, as well as consulting experts to identify issues of insensible patterns (Sojda 2007). Given the purpose of this particular model and the lack of appropriate, complete historical data for testing, a number of different validation procedures, based on those adopted in similar systems modelling projects (Sahin et al. 2017), were applied: 1. Comparison of model outputs with historical data: this was done to calibrate the de-normalised model outputs ('turbidity', 'colour', 'Cryptosporidium') over historical data, before running simulations over future scenarios. 2. Comparison of model outputs with findings of existing reports. 3. Stakeholder consultation: In addition to the workshop organised to define the model structure and input data, the logic behind the model and its outputs was discussed with and visually inspected by stakeholders to ensure its robustness. 4. Sensitivity analysis: this step, described above, also ensured that sensible and logical results were obtained from the simulation. The research team also kept in mind that 'unexpected' did not imply 'illogical', since the complexity of such system may lead to unpredictable, nonlinear, multiple interactions which an expert could not have foreseen. Hence, when this occurred, the reasoning behind such outputs was interrogated through a bottom-up approach, i.e., by checking each sub-component of the SD model for potential errors/inconsistencies.

RESULTS AND DISCUSSION
The results of the BN model were reported fully in Bertone et al. (2016a). The main findings were as follows: • Landslip events, triggered by flood events (i.e., large runoff) are predicted to lead to the worst effects on the ability of the WTP to deliver safe potable water. Conditions would be exacerbated further if the storage level was low (i.e., less dilution, less alternative intake depths to draw raw water from) and if they occurred during winter (i.e., during lake circulation, uniform water quality conditions throughout the water column occur and thus there are no alternative intake depths where better water quality could be drawn from).
• Individual stakeholder's input greatly affects the results. When grouped by category of stakeholder, the predicted absolute and relative severity of different extreme events could be substantially different. For instance, input provided by water managers did not lead to landslides being identified as the predominant extreme event issue; and data elicited from treatment operators led to consistently higher risks being estimated, most likely due to their personal operational experience in dealing with treatment challenges compared to scientists, for example.
Below, a detailed analysis of SD model results, and related implications, is presented.
SD model results Figure 3 shows the number of days during the next 25 years during which defined thresholds would be exceeded for each simulation that was run (.100). The frequency of Cryptosporidium spp. exceedance was always the lowest, though the health implications (i.e., magnitude) could be higher. Colour exceedances were quite stable across the simulations, which might imply that either (1) colour is not very sensitive to future changes or (2) the threshold chosen by the stakeholders is quite high. Based on a deeper analysis of the simulation results (e.g., average daily colour levels), it seems that the latter is the most sensible explanation. Finally, turbidity was the most sensitive (i.e., having the highest fluctuations) water quality output parameter, and was often the one with the highest number of days exceeding the established threshold; however, the high variability implies that in certain scenarios, colour exceedances would be more than turbidity exceedances, over an average 25-year timeframe. This is interesting given the nonlinear relationships between water colour, turbidity and water treatability. Figure 4 shows the results of the SD sensitivity analysis. In each sub-figure, the simulations' results are grouped based on the value of the parameter of interest. For instance, in Figure 4(a), simulation results are divided in two groups, i.e., those with the current evaporation rate as input, and those with an increased (þ20%) evaporation rate as input. Simulations within a group still deliver different results due to variations in other input parameters (rain, wind, forested area, trigger volume for use of alternative sources). For each group, the figures show the 'best' (i.e., lowest number of days with unacceptable water quality) scenario, the worst scenario, and the median scenario.
It can be noticed, first of all, how a change in the evaporation rate (Figure 4(a)) does not affect the system at all. Horizontal lines reflect how the simulation outputs were practically the same after increasing the evaporation rate by 20%. This could be explained by the option of alternative sources of water, which can be triggered whenever the storage level falls below a predefined threshold. Hence, although the storage level might decrease slightly quicker, the trigger level for the use of alternative sources (Figure 4(c)) seems to be a more important variable. As evidence, Figure 4(c) shows a nonlinear (exponential) increase in potential unacceptable LoS days when such a threshold is reduced from 0.7 (i.e., 70% of full storage volume) to 0.4 and then to 0.1. In particular, the upper bound increases more sharply than the median. This means that, with a consistently low storage volume, there is an increased likelihood for a few potential combinations of extreme events leading to extremely negative consequence for the LoS. Consistent with the BN results, a low storage volume leads to less dilution of potentially high turbidity, colour and Cryptosporidium spp. and dramatically reduces the resilience of the treatment facilities. Since an intake tower that allows raw water to be drawn from multiple depths is used for the reservoir being modelled in this study, a low storage volume decreases the usable depth of the water column, reducing the alternative intake depths' options available to avoid poor water quality. A low storage volume also increases the ability of factors such as wind (whose force will increase in the future) to mix the full water column outside of winter times, when the lake regularly de-stratifies regardless due to the colder temperatures. With the lake fully mixed more often, the water quality would be uniform for such periods, nullifying the utility of a multiple intake tower. Although going to lower storage volumes is clearly not to be recommended and is not in the current plans of the water utility, population growth, extreme drought conditions and limited water supply from other sources (out of the scope of this modelling exercise) could pressure the water managers to consider drastic options and draw more water from this reservoir. The SD results allow the managers to envisage the negative water treatment-related consequences of such hypothetical decisions. Although more pronounced than for evaporation, the variation in poor water quality frequency given a reduction in forested areas (e.g., due to urbanisation) is quite limited (Figure 4(b)). Similar to results shown in Figure 4(c), the upper limit also increases more than the median. The main reason for an increase in days of poor LoS due to reduced forested area, according to the models, is an increase in the area vulnerable to landslips. This, however, assumes that targeted interventions to stabilise the soil are not undertaken. A revised version of the model could take into account such management options. Another reason for elevated risk is that there is an increased likelihood of more intense surface runoff, since impervious urban areas result in less infiltration compared to forested areas.
Finally, Figure 4(d) shows the effects of both rain and wind on the frequency of unacceptable LoS. Rain, interestingly, shows a nonlinear behaviour. With a 15% increase in the amount of precipitation, there would be a considerable increase in occurrences, but if this increment goes beyond, e.g., 30%, then the frequency decreases again to lower than base case scenarios. This is because, despite enhancing flood-related risks through, for example, sewage overflows from sewage treatment plants within  the catchment (thus elevating risks associated with high Cryptosporidium spp.), or the occurrence of a landslip, it also reduces the risk of bushfires, as well as the frequency of low reservoir levels. Thus, it was found that increased precipitation amounts of above 15% would be actually beneficial in reducing water quality-related health risk. Such a conclusion was not apparent from the BN alone, since it does not have the capacity to assess such nonlinear, interlinked behaviour over time in a holistic way. Regarding wind, a small increment (10%) seems to lead to a contained elevation of the frequency of unacceptable water quality. A 30% increase in wind strength, however, would lead to negative consequences such as increased chance of infrastructure and treatment facility failures (e.g., power outages) as well as more frequent unseasonal lake mixing events reducing the avoidance capacity. Finally, it would also facilitate the spread of a bushfire. Thus, the strengthening of wind would have exponentially detrimental effects on the water treatment systems' ability to deliver safe drinking water consistently. It is possible, however, that stronger winds would be accompanied by an increase in rainfall amounts, mitigating the wind's effects, provided that such an increment is higher than 15% (i.e., the peak in rain-related LoS risk according to Figure 3(d)). This is demonstrated by the SD results showing that the worst case scenario (i.e., simulation leading to the highest number of days with LoS lower than 90%) was reached considering an increment in wind force by 30% and rain by only 15%, leading to 158 days poor LoS in the next 25 years. Nevertheless, with an increase in rain amount of 30%, the relative worst case scenario would only lead to less than 140 days of poor LoS. These outputs demonstrate the complexity of the interrelationships between climate changerelated parameters and how these can affect reservoir water quality with often unexpected, nonlinear patterns. The developed models however allow WTP managers to run several scenarios assessing the effectiveness of different intervention strategies in response to such predicted climate variations.

CONCLUSIONS
A novel, user-friendly risk assessment tool was developed to understand, rank and manage the effects of extreme events, or a combination of them, on the ability of an Australian water utility to deliver safe potable water to its consumers.
First, a combination of a participatory modelling and Bayesian approach was used and a BN developed around the designated key parameters of water colour, turbidity and Cryptosporidium spp., selected for their potential negative health effects on water consumers. Second, a SD model was developed, based on the original conceptual models. This was populated with historical, empirical time-series data and BN outputs through the application of an innovative, combined Monte Carlofuzzy logic approach. Such a SD model is complementary to BN and it could assess the effects of multiple, slow, temporal changes, as well as identify nonlinear behaviours not detectable with the BN alone.
As a consequence, although the BN established that rain-related extreme events (i.e., leading to landslides/floods) are qualitatively the most concerning ones, the SD model made it possible to integrate this information quantitatively. This helped illuminate, for instance, how increments in rainfall amounts over the next 25 years could increase the frequency of days with unacceptable water quality, but actually decrease such frequency after a certain rainfall threshold (i.e., þ15%) due to an associated, likely reduction in the occurrence of bushfires in the catchment. Regardless of the complexity and nonlinearities, findings point to an increased number of days of unacceptable water quality given expected climate and anthropogenic changes for this system, thus the water utility should look at potential solutions to increase the resilience to future exacerbated extreme events.
The applied methodology proved to be effective for this water management application. It addressed the high level of uncertainty and data limitations of the modelled system, and engaged a number of key stakeholders in defining and populating the models. This made the risk assessment tool more credible and more likely to be implemented within the organisation. Although the developed models are site-specific, the methodology is general and therefore the developed tools can be transferred and adapted easily to other water supply systems. Moreover, they are flexible enough to include new data (both empirical and qualitative) if and when they become available, as well as to edit or enrich the model's structure.
Future work could focus on refining the developed models, by collecting and entering new empirical data or other model outputs (e.g., hydrological models, full water network models, and new population or climate predictions). Also, the scope of the model could be expanded by including all the other interconnected water sources, modelling the full water supply network at once. More water treatment and management intervention options could also be included in the initial conceptual model, quantified through empirical evidence or experts' opinion, and modelled through both the BN and the SD. This would allow the water utility to deploy the tool to assess not only the negative impacts of different extreme events, but also the positive impacts of different intervention strategies.