An ethical decision-making framework with serious gaming: a smart water case study on ﬂ ooding

Sensors and control technologies are being deployed at unprecedented levels in both urban and rural water environments. Because sensor networks and control allow for higher-resolution monitoring and decision making in both time and space, greater discretization of control will allow for an unprecedented precision of impacts, both positive and negative. Likewise, humans will continue to cede direct decision-making powers to decision-support technologies, e.g. data algorithms. Systems will have ever-greater potential to effect human lives, and yet, humans will be distanced from decisions. Combined these trends challenge water resources management decision-support tools to incorporate the concepts of ethical and normative expectations. Toward this aim, we propose the Water Ethics Web Engine (WE) 2 , an integrated and generalized web framework to incorporate voting-based ethical and normative preferences into water resources decision support. We demonstrate this framework with a ‘ proof-of-concept ’ use case where decision models are learned and deployed to respond to ﬂ ooding scenarios. Findings indicate that the framework can capture group ‘ wisdom ’ within learned models to use in decision making. The methodology and ‘ proof-of-concept ’ system presented here are a step toward building a framework to engage people with algorithmic decision making in cases where ethical preferences are considered. We share our framework and its cyber components openly with the research community.


INTRODUCTION
Sensor networks, which is built on the back of the latest digital communication technologies, are increasingly being deployed in urban sewer networks and at the regional scales to monitor flooding and water quality of rivers (Habibi et  way that was previously inconceivable (Kerkez et al. ; Mullapudi et al. ).Furthermore, the continued adoption of control technologies also introduces new stakeholders by which our water environment is actively manipulated (Rentsch ).These new technologies allow for higherresolution monitoring via novel sensors (Sermet et al. ), social media (Sit et al. ), and decision making (Demir et al. ; Sermet et al. ) both in time and space.Where historically our water infrastructure, such as stormwater ponds, were designed as passive structures there is now the capability to control releases actively and precisely from a growing, distributed amount of water infrastructure, oftentimes as small as stormwater ponds.
A proliferation of new sensing and control locations means considerably more complex systems to address persistent water resources challenges such as flooding and water quality.Yet, to do so requires harnessing the unprecedented complexity of these new systems.Toward this end, there is a growing body of research on control schemes for water systems ( performance with a coordinated and increasingly automated control approach and the promise of the new 'smart' water paradigm.
However, little work exists to determine whether control schemes with physically based objectives are consistent with socially normative expectations of 'right' and 'wrong' actions.Because the primary objective of these control studies is to investigate the performance of control strategies, many experimental designs use toy networks or simplified abstractions of real networks.One consequence of this experimental design is that the social and economic variations within and across the network are not considered in the development of control schemes, nor in determining the impact of the controller performance.Yet, when considering the deployment of these distributed technologies across an entire city or region, these demographic factors become relevant.Consider the case, for example, where some flooding is an inevitable outcome within the catchment area of a stormwater network with dynamic control capabilities; how ought a controller act to consider the societal implications of such flooding?
Or, if a control scheme consistently recommends distributing damages in poor neighborhoods and benefits in richer neighborhoods, ought its directives to be followed?These normative questions of oughtacross populations, landscapes, communities, etc.pose serious ethical and moral dilemmas, especially where negative impacts are unavoidable and uneven.
While the ethical questions of civil infrastructure are as old as the infrastructure itself, the challenge to incorporate moral and ethical preferences into the automated, algorithmic decision-making tools that water resources management will rely upon are, indeed, new.The novelty of this challenge is supported by two trends.First, because sensor networks and control systems allow for higher-resolution monitoring and decision making in both time and space, it follows that greater discretization of control will allow for an unprecedented precision of impacts, both positive and negative.A second and complimentary trend is as the complexity of systems grows, humans will continue to cede direct decision-making powers to decision-support technologies such as data algorithms (Sermet & Demir , ).Systems will have ever-greater potential to effect human lives, and yet, humans will be insulated from these direct decisions.An important topic to explore is whether decision-support tools for water resources management can integrate socially normative expectations with physically based objectives.

Individual to institutional: ethics, morality, and machines
The challenge to consider social concepts into what are seemingly technical problems is non-trivial.The social and the technical are intertwined to make a sociotechnical problem (Jonoski ; Vojinovic ´& Abbott ).This understanding motivates our review of both social and technical work as it relates to our efforts.Generally, we can consider ethics and morality along a spectrum that spans from the individual to the institutional.At the individual level, 'normative' ethics considers theories and schema to determine the appropriate actions of an individual or an agent.At the institutional level, studies of ethics and morals consider the power structures that shape norms, their compliance, and their impact on individuals and groups.Work in biology, philosophy, psychology, and sociology investigates questions along with this spectrum.The field of machine ethics complements the efforts from the natural and social sciences by developing methods to incorporate decision making into artificial technologies that are consistent with human's (society's) normative expectations of their behavior (Moor ).
Here, we provide a brief background on these fields and how they relate to incorporating normative ethics into smart water systems.Though some disciplines make distinctions between ethics and morality, we use them as interchangeable in the context of this effort.For consistency within various disciplines, we defer to their respective terminology when referencing their work.

Normative ethical theories in philosophy
In the study of ethics, or moral philosophy, there are three fundamental traditions: deontological, consequential, and virtue.Respectively, these camps choose to inspect morality from three overlapping, but not identical, questions: 'What is the right thing to do?', 'How is the best possible state of affairs achieved?',and 'What qualities make for a good person?' (Grayling ).Importantly, each of these questions, and the responses of the ethical theories, are posed for individuals.
The deontological approach, or duty ethics, considers an action to be moral based upon a set of rules that deem an action permissible, impermissible, or obligatory (Alexander & Moore ).A strength of the deontological approach is the clarity of the rule to direct actions.However, multiple rules can require contradicting actions, which is a key weakness of a purely deontological approach.
In contrast, the consequentialist tradition judges the correctness of an act on its outcome (Sinnott-Armstrong ), in that the correct action is the action that leads to the best outcome by some specified objective function, such as maximizing happiness or minimizing costs.Egalitarianism and utilitarianism are well-known consequentialist moral theories.A concern for consequential approaches is which factors should be included in determining the normative value of an outcome.Though both consequential in the approach, utilitarian and egalitarian models can produce incompatible solutions due to which factors are given importance.This tension is referred to as the equality-efficiency or the fairness-efficiency dilemma (Binmore ).
In the virtue ethics tradition, virtues are the fundamental, irreducible unit by which to define normativity, meaning that they are derivative to neither the outcome of actions (consequential) nor duty to perform an action (deontological) (Hursthouse & Pettigrove ).In agent-based virtue ethics, agents' motivations ascribe the rightness and wrongness of an act and agents learn virtue from 'exemplars of goodness' (Zagzebski ).This understanding buoys the concept of learning models of normative behavior by observing the human performance of similar tasks.

Morality in the sciences
Evolutionary biology tells us that morals are adaptations to social living; when prehistoric humans began to form larger groups, survival was dependent upon the group, and therefore, what was best for it could supersede the priorities of the individual (Krebs ).Psychology and the disciplines of biology are interested in the mechanisms and processes of the human brain as they relate to developing and making moral judgments (Haidt ).As a distinction from other fields, sociology interrogates morality at scales beyond the individual.When sociology does focus on the morality of individuals, it is almost always in relation to a larger group.Contemporary sociology of morality includes both moral theorizing and experimental science to uncover moral truths (Bykov ).Work is not a shared substantive focus, but the recognition that moral evaluations and categorizations are an essential part of struggles in 'social fields' (Hitlin & Vaisey ).
When considering how to incorporate moral and normative sentiments into intelligent infrastructure, the sociological literature provides intriguing insights.The theory of 'thick' and 'thin' moral concepts contends that there are thin moral concepts that are 'methodologically tractable', through hypothetical situational tests.Conversely, thick conceptslike dignity, integrity, humanness, etc.do not lend themselves to parsimonious description or measurement, making thick concepts more difficult to explore by the experiment (Abend ).Furthermore, there is a lack of evidence for how thin and thick concepts relate to each other.Thus, a holistic approach would incorporate exercises in which feedback on both thin and thick moral concepts can be collected.Furthermore, other studies show that higher social class predicts increased unethical behavior (Piff et  Two primary concerns of deploying these technologies in a complex world are as follows: (1) their instantiated purpose and behavior may not be well understood and (2) the systems may take irrevocable acts before humans have the data to discern their error (Samuel ).
These concerns connect to the questions of building fra-

MATERIALS AND METHODS
In this section, we describe the approach and technologies used to employ the methodological framework introduced previously.First, we describe how to identify relevant belief features.Next, we describe the process employed to collect people's preferences.This subsection includes a description of the generalized web framework developed to collect voting-based preferences, its web architecture, game play, and database architecture.Finally, this section details the post-play data analysis to derive preference models, how to use them to make decisions, and the analytics toolbox developed to support this exercise.

Identifying and assaying relevant belief features
Example methodologies to identify relevant belief features for decision making (step a) can be found in Lee et al.

Post-play analysis
In this section, we describe the methods used in the postplay analysis.This includes learning preference models, making decisions with these models, and the iterative process of improving and deploying them.We also describe our analytic toolkit to facilitate these efforts.same scenarios, an estimate of the mean quality difference between option A and B, μ AB , is calculated for each scenario as follows:

Learning preference models
where Φ À1 (x) is the inverse cumulative distribution function (CDF) of the standard normal distribution, and are the number of votes for each option received.For example, C A,B ¼ 27 would mean that 27 people prefer outcome A over outcome B.
Mean differences can be related to scenario belief features as: where X A , X B are the vectors of the belief features for option A and option B, respectively.The learned preference model of the group is the estimate of the belief feature weighting factors of β T and can be found using linear regression, or other classification techniques, given that: where μ AB,i is the estimated mean utility difference of scenario i of the n shared scenarios and (X A À X B ) i is the difference in belief features of i of the n shared scenarios.
After a preference model β T is learned, we can investigate its behavior.To do so, we need to calculate μ AB in Equation (2) using the belief features of new scenarios.If μ AB is positive, then choice A is selected.If μ AB is negative, then choice B is selected.Decisions generated using the preference model can then be compared against various benchmarks, such as the historical performance of a system or against some specified definition of fairness and efficiency.Committing to this activity will require a reflection on whether the algorithm is meeting its purpose.After interrogating the behavior of the learned preference model, the researcher has a clear choice to continue to refine the preference model iteratively or incorporate the preference model into their operation.As a practical matter, one can choose to pursue both continuously and simultaneously.
To facilitate the workflow of data retrieval, learning preference models, and using them to make decisions on new scenarios, we developed a data analytics toolkit using the Python programming language.The toolkit includes a data service interface with a PostgreSQL database, data structures and methods to streamline the workflow.The toolkit and documentation of workflow are provided in the project repository.

RESULTS AND DISCUSSION
In this section, the web framework and methodology are applied in a proof-of-concept use case with a group of undergraduate engineering students at the University of Iowa within a section were served the same five scenarios (Table 1).All responses were recorded in a web-accessible database and related to unique user ids and the class section keyword name.
To learn models from the collected data, votes were To analyze the impact of the learned models, we simulated decisions on all 17 scenarios using the learned models.To add reference outcomes, we also simulated decisions of five models that, respectively, only prioritized one of the belief features.In effect, these reference models made decisions based on a single criterion instead of some combination of the five criteria of the learned models.
Decisions, realizing the flood of the left or right scenario, for all 17 scenarios were made (Figure 4).The procedure is as follows: the dot product was taken between the vector (X A À X B ) and each of the ten β T models.The Models a, b, c, and d are the learned models from each class section, and Average is the average of each of these learned models.These models weigh multi-dimensional effects to determine a decision.These models contrast with the reference models, which only consider the impact along with a single category.total damages.Interestingly, only one of the single objectives, reference models performed better than any of the learned models.

DISCUSSION
The framework presented here demonstrates the ability to use a voting-based system to aggregate human preferences to ethical decisions in smart water systems.Collected data can be used to learn models of preferred behavior which can then be used to make decisions on new scenarios.
This data-driven approach is novel in helping researchers This is an area of future work.Overall, we identify the inability of a single person, namely the researcher, to insert bias into the calculations as a positive outcome of using this framework.
Results from the experiment show the remarkable possibility for the models to choose outcomes that rank highly when considering the cumulative outcome.Models a and c appear to achieve their astonishingly high ranking because they chose outcomes that did not favor private cost minimization at the expense of public costs.This observation is supported by the β values for public and private costs that are equal for model a.These findings are relevant only within the context of our theoretical proof-of-concept.
More robust studies are required to make that claim generally.Furthermore, the cumulative scores and ranking method used here communicate only a single concept of 'success'.It is reasonable to justify other performance metrics beyond summing the normalized results of each category.Rather, the cumulative outcome score analysis can be The second criticism is that the characteristic classifiers used in the moral dilemmas under describe scenarios and, in doing so, elicit not a users' moral ideas but their biases along the dimension of the classifiers (Everett Jaques ).In practice, it is unlikely that a strictly consequentialist framework would be operational in real-world scenarios.
Instead, a hybrid decision-making process would be employed.For example, a system could trigger automatic human oversight if a decision is anticipated to reach a specified damage threshold.Likewise, because the learned models predict whether one outcome is better than another, a mean utility difference between two outcomes close to zero suggests that there is a very weak preference.Thus, in these cases, a human review could also be triggered.These rule-based heuristics can set guardrails on the strictly consequentialist models, while also providing further opportunity for society-in-the-loop principles.Furthermore, a critical step for future work will be to explore how decisions derived from (WE) 2 preference models impact system outcomes via their integration with hydraulic models.

CONCLUSION
Societal values are embedded into our built worldwater systems are no exceptionbut these values are rarely inspected as part of the scoping of technical solutions.Instead, values are treated as priorsimmutable, unstated, and implicit as they relate to the objectives of infrastructure.
Yet, our infrastructure itself evolves.Increasing resolution in sensing and control of our water environment will allow for unprecedented precision of impacts, both positive and negative.At the same time, humans will continue to cede direct decision-making powers to decision-support technologies such as data algorithms.This new paradigm of smart and autonomous water systems will create new operational capabilities and new opportunities to [re]evaluate these values and explicitly incorporate them into operations.
The methodology and 'proof-of-concept' presented here are a first step toward building a framework for engaging people in algorithmic decision making in cases where normative and ethical preferences are considered.We developed the web-based (WE) 2 , which is a generalized framework with serious gaming to collect normative preferences through paired comparison testing.Although our framework was designed for water applications, the framework is generalizable and can be used for any paired comparison exercise in any field.Preferences collected using (WE) 2 can then be used with our data analytics toolbox to build decision-support preference models and investigate their behavior.These resources, including documentation and tutorials, are shared openly and can be found in the project repository.
We observe that the strength of this framework is that it can prime conversations on values and system expectations at every step of the process, forwarding an iterative process.
By doing so, practitioners can work to unobscure AI, ML, and data-driven techniques from behind jargon and demystify the 'black box' processes of decision-support algorithms.We anticipate benefits in deploying our integrated framework in education, operational, and outreach contexts.
Efforts toward incorporating ethics and norms into smart systems must be considered in a sociotechnical context.Importantly, this means that the solution to the development of a technology that is faithful to a society's values may not necessarily be technical in nature at all.
Instead, findings from studies could support a structural, social solution as opposed to a solution reliant upon a technological artifice.Though aspects of the work can be technological, it should not preclude results finding that a structural or institutional solution is preferred.The application of AI, ML, and data-driven techniques to water sector problems does not alone make a system 'smart'.
Instead, 'smart' water should be conceived as the use of these tools to forward an explicitly recognized objective of the society.
al. ; Mullapudi et al. ; Jones et al. ; Yildirim & Demir ).Concurrently, control technologies are being deployed alongside sensors which allow for operators to actively manipulate these systems in places and in a This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).
Overloop ; Mollerup et al. ; Schütze et al. ).Recent studies mainly focus on the challenges of controller design to meet the objectives of physically based water quantity (flooding; Sadler et al. ; Mullapudi et al. ; Sun et al. ) and sometimes quality (total suspended solids or different 'indicator' pollutants; Muschalla et al. ; Sharior et al. ; Troutman et al. ).The collective work demonstrates the potential for improved system al. ), that lower-class individuals are more likely to be compassionate to another's suffering (Stellar et al. ), that different social classes use different criteria in anticipated cost-benefit analyses (Trautmann et al. ), and that lower-class individuals are more likely to perform an unethical act if the act was to the benefit of others (Dubois et al. ).Together, these findings clarify the process and partners needed when operationalizing feedback on normative expectations of smart water systems.Integrating moral and normative sentiments into intelligent systems Machine learning (ML), artificial intelligence (AI), and data algorithms generally have beenand stand to beapplied in a wide range of scenarios from data augmentation (Demiray et al. ) to forecasting (Sit & Demir ; Xiang et al. ).There is, however, growing acknowledgement that, unexamined, these techniques can institutionalize the biases and structural prejudices that persist in the world, especially in decision-making contexts that directly affect humans such as in job hiring, evaluating credit scores, and predicting repeat offenders during parole processes (O'Neil ).Consequently, popular culture, industry, government, and academia have focused attention upon the intentional and ethical application of AI and created a proliferation of AI ethics frameworks (Hagendorff ; Jobin et al. ).
schemes.We then demonstrate the framework with a proofof-concept use case where decision models are learned and deployed to respond to flooding scenarios.The results indicate that the framework can capture group 'wisdom' in learned models and use this to make decisions.Furthermore, we share our generalized framework openly with the research community.The remaining sections are organized as follows.Section 'Materials and methods' provides details on the methodology of each step of the framework listed above, and cyber components used in handling the step.Section 'Results and discussion' demonstrates the framework with a proof-of-concept use case for decision making for flood response with a discussion of the framework.Finally, the 'Conclusion' presents the larger context of incorporating normative expectations into smart water systems.

(
) and Freedman et al. () and include survey and interview techniques.Relevant belief features are those that someone would use to make a deliberative action.For example, in the case of choosing between two flooding outcomes, relevant belief features to be considered could be public costs, private costs, injuries, deaths, and environmental impact.Once belief features have been established, different scenarios can be created that vary along with the belief features.The scenarios are presented to individuals in pairs.Participants must choose which outcome they prefer from each pair presented to them.Their choices are recorded and used later to learn a preference model.To collect preference data from participants, we built an integrated web-based serious gaming platform.Serious gaming is used in a variety of fields for training, decision making, and education (Susi et al. ).In water resources, serious gaming is used to explore the multifaceted challenges such as multi-hazard mitigation (Carson et al. ).Web-based serious gaming offers easily accessible and user-friendly interfaces with flexible architectures for various skill levels (Xu et al. ).In many serious gaming applications, the game play offers the user an opportunity to explore real problems and strategies without the consequences of their actions impacting the real world.Instead, play informs a value system that guides behavior and action during a future, real-world event.Play is recognized as an important feature in the development of value systems and morals in humans.Furthermore, actions within the context of play (i.e.games) give rise to different value systems compared with a work context, even when considering the same topic (Bargheer ).These value systems result in different moral treatments of the same topic which can result from starting in the context of play or work.Gamification allows for parsimonious yet engaging descriptions of the ethical dilemmas at hand.More engaged users are more likely to play for longer and contribute more to the model development.The following subsections describe the web architecture, game play, and database architect of our serious game approach.Water Ethics Web Engine (WE) 2 architecture The Water Ethics Web Engine (WE) 2 is an open and integrated framework that allows a rapid deployment of web applications to investigate moral preferences via pairwise comparisons.The framework is comprised of a PHP-based application engine and use case web template and includes a database architecture on the back end (Figure 1).Full documentation of the framework and source code can be found in GitHub: https://github.com/uihilab/WaterEthics-WebEngine.Researchers provide their application and experimental information to the engine via two configuration files (site_meta.json and scenarios.json)using the JavaScript Object Notation (JSON) format.These files, along with supplied images, are stored on a Web server and accessed during the game play by the application engine.Data generated during the game play are logged in a web-accessible database.Once data are collected, researchers have access to analytics tools and a portal for data export.Case study information described above is stored on a Web server in a unique directory.When a user navigates their browser to the specific game version, given to them by the researcher, the engine generates the Web page from the content stored in case study files (i.e. the site metadata, scenario content, and the images).Scenarios displayed to participants are chosen randomly from the total set supplied.Game play Users are presented with a homepage that provides a brief mission statement on the purpose of the game and a button to start the game play.During the game play, users are presented with two scenario windows displayed sideby-side.In each scenario windows are the descriptions of the event and an action button with a user-defined decision, e.g.'Flood This' or 'Save This'.Descriptions come in some combination of three forms: an image, info bar, and written description.All description types are supplied by the researcher and are customizable.Users are instructed to use the descriptions to determine what outcome they prefer for the scenario.To choose the preferred decision, users click on the action button on which the decision is recorded, and a new scenario is displayed.Once the user has provided their preference for all scenarios, they will be guided to a results page that provides the user with a description of their aggregate preferences in relation to all others who have played the game and to the absolute possible outcomes along the belief features provided in the descriptions.
Figure 1 | System architecture of the web-based decision-making framework.The application engine, database, and researcher-supplied data are located on aWeb server.When clients, or users, navigate to a project directory, the application engine builds the project using the information supplied in the static JSON files.After the game play, the researcher may use the post-play analytical tools to investigate results and learn preference models.
Figure 2 | Example scenario and game play provided by the (WE) 2 framework.Participants are served multiple scenarios where they are provided two outcomes and their descriptions as photos, info bars, and text (displayed when information button, button is click).Upon clicking the action button, e.g.'Flood This', the response is logged in the database and the participant is served another scenario.
aggregated by class section.Participants who did not provide a vote for all five scenarios were not included in model generation.Preference models were learned using multiple linear regression.Five models were learned including one for each class section and one model that is the average of the four class sections.Five model parameters (β values) describe each model, one parameter for each belief feature provided in scenarios (Figure3and Table2).A negative model parameter indicates the model's preference to minimize the impact along the given dimension.A positive model parameter indicates the preference to maximize the impact along the given dimension.This is due to the experimental design and our definition of marginal utility.For example, if it is preferred to minimize public costs, then the difference in public cost between the preferred option and the undesired option would be negative; or, that the preferred option results in fewer public costs.This negative value is multiplied by a negative β value to result in a positive contribution to the utility of that choice.All learned models minimize public costs to some degree; model a gives the most priority to public cost minimization and model c gives the least.However, no model gives minimizing public cost the highest utility against the other belief features.Models b, d, and Average give the highest utility to private cost minimization.Models a and c give the highest utility to the minimization of injuries.However, model a maximizes the minimization of public costs, private costs, and injuries almost equally.
result, the mean utility difference between option A (left) and option B (right) determines how each model would decide between the two outcomes; for a negative difference, chose option A (left), and for a positive difference, chose option B (right).Ties, which only occurred with the reference models, were noted and interpreted as that outcome could go either way.In general, learned models voted similarly, showing the agreement of vote on 11 out of 17 scenarios.To compare how the outcomes differed between the models, we calculated the total impact each model avoided along with each belief feature and normalized them against the minimum and maximum possible avoidable damage (Figure 5(a)).A score of one indicates that the maximum damage was avoided, while a score of zero indicates that the minimum damage was avoided.All learned models avoided high levels of death and environmental impacts,

N
details that how many students provided preferences through the game play.Scenarios column lists which five scenarios were served to each class section.

Figure 3 |
Figure 3 | Learned preference models for each of the four class sections and their average.Negative beta values indicate a preference to minimize the impact for a category.
while showing less agreement and less ability to avoid public costs.The same exercise was performed for the reference models(Figure 5(b)).Although each model scored maximally in their respective categories, it does not suggest with these example scenarios that it is a productive strategy to prioritize only one category over all others.Because each of the 17 scenarios has only two options, flood left or flood right, there are 2 17or 131,072unique decision combinations.We normalized and ranked all outcomes along with each category, allowing us to build percentile curves (Figure6).Only model a score above the 50th percentile in all categories.The remaining four learned models performed above the 50th percentile in all but one category, public costs.Across the five categories and five models, 17 out of the 25 outcome scores ranked above the 90th percentile, which indicates strong performance for all learned models.To compute an overall outcome score for each of the 2 17 voting possibilities, we performed a simple summation of the normalized outcome scores of each category for each model.These overall scores, too, were ranked and percentile scores calculated (Table3 and Figure5(c)).Shockingly, model a achieved the highest possible overall score of 3.922 (out of a maximum of 5) for rank 1 of 2 17 .Model c achieved a cumulative outcome score of 3.849 for a rank of 6 of 2 17 .All learned models achieved scores that put them in the 98th percentile of all outcomes in avoiding

Figure 4 |
Figure 4 | Voting results from each of the five learned models and five reference models.Each model was used to vote, left or right, on the outcome of all scenarios generated for the flooding use case.
learn the utility function from a large, potentially very large cohort of people and not assume an understanding of the utility function to be used to judge outcomes a priori.To this end, water professionals and researchers can investigate how algorithmic components of smart water systems or disaster response perform in relation to people's normative expectations of right and wrong.The framework follows a consequentialist ethical theory, as the definitions of utility and performance rankings are based upon the outcomes of each scenario.However, learning utility functions without any a priori knowledge mitigates the limitations of traditional consequentialist approaches.As discussed earlier, consequentialist approaches come with limitations such as what factors (belief features) to use to describe outcomes and the fair-ness-efficiency tradeoffs between the following different moral theories (utilitarian vs. egalitarian strategies).One strength of this approach is that belief features may also be crowdsourced, which allows many people to define an inclusive list of important belief features.Another strength is that by learning the models of action from people's decision, it is unnecessary to proclaim beforehand what fairness-efficiency tradeoffs should be made.Instead, the fairness-efficiency priorities are captured within the learned models.However, these learned models may not agree with the institutional understanding of rights or justice.Further effort may be required to integrate learned preference models with our ideals and aspirational sense of justice and fairness.Finally, this approach currently does not address the difficulty of decision making under uncertainty.

Figure 5 |
Figure 5 | Outcome scores are calculated as the avoided damage along a belief feature normalized by the maximum possible avoidance.Outcome scores were calculated for the learned models (a) and reference models (b).A model's cumulative outcome score is calculated as the sum of all outcome scores.Each model's cumulative score is ranked against all possible cumulative outcomes (c).Model a performed achieved the highest possible overall outcome score of 3.92 and a rank of 1 in 2 17 .

Figure 6 |
Figure6| Ranking curves for each of the five learned models along each of the five relevant belief features.Overall, the learned models performed well.Model a, which had the highest cumulative score when considering minimization of damages, scored highest among the models in three categories: deaths, injuries, and public costs.Yet, it also performed the worst in minimizing private costs and was third in environmental impact minimization.

Table 1 |
Each class section is referenced by a letter, section name

Table 2 |
Beta values for each preference model used to make decisions

Table 3 |
Cumulative impact scores for each learned and reference model in the rank A total of 2 17 possible outcomes.