ABSTRACT
Estimating the loss of life (LOL) resulting from dam-failures is required for devising emergency action plans and strategies for alert issuance and evacuation. However, current models for simulating fatalities are computationally expensive, forced by highly uncertain variables and not readily interpretable, which may limit their use in engineering and research. For circumventing these problems, we utilize the Polynomial Chaos Expansion (PCE), technique for approximating the LOL, as obtained from the agent-based model LifeSim, and propagating uncertainty of inputs, namely, alerted population, mobilized population, alert issuance and hazard identification, to the model responses. We also benefit from the PCE spectral representation for assessing the effects of each input in the LOL associated with the dam-failure in an urban area in Brazil, considering efficient and inefficient scenarios for alert and evacuation, during the day and night. The PCE error ranged from 10−3 to 10−2, and the mean squared error between the metamodel output and LifeSim was between 1 and 2 fatalities. In global sensitivity analysis, the variables alert issuance and hazard identification contributed the most to the number of fatalities. These findings provide objective guidelines for implementing more effective safety measures, potentially reducing LOL resulting from a dam-break in the study area.
HIGHLIGHTS
We utilize a metamodel as a surrogate to a computional agent-based loss of life model.
We propagate input uncertainty through the metamodel.
We perform global sensitivity analysis by using Sobol indexes.
This research brings together information directly linked to major dam failures, loss of life and how to reduce them with efficient warning and evacuation systems, analyzing the uncertainties in hydrodynamic models.
INTRODUCTION
Simulating the number of fatalities is a paramount step for assessing the consequences of dam failures (Lumbroso et al. 2021). In effect, estimating such a variable may be useful for evaluating critical scenarios and defining appropriate strategies for reducing and communicating risk, either at the design stage of a hydraulic structure or for adapting existing policies for flood mitigation and prioritizing safety measures. Hence, in view of the growing number of constructed dams and, accordingly, the higher number of people at risk, recent literature has focused on advancing knowledge and improving models for more accurately describing the potential loss of life stemming from dam failures (Kalinina et al. 2021; Lumbroso et al. 2021; Silva & Eleutério 2023; Peng et al. 2024; Wang et al. 2024).
In general, models for estimating loss of life differ in complexity and in the underlying structures and are usually classified as empirical and dynamic (USACE 2021). Empirical models (e.g., regression equations) are based on historical cases, considering the mortality rate of the population at risk and the characteristics of the flooding event (Dawson et al. 2011). Dynamic agent-based models, on the other hand, rely on spatial information on the flooding event, the exposed buildings, and the population vulnerability, and account for the interaction of these factors on the individual's decision-making and on the causes of fatalities (Lumbroso et al. 2021).
According to Dawson et al. (2011), agent-based modeling is the most suitable approach to address the challenges of simulating processes and consequences of flooding events, as such models are designed to capture dynamic interactions and responses in a spatial environment. Under this rationale, the dynamic models Life Safety Model and LifeSim have stood out in the scientific literature (Aboelata & Bowles 2005; Johnstone et al. 2005). In particular, Kalinina et al. (2021) discussed the differences between the models, pointing out that the Life Safety model is intended to evaluate the individual behavior during flooding events (a person or a car), whereas LifeSim extends the simulation to large spatial scales, i.e., considering the analysis of many individuals, and provides alert and evacuation scenarios – which hence broadens its scope for application in areas with more heterogeneous land uses, complex and irregular human occupation, and more intricate evacuation paths.
The LifeSim model was initially incorporated in a simplified version into the Hydrologic Engineering Center's - Flood Impact Analysis (HEC-FIA) program by the U.S. Army Corps of Engineers (USACE 2015), and then fully integrated into HEC-LifeSim (USACE 2018), which was recently updated to LifeSim 2.0 (USACE 2021). The model allows for simulating scenarios related to warning and evacuation systems for the exposed population (USACE 2021). As the progression of the flooding event and the actions of the population exposed to the danger are dynamic processes, LifeSim simulates the interactions between processes considering three main modules: Shelter Loss Module, Alert and Evacuation Module, and Loss of Life Module (USACE 2021). In more detail, LifeSim attempts to describe the behavior of the affected population, conditioned to shelter conditions (type and height of buildings), demographic features (total population, density of occupation, and age), and evacuation paths (road network and escape destinations), when an alert is issued as a response to a flooding event that exceeds some critical threshold.
Besides applications in engineering, LifeSim has been increasingly utilized for research purposes, mostly for assessing influential factors associated with loss of life in different parts of the world (e.g., Bilali et al. 2021, 2022; Kalinina et al. 2021 and references therein). It should be noted, however, that LifeSim is affected by many distinct sources of uncertainty, particularly those related to some highly uncertain (or poorly estimated) or even the absence or failures of these inputs, many of them averaged over large areas, and the boundary /initial conditions for the simulations (e.g., the time of the dam failure or the time of the alert). As a result, model outputs – loss of life – can be too dispersed with respect to the median point prediction. Thus, quantifying and propagating input uncertainty is necessary for both model scrutiny and decision-making processes.
For accommodating the outlined uncertain conditions, LifeSim resorts to Monte Carlo simulations (USACE 2021). In this case, model inputs (but not necessarily all model parameters) are treated as stochastic variables, which are independently sampled at each model run. Conditionally on a particular input vector, the output is deterministically obtained, and, by reiterating the model with a large number of new random inputs, an ensemble of predictions is estimated. Arguably, this is a computationally expensive approach, as LifeSim is a very parametrized model. Hence, devising a less expensive framework for propagating input uncertainty in practical applications is desirable – this would allow exploring the output space in a more comprehensive manner (through a much larger number of runs) and tracking the most critical trajectories for the model responses (Sudret 2007).
Surrogate models (or metamodels) comprise an appealing alternative for dealing with the high computational costs of agent-based counterparts. Metamodels are flexible mathematical structures that can be estimated from only a few runs of the original numerical model and then utilized for extensive simulation at reasonable costs (Shields et al. 2019). Surrogate models have been applied in several fields of knowledge, such as the investigation of bistable energy collectors (Norenberg et al. 2022), wind turbine blades (Pavlack et al. 2022), marine turbines (Nispel et al. 2021), and urban drainage (Nagel et al. 2020). Particularly with respect to the estimation of loss of life, Kalinina et al. (2021) applied surrogate models in a hypothetical dam in Switzerland and performed an ad hoc sensitive analysis on some input data from the LifeSim model – which included population characteristics and the delay in communicating hazards and in alert issuance. The authors concluded that, among the analyzed variables, the total population was the one that mostly impacted the model's outputs.
Within a broader class of surrogate models, in this paper, we utilize the polynomial chaos expansion (PCE) technique (Sudret 2007) for estimating the loss of life related to dam failure. In short, PCE maps a given computational model into a combination of orthonormal polynomial functions, which allows the inexpensive computation of the deterministic model outputs (Marelli et al. 2022). Moreover, PCE is a well-suited technique for global sensitivity analysis as it allows trivially decomposing the variance of the model outputs, from which the Sobol coefficients of all orders, which summarize the contribution of the inputs, can be readily retrieved. Hence, as opposed to the current LifeSim implementation, estimating the influence of each of the input variables on the model responses through a PCE model is straightforward, which may be helpful for more accurately assessing risk and establishing more effective evacuation plans for critical (pre-defined) dam failure events.
We apply the PCE metamodel to a hypothetical dam failure study in the city of Belo Horizonte, Brazil – a densely populated urban area with a variety of economic activities that strongly modulate the affected population in distinct scenarios of failure (e.g., daytime and nighttime). The objective of the study is to provide a comprehensive assessment of loss of life estimation under distinct initial/boundary conditions for simulation, as well as to investigate how the input variables (and their interactions) affect the model estimates as the dam failure scenarios change. The main novelty of the study is that we focus on alert and evacuation systems, which are paramount for the elaboration of emergency action plans (EAPs) and for reducing the number of fatalities associated with dam failures, as discussed and presented in the papers by Lumbroso et al. (2021) and Kalinina et al. (2021), but, to the best of our knowledge, have not been tackled within previous research. The paper is structured as follows. In the next section, we describe the study area and utilized datasets, as well as the formalism and underlying assumptions of the PCE model. We then discuss the obtained results related to loss of life and perform the global sensitivity analysis to identify the more influential input variables. Lastly, we present the main conclusions of the study and the envisaged research developments.
METHODS
Case study – the Pampulha Reservoir
Description/features . | Objective . |
---|---|
Geographic coordinates | Latitude 19°50′44.69″S and longitude 43°58′1.43″W |
Purpose | Flood cushioning/landscaping |
Year of construction/inauguration | 1936/1938 |
Elevation crest of the massif | El. 805,00 m |
Elevation of dam foundation (m) | 785,00 |
Normal maximum NA elevation | El. 801,00 m |
Overall height | 20.0 m |
Crest length | 450 m |
Total reservoir volume (up to crest) | 30,084.312 m3 (El. 805.00 m) |
Usable volume up to the threshold | 10,009.628 m3 (El. 801.00 m) |
Massive | Compacted terror |
Emergency overflow | Side channel with a width of 32.00 m (El. 801.00 m) |
Auxiliary extravasor | Tulip with a diameter of 12.54 m (El. 801.50 m) |
Description/features . | Objective . |
---|---|
Geographic coordinates | Latitude 19°50′44.69″S and longitude 43°58′1.43″W |
Purpose | Flood cushioning/landscaping |
Year of construction/inauguration | 1936/1938 |
Elevation crest of the massif | El. 805,00 m |
Elevation of dam foundation (m) | 785,00 |
Normal maximum NA elevation | El. 801,00 m |
Overall height | 20.0 m |
Crest length | 450 m |
Total reservoir volume (up to crest) | 30,084.312 m3 (El. 805.00 m) |
Usable volume up to the threshold | 10,009.628 m3 (El. 801.00 m) |
Massive | Compacted terror |
Emergency overflow | Side channel with a width of 32.00 m (El. 801.00 m) |
Auxiliary extravasor | Tulip with a diameter of 12.54 m (El. 801.50 m) |
Source:Vianini Neto (2016).
The Pampulha Reservoir initiated its operations in 1938, with the main purpose of supplying water to the municipality of Belo Horizonte. In 1954, a failure due to piping resulted in the dam failure, with a volume of about 12.6 million m³ being routed downstream. Despite the material damages related to this event, no lives were claimed as a result of the flood wave. In 2016, the surroundings of the Pampulha Reservoir, which encompasses the lagoon and the architectural monuments, received the title of Cultural Heritage of Humanity, granted by United Nations Educational, Scientific and Cultural Organization (UNESCO). Currently, it is a cultural and leisure symbol in the city of Belo Horizonte.
Hydrodynamic model
Characterization of the downstream valley
For characterizing the downstream valley, we mainly relied on secondary data. The affected structures and population at risk were retrieved from the study of Nascimento et al. (2020), which developed the EAP for the downstream valley of the Pampulha Reservoir in the event of a dam rupture. The delimitation of the flood-affected area, as well as the survey of population in households, were, in turn, obtained from the statistical grid developed by the Brazilian Institute of Geography and Statistics (IBGE 2016).
This statistical grid allows a more detailed analysis of territorial divisions and provides data in smaller geographic units, which are composed of a set of regular areas that divide geopolitical territories. This allows integration of data from distinct sources into incompatible geographic units. The smallest geographic unit, as obtained from the Brazilian demographic survey, is the so-called census tract, which does not have a homogeneous form. Thus, through statistical processes of aggregation and disaggregation, the information contained (population and households) in these census tracts are resampled to the 1 × 1 km resolution in rural areas, and the 200 × 200 m resolution in urban areas – which now comprise homogeneous time-invariant units for modeling purposes (IBGE 2016).
For defining the statistical grid to be analyzed in our case study, we considered the limits of the dam failure flood map and a 100-m offset as a buffer. Then, 428 statistical grids were extracted for further analysis. From the delimited area, data were extracted using the QGIS geoprocessing tool v. 3.36.1, from which 107,029 inhabitants and 32,868 households were identified in the flood zone.
Therefore, we were able to allocate the information extracted from IBGE's microdata. This allocation comprised two main stages. The first was the use of the algorithm built by Nascimento et al. (2020) to read the data in the form of numbers associated with each piece of information. In addition, the algorithm has been updated so that people with physical disabilities (blind, deaf, limited mobility, and mentally disabled) could be aggregated to the variable ‘people with limited mobility’ in the LifeSim software. After reading these data, we proceed to the characterization of the structures identified in the area of interest with the QGIS geoprocessing tool, which allows the proportional arrangement and the homogeneous distribution of the information in the households allocated in the study region.
To define the number of floors of the buildings, we assumed a homogeneous vertical distribution of households, i.e., the same number of apartments per floor. In addition, we considered buildings with four floors in the ‘apartment’ typology.
Regarding the variables related to the presence of people at home in the afternoon and evening, the following hypotheses were adopted:
For the night shift, people who return home every day were considered;
For the daytime period, we considered people who do not work, those who perform domestic work at home, those who work at home, those who study in the morning, and those who do not work (from the sixth year of elementary school to the third year of high school) and 50% of people who attend higher education and who do not work.
The distribution of people over and under 65 years was carried out respecting the total population of each household.
Based on these assumptions, the number of people in households during non-business hours (02:00 a.m.), 105,577 people, and during business hours (02:00 p.m.), 55,317 people, could be estimated.
Estimates of the population in educational (elementary, higher, state, federal, and private), health (Emergency Care Units and health centers), and social assistance institutions were obtained from the PRODABEL database. A total of 583 units located in the Pampulha, North and Northeast regions of Belo Horizonte were identified. However, only 84 units are located within the area of interest. We then assumed that 50% of the employees of the health and social assistance units work during the day and 50% during the night, due to their full-time operation.
Based on PRODABEL data, we could also identify the number of people involved in economic activities. The dataset presents the location and type of activity, the area used for the development of the activity, the size of the company, the start date, the legal nature, the corporate name, and the numeric identification. The database revealed 379,402 economic activity facilities in the municipality of Belo Horizonte, of which 12,324 are located in the area of interest.
Finally, to avoid the redundant count of people in households and economic activity facilities, we have relied on the following assumptions:
Only activities that were located outside exclusively residential areas were considered – this should avoid the double counting of the exposed population since those people who work in households in these areas have already been considered; and
Activities related to teaching, health, and social assistance were not considered.
From this analysis, 156,241 economic activity facilities were identified in Belo Horizonte and 5,070 in the area of interest. A regrouping of activities was performed based on the working shift: night and day activities. As nocturnal activities (02:00 a.m.), bars and establishments specialized in beverage services, such as entertainment, party and event houses, and discos, and dance clubs were considered. All other activities were assumed to be daytime ones. As a result, the 5,070 economic activities were divided into 5,038 and 32 for daytime and nighttime, respectively.
The total estimated population for business hours (02:00 p.m.) was 101,865 people and for non-business hours (02:00 a.m.) was 106,520 people. Regarding the construction material, all of them were considered to be made of concrete, which, according to the stability criterion of USACE (2015), behaves similarly to masonry. It is noteworthy that the variable population within the study area was not considered.
The road network represents the paths that can be used by evacuees on foot or in vehicles. The data were obtained from the OpenStreetMap platform, which is included in the LifeSim interface and allows the identification of road directions.
Loss of life modeling
Loss of life modeling was performed using the HEC-LIFESIM software. The data were collected in the study by Nascimento et al. (2020), who developed maps of vulnerability and provided the characterization of the population and infrastructure in the downstream valley based on secondary data. The HEC-LifeSim software provides dynamic modeling in a spatially distributed system to estimate potential fatalities and direct economic damage from flood events (USACE 2021). The model performs explicit simulations relating the alert of the hazard resulting from the flood event and the mobilization of the potentially exposed population, inside buildings and on road networks (USACE 2021). The interactions of the model are performed using the Monte Carlo statistical method. For this purpose, each simulation begins with the first evacuation warning or considering the first time of arrival of a flood wave, and ends when the rupture hydrograph has been completely propagated downstream or each mobilized group has completed the evacuation actions (USACE 2021).
The exposure and vulnerability scenarios were built upon data from the last demographic survey conducted by IBGE in 2010 – the most recent information available – and on the database from PRODABEL, following the rationale discussed in Silva & Eleutério (2023).
The simulations in HEC-LIFESIM were performed considering the daytime (02:00 p.m.) and night (02:00 a.m.) for two scenarios: one in which the identification of the danger (alert and mobilization of the population) is efficient; and another inefficient, which reproduces a region that is poorly prepared to react to the emergency of a dam failure, with respect to the responsibilities of the entrepreneur, the alert system, and the mobilization of the population. To represent these scenarios, the software has mobilization and alert curves, in which the coefficients of the equations represent the stages of hazard identification, communication to hazard managers, issuing of the alert, first alert received by the population, start of evacuation, and safe destination.
Surrogate models – PCE
Metamodels are surrogate functional forms intended to replace a complex model, with high computational demands, with a statistically equivalent and low-cost simulation model, which is built upon a limited number of runs of the true model (Le Gratiet et al. 2017; Sudret et al. 2017; Sudret 2021). Therefore, metamodeling has become a tool with important applications in the field of engineering and applied mathematics. However, due to the underlying complexity of its formulation, this technique has seen relatively little use in other fields (Marelli et al. 2022).
PCE is a technique used to model and propagate uncertainties in stochastic computer simulations by approximation using a basis of orthonormal polynomials (Sudret 2007). This concept was originally introduced by Wiener (1938). However, its application only began at the end of the twentieth century, with the pioneering work of Ghanem & Spanos (1991). The main advantage of the PCE method is that the number of points required for the estimation of the output statistics is relatively low (Luthen et al. 2022). Also, as compared to the other spectral representations, PCE shows faster convergence rates with increasing order of expansion (Sun et al. 2021).
Originally proposed to solve stochastic differential equations, PCE allows for the creation of a robust relationship between the system's response and the random input variables, as it can determine the mean and standard deviation of the random response (Sudret 2007). Xiu & Karniadakis (2002) generalized the Chaos polynomial to the orthogonal polynomial family by applying the solution of elliptic partial differential equations with uncertainties. Sudret (2006, 2007) presented the use of PCE in the context of sensitivity analysis. PCE has the added benefit of allowing directly estimating variance-based sensitivity indices (e.g., Sobol indices), which is the main motivation for choosing PCE as a probabilistic model in this paper.
Marelli et al. (2022) present the classical families of polynomials used as a basis for the expansion (Table 2). By combining these univariate polynomials using tensor products, it is feasible to efficiently propagate input uncertainty and account for output variability in complex systems, allowing one to straightforwardly perform sensitivity analysis.
Distribution . | Univariate polynomial family . | Hilbertian basis . |
---|---|---|
Uniform | Legendre | |
Gaussian | Hermite | |
Gamma | Laguerre | |
Beta | Jacobi |
Distribution . | Univariate polynomial family . | Hilbertian basis . |
---|---|---|
Uniform | Legendre | |
Gaussian | Hermite | |
Gamma | Laguerre | |
Beta | Jacobi |
Source: Adapted from Marelli et al. (2022).
The relevance of the terms in the database, however, is not uniform. Frequently, the most important terms in expansion are those for which only a few variables have significant influence. This phenomenon is known as the principle of sparsity of effects. These schemes are designed to improve computational efficiency and interpretation of results, allowing for a simplified and meaningful representation of the phenomena being studied (Marelli et al. 2022).
In this study, we considered the alerted population, the mobilized population, the issuance of the alert, and the hazard identification input (random) variables, for which probability distributions should be assigned before the construction of the metamodel and the coefficients of the PCE must be computed. The output of the surrogate model is the number of fatalities due to the dam failure.
Global sensitivity analysis – Sobol indexes
The aforementioned uncertainty propagation methods provide insights into the effects of the input variables on the variability of the model response (Sudret 2007). This hierarchization of the input variables is known as sensitivity analysis, which can be classified as local or global. Local analysis is obtained by changing the values of a model parameter in a given range and fixing the other ones. On the other hand, the global analysis is given by the alteration of all the parameters that are being analyzed mutually. Despite the increased complexity, this last option allows a better representation in the sensitivity analysis of the output results since the parameters are simulated together throughout the numerical iterations (Sudret 2007; Pavlack et al. 2022).
In this study, we resort to the Sobol indices for global sensitivity. The Sobol indices (Sobol 1993) are based on the definition of expansion of the computational model in terms of increasing growing dimension. Similarly, this is a variance-based method for sensitivity analysis, in which the total variance of the model outputs is described as the sum of the variances of the plots (Marelli et al. 2022).
For calibrating the metamodel and estimating the Sobol indexes, we utilized the UQLab tool, which was developed by the Research Group on Uncertainty, Safety, and Risk Quantification (Marelli et al. 2022) of the Zurich Institute of Technology in Switzerland. Access was made through the MatLab programming platform.
The input data have different formats for forcing LifeSim, with most of the information being entered through shapefiles. The hydrodynamic model, which has the hydraulic information necessary to perform the simulations, can be exported in the format .HDF.
The structural inventory is inserted in shapefile format with information regarding the types of occupation, types of construction, total population, population with mobility difficulties, number of floors and height of the foundation. It is noteworthy that most of this information is provided by the IBGE demographic census through official surveys. The road network is exported from OpenStreetMap, which is a free and collaborative mapping tool. Based on these inputs, it is possible to identify the study area: on the platform, the targeting data of the roads and official routes of passage of vehicles are generated.
Emergency zones are evaluated for the association of alert and mobilization curves, in which the population is characterized and grouped based on the issuance and dissemination of the alert, as well as the preparation and perception of the danger. The destinations of escape routes are entered as safe points to which the population can move and be safe from the danger of flooding. Generally, these are points in high terrain and locations outside the flooded spot. Two important variables are the time to identify the hazard and the delay in communicating the alert, as they will influence the alert and mobilization curves and, consequently, the number of estimated fatalities.
The LifeSim output identification step was performed by evaluating all data at the end of each simulation. It is noteworthy that LifeSim is not currently an open-source software. In this sense, the numerical approach used by the surrogate model is a non-intrusive method, which does not alter the original source code of the computational model being studied in this research.
In the construction of the surrogate model based on the PCE, it is possible to calculate the Sobol indices from the polynomial coefficients (Marelli et al. 2022). For this, it is necessary to include commands that relate the global sensitivity analysis, based on the Sobol index, to the metamodel algorithm. The configuration is given in the approximation of information used in the PCE, and MatLab commands are used to construct the sensitivity analysis. To use higher-order Sobol indices, it is necessary to set the algorithm for different Sobol orders.
The Sobol index method was chosen for the global sensitivity analysis of the input variables as it is trivially related to the PCE model. The Sobol indices, up to order 3, were calculated for each scenario, without the need for additional samples from LifeSim, which is one of the advantages and justifications for the choice of the method. For the sake of simplicity, each input was associated with a symbol, as shown in Table 3.
Variable . | Symbol . |
---|---|
Alerted population | X1 |
Mobilized population | X2 |
Issuance of the alert | X3 |
Hazard identification | X4 |
Variable . | Symbol . |
---|---|
Alerted population | X1 |
Mobilized population | X2 |
Issuance of the alert | X3 |
Hazard identification | X4 |
RESULTS AND DISCUSSION
Loss of life model
From the two-dimensional hydrodynamic model and the input data for the loss of life model, we performed simulations in the LifeSim software. The simulations considered efficient and inefficient scenarios, during the day and at night. The developed scenarios are described below.
Inefficient scenario: This is intended to represent a region that is poorly prepared for alert and evacuation. The coefficient ranges of the equations of delay in the dissemination of the alert and delay in the start of mobilization were determined to reflect this situation. In the interval between the identification of the threat and the issuance of the alert, a sufficiently long period was adopted, ranging from 0 to 24 h (i.e., from 0 to 1.440 min). Regarding the modes of evacuation, we considered that 50% of the evacuees would evacuate on foot and the other 50% would use vehicles.
Efficient scenario: The choice of the most efficient coefficient ranges, with respect to the dissemination of the alert and the beginning of the mobilization, represents efficient emergency planning at all stages. Regarding the modes of evacuation, we considered that 50% of the evacuees would evacuate on foot and the other 50% would use vehicles.
On the basis of these simulations, we observed a significant reduction in the number of fatalities from the implementation of efficient warning and evacuation systems, as well as the adequate preparation of the population for evacuation. When the dam failure occurs at night, Figure 8 revealed that an inefficient scenario with respect to the alert issue would entail, on average, 976 fatalities, while in the efficient scenario, this would be reduced to an average of 319 claimed lives. For the daytime period, in the inefficient scenario, there would be an average of 1,047 fatalities, whereas for the efficient scenario, we would observe, on average, 325 fatalities. In other words, the implementation and operationalization of efficient warning and evacuation systems in the area affected by the flood wave would lead to a reduction of 69% in the number of fatalities for the daytime period and 67% for the nighttime. In the efficient scenarios, the minimum toll was eight fatalities for the night shift and nine fatalities for the day shift. In contrast, in the inefficient scenarios, the minimal toll was much more severe, with 279 fatalities for the night shift and 334 for the day shift. These results highlight the importance of efficient warning measures to reduce the number of fatalities.
The results also indicate that the daytime breakout scenario (02:00 p.m.) is more critical as compared to the nighttime scenario (02:00 a.m.). This difference can be ascribed to commercial activities in the region during daytime hours, which implies a greater concentration of people exposed to risk during this period. Also, we note that, although not directly considered in the simulations, during business hours there would be an enhanced flow of people transiting through the risk area, which can have very negative effects on evacuation.
The results show significant potential in reducing human damage (fatalities) through the optimization of systems for identifying possible failures, disseminating alerts and organizing evacuation actions. This includes effective training of the population exposed to the hazard and the capacity building of the dam's emergency management team. In this sense, investing in technologies and strategies that improve the risk identification capacity and the efficiency of warning and evacuation systems is essential to ensuring an adequate response.
Surrogate model
The input variables were defined on the basis of their relevance to the elaboration of the EAP, as indicated in the study by Lumbroso et al. (2021). However, we note that, for the failure of the Pampulha Reservoir Dam, no detailed information on plausible ranges for our input variables is available for our case study, which increases uncertainty on the realizations of underlying processes. As a result, we considered the full range of the empirical curves intended to estimate the alerted population, the mobilized population, the alert issuance, and the hazard identification in LifeSim. Such values are shown in Table 4.
Variable . | Unit . | Values . |
---|---|---|
Population alerted | Fraction | 0 − 1 |
Mobilized population | Fraction | 0 − 1 |
Issuance of the alert | Minutes | 0 − 1.440 |
Hazard identification | Minutes | 0 − 1.440 |
Variable . | Unit . | Values . |
---|---|---|
Population alerted | Fraction | 0 − 1 |
Mobilized population | Fraction | 0 − 1 |
Issuance of the alert | Minutes | 0 − 1.440 |
Hazard identification | Minutes | 0 − 1.440 |
After determining the input variables, namely, the alerted population, the mobilized population, the alert issuance, and the hazard identification, it is necessary to choose the probability distribution that best represents the data. For this purpose, the marginal distributions of the input variables were analyzed for constructing the surrogate model based on PCE. As virtually no information on the inputs is available in the study area, we utilized uniform distributions, which would maximize uncertainty with respect to their outcomes.
For constructing the surrogate model, it is necessary to choose the polynomial basis and its maximum degree, and then compute the coefficients of the polynomial expansion. Hence, for selecting the best-fit models in each scenario, we considered polynomial degrees ranging from 1 to 15 truncation coefficients varying between 0.1 and 1.0. The polynomial degree was validated on the basis on the lowest estimated error under cross-validation and the extrapolation quality of the model.
For the calibration of the surrogate model, a sample of 1,000 LifeSim output simulations was utilized. Four metamodels of PCE were created: two for the pessimistic scenario of alert and mobilization (one for the daytime and one for the nighttime) and two for the optimistic scenario of alert and mobilization (one for the daytime and one for the nighttime). Table 5 summarizes the results generated in the construction of the PCE model. We note that the calibration errors are close to zero and similar to those found by Kalinina et al. (2021), although we could not find a clear pattern for the goodness-of-fit with respect to the proposed scenarios or the time of the day in which the hypothetical failure occurs.
Scenario . | Period . | Error . | Polynomial degree . | Q-normalized . |
---|---|---|---|---|
Pessimist | Diurnal | 0.0074 | 8 | 0.9 |
Pessimist | Nocturne | 0.0124 | 15 | 0.5 |
Optimistic | Diurnal | 0.0082 | 11 | 0.8 |
Optimistic | Nocturne | 0.0066 | 14 | 0.8 |
Scenario . | Period . | Error . | Polynomial degree . | Q-normalized . |
---|---|---|---|---|
Pessimist | Diurnal | 0.0074 | 8 | 0.9 |
Pessimist | Nocturne | 0.0124 | 15 | 0.5 |
Optimistic | Diurnal | 0.0082 | 11 | 0.8 |
Optimistic | Nocturne | 0.0066 | 14 | 0.8 |
The pessimistic-nocturnal scenario had the highest truncation coefficient among all scenarios (q-norm = 0.9), which resulted in the use of a greater number of polynomial coefficients to adjust the substitute model – a total of 142 coefficients, as shown in Figure 9(a). On the other hand, the pessimistic-diurnal scenario had the lowest truncation coefficient among all scenarios (q-norm = 0.5), and, despite converging with a polynomial degree of 15, 40 polynomial coefficients were necessary to stabilize the surrogate model (Figure 9).
In the optimistic scenario, for both periods (night and day), the same truncation coefficient (q-norm = 0.8) was obtained. However, when comparing the polynomial degrees, the daytime period converged to a lower polynomial degree (degree 11) than the nighttime period (degree 15). Therefore, 61 polynomial coefficients were generated for the daytime period (Figure 9) and 40 polynomial coefficients for the nighttime period (Figure 9).
Cross-validation
Table 6 provides the cross-validation results for each proposed scenario. It is possible to observe that the simulations for the optimistic scenarios present very similar results, with smaller errors with respect to the pessimistic counterparts.
Scenario . | Period . | MSE . |
---|---|---|
Pessimist | Diurnal | 1.89 |
Pessimist | Nocturne | 2.10 |
Optimistic | Diurnal | 1.41 |
Optimistic | Nocturne | 1.43 |
Scenario . | Period . | MSE . |
---|---|---|
Pessimist | Diurnal | 1.89 |
Pessimist | Nocturne | 2.10 |
Optimistic | Diurnal | 1.41 |
Optimistic | Nocturne | 1.43 |
However, it is important to emphasize that, overall, the performance of the models is adequate. The largest error found represents a difference of only two fatalities, more or less, with respect to the actual estimates. In this context, the identified error was considered acceptable.
As compared to Kalinina et al. (2021), the MSE values found in this study were higher. However, those authors relied on a much better database for defining the marginal distributions of the inputs, which obviously reduces uncertainty and entails less dispersed estimates during prediction. Despite the inherent uncertainties, considering the hypotheses adopted by LifeSim, such as the alert and mobility curves that are based on empirical data, the estimates were consistent and close to the evaluated computational model (LifeSim). This fact indicates that the estimated model has an acceptable degree of accuracy for estimating the number of fatalities in different scenarios.
After the calibration and validation of the surrogate model, we compared its computational costs to those associated with LifeSim. For this, we run 100,000 simulations of each model. LifeSim required about 20 h to perform the 100,000 simulations, while the PCE model required 20 min of simulation. These time values are similar to those found by Kalinina et al. (2021) in the same number of simulations.
Global sensitivity analysis
The indices for the first, second, and third orders were calculated, and the results reinforced the importance of the variables ‘Alert Issuance’ and ‘Hazard Identification’ (Figure 10). For the first order, which evaluates the individual influence of each input by averaging over the variations of the other inputs, the values 0.1883 and 0.1891, respectively, were obtained for the referred variables. For the second-order analysis, the Sobol index was 0.6015, which indicates the significant influence of the interaction between the variables ‘Alert Issuance’ and ‘Hazard Identification’ on the fatality estimates. On the other hand, no significant pairwise interactions among the other combinations of inputs were found, as the correspondent second-order Sobol indexes are very close to zero. In the third-order analysis, the Sobol index was 0.0085 for the influence of the interaction between the variables ‘Alert Issuance’, ‘Hazard Identification’, and ‘Mobilized Population’ on the fatality estimates (Figure 10). Being very low, this value indicates a minimal influence of the combination between the three variables in the variance of the model outputs – the same holds for the remaining third-order Sobol indexes, which are virtually null. One should note that the summation of the ith order Sobol indexes, i = 1, 2, 3, is equal to 1, as required.
In the daytime pessimistic scenario, the Sobol index for the first order, which evaluates the individual influence of each variable, the variables ‘Mobilized Population’, ‘Alert Issuance’, and ‘Hazard Identification’ presented values of 0.0354, 0.0910, and 0.1086, respectively, for the Sobol index (Figure 11). The values are slightly lower than those found for the pessimistic night scenario for the latter two variables, but some influence of the mobilized population is perceived in this scenario – in fact, a larger number of people may be at risk in vulnerable working places during the day, which explains the higher contribution of this variable. For the second-order analysis, the Sobol index was 0.7609 for the influence of the interaction between the variables ‘Alert Issuance’ and ‘Hazard Identification’ on the fatality estimates (Figure 11), but again no other significant pairwise interactions were observed. In addition, the second-order index is close to the pessimistic nocturnal scenario. In the third-order analysis, the Sobol index was 0.0011 for the influence of the interaction between the variables ‘Alert Issuance’, ‘Hazard Identification’, and ‘Mobilized Population’ on the fatality estimates (Figure 11). In addition, the Sobol index generated values below 0.0002 for the interaction between the other variables. These values, which are even smaller than in the pessimistic night scenario, indicate that the combination of three variables has a low contribution to the variability of the model's output.
In the daytime optimistic scenario, for the first order, the Sobol index values were 0.0900, 0.0280, and 0.0602 for the variables ‘Mobilized Population’, ‘Alert Issuance’, and ‘Hazard Identification’, respectively (Figure 12). The values of variables X3 and X4 are close to those found in the previous scenarios. However, variable X2 had a more significant contribution to the optimistic diurnal scenario, which is justified by the importance of this variable in the scenario that considers an efficiency in alerting and mobilizing the population.
For the second-order analysis, the Sobol index was 0.7098 for the interaction between variables ‘Alert Issuance’ and ‘Hazard Identification’ on the fatality estimates (Figure 12) 0.0157 between ‘Mobilized Population’ and ‘Hazard Identification’, and 0.0121 for the interaction between the variables ‘Mobilized Population’ and ‘Hazard Identification’. The contribution of variables of X2 and X4 are close to those found in the previous scenarios, but the influence of variable X2 is stronger for this scenario and, therefore, the contribution of this variable with the others is also more prominent in this case.
In the third-order analysis, the Sobol index was 0.0769 for the influence of the interaction between the variables ‘Alert Issuance’, ‘Hazard Identification’, and ‘Mobilized Population’ on the fatality estimates (Figure 12). In addition, the Sobol index presented a value of 0.0069 for the interaction between the variables ‘Alerted Population’, ‘Alert Issuance’, and ‘Hazard Identification’. There is a larger contribution between the interaction of these variables in this scenario, as well as a larger contribution at the individual level.
For the first order, the Sobol index values were 0.2106, 0.0275, and 0.0275 for the variables ‘Mobilized Population’, ‘Alert Issuance’, and ‘Hazard Identification’, respectively (Figure 13). These results are in agreement with those found in the daytime optimistic scenario: variable X2 continues to have a larger contribution due to the importance of alerting and mobilization in the optimistic scenario. On the other hand, in the pessimistic scenarios, the variables X3 and X4 had greater contributions in all orders of Sobol analyzed in this study. For the second-order analysis, the Sobol index was 0.6988 for the influence of the interaction between the variables ‘Alert Issuance’ and ‘Hazard Identification’ on the fatality estimates (Figure 13), 0.0048 between ‘Mobilized Population’ and ‘Hazard Identification’, and 0.0048 for the interaction between the variables ‘Mobilized Population’ and ‘Hazard Identification’. These values also indicate the prevalence of variables X3 and X4 for the second order of Sobol, as found in all scenarios.
In the third-order analysis, the Sobol index was 0.0192 for the influence of the interaction between the variables ‘Mobilized Population’, ‘Alert Issuance’, and ‘Hazard Identification’ on the fatality estimates (Figure 13). In addition, the estimated Sobol index is 0.0057 for the interaction between the variables ‘Alerted Population’, ‘Alert Issuance’, and ‘Hazard Identification’, and lower than 0.001 for the other variables. As in the daytime optimistic scenario, there is a larger contribution of the variables in the output of the model, which can be justified by their importance variables in the evacuation and mobilization of the population in case of imminent danger, unlike the pessimistic scenario, which assumes that the population is unprepared and that the alert and mobilization system is inefficient.
CONCLUSIONS AND RESEARCH DEVELOPMENTS
The use of computational tools to analyze the consequences of dam failure events is essential for a better understanding of the involved processes, as well as to provide guidelines for the planning of emergency actions, which, if efficiently implemented, may reduce the number of fatalities and environmental and socioeconomic damage. In this context, the present study resorted to the LifeSim agent-based model for investigating the loss of life that would result from the failure of the Pampulha Reservoir Dam, in Brazil. In addition, to provide a broader understanding of the influence of input variables in LifeSim responses, which is useful for designing guidelines for evacuation, we utilized the computationally inexpensive PCE technique for approximating the numerical model and the Sobol Indices for decomposing the variance of the model responses.
For simulating LifeSim, we considered the inputs alerted population, mobilized population, issuance of the alert and hazard identification as uniformly distributed variates – as we did not have detailed information on them in our study area – and assumed four distinct scenarios that differ in the efficiency of the alerts and the time of day in which the dam failure took place. For a nighttime event, the inefficient scenario would result in 976 fatalities on average, whereas for the efficient one, the toll would reduce to 319 claimed lives. On the other hand, a daytime event would entail 1,047 and 325 fatalities, for the inefficient and efficient scenarios, respectively. We note that, despite the very distinct activities developed in the study during day and night periods, the number of fatalities was quite similar in both cases, which is likely a result of the very uncertain conditions from which LifeSim is forced. Hence, further research on the ‘actual’ distribution function of the input variables would be beneficial for properly distinguishing between these situations. In addition, the advantages of designing proper alert and evacuation strategies are noticeable: even under very uncertain inputs, the number of fatalities is considerably decreased when the alert is properly communicated, and the population is properly trained to react to imminent danger.
To more thoroughly explore the LifeSim's output space, we built four metamodels that accurately approximate the numerical model responses under cross-validation – of course, the metamodels are very parametrized and, as a result, low levels of bias were expected a priori. The metamodels considerably reduced the computational costs for the stochastic simulation of loss of life resulting from the dam failure. More important, however, is that decomposing the variance of the PCE model is straightforward, and the computation of the Sobol indices allowed readily estimating the influence of the random inputs. Alert issuance and hazard identification were, by far, the most important contributing factors in pessimistic scenarios, but the alerted population and the mobilized population presented nonnegligible indices for the optimistic ones. Second-order analysis highlighted the importance of the joint study pairwise of variables, but, from this point onwards, the joint effects are deemed too low. Overall, our results highlighted the importance of well-designed alert strategies, which would then stand out as a priority for decision-makers. However, a limitation of this method is the construction of a surrogate model that properly represents the computational model being studied.
To sum up, the combination of LifeSim and the PCE metamodel enabled propagating uncertainty at reasonable costs and further investigating conditioning factors to the population behavior as a response to a critical dam failure, even in situations in which little information on alert and mobilization is available. We believe this knowledge may underpin updates in policies and safety measures for reducing the number of fatalities in the case of a dam failure. Of course, there are several aspects for improvement in future work. In effect, other applications to well-documented structures and systems may provide additional insights for prescribing the probability distributions of the stochastic inputs. Moreover, we intend to assess whether large natural floods might be used as proxies for technological ones, at least in some locations of the study area, for aggregating more information on the behavior of the population at risk – which could, in turn, indicate further limitations of LifeSim in describing the evacuation process.
ACKNOWLEDGEMENTS
The authors acknowledge the support to this research from Conselho Nacional de Desenvolvimento Científico e Tecnol'ogico (CNPq), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), and Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG). The authors also wish to acknowledge the anonymous reviewers and editors for the valuable comments and suggestions, which greatly helped improve the paper.
DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.
CONFLICT OF INTEREST
The authors declare there is no conflict.