## Abstract

Coupled 1D2D models emerged as an efficient solution for a two-dimensional (2D) representation of the floodplain combined with a fast one-dimensional (1D) schematization of the main channel. At the same time, high-performance computing (HPC) has appeared as an efficient tool for model acceleration. In this work, a previously validated 1D2D Central Processing Unit (CPU) model is combined with an HPC technique for fast and accurate flood simulation. Due to the speed of 1D schemes, a hybrid CPU/GPU model that runs the 1D main channel on CPU and accelerates the 2D floodplain with a Graphics Processing Unit (GPU) is presented. Since the data transfer between sub-domains and devices (CPU/GPU) may be the main potential drawback of this architecture, the test cases are selected to carry out a careful time analysis. The results reveal the speed-up dependency on the 2D mesh, the event to be solved and the 1D discretization of the main channel. Additionally, special attention must be paid to the time step size computation shared between sub-models. In spite of the use of a hybrid CPU/GPU implementation, high speed-ups are accomplished in some cases.

## HIGHLIGHTS

A hybrid CPU/GPU coupled 1D2D model is presented.

Details of the acceleration technique and code implementation are provided.

A sensitivity analysis is performed in order to evaluate the model efficiency.

Runtimes for each model and the transference are analysed for different hybrid configurations.

The model is applied to a realistic case: a large stretch of the Ebro River.

## INTRODUCTION

According to a European Environment Agency (EEA) survey (EEA 2010), flood events are prone to occur more frequently in many river basins giving rise to catastrophic consequences. To control their impact on the environment and society, management plans are being developed and flood modelling is starting to represent an important tool for flood risk maps design, as considered in the European Floods Directive (C. o. t. E. U. European Parliament 2007). However, flood forecasting is still not as widely used by decision-makers as it could be, not because of the quality of the results, but due to the speed of the simulations (Leskens *et al.* 2014).

The reduction of computational time when simulating real or practical cases has been one of the most important challenges of computational fluid dynamics (CFD). In particular, when trying to reproduce flood events on a large spatial and temporal domain, the efficiency of the method becomes crucial to make a computational tool affordable and have a practical use. For this purpose, the most common strategies can be divided into two groups: model reduction and computation acceleration. The complexity of a real flow may require the consideration of the three-dimensional (3D) Navier–Stokes equations for the total representation of its nature. However, the prediction of hazards, such as flood events or evolution of pollutant discharges on rivers, does not need such a detail in many cases. Thus, the simplification of the model under shallow water conditions leads to the non-linear shallow water equations (SWE), whose resolution is simpler and faster than the complete equations. Additionally, the assumption of dimensional hypothesis, which allows the modeller to work with one-dimensional (1D) models, can also accelerate the computations. However, the limits of these models must always be analysed in order not to lose accuracy in the solution (Horritt & Bates 2002; Costabile *et al.* 2015).

For flood event simulations, the two-dimensional (2D) shallow water system of equations has been demonstrated to be suitable enough to reproduce the flow behaviour (Knijff *et al.* 2010; Sanders *et al.* 2010; Masoero *et al.* 2013; Lacasta *et al.* 2014; Vacondio *et al.* 2016). Thus, although 3D models are widely extended on other CFD applications, their use for flood events still presents unaffordable times for large-scale events even when using parallelization techniques for acceleration. Additionally, their computational cost is not compensated by the accuracy increment in comparison with simplified models since huge flood events have a natural 2D behaviour (Horna-Muñoz & Constantinescu 2018). Thus, model reduction is a common strategy when the application allows for it. However, although the use of 2D models provides accurate results when dealing with floods due to the proper representation of the 2D velocity field, their computational cost may still be excessive, especially due to the necessity of small cells in the river bed or in hydraulic structures (Caviedes-Voullième *et al.* 2012; Echeverribar *et al.* 2019). This drawback of 2D models has caused over the years an extended use of 1D models for flood simulation (Petaccia *et al.* 2012; Murillo & Garcia-Navarro 2014). 1D models are advantageous when simulating complex channel nets and have been widely used by authorities, decision-makers and flood modellers due to their relatively short computational time. However, they have a clear limit in the simulation of 2D velocity fields and have been proven not to be accurate enough for flood events (Horritt & Bates 2002; Costabile *et al.* 2015; Morales-Hernández *et al.* 2018). Alternatively, some researchers have proposed 1D2D coupled models to combine a 2D representation on the floodplain and a 1D numerical schematization for the main channel (Bladé *et al.* 2012; Domeneghetti *et al.* 2013; Morales-Hernández *et al.* 2013, 2015). These models lead not only to a reduction of the simulation time but also to a more accurate representation of the river channel flow, whose main velocity direction and geometry are better captured by 1D models. Nevertheless, even though these combined models present a speed-up in comparison with full 2D models, when dealing with large-scale events they still present a high computational cost.

On the other hand, there exist other techniques which are focused not on the number of operations to compute depending on the model and its dimensions but on the speed to manage all those operations. Based on parallelization techniques that divide the computing workload into different cores that solve the scheme simultaneously, different technologies can be found. One of the most common strategies is Open Multi-Processing (OpenMP) (Board 2015), that accelerates the calculations depending on the number of cores, which depends on the number of Central Processing Units (CPUs) available. Beyond this, high-performance computing (HPC) enerally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science or engineering. In particular, schemes can be implemented to run on graphics cards (GPUs) (NVIDIA 2010), which may contain up to thousands of cores in the same device and work together with only one CPU. The implementation of 2D models on GPUs was extended a few years ago and several 2D models running on a Graphics Processing Unit (GPU) can be found. They offer computational speed-ups turning 2D models into affordable tools for flood forecasting (Castro *et al.* 2011; Brodtkorb *et al.* 2012; Lacasta *et al.* 2014; Petaccia *et al.* 2016; Vacondio *et al.* 2016). Additionally, if several GPU devices are used, the acceleration may be even higher. However, in addition to the large computing facilities required, the communication time penalty is not always worth it for a large number of GPUs, requiring efficient implementations (Morales-Hernandez *et al.*; Sharif *et al.* 2020; Turchetto *et al.* 2020).

Therefore, the combination of these two strategies – HPC and model dimension reduction – is proposed to further increase the performance during a flood simulation. With the aim of carrying out this combination, the coupled 1D2D model proposed by Morales-Hernández *et al.* (2015) is adequately programmed for GPU devices, in the HPC context. In this model, previously validated running on CPU (Morales-Hernández *et al.* 2015), the main river bed is modelled so that the whole framework works as a pure 1D model when channel overflow does not occur (Morales-Hernández *et al.* 2016). During a flood event, the terrain adjacent to the river bed becomes inundated and a 2D model is used for the representation of the velocity field evolution on the floodplain. By means of an appropriate geometric link, the 1D and the 2D models are coupled. They are both based on the SWE and solved using a finite volume explicit upwind first-order numerical scheme with approximate Roe solvers that are able to deal correctly with wet/dry fronts, advance over dry beds and transient flows over irregular topography (Murillo *et al.* 2009; Morales-Hernández *et al.* 2015).

The computation of the combined 1D2D model consists of a combination of CPU and GPU algorithms and data transfers. The GPU programming procedures present nontrivial problems regarding the implementation efficiency of the coupled model. While the 2D model is running on the *device* (GPU), the cells for the 1D representation of the main channel run on the *host* (CPU). At this point, the computation of the coupled cell edges must be analysed to find the most efficient method of implementation. In this case, GPU computing was chosen also for these 1D2D shared edges.

The interest of the present analysis resides in the study of the combination of efficiency in the model and on the architecture of the parallelization, as the speed-up of the coupled model is not that obvious due to its particular hybrid implementation. This implementation is based on combining CPU and GPU algorithms for each of the sub-domains and the necessity of data transfers. Having a 1D2D fully GPU model could be a possibility depending on the application. However, in the cases of interest in the present study, the fast 1D frameworks and the high involvement of the floodplain in flood events place this hybrid model in an interesting position to be studied for cases where the main river does not represent a large proportion of the computation. Therefore, the aim is to analyse not only computational times depending on the existing models but also the influence of the implementation and the data transfer within the global computational time.

Depending on what the model is compared to, this particular code architecture could lead to different speed-ups that are schematized in Figure 1. Speed-up 1 (SU1) has been previously analysed in several works where a 2D model has been implemented on GPU. It is worth recalling the dependency not only on the algorithm implementation but also on the GPU device used. For instance, Lacasta *et al.* (2014) report approximately constant values of 60x with an optimized code. With the same purpose, Vacondio carried out a deep analysis of simulation times for different devices obtaining speed-ups that vary from 10x up to 200x (Vacondio *et al.* 2014). On the other hand, speed-up 4 (SU4), as mentioned before, has also been studied in previous works of the present model showing speed-ups up to 30x (Morales-Hernández *et al.* 2015). It is relevant to point out that speed-ups 1 and 2 are strictly computational speed-ups. However, speed-ups 3 and 4 require model reduction, and therefore, the speed-up comes with the penalty of accuracy and complexity (dimensional in this case) of the solution. Thus, this work aims to study speed-up 2 (SU2), the comparison between coupled models running on CPU and hybrid CPU/GPU, and speed-up 3 (SU3), the comparison of the hybrid CPU/GPU coupled model with a pure 2D model running on GPU.

For this purpose, different test cases are carried out. First, a stretch of the Severn River (UK), proposed by the UK Environment Agency (Environment Agency 2010), is used as validation test to ensure the proper implementation of the 1D2D coupled model on the hybrid CPU–GPU paradigm, comparing the obtained results with other models on a fixed mesh size. Once the model has been tested, a benchmark case is designed to analyse the sensitivity of the model to the data transfers depending on parameters that are susceptible to change depending on the case, such as the number of 1D cells, the contribution of floodplain due to discharge peak of the event and duration of the hydrograph. Finally, the model is applied to a real test case: the Ebro River (Spain). The simulation of two historical events in the middle reach of the river, encompassing 125 km of river within a 400 km^{2} area, is used to study the performance in real cases where the floodplain plays an important role and the presence of hydraulic structures involves higher resolution.

## GOVERNING EQUATIONS AND NUMERICAL SCHEME

### 1D SWE

*et al.*1980). The mass and momentum conservation laws are written for the channel longitudinal direction

*s*as follows:

*t*, and in space,

*s*, arewhere

*Q*(m

^{3}/s) is the discharge,

*A*(m

^{2}) is the cross-section wetted area, and

*g*(m/s

^{2}) is the acceleration due to the gravity. The source term vector contains the bed variation term, (m/m), and the friction slope, (m/m), where friction losses are represented by means of the Gauckler–Manning's roughness coefficient,

*n*() (Henderson 1966; Chanson 2004), present on the empirical Manning formula that defines (Morales-Hernández

*et al.*2015). Finally, (m

^{3}) and (m

^{2}) account for hydrostatic and longitudinal width variation pressure forces, respectively (Burguete & García-Navarro 2001).

### 2D SWE

*et al.*1980), taking into account the two directions of the flow for the momentum equations – is given bywherewhere contains the 2D conserved variables of the problem: water depth,

*h*(m), and the unit discharges in the

*x*- and

*y*-direction, (m

^{2}/s) and (m

^{2}/s), respectively. and represent the fluxes of those variables in the

*x*- and

*y*-direction. Finally, source terms are compacted in containing the friction and bed slopes, and , respectively, projected on the

*x*- and

*y*-direction (Morales-Hernández

*et al.*2015).

### Numerical scheme

*i*, where Equation (5) is integrated, the finite volume (FV) numerical scheme can be derived. The hyperbolic character of Equation (5) offers the possibility to calculate the real eigenvalues and eigenvectors of the Jacobian matrix. They are used to build the updating schemes.

*et al.*(2015), the expression for the 1D updating at each cell

*i*iswhere is the 1D grid size, are the Jacobian eigenvalues (with superscripts denoting the upwind discretization), are the Jacobian eigenvectors and are wave coefficients. Expression (6) represents the time updating each time step at each cell with the in-going contributions, as depicted in Figure 2(a). Finally, since the numerical scheme has an explicit nature, the time step size must be limited by the Courant–Friedrichs–Lewy (CFL) stability condition (LeVeque 1992):

*i*, , instead of . If a generic edge between cells

*i*and

*j*is called

*k*, the problem is projected onto

*k*and the matrix eigenvectors basis. Finally, the updating equation in a triangular mesh iswhere stands for the number of eigenvectors (or variables) and

*k*runs the number of involved neighbouring walls (with NE = 3 in the triangular case). Again, the time step has to be limited through the CFL condition:where

*k*loops over all computational walls and is defined at each cell depending on the area, , and the length of its

*p*neighbouring walls:

## RCE COUPLED MODEL AND EXTENDED NUMERICAL SCHEME

### Formulation of the RCE coupled model

The Riemann Coupled Edge (RCE) from Morales-Hernández *et al.* (2015) model is based on the definition of local Riemann problems between the 1D and the 2D models that will additionally contribute to the normal updating procedure of the cells in both 1D and 2D models, as represented in Figure 3. The coupled model is based on a new element of discretization: the coupling zone (CZ). Each CZ contains one 1D cell and an entire number of 2D cells that are laterally coupled at both left and right margins. It is important to note that at least one 2D cell must connect to a 1D cell. Some more details about the geometric linkage of the models can be found in Morales-Hernández *et al.* (2015).

The main idea of the RCE procedure is the conversion of the 1D values into 2D equivalent quantities, so the local Riemann problem at both sides of the CZ can be computed as an ordinary 2D problem. Therefore, additional contributions are computed in the lateral coupled edges (Morales-Hernández *et al.* 2015). According to the CZ angle, , the 1D cell values can be transformed into a 2D vector. In particular, the 1D model discharge at each cell, , contributes to the 2D cells properly projected, allowing an unstructured coupling. The detailed procedure for the angle, , calculation is explained in Morales-Hernández *et al.* (2015), as well as the final expression for eigenvectors and values for the coupled edges.

At the end, each Riemann problem can be projected over the normal direction of the vector defined as the normal vector of the coupled edge, , between both models. Consequently, each coupled cell in both 1D and 2D models has additional contributions coming for the coupled part and Equations (6) and (8) are modified subsequently.

### Extended numerical scheme

All the specific eigenvalues and eigenvectors, as well as other details of the coupled numerical scheme, can be found in Morales-Hernández *et al.* (2015).

## HPC IMPLEMENTATION

Present-day engineering problems involve a high amount of data management and the necessity of fast results requiring HPC methods. In this work, the NVIDIA CUDA (Compute Unified Device Architecture) toolkit is used to carry out the simulations on a GPU-based solution.

### GPU architecture and CUDA toolkit

The GPU was initially designed to deal with computer graphics operations. However, their development has turned these devices into a more general tool that can offer its capabilities for other purposes, such as the engineering problem acceleration. This approach is also known as GPGPU (General-purpose computing on Graphics Processing Units), and it allows the coder to implement algorithms that can run on a GPU hardware using high-level language. Particularly, NVIDIA developed the CUDA toolkit to run parallel solutions on GPU. It is a computing platform and programming model where the developer still programs in its familiar language (C, in this case) and incorporates extensions that express massive amounts of parallel operations.

It is important to mention that the GPU device has its own memory and this fact affects the method of algorithm implementation, since the location of the variables (CPU or GPU) must always be specified on the code and, usually, implies a double copy of the variables and a large amount of data transfer operations. Although some developments aim to make this transparent to the developer by means of a *unified memory*, the most common performance is based on memory copy operations. This point requires special attention due to two main implications. First, the performance of the GPU solution may be greatly reduced if there is a large number of transfer in the code. Secondly, this GPU performance may not be appropriate for small applications where the transfer operations have a weight higher than calculations. Note that GPUs were initially oriented to perform arithmetical operations on vector-based information. Because of this, original CPU codes used commonly have to be reorganized on these mentioned vector structures to implement them on CUDA.

### Details of code implementation

The implementation on GPU is not a trivial task when dealing with combined models with different parts that lead to different performance efficiencies. It is clear that, in the case of having a huge model including an entire river network where the number of 1D cells compares with the number of involved 2D cells, the 1D2D fully GPU could be justified since the global computational time would not be limited by the 2D floodplain, but by the river network. However, the focus here relies on the assumption that the number of cells in the 1D model is much lower than in the 2D model, and the number of computations in the latter is relatively large so as to require GPU acceleration. Then, the 1D framework is kept in its original CPU environment, avoiding the GPU porting. In particular, the original 1D code, like many CPU codes written in C, is organized using *Arrays of Structures*. Thus, the reorganization of the data into *Structures of Arrays* for the GPU implementation would be mandatory to get an acceptable performance on the GPU (Lacasta *et al.* 2014).

For all those reasons, in the present work, the 2D main loop and the coupled part are computed by the GPU, while the main channel, solved with a 1D model, is run on the CPU. The 2D structures (mesh and cells containing variables), as well as coupled structures (coupling zones, CZ), are mapped using *Structure of Arrays* and copied to the GPU at the beginning of the simulation. All the 1D structures (channels and cells with variables) remain on the CPU memory and are computed without this acceleration. Each time step, the results of the 1D models are transferred to the GPU to feed the CZ. This is sketched in Figure 4, where the operation sequence is outlined, distinguishing between the CPU and GPU codes.

The functions of the algorithm have been divided into four groups that will be used to quantify the computational cost of each group and to carry out a detailed performance analysis. These four groups that can be seen in Figure 4 are:

1D–1D cell edges computation and 1D model update;

1D–2D cell edges computation;

2D–2D cell edges computation and 2D model update; and

data transfer between sub-domains.

The pre-process functions will not be taken into account later for time analysis. Different colours have been assigned to those transfer processes, sketched in Figure 4, to enable an easy graphical visualization in the plots presented in next sections.

## SIMULATION CASES

In general, there is a noticeable speed-up when moving from a 2D model to a 1D2D model due to the substitution of the 2D refined river bed by 1D cross-sections. However, when a model moves on to the GPU, an additional speed-up analysis must be done due to the time-consuming data transfers. This is particularly so when there is a part of the code still running on the CPU, the 1D model in this case, and data transfer between sub-domains is performed each time step.

The test cases of this section aim to demonstrate, first, the accuracy of the model. The method is applied for that purpose to a reach of the Severn River (UK), whose data are provided by the UK Environment Agency with the aim of testing 1D2D models on a fixed size mesh. Secondly, the test cases intend to analyse the hybrid CPU/GPU coupled model performance regarding total simulation times and transfers. This is done with an idealized long river. Finally, it is important to study the limits of the model concerning speed-up when it is applied to a real case where the 2D floodplain governs completely the simulation time. To do so, the model is applied to a long reach of the Ebro River (Spain).

### Validation test case: the Severn River (UK)

The UK Environment Agency proposed different test cases for model validations (Environment Agency 2010). In this work, the 7th test case, a stretch of the Severn River (UK), is chosen to evaluate 1D2D model linking and exchanges of flood volume between the main channel and the floodplain running on GPU. As the intention was not to replicate an observed flood, upstream boundary conditions were designed by the UK EA, and the results of several models were provided to compare with.

This river consists of a 7 km long by 0.75–1.75 km wide floodplain, modelling a 20 km stretch of the Severn River. Boundary conditions are a hypothetical inflow hydrograph (see Figure 5(a)) and a downstream rating curve. An initial steady condition of 200 m^{3}/s is imposed. Several gauging points are spread over the domain to compare water level temporal evolution with other models. The location of these probes is also indicated in Figure 5(b) over the topography.

According to the benchmark test case rules, a 2D mesh for the floodplain has been built with 65,708 cells and linked with a 1D channel that contains 300 cells. In total, there are 858 coupling 2D cells that will need the information of the 1D cells to update their own variables every time step. Under these conditions, the GPU implementation is not so relevant due to the small efficiencies that these devices present for small cases. However, it might be worth mentioning that this 72 h flood event took 1,003.28 s for the model running on the GPU, whereas the CPU version lasted 16,903 s, getting a speed-up of 16.84x, due to the high influence of the 2D floodplain on the simulation.

There are three different areas with several measurement points on each. In Figure 6, the proposed model results are compared with the lower and upper envelope of other models (provided in Environment Agency 2010). Only one probe per floodplain/area has been chosen.

The accurate implementation of levees crest elevation is crucial for a proper representation of the volume transferred from the main channel to the floodplain. The original information contained in the 1D cross-sections regarding river embankments overlaps with the floodplain provided Digital Elevation Model (DEM) and the modeller must make a choice. This decision unavoidably involves different overtopping times and, although this generates discrepancies between different models, the results are all coherent and similar. The model of this work also provides results in the proper range.

### Sensitivity analysis case: long channel with a lateral floodplain area

The data transferred to the GPU due to coupled cells updating could involve a deceleration that may counteract the coupled model advantages. In order to test the sensitivity of this model to transfers, a test case has been designed with real river dimensions. The case involves a prismatic channel 86 km long, 8 m deep and 100 m wide; it has a surrounding plain flood-prone area with irregular micro-topography (created as a combination of random *sin* and *cos* functions); and it has a soft longitudinal slope ( m/m). The whole spatial domain encompasses 960 km^{2}. The performance analysis has been done focusing on different parameters: (a) number of 1D cells on the river bed (thus, a larger amount of data transfer) and (b) simulated hydrograph (thus, different floodplain areas to be computed on the 2D floodplain and different durations). With reference to Figure 1 in the introduction, SU2 and SU3 will be compared.

For the purpose of this comparison, two different effects are to be analysed. First, the speed-up decrease due to the data transfer between devices (CPU/GPU) and the influence of the CPU part of the code. Second, the differences on speed-up depending on the percentage of floodplain involved (wet). Figure 7 shows a zoom view of several meshes that have been designed for the study. There is one mesh for the 2D model with a refined riverbed that acts as reference (a), and several coupled meshes that vary the discretization refinement on the main channel from 100 (b) to 2,000 1D cells (f), maintaining the same resolution on the flood-prone area, so that this parameter does not affect the results. Considering the necessity of the model to have at least one 2D cell linked to each 1D cell, a finer 1D discretization is not considered since it would unavoidably lead to an extra increase of the 2D resolution. The full 2D mesh has been refined at the river bed for a proper bathymetry representation. Therefore, while the coupled mesh contains 207,317 triangular elements within the floodplain, the full 2D mesh has 454,159 triangular cells in the whole domain. Over these meshes, four hydrographs have been propagated. All of them can be seen in Figure 8(a), where the maximum flooded area generated by the highest peak (H1 or H2) and the lowest (H3 or H4) of them is also shown in Figure 8(b).

For the computational time analysis of the hybrid model, four chronometers have been set on the code in order to do a profiling study distinguishing between four different costs: 1D cells computation, 2D cells computation, coupled areas computation and data transfer between sub-domains (1D/2D) and, thus, devices (CPU/GPU). For this analysis, an NVIDIA GeForce GTX Titan Black has been used as a *device* in combination with an Intel Core i7-8700 as a *host*. Figure 9 shows, for each hydrograph, the profiling for each number of 1D cells. All the colours correspond to the colour code stated in the flowchart depicted in Figure 4.

In Figure 9, where the computational time is divided into the aforementioned groups, the remaining time consumption is gathered under the ‘other’ flag, so the whole column represents the total simulation time. The five different columns correspond to a different number of 1D cells on the river bed. All the graphs show how the 2D computation consumes the vast majority of the simulation time, followed by the coupled computation and the 1D edges. The transfer time between sub-domains, although not negligible, does not govern the total computational time even in the most unfavourable case (2,000 1D cells). However, it can be seen how the 1D time entails a potential bottleneck as the resolution (the number of cells) increases.

As depicted in Figure 4, the 1D domain is computed on the CPU and transferred from *host to device* in order that the coupled edges can be computed on the GPU with the main channel information. Thus, when increasing the number of 1D cells, the theoretical tendency should be an increase of 1D time and an increase of transfer time, while 2D and coupled edges times should remain approximately constant. This trend can be seen in Figure 10, where the normalized computational times show the relative importance of each part of the algorithm and the percentage of the 1D model increases with the number of cells, whereas the transfer time, directly proportional to the number of 1D cells, remains with the same importance. However, some exceptions are worth mentioning.

The coupled model presented needs a homogenization of the time step sizes between all the sub-domains (see Figure 4). Thus, the most restrictive mesh and flow condition governs the simulation time. As is usually higher than 2D cell edge sizes, the floodplain usually governs the global . However, some extreme cases can occur when too many 1D cells are used and their size decreases, leading to a restriction imposed by the 1D model that slows down the simulation time. This is especially seen in the case with 2,000 1D cells on the main channel. It is worth mentioning that this fact affects only the 2D model simulation time and has nothing to do with the model being implemented on GPU in this hybrid form, but rather with the generic algorithm for the 1D2D model.

When this model runs on CPU, the transfer time does not exist and the percentage of the total simulation time consumed by the 2D floodplain is higher. This is shown in Figure 11, where the same itemization of the different computational times already seen for GPU is depicted for the model running purely on CPU. Note that the 1D channel consumes the same time in both the hybrid and the CPU models. The difference resides in the 2D model and the importance of the data transfer. The relevance of each part can be seen in Figure 12, where normalized times are shown. The 2D model takes the highest percentage of the total simulation times at every case, although the finer 1D resolution (2,000 cells) gets a significant increase in the total runtime.

On the other hand, when a full 2D framework is used transfer time is also missing since the whole model runs on the GPU. In this case, the small cells at the main channel require a computational time much higher than the transfer time and the 1D time together so the speed-up of the coupled model is still worthwhile. Besides, in this case, the main channel is especially time consuming in comparison with the floodplain since it contains the smallest cells (see Figure 7). The global speed-up accomplished by the hybrid coupled model in comparison with the full 2D is the SU3, sketched at the introduction in Figure 1. According to this figure, SU2 is the comparison of the hybrid model with the full CPU coupled model. Both speed-ups are represented for each hydrograph in Figure 13 (H1 to H4, from upper left to lower right), and for the number of 1D cells in the main channel.

In this case, it is clearly seen that the main channel has a strong influence on the results, provoking a higher speed-up when the river bed is substituted by a 1D model (speed-up 3). For this reason, when the model is compared with its CPU equivalent (speed-up 2), the acceleration is not that high.

### Real test case: the Ebro River (Spain)

In this section, the coupled model is applied to a 125 km long stretch of the Ebro River (Spain) that encompasses a surface of 392 km^{2}. Two different historical events are propagated over the mesh and the same speed-up analysis is carried out to study the performance of the model under realistic conditions with complex and non-homogeneous floodplains. Figure 14(a) shows the upstream hydrographs for both events of different duration and peak discharge. The shortest, E1, corresponds to a 2010 flood event that lasted 5 days reaching 2,000 m^{3}/s. The second one, E2, stands for the 2015 event that lasted 21 days and, containing two peaks, reached more than 2,500 m^{3}/s.

The previous case has revealed how the 2D model may govern the flow, especially when the floodplain gets involved to a great extent. This effect can be seen even more dramatically if the floodplain contains hydraulic structures, as levees, that require a mesh refinement in order to capture their crest properly. This is the particularity of the present case, as can be seen in Figure 14(b), where a zoom view allows us to see the comparison of the coupled 1D2D mesh and the full 2D mesh is shown. The figure reveals how the presence of narrow levees between field crops forces the 2D mesh to contain cells much smaller even than those in the main channel and this increases the computational time (Echeverribar *et al.* 2019).

The geometry of the case also determines the proper number of 1D cells. In this case, although the flood wave is of low frequency and not too many 1D cells are necessary to reproduce proper transport velocities, the curvature of the river forces the necessity of small values of within the 1D framework (also seen in Figure 14(b)), leading to 2,201 cells on the river bed.

Figure 15 shows the computational time divided by operations for the two different hydrographs in the Ebro River using the coupled model. For the sake of clarity, both normalized (b) and absolute (a) times are shown. Obviously, the E2 event consumes more time due to its own hydrograph duration. However, normalized times show that the trends are the same in both cases. The transfer time is not so predominant and the floodplain and coupled edges (all computed on GPU) have a strong influence on global time. Finally, due to the extent of the flooded area where small cells are required to represent the levees and the high consumption of the 1D model, simulation times are higher than could be expected according to the idealized case previously simulated.

Table 1 shows the computational times of the coupled model in comparison with the full 2D model. Unlike the previous test case, the presence of small cells in the floodplain and their high influence, combined with a high number of 1D cells in the coupled model, causes a lower speed-up.

Model . | E1 . | E2 . |
---|---|---|

1D2D | 12 h | 2.4 h |

Full 2D | 15 h | 3.41 h |

Speed-up | 1.25x | 1.42x |

Model . | E1 . | E2 . |
---|---|---|

1D2D | 12 h | 2.4 h |

Full 2D | 15 h | 3.41 h |

Speed-up | 1.25x | 1.42x |

## CONCLUSIONS

Even though 2D river flow models have become widely extended due to the detailed view of the flooding pattern that they offer, their calculation on large areas still remains excessively time consuming. Coupled 1D2D models emerged in the past as an alternative to accelerate the computation in river flood simulation cases, avoiding the use of a large number of 2D cells in the main channel and leading to a reduction of the total computational time. On the other hand, nowadays, 2D models have been improved in terms of computational efficiency thanks to HPC. The proposed work has merged these two strategies seeking an even greater speed-up.

For the analysis presented in this work, the 2D model runs on GPU and the 1D model is solved on the CPU. The computational performance of the coupled model has been analysed. From the examples presented, it can be concluded that, even with the additional data transfer time, there exists a positive speed-up with respect to the full CPU 1D2D coupled model and the full 2D GPU model in all cases. However, a 1D2D fully GPU could be justified in case of having a number of 1D cells that compares with the number of involved 2D cells.

The sensitivity analysis has shown the influence of different parameters of the model on the results. First, the number of elements within the 1D framework implies a big difference in the computational time of the main channel, reaching values with the highest number of cells (2,000) that are around ten times higher than the most advantageous case (100 cells). However, this time does not govern the total computational time. At this point, it is worth recalling the linking strategy of the model in which each 1D cell requires at least one 2D cell linked and, thus, a higher resolution on the main channel would lead to a finer 2D mesh. Therefore, it would always be more time consuming than the 1D framework itself. Additionally, even with a large amount of 1D cells on the main river, the number of operations required by the 1D2D model is always lower than the number of operations on a refined 2D main channel that requires a high amount of small 2D cells in order to properly represent an irregular bathymetry of a river. There can be an effect of speed-up reduction if governs the flow, slowing down the floodplain calculation and, as a consequence, the global time. Although, even in this case, the speed-up in comparison with the full 2D GPU model is worthwhile, it could be a matter of future research to explore the possibility of not exchanging information between models at every time step. However, it must be taken into consideration that both models share the same characteristic time scale and magnitude of wave propagation celerity. Therefore, a local time step for each model could complicate the algorithm without a clear advantage. In any case, it could be concluded that the fact of having the 1D model implemented on CPU does not affect the global computational time significantly. The influence of the floodplain in the whole simulation has also been evaluated. Depending on the number of wet cells on the floodplain, their area and their velocity (thus, the time step size), the time consumed by the 2D scheme will be higher. The sensitivity test case, with a finely discretized bathymetry, has been subjected to different hydrographs studying the influence of the maximum flooded area.

After the sensitivity analysis in an idealized case, the coupled model is applied to a specific real case. The challenge lies in all the hydraulic structures that the flood-prone area contains, and the necessity of very small cells for their representation. This situation leads to lower speed-ups due to the extreme influence of the floodplain on the simulation time. Although a speed-up greater than one is accomplished, the profitability of the coupled model could be discussed for short cases. It is worth mentioning that the speed-up in these cases is fully dependent on the duration of the event. For the 5-day event, with a low peak, it is not that as high as for the long case with a higher peak.

The use of a coupled model can be extremely worthwhile for substituting a full 2D model for flood simulation when HPC is used, as in the proposed model. The use of the CPU only for the computation of the 1D scheme does not present a bottleneck and the efficiency of the model has been proved. Nevertheless, it is important to evaluate the type of simulation to be run so that the efficiency of the coupled model is ensured to be worth it. Another final fact that should be mentioned is the additional effort that the geometric link between sub-domains requires for mesh creation. Thus, although the presented results are positive and the model has proved its efficiency, each simulation case should be evaluated before choosing a coupled or a full 2D model.

## ACKNOWLEDGEMENTS

This work is part of the PGC2018-094341-B-I00 research project funded by the Ministry of Science and Innovation/FEDER. Additionally, Mario Morales-Hernández was partially supported by the U.S. Air Force Numerical Weather Modeling Program.

OpenMP Application Programming Interface. High-Performance Computing Center Stuttgart, Germany