## Abstract

The response capacities of urban flood forecasting and risk control can be improved by strengthening the computational abilities of urban flood numerical models. In this work, a GPU-based hydrodynamic model is developed to simulate urban rainstorm inundations. By simulating rainstorm floods in a certain area of Xixian New City, the established model can implement high-resolution urban rainstorm inundation simulations with significantly accelerated computing performances. The accelerated computation efficiencies of the different rainstorm event simulations under resolutions of 5 and 2 m are quantitatively analysed, showing that the absolute and relative speedup ratios for all scenarios of applying two GPUs range from 10.8 to 12.6 and 1.32 to 1.68 times as much as those of a CPU and a single GPU, respectively. The application of a large-scale rainstorm inundation simulation shows the excellent acceleration performance of the model compared to previous research. In addition, the greater the number of computational grids included in the simulation, the more significant the effect on the acceleration computing performance. The proposed model efficiently predicts the spatial variation in the inundation water depth. The simulation results provide guidance for urban rainstorm inundation management, and they improve the time and efficiency of urban flood emergency decision-making.

## HIGHLIGHTS

The GPU-based hydrodynamic model is used to evaluate sponge city rainstorm inundation.

Significant speedup of the model executed on multi-GPU compared to a single GPU.

The model is efficient for performing high-resolution sponge city inundation simulation.

Validation and speedup computing performance tests on real urban city-scale areas.

## INTRODUCTION

Floods are natural disaster phenomena throughout the world, causing economic loss, urban infrastructural damage and threats to people's lives (Iqbal *et al.* 2018; Rong *et al.* 2020; Luo *et al.* 2022a). Rapid urbanization and rainstorms are important factors that compound the effects of floods (Argüeso *et al.* 2016). Rapid urbanization can intensify the effects of extreme precipitation, and large quantities of impermeable surface areas result in increasing surface runoff (Argüeso *et al.* 2016). The surface roughness, which is a result of urbanization, increases the risk of flooding, as reported by Zhang *et al.* (2018). In addition, global warming causes more rainstorm events with higher frequencies, and increased numbers of extreme precipitation events result in an increase in pluvial and urban floods (IPCC 2021; Cao *et al.* 2022). In addition, since the 21st century, heavy floods have occurred in many Chinese cities that were caused by extreme precipitation events, especially in recent years. For example, China's heaviest flood disaster occurred in 2021 in Zhengzhou city; it was caused by a torrential rain event (Dong *et al.* 2022). However, extreme rainstorms are difficult to forecast and frequently cause floods that cause huge economic losses. Therefore, it is important to improve the time and efficiency of flood disaster emergency decision-making by improving the predictive capability of numerical modelling for urban rainstorms, effectively reducing the risk of urban floods.

Urban flood models are efficient tools for urban stormwater administration and flood forecasting (Qi *et al.* 2021; Luo *et al.* 2022b). The current urban flood modelling method involves the use of hydrological and hydrodynamic models. Hydrological models are more effectively used in large catchment flood simulations; however, they cannot describe the dynamic flooding process in detail, e.g., the surface water depth, inundated area and flow velocity (Liang *et al.* 2015; Liang & Smith 2015; Qi *et al.* 2021). However, the hydrodynamic model has an advantage in reflecting the surface flow process over complex underlying surfaces (Felder *et al.* 2017). Compared with the hydrological model, the hydrodynamic model has a computational advantage in having fewer input parameters (Xia *et al.* 2017). Thus, the hydrodynamic model is a powerful simulation method, and it is widely used in urban food modelling. Previous studies suggested that a two-dimensional (2D) hydrodynamic model based on shallow water equations (SWEs) can be effectively applied for more accurate urban flood simulations with higher accuracy terrain data (Morales-Hernandez *et al.* 2013; Liang & Smith 2015; Xing *et al.* 2022). The Godunov-type scheme finite volume framework is popular for addressing SWEs (Liang & Marche 2009; Hou *et al.* 2013a) and can be applied to address complex shallow water flows. In addition, most studies show that the full 2D hydrodynamic model is an effective simulation method to accurately predict urban flood dynamics and flood risk (Song *et al.* 2011; Rong *et al.* 2020; Qi *et al.* 2021). Overall, the full 2D hydrodynamic model has been proven highly accurate for flood simulations in complicated topographic conditions.

However, the full 2D hydrodynamic model suffers from computational efficiency limitations for high-resolution numerical simulations in high-resolution urban terrains. High-performance computing (HPC) techniques can be applied to overcome the low computational efficiency limitations. The most popular HPC approaches include Open Multi-Processing (OpenMP), Message Passing Interface (MPI) and Graphics Processing Unit (GPU). parallel computing. Among them, it was confirmed that the GPU computing method is the most promising flood model acceleration pathway, indicating that a full 2D hydrodynamic model using the GPU parallel computing approach can accelerate the model by several orders of magnitude compared to the base model running on a multi-CPU system approach (e.g., OpenMP and MPI) (Liang *et al.* 2015, 2016; Vacondio *et al.* 2016). For example, a single GPU can achieve a simulation speedup of 20 orders of magnitude compared to a model running on multicore CPU implementations (Smith *et al.* 2015; Ayyad *et al.* 2020). This is because a GPU includes more computational cores than a multi-CPU with larger spatial-scale computations. Thus, the GPU-based hydrodynamic model demonstrates accelerated computational efficiencies, which can shorten the execution time, and it is widely used for flood simulations (Kalyanapu *et al.* 2011; Lacasta *et al.* 2014; Liang & Smith 2015; Hu & Song 2018). However, the 2D hydrodynamic model's method, based on a single GPU, is applied in most current studies, which implies there are longer computational simulation times in larger domains for urban flood simulations. The poor calculation performance for the current numerical simulation hydrodynamic methods cannot address the needs of more practical problems. To meet the need for faster computations and larger domains, a multiple GPU parallel computing approach should be applied to large-scale rainstorm flood processes. Previous research reported that multi-GPUs attain superior computational speeds to address SWEs based on an explicit finite volume scheme (Sætra & Brodtkorb 2012). Based on the above discussion, the multi-GPU parallel computing approach could be identified as a promising strategy for flood simulation.

In this work, an efficient GPU-based hydrodynamic numerical model is proposed to evaluate the effect of rainstorms on sponge city inundation. The quantitative analysis of model computing speedup for the rainstorm inundation simulation implemented on two GPUs was early carried out in the sponge city of China. This model greatly speeds up the simulation computing by using an explicit finite volume scheme for addressing SWEs. Domain decomposition is applied to achieve multi-GPU parallel computing in each GPU under the 2D structured mesh domains. It is also helpful in improving the time and efficiency of urban flood disaster emergency decision-making. This paper is organized as follows. The full 2D hydrodynamic model is briefly introduced in Section 2, and a detailed description of the implementation on multiple GPUs is also provided. A case study and data resources are provided in Section 3. Section 4 provides the simulation results and a discussion. Conclusions and recommendations for further work are briefly provided in the last section.

## METHODS

### Numerical model

*t*represents the time,

*x*and

*y*are the Cartesian coordinates,

*q*is the flow variable vector,

*f*and

*g*are the flux vectors in the

*x*and

*y*directions, respectively, and

*S*is the source term vector, which contains the bed slope source term

*S*

_{b}and friction source term

*S*

_{f}. These vectors are expressed aswhere and are the unit-width discharges in the

*x*-direction and

*y*-direction, respectively. and

*h*are the bed elevation and water depth, respectively, while the water surface level is ;

*u*and

*v*are the velocity components in the

*x*-direction and

*y*-direction, respectively, and , ;

*g*is the gravity acceleration, and the value is 9.81 m/s

^{2}; represents the bed roughness coefficient calculated by the Manning coefficient

*n*and water depth using ;

*i*is the source and sink terms (contains rainfall and infiltration).

*et al.*2013a) is applied to discretize the SWEs, and the updated flow variables in the new time step are expressed as:where the superscript

*n*is the time level;

*i*and

*j*are the cell positions; is the time step; and are the cell sizes in the

*x*-direction and

*y*-direction, respectively.

For the numerical methods of the current model, the interface fluxes are determined by the Harten, Lax and van Leer approximate Riemann solver using the contact wave (HLLC) (Liang *et al.* 2015). To avoid numerical oscillations, second-order accuracy in space and time can be achieved by using the Monotone Upstream Scheme for Conservation Law (MUSCL) scheme (Liang 2010). The slope source and friction source terms are calculated by the slope-flux and splitting point-implicit methods (Liang & Marche 2009; Hou *et al.* 2013a), respectively. The two-stage explicit Runge‒Kutta time-integration scheme is employed to update the variables' values at a new time level (Song *et al.* 2011) and by applying the CFL (Courant‒Friedrichs‒Lewy) for controlling the time steps. The numerical instabilities of the water depth computation are improved under wet and dry complex topographies by using the format adaptive method proposed by Hou *et al.* (2013b), which improves the simulated accuracy and avoids unphysical high velocities at wet‒dry fronts. To summarize, the abovementioned numerical methods are employed to improve the computation stability and accuracy of the model. In addition, GPU parallel computing technology is used to accelerate the computational efficiency of the current model. The explicit calculation format of the proposed model is more favourable for adopting the GPU parallel computing method to achieve high-performance simulations.

### Implementation on GPU

In this section, the GPU parallel computation approach with a structured grid is introduced. The numerical methods are implemented on GPUs with parallel computation to speed up the simulations. The developed GPU-based 2D hydrodynamic numerical model was programmed using the CUDA/C ++ programming language on NVIDIA GPU hardware. After successful testing and validation, the single GPU model program was converted to a multi-GPU model program. Two GPU cards (NVIDIA GeForce GTX 1080) of the same type were used in this work, avoiding the unbalanced workload on different types of GPUs. Table 1 reports the characteristics of the two GPU cards. The GPU cards have a larger number of CUDA cores and RAM to accommodate large-scale numerical computing and meet the demand for high-performance simulations. The GPU models are tested on the same NVIDIA GPU hardware; the single GPU model is tested on GPU 1, and the multi-GPU model is tested on both GPU 1 and GPU 2. All of the simulations were executed on a Windows 64-bit operating system.

. | GPU 1 . | GPU 2 . |
---|---|---|

CUDA cores | 2,560 | 2,560 |

GPU RAM (GB) | 8 | 8 |

NVIDIA graphics card | NVIDIA GeForce GTX 1080 | NVIDIA GeForce GTX 1080 |

GPU frequency | 1,607 MHz | 1,607 MHz |

Bandwidth (GB/s) | 320 | 320 |

. | GPU 1 . | GPU 2 . |
---|---|---|

CUDA cores | 2,560 | 2,560 |

GPU RAM (GB) | 8 | 8 |

NVIDIA graphics card | NVIDIA GeForce GTX 1080 | NVIDIA GeForce GTX 1080 |

GPU frequency | 1,607 MHz | 1,607 MHz |

Bandwidth (GB/s) | 320 | 320 |

#### Implementation on a single GPU

*et al.*2015). The former executes parallel codes (kernels) to carry out the actual data computations on the GPU. The latter mainly executes the serial code, including the memory allocation and the exchanges between the GPU and CPU. Figure 1 illustrates the basic process of single GPU parallel computing. Most GPU models take advantage of this process to execute flood simulations with high computing performances, benefitting from the GPU parallel calculation implemented in the numerical algorithms (Vacondio

*et al.*2014; Liang

*et al.*2015; Dazzi

*et al.*2020). The single GPU model programming is designed in such a way that the data from the CPU are first transferred to the GPU after the GPU memory spaces are allocated. The data from the CPU contain the initial model input data and parameters, e.g., the hydrologic and hydraulic parameters, grid cell information and boundary parameters. Then, the parallelization process of the numerical algorithms is performed on the GPU, including interface flux and source terms, updated time steps and updated variable values for each grid cell at a new time level. After the completion of all the computational processes on the GPU, the computed data are finally transferred back to the CPU, and the output can be written synchronously to files for storage and postprocessing. The single GPU model is tested in terms of this basic process executed on GPU 1.

#### Implementation on multi-GPUs

The simulations take a long time to simulate flood processes over the large-scale computing domain implemented on a single GPU due to insufficient computing memory for the single GPU. Compared with a single GPU, multiple GPUs have multiple computing memory spaces to provide powerful computing efficiency for the numerical models. It is helpful to implement rapid simulations of the flood process, which aid in the effective forecasting of floods. The parallel computing process of the developed GPU-based 2D hydrodynamic numerical model implemented on two GPUs (both GPU1 and GPU2) is introduced in this section. Different from single GPU parallel calculations, the domain decomposition and data exchange from the neighbouring subdomains between GPU 1 and GPU 2 are the two critical aspects that need to be addressed.

*et al.*2022; Yang

*et al.*2022). This study first needs to decompose the entire domain into subdomains, and the number of subdomains is equal to that of the GPU. Two GPU cards are used in this work. Thus, the entire domain (

*M*×

*N*) is equally decomposed into two subdomains along the

*y*-direction, and there are the same number of rows as columns for each subdomain (

*M*/2 ×

*N*), as shown on the left of Figure 2. In addition, it is necessary to apply special treatments at the edges of the subdomains because every two neighbouring computed cells have a common edge. The flux calculation of the cells at the boundary is closely related to the neighbouring cells in the structured grid cell domain. It can be seen that the value of the neighbouring cells in the lower boundary of subdomain A is in subdomain B, instead of the value of the neighbouring cells in the upper boundary of subdomain B being in subdomain A. To ensure an accurate calculation and to update the new value of the interface flux values of a computed cell on an edge, the value of the neighbouring cells of a subdomain is allocated to another subdomain. The best method is to add a layer of computed cells to the boundary of two subdomains. To avoid excessive memory exchanges between the host and the device each time, one layer of computed cells (1 ×

*N*) is expanded on the boundary of each subdomain in this study, as shown on the right of Figure 2. Figure 2 shows the overlapping region, and the black lines in the middle are the common edges between two adjacent subdomains. One layer of computed cells (dark green area) in the lower boundary of subdomain A is expanded into the outer boundary in the upper boundary of subdomain B, and the dark green and green areas form a new subdomain B with the cell of (

*M*/2 + 1) ×

*N*. Similarly, one layer of computed cells (dark blue area) in the upper boundary of subdomain B is expanded into the outer boundary in the lower boundary of subdomain A, and the dark blue and blue areas form a new subdomain B with the cell of (M/2 + 1) × N. Finally, the entire domain is equally decomposed into two new subdomains (

*M*/2 + 1) ×

*N*. Domain decomposition is essential for the data exchange between GPU 1 and GPU 2. The domain decomposition will use multi-GPU parallel computing to improve computation efficiency.

For the second aspect, the data exchange from the neighbouring subdomains between GPU 1 and GPU 2, CUDA streams were used to solve the data exchange through asynchronous copying between the GPUs and the CPU. As mentioned earlier, one layer of computed cells (1 × *N*) is expanded on the boundary of each subdomain. The data exchange for the variables of these expanded one-layer computed cells (1 × N) needs to require an asynchronous copy between each GPU and the CPU. The GPU's NVIDIA GeForce GTX 1080 cards have enough bandwidth to avoid excessive data exchange times between the GPUs and the CPU, and they are also effective in accelerating the model's computational abilities. CUDA streams can be used for data communication and concurrent execution of the kernels between the host and the device. As shown in Figures 2 and 3, two CUDA streams are created for each of the two GPUs in the computing system. Stream 1 and Stream 2 are created to control the data communication and kernel computation for the other and overlapping regions on GPU 1 and GPU 2, respectively.

The detailed process of the data exchange using CUDA streams after the domain decomposition pre-process, when finished includes the following. First, creating CUDA stream 1 and stream 2 is used to control the variable computation in the corresponding GPU 1 and GPU 2. The variable data of subdomains A and B, after the decomposition of the computing area, are copied from the CPU to GPU 1 and GPU 2, respectively. Second, we copy the variable data in the lower boundary of subdomain A from the CPU to GPU 0 and fill it to the outer boundary (external layer) of the upper boundary of subdomain B. Similarly, we copy the variable data in the upper boundary of subdomain B from the CPU to GPU 0, and fill it to the outer boundary of the lower boundary of subdomain A (Figure 2). Third, after the completion of the above memory exchanges and data expansion, the first iterative calculation starts to be conducted synchronously on both GPUs and is completed. The new solution is updated; the updated variable data of the lower boundary of subdomain A are copied from GPU 1 to the CPU, and the updated variable data of the upper boundary of subdomain B are copied from GPU 2 to the CPU. The two kinds of updated variable data, which only have one layer of computed cells (1 × N), are stored in the CPU as backups for the next time step. Fourth, the described CPU-GPU data exchanges and computation methods are followed until the computing requirements and simulation task are completed. Finally, the variable data from GPU 1 and GPU 2 to the CPU are copied except for the variable data of the outer boundary; the final merged data of the entire computing domain are merged and stored in the hard disk. Thus, data exchange from the neighbouring subdomains between GPU 1 and GPU 2 can obtain the best computation performance based on a particular CUDA stream. The proposed CUDA streams are an effective and powerful way to overlap CPU-GPU memory exchanges for multi-GPU computations.

### Model performance assessment

The relative error (RE) of the simulated and measured inundation areas for urban rainstorm-inundated sites is used to assess the model's accuracy. The absolute speedup (AS) and relative speedup (RS) ratio are applied to assess the accelerated computing efficiency of the model, analysing the speedup effect of the GPUs to CPUs, and two GPUs to a single GPU, respectively.

## CASE STUDY AND DATA

### Study area

*et al.*2019). The study area is a core area of the sponge city construction in the Xixian New Area (Figure 4), and the area is approximately 14.5 km

^{2}, with a subhumid warm temperate climate. The annual average precipitation is 520.0 mm, and the rainfall is mainly concentrated from July to September. The study area is located south of the Xibao highway, north of Tongyi road, west of the Wei River and east of the Feng River. The seasonal distribution of rainfall shows uneven changes and high annual variations between years (Gao

*et al.*2020). Simulation studies of rainstorm inundation and stormwater processes in the Fengxi New City region have been conducted by the local government, and some of the studies adopted numerical models to evaluate rainstorm inundation and urban stormwater processes (Wang

*et al.*2019; Yang

*et al.*2021). As mentioned earlier, it is important to select this study area as a typical region that reflects the urban rainstorm inundations.

### Data sources

*et al.*(2015) as:where

*q*is the design rainstorm intensity, L/s × ha;

*P*is the design return period in years; and

*t*is the rainstorm duration in minutes.

To better determine the impact of rainfall on the computational efficiency of the rainstorm inundation simulation, rainfall return periods of 5, 20 and 50 years are used as the design rainstorm events in this work (Table 2). LiDAR and a multicamera measurement system equipped with an unmanned aerial vehicle are employed to capture the high-resolution terrain data (DEM) of the study area, with a resolution of 2 m (Figure 5(b)). The terrain data of 1 and 5 m resolutions are obtained by the analysis and interpolation of the Geographic Information System (GIS) data. Thus, the applied terrain resolutions of the computational domain are 1, 2 and 5 m, with squared cells of approximately 2.1 × 10^{7}, 5.37 × 10^{6} and 8.59 × 10^{5} squared cells, respectively. The land use map is captured by the LiDAR technique, including residential land, road, woodland, grassland and bare land in this study area (Figure 5(c)). The infiltration values of these five land use types are collected through measurements. In addition, the Manning coefficient values of different land uses were determined according to the parameter values resulting from previous studies that carried out flood simulations in the same study region (Table 3). The Manning coefficient values for setting up the model are summarized in Table 3, and the values are given as Qi *et al.* (2018). For the equivalent infiltration value of the urban pipe network, some studies have proposed a method that implies that constant rainfall infiltration can represent the designed drainage capacity of the urban pipe network, and the simulated results indicate that this method can provide high simulation accuracies (Wang *et al.* 2018; Hou *et al.* 2021). Therefore, the applied equivalent infiltration constant value of the urban drainage network capacity is the rainfall value of the 1-year return period design rainstorm, which was calculated by Equation (1), with a rainfall value of 12.8 mm/h.

No. . | Return period . | Rainfall (mm) . | Rainfall duration (h) . |
---|---|---|---|

1 | 5 | 38.87 | 2 |

2 | 20 | 61.33 | 2 |

3 | 50 | 76.17 | 2 |

No. . | Return period . | Rainfall (mm) . | Rainfall duration (h) . |
---|---|---|---|

1 | 5 | 38.87 | 2 |

2 | 20 | 61.33 | 2 |

3 | 50 | 76.17 | 2 |

No. . | Land use . | Manning coefficient (sm^{−1/3})
. |
---|---|---|

1 | Residential land | 0.015 |

2 | Bare land | 0.03 |

3 | Grass land | 0.06 |

4 | Wood land | 0.20 |

5 | Road | 0.014 |

No. . | Land use . | Manning coefficient (sm^{−1/3})
. |
---|---|---|

1 | Residential land | 0.015 |

2 | Bare land | 0.03 |

3 | Grass land | 0.06 |

4 | Wood land | 0.20 |

5 | Road | 0.014 |

## RESULTS AND DISCUSSION

### Model validation

^{6}grid cells at a resolution of 2 m. Two inundated sites were selected, and the inundation area of the inundated sites was measured after a rainfall (

*t*= 5 h), as shown in Table 4. The positions of the inundated sites and the comparison of the simulation and measured inundation are illustrated in Figure 6. Table 4 summarizes the measured and computed inundation area values. The relative errors of the measured and computed inundation areas of all the inundated sites are calculated by Equation (5), with values of 3.4 and 8.6%, respectively. The results indicate that the computed inundation areas are close to the measured inundation areas, and the acceptable errors imply that the established model has a higher accuracy.

Inundation position . | Area (m^{2}). | Relative error (%) . | |
---|---|---|---|

Measured values . | Computed values . | ||

S1 | 1,000 | 914 | 8.6 |

S2 | 800 | 773 | 3.4 |

Inundation position . | Area (m^{2}). | Relative error (%) . | |
---|---|---|---|

Measured values . | Computed values . | ||

S1 | 1,000 | 914 | 8.6 |

S2 | 800 | 773 | 3.4 |

### Assessment of model computing speedup

To explore the effect of accelerating the computing performance improvement of the established multi-GPU-based hydrodynamic numerical model in urban rainstorm inundation simulation, the model computing speedups are assessed by running on a multicore CPU (6 cores), single GPU (1 GPU), and multi-GPU (2 GPUs). The DEM data of the study area from Fengxi New City have a spatial resolution of 2 m with 5.37 × 10^{6} grid cells in the computational domain. The spatial resolution of the 5 m DEM data can be acquired by resampling the 2 m DEM, including the 8.59 × 10^{5} grid cells. Table 5 shows the simulation times for running on the different parallel computing techniques under both spatial resolutions for the different return periods (5a, 20a and 50a). In addition, insufficient memory will occur in the simulation that is executed on the CPU due to the larger grid cells in the computational domain. Thus, the urban rainstorm inundation simulations are executed on 1 GPU and 2 GPUs without a CPU under a resolution of 2 m in this work (Table 5). The results show that decreased simulation runtimes for the different return periods with a resolution of 5 m with 8.59 × 10^{5} grid cells, in descending order of magnitude, were the CPU, 1 GPU and 2 GPU models. A similar law shows that the simulation runtimes in descending order of magnitude are from the 1 GPU and 2 GPU models.

Grid cell resolution . | Cells number . | Return period/a . | Execution time on CPU/s . | Execution time on 1GPU/s . | Execution time on 2GPUs/s . |
---|---|---|---|---|---|

5 m | 8.59 × 10^{5} | 5 | 4,061 | 431 | 322 |

20 | 5,551 | 634 | 480 | ||

50 | 7,596 | 937 | 706 | ||

2 m | 5.37 × 10^{6} | 5 | — | 6,065 | 3,605 |

20 | — | 7,923 | 4,755 | ||

50 | — | 9,472 | 5,708 |

Grid cell resolution . | Cells number . | Return period/a . | Execution time on CPU/s . | Execution time on 1GPU/s . | Execution time on 2GPUs/s . |
---|---|---|---|---|---|

5 m | 8.59 × 10^{5} | 5 | 4,061 | 431 | 322 |

20 | 5,551 | 634 | 480 | ||

50 | 7,596 | 937 | 706 | ||

2 m | 5.37 × 10^{6} | 5 | — | 6,065 | 3,605 |

20 | — | 7,923 | 4,755 | ||

50 | — | 9,472 | 5,708 |

^{5}) in the 5 m resolution terrain. It can be seen that multi-GPU parallel computing has better-accelerated simulation advantages than a single GPU, indicating that the urban rainstorm inundation simulation executed on a multi-GPU can shorten the simulation calculation time. In addition, the RS ratio of multi-GPU to single GPU parallel computing ranges from 1.66 to 1.68 times, with a larger number of computing grid cells (5.37 × 10

^{6}) in the 2 m resolution terrain, as described in Figure 8. The significant RS ratio further proves that the proposed multi-GPU parallel computing method can effectively improve the computational efficiency of an urban rainstorm inundation simulation.

The comparison of the speedup ratio results for the two different grid cells of the computational domain regions under the same rainstorm event simulation in Figure 8, shows that it is obvious that the computational grid cells increase (higher spatial resolution); and the accelerated modelling performance implies significant increases with a larger speedup ratio. Additionally, it reveals that multi-GPU parallel computing can be a powerful technique to improve the computing efficiency, and it also has a great advantage in carrying out a more accurate simulation in the application of urban rainstorm inundations with larger grid cells at higher spatial resolutions.

The speedup effect of rainstorm inundation simulations at high resolutions executed on multi-GPUs is analysed. The high-resolution terrain data of 1 m are resampled from a spatial resolution of 2 m, with 2.1 × 10^{7} grid cells. The observed rainfall event that occurred on August 25, 2016, was selected as the input rainstorm. The designed simulation time lasted for 8 h. The simulation is executed on multi-GPUs, without on a single GPU because of the limited memory under larger grid cells. Table 6 shows the simulation time result of this study compared to previous research. The previous research applied a 2D hydrodynamic model based on a single GPU to carry out rainstorm inundation simulations of a small-scale region (Hou *et al.* 2018). The previous small-scale region (an area less than a third of the entire study area) belongs to part of our study area. As shown in Table 6, the execution times on a single GPU and two GPUs under 3.68 × 10^{6} and 2.1 × 10^{7} grid cells are 12.5 and 12.35 h using the same NVIDIA GeForce GTX 1080 GPU processor type, respectively. Comparisons of the results with similar research show that the proposed method in this study has more advantages in large-scale high-resolution calculations. In addition, although the actual execution time (12.35 h) on the two GPUs is smaller than the designed simulation time (8 h) under the 2.1 × 10^{7} grid cells at a resolution of 1 m, this is because the physical simulation flood process is more time-consuming on complex urban underlying surfaces in the larger grid cell domain. Therefore, compared to previous research on the speedup effect, the proposed model in this study provides an excellent acceleration performance in rainstorm inundation modelling.

. | Hou et al. (2018)
. | In this work . |
---|---|---|

Grid resolution | 1 m | 1 m |

Cells | 5.37 × 10^{6} | 2.1 × 10^{7} |

GPU hardware | 1 NVIDIA GeForce GTX 1080 | 2 NVIDIA GeForce GTX 1080 |

CPU processor | Intel(R) Core (TM)i7-4790 | Intel(R) Core (TM)i7-8700 |

Runtimes /h | 12.35 | 12.5 |

. | Hou et al. (2018)
. | In this work . |
---|---|---|

Grid resolution | 1 m | 1 m |

Cells | 5.37 × 10^{6} | 2.1 × 10^{7} |

GPU hardware | 1 NVIDIA GeForce GTX 1080 | 2 NVIDIA GeForce GTX 1080 |

CPU processor | Intel(R) Core (TM)i7-4790 | Intel(R) Core (TM)i7-8700 |

Runtimes /h | 12.35 | 12.5 |

### Application of rainstorm inundation simulation

*t*≥ 2 h). Thus, the results indicate that the established model can accurately simulate the spatial variation in the inundation region. Meanwhile, the proposed multi-GPU-based hydrodynamic model is efficient for predicting the spatial distribution of the inundation water depth to assess the urban rainstorm inundation risk.

### Discussion

This study proposes an effective GPU-based high-performance numerical model for urban rainstorm inundation simulations. According to the runtimes and speedup ratios of the model running on the CPU and GPU, as shown in Figures 7–10 and Table 6, it has obvious model performance advantages in the application of a rainstorm inundation simulation, with higher accuracy and acceleration computing, showing the spatial variations of the inundation water depths. The GPU-based rainstorm model is based on the solution to SWEs using the explicit finite volume scheme method. The explicit scheme method can map very well to the hardware architecture of GPUs, representing a kind of scheme that is well suited for GPU computing (Brodtkorb & Sætra 2012). Thus, GPUs have been used successfully for solving the SWEs to improve computing efficiency. In addition, some studies have applied the CPU and GPU models to produce urban flood simulations, indicating that the numerical model implemented on a GPU can address modelling time constraints more effectively than a CPU, and it can more accurately represent the flow characteristics in a complex topography (Bales & Wagner 2009; Kalyanapu *et al.* 2011). In addition, all of the simulations executed on a GPU server, or on different GPU hardware for different spatial resolutions, were analysed, and the speedup performance of the GPU model is more significant than that of the CPU framework (Kalyanapu *et al.* 2011). The same conclusion is drawn in this study as in the abovementioned studies. Thus, the GPU model is a scientific method, which can effectively be used for flood modelling in the future.

The previous researches indicated that CUDA performs better computing performance in the parallel computing applications (Fang *et al.* 2011; Du *et al.* 2012). Thus, the model has the better advantage to be implemented on the NVIDIA CUDA programming platform to carry out GPU HPC. The developed GPU-based 2D hydrodynamic numerical model was implemented on two GPUs (both GPU1 and GPU2) in this work. If the model performs simulations on multiple GPUs (more than 2GPUs), it is necessary to further decompose the entire computing domain into multiple subdomains based on the current method, and the data exchanges and calculations will be more complex. Thus, the multi-GPU-based flood model based on the proposed model can be further extended to multiple GPUs. Domain decomposition increases the communication and data exchange times between the GPUs and neighbouring GPUs. The row decomposition can minimize the overlapping areas of the domain and communication costs, while the multiple GPU (e.g., 2 GPUs) simulations simply require the exchanges of the overlapping regions between the two GPUs before each simulation step, as being described in Brodtkorb & Sætra (2012). However, the larger the number of GPUs, the faster the computing performance changes. Some researches show that data exchanges between different GPUs can also be achieved using the CUDA stream, CUDA/MPI and CUDA/OpenMP methods (Liang *et al.* 2016; Lončar *et al.* 2016). The most GPU models are tested on a single GPU (different type of GPU hardware) and multiple GPUs (same type of GPU hardware) (Lacasta *et al.* 2014; Liang *et al.* 2016; Hou *et al.* 2018; Gong *et al.* 2020). The computing domain decomposition may lead to an unbalanced workload among different types of GPUs. However, the two same types of GPU cards were selected to carry out the simulation before the model was set up, and each model computing is implemented on the two same types of GPU cards (NVIDIA GeForce GTX 1080 cards) adopting the GPUs parallel computing method in this work, avoiding unbalanced GPU workload on the different types of GPU. Thus, there will be a great computing performance of the data exchanged and updated variables using the same type of GPU.

The speedup effect of the proposed model compared to previous research has been analysed in Table 6. To more accurately describe the computing performance of the multi-GPU model, a comparison of the speedup performance for the established multi-GPU model and other GPU models reported in recent research is summarized in Table 7. Recent research developed an urban flood model to simulate floods in the same study area as this work (Gong *et al.* 2020). It is clear that a simulation on the same type of GPU hardware, cell number and same design rainstorm has a return period of 50 years, with different CPU processors and input parameters. Table 7 shows the relative result of the speedup ratio of this work compared to Gong *et al.* (2020), indicating that the multi-GPU model displays an excellent computational performance compared to the single GPU model under the same type of GPU hardware in the same study area. In addition, theoretically, the speedup ratio of model computing executed on two GPUs to a single GPU cannot exceed two times. The results show that the speedup ratios of model computing are less than two times in this work. There is a slightly greater than two speedup ratios, as shown in Table 7; this is because of some errors and different computing hardware (e.g., CPU processor, other hardware) in this study and the research by Gong *et al.* (2020). Besides, it is possible that compared to wet blocks, more dry blocks can be involved in calculation and memory allocation in inundations simulation of dry areas, while these computations can increase the consumption for both memory bandwidth and floating-point operations (Brodtkorb & Sætra, 2012). In order to avoid the too much cells computation in the dry area, the sparse memory can be used to solve this shortcoming. The method of dry blocks and sparse memory use are efficient method to improve to accelerate computation and reduce memory requirements as described in (Brodtkorb & Sætra 2012). The high levels of sparsity are exploited to accelerate simulation calculation and reduce memory requirements (Gale *et al.* 2020). Sparse simulations only carry out the wet block launch that needs to be calculated, significantly improving the computation performance by saving the computation and data transmission bandwidth. Moreover, the potential great computation performance of GPU flood model is exhibited through larger spatial-scale computations. Thus, the speedup ratio of flood model computing executed on two GPUs to a single GPU will be close to two times based on the dry blocks and sparse memory use method under a larger number of cell grids. Overall, the speedup performance of simulating urban rainstorm inundation using multi-GPU parallel computing is substantial.

. | Gong et al. (2020)
. | In this work . |
---|---|---|

Grid resolution | 2 m | 2 m |

Cells | 5.37 × 10^{6} | 5.37 × 10^{6} |

GPU hardware | 1 NVIDIA GeForce GTX 1080 | 2 NVIDIA GeForce GTX 1080 |

CPU processor | Intel(R) Core (TM)i7-7700 | Intel(R) Core (TM)i7-8700 |

Runtimes /s | 11,700 | 5,708 |

Speedup ratio of this work compared to Gong et al. (2020) | – | 2× |

. | Gong et al. (2020)
. | In this work . |
---|---|---|

Grid resolution | 2 m | 2 m |

Cells | 5.37 × 10^{6} | 5.37 × 10^{6} |

GPU hardware | 1 NVIDIA GeForce GTX 1080 | 2 NVIDIA GeForce GTX 1080 |

CPU processor | Intel(R) Core (TM)i7-7700 | Intel(R) Core (TM)i7-8700 |

Runtimes /s | 11,700 | 5,708 |

Speedup ratio of this work compared to Gong et al. (2020) | – | 2× |

In addition, based on the result of computing the RS ratios when the simulations are executed on 1 GPU and 2 GPUs, it was found that the higher the number of grid cells, the larger the acceleration performance for the 2-GPU model. The speedup ratio of multi-GPU to single-GPU parallel computing is positively correlated with the increase in the number of grid cells in the same study area. Based on the above analysis, the result shows that the number of grid cells has a significant impact on the accelerated computing performance using the multi-GPU parallel computing method. Based on the highly accurate simulation result, it is determined that multi-GPU parallel computing can be applied in numerical approaches to achieve faster computing speedup. Thus, the multi-GPU-based high-performance numerical model in this study can realize the rapid and accurate simulation of the storm flood process and has a significantly accelerated computing performance; it is a powerful method to apply to flood predictions.

## CONCLUSIONS

This paper presents a novel urban rainstorm inundation simulation method and also introduces a GPU-based 2D hydrodynamic model. The model is validated and applied in an idealized study area from the Fengxi New City region in China. The computational performance of the proposed model is assessed on a multicore CPU, a single GPU and two GPUs. The main conclusions are summarized as follows:

The proposed model adopted the domain decomposition method to decompose the computing domain into subdomains, providing an effective parallelization approach implemented on each GPU to improve the computational speedup for a high-resolution flood simulation. The model is validated using measured rainstorm events in the Fengxi New City region, and the computed inundation area shows good agreement with the measured area, indicating that the established model can reliably simulate the rainstorm inundation process.

The urban rainstorm inundation simulation is implemented on a multicore CPU, a single GPU and two GPUs under three kinds of designed rainstorm events at a terrain resolution of 5 m (8.59 × 10

^{5}grid cells). The computational speedups of using a single GPU and two GPUs range from 8.1 to 9.4 and 10.8 to 12.6 times, respectively, compared to a CPU model adopting the same numerical algorithms. Among them, the multi-GPU parallel computing technique implementing a 2D hydrodynamic model presents the largest accelerated computing performance.The proposed model is applied to simulate the different designed rainstorm events at resolutions of 5 and 2 m, with grid cells of 8.59 × 10

^{5}and 5.37 × 10^{6}, respectively. The results show that the RS ratio of two GPUs to single GPU parallel computing ranges from 1.32 to 1.34 and 1.66 to 1.68 times, respectively. The results indicate that the grid cells are sensitive to the accelerated computational performance of the model; more grid cells in the computing domain result in a significantly accelerated computing performance advantage of the model. The proposed model presents a higher computational speed to achieve fast high-resolution flood simulations as compared to other research over the same study area.

In summary, the proposed GPU-based 2D hydrodynamic model running on two GPUs provides a powerful parallelization method to achieve accurate and faster simulation of the urban rainstorm inundation process with the best computational performance. The multi-GPU model provides a reliable technical means for the rapid prediction of high-resolution urban rainstorm inundations. Compared to the current single GPU model, it is also helpful in aiding in the efficiency of flood disaster emergency decision-making, increasing the time for emergency responses for flood control. Thus, the multi-GPU model has excellent potential for rapid predictions of urban rainstorm inundations and efficient flood risk evaluations in the future. This study was primarily carried out on 2 GPUs. Further application development for the multi-GPU model implemented on more than 2 GPUs needs to be improved in future work.

## ACKNOWLEDGEMENTS

This work is partly supported by the National Key R&D Program of China (No. 2023YFC3008400); Science Fund for Creative Research Groups of the National Natural Science Foundation of China (No. 42221003); National Natural Science Foundation of China (No. 51609199, 52079106); and the Fundamental Research Funds for the Central Universities, CHD (300102263511). The authors would like to thank the State Key Laboratory of Eco-hydraulics in Northwest Arid Region of China for providing the necessary computer equipment support. Besides, we are also thankful to the editor and the anonymous reviewers whose insightful and constructive comments helped us to improve the quality of the paper.

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

## CONFLICT OF INTEREST

The authors declare there is no conflict.