Abstract
The grid-based Xin'anjiang model (GXM) has been widely applied to flood forecasting. However, when the model warm-up period is long and the amount of input data is large, the computational efficiency of the GXM is obviously low. Therefore, a GXM parallel algorithm based on grid flow direction division is proposed from the perspective of spatial parallelism, which realizes the parallel computing of the GXM by extracting the parallel routing sequence of the watershed grids. To solve data skew, a Directed Acyclic Graph (DAG) scheduling algorithm based on dynamic priority is proposed for task scheduling. The proposed GXM parallel algorithm is verified in the Qianhe River watershed of Shaanxi Province and the Tunxi watershed of Anhui Province. The results show that the GXM parallel algorithm based on grid flow direction division has good flood forecasting accuracy and higher computational efficiency than the traditional serial computing method. In addition, the DAG scheduling algorithm can effectively improve the parallel efficiency of the GXM.
HIGHLIGHTS
Proposing a parallel algorithm based on grid flow direction division, speeding up the computation of the GXM.
Accurate flood forecasting facilitates flood management and flood control scheduling.
DAG scheduling algorithm can effectively improve parallel efficiency.
Graphical Abstract
INTRODUCTION
The hydrological model has always been a hot spot in hydrological research, which can be used for flood forecasting, hydrological mechanism research, hydrological and environmental process simulation and prediction. The hydrological model can provide decision support for flood control personnel, which is significant in flood control and disaster reduction. Hydrological models are divided into lumped and distributed hydrological models (Rui 2018). In the computation process of the lumped hydrological model, the spatial difference of the watershed is not considered and a set of unified and fixed parameters are adopted. Therefore, when the physical conditions of the underlying surface in the watershed vary greatly, the flood forecasting accuracy of the lumped hydrological model is not high enough. The distributed hydrological model takes into account the uneven distribution of rainfall in the watershed and the impact of the difference in the physical condition of the underlying surface (Rui 2017), so it has a high forecasting accuracy. Typical distributed hydrological models include the distributed Xin'anjiang model, a topography-based hydrological model (TOPMODEL) (Beven et al. 2021) and TOPKAPI model (Liu et al. 2005).
The distributed Xin'anjiang model can be divided into two types according to the partition granularity of the watershed: block-based distributed Xin'anjiang model and grid-based distributed Xin'anjiang hydrological model (GXM). The GXM is a distributed hydrological model based on the principle of the Xin'anjiang model and the digital elevation model (DEM) (Yao et al. 2009). GXM divides the watershed into high-precision grid units of the same size. The computation process of the GXM mainly includes evapotranspiration, runoff generation and water source division for each grid unit. Finally, according to the routing sequence of the upper and lower reaches of the watershed, the slope and channel confluence are computed grid by grid (Yao et al. 2021).
When GXM is computed serially, the computation time is long and computational efficiency is low, so GXM needs to be processed in parallel. The parallelization methods of distributed hydrological models are generally divided into parallelization in space, parallelization in time and parallelization in a combination of time and space (Ye et al. 2022). At present, the parallelization of distributed hydrological models mainly adopts the method of spatial parallelization. Liu et al. (2014) designed a parallel algorithm that stratifies the simulation units according to the flow direction for the distributed hydrological model Fully Sequential Dependent Hydrological Models (FSDHM). They parallelize the FSDHM by dividing simulation units (grids or sub-basins) without upstream and downstream dependencies into the same layer for simultaneous computation. Results have shown that the parallel algorithm can effectively improve the computational efficiency of FSDHM. Avesani et al. (2021), based on the Message Passing Interface (MPI) standard, designed a parallel hydrological model, named HYPERstreamHS. The model adopts a dual-layer parallelization strategy to fully exploit both spatial domain and parameters domain decomposition, and reduce the computational time by utilizing the available processors efficiently.
In this study, we proposed a GXM parallel algorithm based on grid flow direction division, which realizes the parallel computation of the GXM confluence process and improves the computation speed of the model. Aiming at the possible data skew in the computation process, a DAG scheduling algorithm based on dynamic priority was designed, which realizes task scheduling and improves the parallel efficiency in the computation process.
The following content of this article is a description of the GXM and parallel algorithm; data and research used in this experiment; a description and discussion of experimental results; finally, a summary of this research.
METHODS
Grid-based Xin'anjiang model
The GXM uses high-precision grids as the basic simulation unit. First, the three-layer evapotranspiration component and the runoff generation component are used to calculate the evapotranspiration and runoff of each grid unit. Then, the free water reservoir structure is used to obtain surface runoff, subsurface runoff and underground runoff from each grid unit. Finally, according to the Muskingum method, the slope and river confluence are computed grid by grid. The surface runoff, subsurface runoff and underground runoff of the grid in the watershed are computed to the outlet of the watershed (Li et al. 2007).
GXM parallel algorithm based on grid flow direction division
The most essential condition for components to realize parallel computing is that some grid units are independent and do not affect each other during the computation process (Liu et al. 2016). Therefore, the computation of independent components, including evapotranspiration, runoff generation and water source division components, is naturally parallelizable.
Computation-independent components only have time dependencies. In other words, in the cyclic computation process of a single period, the grid units in the watershed are independent, and there is no data exchange between grids. Therefore, we can evenly divide the simulation grid units into several parts for parallel calculation. However, when we parallelize space-dependent components, we cannot directly process grid units in parallel like computation-independent components. Because there is not only time dependence between grid simulation units, but also space dependence in the computation process of space-dependent components.




Schematic diagram of grid-by-grid confluence based on the Muskingum method.
Therefore, we proposed a GXM parallel algorithm based on grid flow direction. By constructing the grid flow direction matrix, cumulative catchment area matrix and the water system matrix of the watershed, the flow relationship between the grids in the watershed can be decomposed to find out the parallel routing sequence of the grid simulation units in the watershed. This way, grid unit sets with the same routing sequence are computed in parallel to realize the parallelization of flow routing components.






Process of generation of cumulative catchment area matrix (b) and water system matrix (c) by grid flow matrix (a).
Process of generation of cumulative catchment area matrix (b) and water system matrix (c) by grid flow matrix (a).
To decouple the computational relationship between the grids, we proposed the multi-level parallel grid division algorithm, which can obtain the parallel routing sequence of the grids in the watershed to realize the parallel computation of the GXM. The algorithm is described as follows:
- (1)
First, the routing order of the outlet grid in the watershed is initialized. We label the number of grids in the watershed as N and the catchment area of each grid as S. The grid (outlet grid of watershed) coordinate (
) with the largest catchment area (
) is searched according to the catchment area matrix. We set the routing order of this grid to
.
- (2)
- (3)
To compute the routing order of upstream grids, we find out the upstream grids of the current grid according to the grid flow direction. If the adjacent grid flows into the current grid, the routing order of the adjacent grid is set to
.
- (4)
After the routing order of all adjacent grids are computed, the adjacent grid is updated as the current grid. We repeat the above steps continuously until the adjacent grids cannot be found.
- (5)
We reverse the routing sequence obtained and then traverse all grids in the watershed. If the grid belongs to a water system, the routing order of the grid is retained. If the grid does not belong to a water system, the routing order of the grid is initialized to 1. Finally, the routing order of grids belonging to the water system is obtained.
- (6)
Initialize the routing order of slope grids: if the current grid is a slope grid, we judge whether the current grid has upstream and downstream grids according to the catchment area. If the grid has no upstream and downstream grids, the routing order of the grid is initialized to
.
- (7)
To compute the routing order of slope grids, we find the downstream slope grids according to the grid flow matrix and add 1 to the maximum routing order of all upstream grids.
- (8)
After the routing order of the adjacent grid is computed, the adjacent grid is updated to the current grid. Then we repeat the above steps until the routing order of all slope grids in the watershed is obtained.


These grid units with the same routing order can be computed in parallel because these grid units do not have upstream and downstream dependencies. The grid units of each routing order are counted to obtain the distribution table of the parallel routing order of the watershed. Table 1 shows the distribution table of the parallel routing order of the Tunxi watershed in Anhui Province. We can find that as the routing order increases, the number of slope grids decreases gradually, and the routing order of the watershed outlet grid is the largest.
Distribution table of watershed parallel routing order in the Tunxi Basin
Calculation order . | Channel grid . | Slope grid . | Amount of grid . |
---|---|---|---|
1 | 0 | 1872 | 1,871 |
2 | 0 | 493 | 615 |
… | … | … | … |
99 | 1 | 0 | 1 |
100 | 1 | 0 | 1 |
Calculation order . | Channel grid . | Slope grid . | Amount of grid . |
---|---|---|---|
1 | 0 | 1872 | 1,871 |
2 | 0 | 493 | 615 |
… | … | … | … |
99 | 1 | 0 | 1 |
100 | 1 | 0 | 1 |
DAG scheduling algorithm based on dynamic priority
Data skew is prone to occur between threads when the amount of input data is large. Therefore, we propose a DAG scheduling algorithm based on dynamic priority. We generate a DAG to represent computational relationships of grids in the watershed by connecting valid grids in the watershed according to the parallel routing order. First of all, the channel grids are used as the initial critical path. Second, we establish three task status queues including the ready queue, the running queue, the completed queue and put task nodes to be executed into the ready queue, and running task nodes into the running queue.
Model performance measures





DATA AND STUDY DOMAIN
Qianhe River, located in the south of Shanxi Province, belongs to the semi-arid basin, and the basin area of Qianhe River is 2,935 km2. The terrain is higher in the northwest and lowers in the southeast, the annual average rainfall is 696 mm. The yearly rainfall distribution in the Qianhe River Basin is uneven, mainly concentrated from June to September. The regional distribution of precipitation in the basin is primarily affected by climate, geographical and environmental factors and the precipitation gradually increases from north to south.
The vegetation leaf index, vegetation height and other parameters required by the GXM can be directly assigned according to the vegetation type of the grid unit by using the existing research conclusions. Parameters such as tension water storage capacity, free water storage capacity, soil inflow and subsurface runoff outflow coefficients can be estimated through their quantitative relationships with features such as topographic index, soil type, and soil layer thickness. The area proportion method based on the flow direction can estimate the proportion parameter of surface runoff entering the channel. On this basis, the observed hydrological data are used to calibrate and estimate the remaining parameters such as the full storage rate, and finally, the model parameter set of each grid unit can be obtained (Yao et al. 2021).
The underlying surfaces of the Tunxi Basin and the Qianhe River Basin use grid data with a resolution of 1 km × 1 km. We selected the rainfall and evaporation data of the Tunxi Basin in July 2020 and June 2016 and selected the rainfall and evaporation data of the Qianhe River Basin in July 2010, July 2018 and July 2016. We directly use the real-time data of surface rainfall stations, and use Thiessen polygon, inverse distance squared and other methods for spatial interpolation to obtain rainfall data in the watershed. Watershed evaporation data are computed from evaporation station data in the watershed.
RESULTS AND DISCUSSION
Efficiency analysis of parallel algorithms
Computation time comparison between the serial and parallel algorithm in the Qianhe River.
Computation time comparison between the serial and parallel algorithm in the Qianhe River.
Computation time comparison between the serial and parallel algorithm in the Tunxi Basin.
Computation time comparison between the serial and parallel algorithm in the Tunxi Basin.
From the above two comparison experiments of computation time, we can draw the following conclusions: compared with the original serial algorithm, the computational efficiency of the GXM parallel algorithm based on grid flow direction division has been significantly improved. As the amount of model input data continues to increase, the acceleration performance of parallel algorithms becomes more and more significant.
Qin et al. (2020) realized the parallelization of the confluence process of the WEP-L distributed hydrological model by designing a domain decomposition algorithm and implemented the task scheduling based on the greedy algorithm. The algorithm decomposes the whole watershed into the process of mainstream confluence and tributary stream confluence. Since the tributary hydrological areas are independent of each other and there is no confluence relationship between the tributary areas, each tributary hydrological area can be assigned to different threads to achieve parallel computing. It is verified by experiments that when the number of computing threads is 4, the speedup ratios of the tributary confluence process and the main branch confluence process both reach the maximum, which are 3.04 and 2.51, respectively. Compared with the above algorithm, the GXM parallel algorithm proposed in this paper has a finer granularity for the division of the underlying surface, and parallelizes the GXM by extracting the parallel routing sequence of the underlying surface grids of the watershed, thus having higher computational efficiency.
Accuracy analysis of the GXM parallel algorithm in flood forecasting
We selected a total of four flood events in the Tunxi Basin and the Qianhe River Basin for discharge simulation to verify the flood forecasting accuracy of the GXM parallel algorithm.
The four flood events are:
- (1)
Flood event 1 (Figure 12) that occurred in the Tunxi Basin from 21:00 on 1 July 2020 to 22:00 on 13 July 2020.
- (2)
Flood event 2 (Figure 13) that occurred in the Tunxi Basin from 21:00 on 17 June 2016 to 21:00 on 24 June 2016.
- (3)
Flood event 3 (Figure 14) that occurred in the Qianhe River Basin from 05:00 on 23 July 2010 to 18:00 on 27 July 2010.
- (4)
Flood event 4 (Figure 15) that occurred in the Qianhe River Basin from 00:00 on 10 July 2018 to 14:00 on 13 July 2018.
The model warm-up period of the four flood simulation experiments is set to 30 days. The resolution of the underlying surface of the two watersheds is 1 km × 1 km. There are 3,619 effective grids in the Tunxi Basin and 11,448 effective grids in the Qianhe River Basin. The evaluation indicators of model simulation performance are NSE, RMSE and RE.
Figure 12 shows the observed and simulated hydrograph of flood event 1. The peak time of the observed discharge is at 16:00 on 7 July 2020. At this time, the observed discharge is 5,040 m3/s, the simulated discharge calculated by the GXM is 4,968 m3/s and the error value of the simulated flow is 72 m3/s. The flood peak time calculated by the GXM is 19:00 on 7 July 2020. At this time, the observed discharge is 4,790 m3/s, the simulated discharge calculated by the GXM is 5,010 m3/s and the error value of the simulated discharge is 220 m3/s. At this time, the NSE of the GXM is 0.9745 and the RMSE of the GXM is 164.89 m3/s.
Figure 13 shows the observed and simulated hydrograph of flood event 2. The peak time of the observed discharge is at 02:00 on 20 June 2016. At this time, the observed discharge is 4,239 m3/s, the simulated discharge calculated by the model is 4,417 m3/s and the error value of the simulated flow is 178 m3/s. The flood peak time calculated by the model is 01:00 on 20 June 2016. At this time, the observed discharge is 4,213 m3/s, the simulated discharge calculated by the model is 4,446 m3/s and the error value of the simulated discharge is 233 m3/s. At this time, the NSE of the GXM is 0.9850 and the RMSE of the GXM is 141.28 m3/s.
Figure 14 shows the observed and simulated hydrograph of flood event 3. The peak time of the observed discharge is at 21:00 on 23 July 2010. At this time, the observed discharge is 2,000 m3/s, the simulated discharge calculated by the model is 1,857 m3/s and the error value of the simulated flow is 143 m3/s. The flood peak time calculated by the model is 20:00 on 23 July 2010. At this time, the observed discharge is 1,250 m3/s, the simulated discharge calculated by the model is 1,876 m3/s and the error value of the simulated discharge is 626 m3/s. At this time, the NSE of the GXM is 0.8230 and the RMSE of the GXM is 205.06 m3/s.
Figure 15 shows the observed and simulated hydrograph of flood event 4. The peak time of the observed discharge is at 07:00 on 11 July 2018. At this time, the observed discharge is 1,340 m3/s, the simulated discharge calculated by the model is 1,080 m3/s and the error value of the simulated flow is 260 m3/s. The flood peak time calculated by the model is 08:00 on 11 July 2018. At this time, the observed discharge is 1,330 m3/s, the simulated discharge calculated by the model is 1,110 m3/s and the error value of the simulated discharge is 220 m3/s. At this time, the NSE of the GXM is 0.8853 and the RMSE of the GXM is 110.34 m3/s.
The simulated hydrographs in the Tunxi Basin during 1 July–13 July 2020.
The simulated hydrographs in the Tunxi Basin during 17 June–24 June 2016.
The simulated hydrographs in the Qianhe River Basin during 23 July–27 July 2010.
The simulated hydrographs in the Qianhe River Basin during 23 July–27 July 2010.
The simulated hydrographs in the Qianhe River Basin during 10 July–13 July 2018.
The simulated hydrographs in the Qianhe River Basin during 10 July–13 July 2018.
Performance of the GXM in the Tunxi Basin and the Qianhe River Basin
Flood event . | RE (%) . | Peak time difference (h) . | NSE . | RMSE (m/s) . |
---|---|---|---|---|
Event 1 in Tunxi | −5.909 | 3 | 0.9754 | 164.89 |
Event 2 in Tunxi | 13.167 | −1 | 0.9850 | 141.27 |
Event 3 in Qianhe River | 14.518 | −1 | 0.823 | 205.06 |
Event 4 in Qianhe River | 11.861 | 1 | 0.8853 | 110.34 |
Flood event . | RE (%) . | Peak time difference (h) . | NSE . | RMSE (m/s) . |
---|---|---|---|---|
Event 1 in Tunxi | −5.909 | 3 | 0.9754 | 164.89 |
Event 2 in Tunxi | 13.167 | −1 | 0.9850 | 141.27 |
Event 3 in Qianhe River | 14.518 | −1 | 0.823 | 205.06 |
Event 4 in Qianhe River | 11.861 | 1 | 0.8853 | 110.34 |
CONCLUSIONS
This study proposes a GXM parallel algorithm based on grid flow direction division and a DAG scheduling algorithm based on dynamic priority to realize the parallel computing and task scheduling of the GXM. The GXM parallel algorithm and the DAG scheduling algorithm were applied in the Tunxi Basin and the Qianhe River Basin, the results have shown that the GXM parallel algorithm can significantly improve the computational efficiency of the GXM and the DAG scheduling algorithm can increase parallel efficiency and reduce resource overhead obviously. In addition, four flood simulation experiments were conducted in the Tunxi Basin and the Qianhe River Basin, the simulation results show that the GXM parallel algorithm has good flood forecasting accuracy and is useful for flood management.
In recent years, the graphics processing unit (GPU) has been widely used in the parallel computing of hydrological models. Freitas et al. (2022) used a CPU + GPU system to realize the parallelization of the Modelo de Grandes Bacias (MGB) hydrological model and achieved a significant performance improvement. In the future, we will continue to study how to reduce the communication overhead of the GXM between grids during parallel computing to further improve the computational efficiency of the GXM, and try to realize the parallelization of the GXM by utilizing GPU.
ACKNOWLEDGEMENT
This study was supported by the National Key Research and Development Program of China (2021YFB3900605) and the National Key Research and Development Program of China (2018YFC1508106).
DATA AVAILABILITY STATEMENT
Data cannot be made publicly available; readers should contact the corresponding author for details.
CONFLICT OF INTEREST
The authors declare there is no conflict.