Abstract

A parallel algorithm for 1D free-surface flow simulations in irrigation canals is shown. The model is based on the Hartree method applied to Saint-Venant equations. Due to the close-to-steady flow nature in irrigation canals, external and internal boundary conditions are linearized to preserve the parallel character. Gate trajectories, off-take withdrawals, and external boundary conditions are modeled as piece-wise functions of time, so there are discontinuities. To achieve a fully parallelized algorithm, an explicit version of the Hartree method is chosen, and external and internal boundary conditions are linearized around operation point. This approach is used to build a computer simulator, written in C-CUDA language. Two tests by ASCE Committee on Canal Automation Algorithms have been used to evaluate accuracy and performance of the algorithm. The Maricopa Stanfield benchmark has been used to prove its accuracy, and the Corning Canal benchmark to evaluate performance in terms of processing time. Surprisingly, solving a 12 hr-long prediction horizon with a cell size of about Δx= 10 m is less than 1 s on a Nvidia K40 card. Results were compared with a serial and a multi-CPU version of the same algorithm. The implementation that showed the best performance on different platforms is the one that uses GPU.

HIGHLIGHTS

  • A high-performance parallel algorithm for solving 1D-Saint Venant Equations in irrigation canals has been implemented. It is highly efficient enabling many simulations in a reasonable period of time. In this way, Genetic Algorithms can be used in the context of inverse problems in free-surface flow phenomena.

  • The GPU parallel algorithm used in the paper is very efficient making it possible to obtain steady flow solutions by using transient equations as an alternative to the step method.

  • The philosophy inherent in the methodology could also be applied to other phenomena, such as the flow in pipes. The resolution time of the transients is so short that the steady state could be found from the final state of a transient in a very large networks in real time.

INTRODUCTION

The development of numerical models that represents real systems usually involves a calibration phase, which implies the determination of unknown parameters of the phenomenon governing equations from the observed data. One of the most used methodologies are genetic algorithms which can involve thousands of simulations (Whitley 1994). Therefore, having efficient simulation algorithms is very important. Herein, the phenomenon studied is the free-surface flow in open canals which is described by means of the 1D-Saint-Venant's equations (1D-SVE). In the context of irrigation canal control – more specifically when model predictive control strategy is adopted – simulations are needed to have good predictions on responses to control actions (Martín-Sanchez & Rodellar 1996). Efficient simulators could also be useful in this field.

The use of graphical processing units (GPUs) for scientific calculation has increased in the last decade since scientists and engineers have been looking for alternatives to central processing units (CPUs) for high-performance computing (Anders & Kandrot 2010). Nevertheless, most GPU applications have been developed in the field of 2D and 3D computational fluid dynamics, and the number of 1D applications in civil engineering on hydraulics/hydrology is significantly lower. In this paper, we implement a parallel algorithm for solving 1D-SVE in irrigation canals that uses GPUs. Several works in the literature report that explicit methods have a greater potential than implicit to take advantage of the parallelization in the GPUs (Aissa et al. 2017; Bondarenco et al. 2017; Bessone et al. 2018). Thus, a numerical explicit scheme has been chosen for the algorithm.

According to Wylie (1969), all numerical methods, whether they are explicit, implicit, or characteristics, give similar results when compared to the observed data, depending more on the accuracy of the starting data than on the particular methodology. In this paper, a Hartree method with linear space interpolations has been implemented, that is, a first order version of the method of characteristics (MOC). The MOC was chosen since it is well well known for having many merits regarding the aspects of theoretical and physical interpretation for 1D-SVE modeling (Yang & Chiu 1993). Details of the Hartree method can be found in Katapodes (2016) and Litrico & Fromion (2009). Gómez (1988) considers the MOC one of the best methods to solve unsteady flow problems in prismatic canals. Soler et al. (2015) used the Hartree method in the context of inverse problems to reproduce the flow in a laboratory canal, and were able to accurately reproduce observed water depth hydrographs in a transient.

The application of the numerical techniques requires to discretize the domain x/t (space/time) and find the solution at certain points only. The canal is divided into cross sections and two variables are considered at each section: the water depth yi and the flow velocity vi. As the spatial discretization is fixed, the solution is always obtained at the same points of the canal. The solution for each instant depends on the solution in the previous instant. The specified-time-interval scheme has been used commonly in applying the MOC to unsteady open-canal flow problems. However, with the use of this scheme, the numerical error for the simulation results can always be induced due to the interpolation used to approximate the characteristics trajectory. In order to overcome the lack of accuracy, Yang & Chiu (1993) proposed to use MOC with higher-order splines for interpolation, and for demonstration example, cell sizes from Δx= 200 m to Δx= 800 m were used. Alternatively, it is possible to use the four-point implicit scheme, also known as the box scheme finite difference or Preissmann's method. This scheme is known to be a conservative method. Although the mesh size in the box scheme can be larger than in the MOC, the two methods give good results, both in terms of surface profiles in a time instant and as in terms of flow hydrograph at a section (Ostad-Ali-Askari et al. 2018). Another strategy for increasing accuracy is fine discretization, which has been adopted herein.

The fine linear discretization strategy has been adopted since the increasing number of computational nodes is compensated by its simplicity of computing. A fine discretization forces the reduction of the time step size according to the Courant–Friedrichs–Lewy (CFL) condition, so that the efficiency in terms of computation time is reduced. However, the concept of efficiency is not as simple as it first seems, since explicit schemes can make the most of parallel architecture. When considering explicit schemes, the solution of each node depends exclusively on the solution in the surrounding nodes. Thus, each node can be updated independently of the rest, therefore, it is not necessary to look for values in the memory far from the point which is being updated. Under these conditions, each point can be resolved in a different process unit at the same time, either CPU or GPU. To adopt an explicit scheme is amenable to multi-threading.

The box scheme finite difference along with external and internal boundary conditions results in a system of linear equations which must be solved for each time step. For a single canal, the coefficient matrix has a band width of five and can be solved by one of many banded matrix solvers. Direct solvers for banded linear systems cannot be performed in parallel without communication. This can be accomplished through a shared memory bus and synchronicity, things that may cause delays. Moreover, for network canal problems, sparse terms destroy the banded structure and the consequent loss in the locality of data deteriorates the performance in many-core platforms. On the other hand, the memory use of implicit solvers strongly limits the size of problem that can be solved.

Currently, the lines of research on new numerical schemes for 1D-SVE are aimed at improving the techniques based on finite volume. Specifically, new ways of solving the Riemann solver are investigated, such as in Hodges & Liu (2019). These techniques are very suitable for the complex flows that occur in rivers, such as in the presence of hydraulic jumps and very irregular geometries. These techniques do not contribute much in the case of the flow in irrigation canals, where it is not usual to find hydraulic jumps or other types of discontinuity. It is well known that the finite volume technique is based on the principle of conservation of momentum, and therefore, it is adequate to be applied in local phenomena with discontinuities where energy losses are unknown, like it occurs just downstream a gate (Cozzolino et al. 2015). In other words, finite volume techniques give non-continuous solutions. Conversely, when hydraulic variables are continuous, as happens the most in irrigation canals, the energy principle seems more appropriate to be applied, since the resulting solutions are always continuous.

According to these points, a totally explicit scheme based on the Hartree method with linear interpolations has been chosen. The method is included in the MOC category, and its presentation is the objective of the following section. In the past, the method was rejected due to its lack of accuracy, but it was a time when computers had low power so that they had to use very sparse space discretizations (Δx= 300 m in Gómez (1988) for example). Using CPU-GPU acceleration makes it possible to use strong discretizations (Δx= 1 m, for example) in large-scale canals (a 28 km-long canal, for example), so that, lack of accuracy is overcome.

The remaining sections of this paper are as follows. In the next section, the parallel algorithm is presented, some arguments that justify each one of its steps are given, and the reasons why the Hartree method has been chosen for the resolution of the 1D-SVE. Moreover, it describes in detail the used numerical scheme. It includes the discretization scheme used for the 1D-SVE and the treatment of boundary conditions, both the external and the internal. In the section titled ‘Illustrative examples’, a pair of hydraulically complicated scenarios are presented for testing the algorithm. These scenarios are based on benchmarks that can be found in the canal control literature. The examples show the advantages of parallel algorithms as opposed to traditional serial algorithms. The algorithm performance in terms of processing time to complete the scenario simulation is compared between different platforms.

PARALLEL ALGORITHM

Although the 1D-SVE can be applied in several research fields, such as river analysis systems or drainage system modeling, this paper is focused on the flow in irrigation canals. The flow in irrigation canals, which are often lined and prismatic-shaped, shows characteristics that differentiate it from river drainage system flow, and that influence the choice of the numerical scheme. Usually, water management in irrigation canals implies the use of electro-hydraulic devices that control the flow (such as the in-line gates) that remain in the same position for long periods of time. When a change of position occurs relatively quickly, it can be considered as an instantaneous movement, especially if the movement time is compared with the time the gates remain stopped. All that results in a close-to-steady state sub-critical regime where things occur very slowly. This behavior is common in irrigation canal control systems, and it is amenable to be described mathematically by means of piece-wise functions of time as the gate trajectories shown in Figure 1. As can be seen below, this definition is interesting for numerical parallelization purposes.

Figure 1

Sketch of a piece-wise function of time. The piece Ui(n) represents the control variable inside of the nth-Regulation Period. That is the period of time between instants Tn−1 and Tn. A position of the i-device at instant tk is denoted in minor letters by . Notice that there is a jump discontinuity at the end of each Regulation Period.

Figure 1

Sketch of a piece-wise function of time. The piece Ui(n) represents the control variable inside of the nth-Regulation Period. That is the period of time between instants Tn−1 and Tn. A position of the i-device at instant tk is denoted in minor letters by . Notice that there is a jump discontinuity at the end of each Regulation Period.

In the context of irrigation canal control, the flow control devices stay in the same position without moving most of the time. Only occasionally, the control algorithm changes the device positions according to a desired target (the period between movements is called sampling period, but is referred to as ΔT-long Regulation Period in this paper). As a result, unsteady flow varies weakly and slowly around the operation point at a close-to-steady canal state. In that way, it is possible to consider the following assumptions:

  • Flow control trajectories can be defined as ΔT-long piece-wise functions of time. Figure 1 plots a generic piece-wise function.

  • An explicit scheme with fine discretizations for the Saint-Venant equations can be used in the numerical model.

  • External and internal boundary conditions can be linearized in a small region around the operating point.

  • The number of CFL stability condition verification, and the corresponding time step size computations, can be reduced a great deal because it is not necessary to do this at every time step.

Taking into account these points, it is possible to apply efficiently the three previous basic strategies for parallelization by solving flow equations for a Regulation Period entirely in the GPUs with no global memory accesses, as can be seen below.

At the end of each Regulation Period, external and internal boundary conditions change instantaneously. For example, actuators change opening gates. Therefore, between two consecutive Regulation Periods, there is a discontinuity called, herein, Jump Step. Non-linearized original equations at discontinuities have to be solved using iterative techniques, in which the number of floating point operations varies very much depending on the flow conditions. This is the reason why it has been decided to compute Jump Step in the CPU. Broadly speaking, the problem of simulation in open canals is divided into two parts, depending on where they are solved: on the GPU or on the CPU. An algorithm that meets all of these prescriptions is written as follows:

  • Load initial data and initial conditions to the CPU:

  • Compute the first time step size:

CPU external loop

  1. Initialize Regulation Period Counter: n= 1

  2. Compute Jump Step:

  3. Compute the time step size

  4. Copy from CPU-memory to GPU-memory:

GPU inner loop

  • 1.

    k = 0; t = (n − 1) · ΔT; tend = n ΔT; dstOut=true

  • 2.

    if (dstOut=‘true’):

    Otherwise:

  • 3.

    k=k+ 1; t=t+ ΔtG ; dstOut=! dstOut;

  • 4.

    If t<tend go to GPU inner loop step 2

  • 5.

    if (dstOut=‘false’):

  • 6.

    Copy from GPU-memory to CPU-memory:

  • 7.

    n=n+ 1

  • 8.

    If nλ go to CPU external loop step 2.

where: and are two versions of water depth and velocity vectors in the GPU, whose components are values at sections, (YC,VC) are the same vectors but in the CPU, CFL(Y,V) is the CFL condition function for computing time step size, and JumpStepModelt,Y,V) and RegulationPeriodModelt,Y,V) are the parallel-in-space updating functions, which are described at the end of this section, and λ is the finite time prediction horizon. The algorithm continues until predictive horizon ends.

The algorithm has two loops: the external one implemented in the CPU, and the inner in the GPU. The NVIDIA compiler (nvcc) separates source code into CPU and GPU components: GPU functions are processed by nvcc, and the CPU functions are processed by standard compiler, like the gcc.

The following comments about the algorithm can be made:

  • Theoretically, the CFL condition has to be satisfied always, that is, at all time steps. Due to the close-to-steady flow nature, it is possible to update the time step size once (CPU external loop step 3), just before getting into the GPU inner loop. Since the largest changes in flow conditions are produced when control devices are re-positioned, updating the time step size has been located just after Jump step computation. Its computation is avoided in the inner loop. Vectors are accessed at global memory (herein, referred as CPU-memory).

  • Instances of communication and synchronization between different sub-tasks are usually some of the biggest obstacles to obtaining good performance in parallel programming. Updating the time step size according to the CFL condition is considered as one example of these because it cannot be fully parallelized. Searching a minimum value of a vector is a task non-parallelizable by definition since the candidate to be a minimum has to be compared with all components one by one. This is another reason why the CFL has also been placed into the CPU.

  • CUDA threads may access data from multiple memory spaces during their execution. Although each thread can access the same global memory, they have a private local memory (herein, referred to as GPU-memory), which is considered to be accessed in a way faster than in the global. Threads launched by the kernel RegulationPeriodModel to compute , have access to read from the private local memory. The kernel has been programmed in a way that it never needs to have access to global memory, and it only uses local private memory. That is shown in the GPU-inner loop steps 4 and 5.

  • Optimizing memory usage starts with minimizing data transfers with low bandwidth between the CPU and the GPU since large time intervals are spent on data movement rather than arithmetic. The CPU-GPU data transfer, and vice versa, has been limited once every external loop iteration.

  • Applying the Hartree method to the 1D-SVE results in an explicit system of two linear equations for each computational node. This feature makes this method very suitable for parallelization. Moreover, boundary and interior conditions can be linearized to preserve this characteristic. Moreover, since within a Regulation Period control variables keep constant, governing equations are simplified because their time-derivatives are canceled. Lumping together whole equations, a fully parallelizable set is obtained. The subroutine for solving the set is denoted by RegulationPeriodModel (GPU inner loop step 2). Details of how this set is obtained is one objective of this section. Vectors are accessed at local private memory.

  • Instantaneous changes in hydraulic variables are represented as piece-wise functions of time. Since the equations are nonlinear, they cannot be linearized because their variables are not continuous, and then their derivatives do not exist. Therefore, the original equations have to be solved by using iterative techniques. The set solving subroutine is called JumpStepModel function. This kind of method is inadequate for parallelization since the number of iterations required by each node depends on flow conditions and every time step. This is the reason why JumpStepModel has been isolated in the CPU, at external loop step 2.

  • In the second step of GPU inner loop, vectors and are alternatively exchanged to avoid having to update every iteration, and only vector pointers are passed through the function as parameters. This fact speeds up the inner loop.

In the rest of the section, numerical formulation details about the Hartree method and boundary conditions are given. It shows how these three functions are compiled: JumpStepModel(Δt,Y,V), RegulationPeriodModel(Δt,Y,V), and CFL.

Governing equations for 1D free-surface flow

The 1D-SVE describe the free-surface flow in prismatic canals. They are the result of the application of the principles of mass and momentum conservation:
formula
(1)
where x and t are the independent space and time variables, y is the water depth, v is the average velocity, A(y) is the wet area, T(y) is the free-surface width, S0 is the canal bottom slope, and Sf (y, v) is the slope friction. For a trapezoidal section,
formula
(2)
where m is the side slope, b is the bottom width, and P(y) is the wet perimeter.

The system (1) is applicable under flow conditions given in irrigation canals, where:

  • the curvature of the free-surface is small (vertical accelerations are neglected, and vertical pressure distribution is hydrostatic),

  • the canal slope is supposed sufficiently small such that its sin is practically null,

  • the energy dissipation term is being specified through Manning's equation which was defined for a steady state:
    formula
    (3)
    where n is Manning's coefficient, RH is the hydraulic radius, and Sf is the friction slope.
  • the changes in flow conditions are not fast enough to generate wave fronts.

Realize that flow conditions required by 1D-SVE to be applied are similar to that are often observed in irrigation canals. The MOC transforms the set (1) of partial differential equations into an ordinary differential equation set, named
formula
(4)

The mathematical process of transformation of system (1) to the equivalent (4) can be found in many references, like Ames (1977) or Ducheteau & Zachmann (1988), so will not be discussed further here. The system (4) describes the conditions of flow in a canal in the same way as (1), and it adds no new hypothesis in the transformation. However, it is limited in the way it is applied. The variable x, which was initially independent, is now dependent on time t, as it is understood in (4); then, (4(a)) will be true only along the curves, denoted by x+, that satisfy the equation (4(b)) and, in the same way (4(c)) will be true along the curves, denoted by x, that satisfy the equation (4(d)). In this way, the four equations of the system must be solved simultaneously.

Linear space interpolations

The Hartree method belongs to the MOC family, and it uses linear interpolations in the space direction (Figure 2).

Figure 2

Numerical resolution for the Hartree method.

Figure 2

Numerical resolution for the Hartree method.

The method basically consists of replacing the time-derivatives in (4) by the corresponding finite differences, giving the following algebraic set:
formula
(5)
where SfP = Sf(yP,vP) and SfQ = Sf(yQ,vQ). (5(b)) makes it possible to define the interpolated values at point P, as follows:
formula
(6)
where
formula
(7)
In the same way, similar expressions are obtained from (5(d)) for the point Q, as follows:
formula
(8)
where
formula
(9)
When yP, vP, cP, yQ, vQ and cQ have been computed, values at point R can be obtained by using expressions (5(a)) and (5(c)), as follows:
formula
(10)

The solution at instant k+ 1 for node i is explicit and it depends only on the values at the three nodes i − 1, i, and i+ 1, at instant k. Since these values are stored in a vector, they are closely stored in the memory, and the processor can quickly access them. In contrast, that does not occur when using implicit methods, where the solution at instant k+ 1 for node i depends on whole vectors at k − 1 and k. Another interesting property of (10) is that the system can be solved in a fixed number of float operations for all i. Both are key factors for a good parallelization because processing time remains constant for all nodes i at time instant k, and they are simultaneously solved by the corresponding codes without synchronization requirements. It can launch one thread for one node without waiting time for synchronization.

The free-surface flow in open canals is governed by the 1D-SVE, but other equations are needed to describe the flow throughout structures like gates, siphons, joints, jumps, weirs, off-takes, etc. Appropriate boundary conditions need to be specified at off-takes, check gates, head and tail end of the canal system, so that, additional relationships are also needed for describing upstream and downstream boundary conditions, and internal conditions for checkpoints located throughout the canal.

Upstream external boundary condition (gate at canal header)

The incoming flow rate to the irrigation canals is often regulated by means of gates located at the canal header, and the trajectory of the water level just upstream of the gate is known. Thus, upstream boundary condition can be defined by using one of the following gate-discharge relationships (Bautista & Clemmens 1999):
formula
(11)
where Equation (11a) is defined for submerged flow conditions and (11b) for free orifice flow; Hup(t) = water level trajectory at boundary, defined by a piece-wise function; A(y)·v = undershot sluicegate flow; cd = discharge coefficient; bg = gate width; d = drop at gate; u = gate opening; and, y = water depth downstream gate.

Flow in irrigation canals is usually sub-critical, and then, the second equation in (10) has to be satisfied along the x (Figure 3). According to this, and taking into account uniqueness theorems for hyperbolic systems (see e.g., Crandall 1956), the second equation in (10) together with (11) has to be satisfied at the same time.

Figure 3

Numerical resolution for upstream boundary.

Figure 3

Numerical resolution for upstream boundary.

Interpolating flow conditions at Q (yQ and vQ) by using (8) makes it possible to have a relationship between yR and vR at boundary point R. Therefore, a new set for solving the external upstream boundary condition is posed, namely:
formula
(12)
where fUP is a generalized expression which represents one of (11), or any other upstream boundary condition.

The set (12) is nonlinear in , so it has to be solved by using iterative techniques, like the Newton–Raphson method. As mentioned above, this kind of technique is inappropriate for parallelization. In order to avoid using it, a linearization of the equations around , is carried out. Since the opening gate u(t) and HUP(t) are defined as piece-wise functions of time, there are time instants where functions are not continuous, that is at Jump Step. This is a problem with linearization because their derivatives do not exist. Therefore, it is mandatory to use iterative techniques for solving (12) to solve Jump Step. This is the first reason why Jump Steps are solved at CPU external loop step 2.

Inside of a Regulation Period the system (12) can be linearized. Then, taking first order time derivatives, it gives:
formula
(13)
Since Hup and u keep constant, time-derivatives become null, and (13) can be simplified. Then, taking finite differences around in (13), gives:
formula
(14)
A new 2 × 2 system constituted by (14) and (5(c)) can be compiled, as follows:
formula
(15)
where, for the case :
formula
(16)
and for the case y<d+0.5u, qUP = 0. The determinant of the square matrix at the left-hand side of (15) never vanishes, so that, the solution can always be found.

Downstream external boundary condition (canal tail)

At the downstream end, canals often discharge water to a storage basin where a pumping station is usually found. Moreover, an orifice exists to empty the canal. Taking Qdw = Qdw(t) as the scheduled pumped off-take withdrawal, the downstream external boundary condition can be defined according to the following mass conservation equation:
formula
(17)
where Equation (17a) is for the case in which the orifice be operative, and (17b) for the non-operative case; A(yv = downstream canal tail discharge; ko = discharge coefficient of the orifice; yo = center of the orifice from canal bottom; and Qdw(t) = pump discharge hydrograph, defined as a piece-wise function.
The first equation in (10) has to be satisfied along x+ in Figure 4, and (17) has to be satisfied at the boundary. Interpolation flow conditions at P (yP and vP) make it possible to have a relationship between yR and vR at point R. Therefore, a new set for solving the external downstream boundary condition is posed, namely
formula
(18)
where fDW is a generalized expression which represents (17) or any other downstream boundary condition.
Figure 4

Numerical resolution for downstream node.

Figure 4

Numerical resolution for downstream node.

At a Jump Step, (18) has to be solved at CPU by Newton–Raphson as it happens in the upstream boundary conditions. However, inside of a Regulation Period, the system can be linearized and solved in the GPU, since QDW(t) keeps constant. Thus, taking first order time-derivatives in (18), it becomes
formula
(19)
Neglecting dQdw/dt, and taking finite differences around , obtains
formula
(20)
A new 2 × 2 system for downstream boundary is compiled, namely
formula
(21)
where, for the case of turned on
formula
(22)
and for the case of turned off, qDW = 0. The determinant of the square matrix at the left-hand side of (21) never vanishes, so that, solution is ensured.

Internal boundary condition (checkpoints)

Checkpoints are cross structures for canal flow control which contain gates in-line, off-take withdrawals, lateral weirs, pumping stations, and so on. Their function is to interfere with stream to regulate flow. Canal models are often sectioned between pools by checkpoints. Flow through checkpoints are usually regulated by gates. Herein, checkpoints are considered to have only a sluicegate and a pumped off-take withdrawal for simplicity, as seen in Figure 5.

Figure 5

Checkpoint sketch.

Figure 5

Checkpoint sketch.

Consider a sluice gate in a checkpoint located among two canal pools as Figure 6 shows. On the left, there is the upstream cell defined by points xn and xn−1, and the downstream one defined by x1 and x2. The objective is to compute a solution at from the solution at , that is, at points RU and RD from interpolated variables at P and Q.

Figure 6

Numerical resolution for internal boundary.

Figure 6

Numerical resolution for internal boundary.

The flow interaction of this control structure is described by two relationships, named the mass continuity and gate equations. Both equations are defined in terms of hydraulic variables at points RU and RD, as follows:
formula
(23)
where A(ynvn = flow coming from upstream; A(y1v1 = flow going downstream; QP = pumped off-take withdrawal; cd = discharge coefficient; bg = gate width; d = drop at gate; u = gate opening; yn = water depth upstream gate; and, y1 = water depth downstream gate. Equations (23) constitute the so-called internal boundary conditions, and have to be solved together with Equations (10) in a coupled way, giving
formula
(24)
where fN1 and fN2 are generalized expressions which represent (23(a)) and (23(b)), respectively, or any other equivalent checkpoint relationship.
To solve this set of four nonlinear equations, iterative techniques must be used, such as the Newton–Raphson method. At Jump Step these equations cannot be linearized by the same reasons as above, but inside of a Regulation Period they are linearized. Carrying out (24) a similar process of linearization around like on the external boundary conditions, the following system is obtained:
formula
(25)
where, for the case :
formula
(26)
and q1 = 0, otherwise.

In the context of the parallelization, all nodes are solved at the same time through the corresponding thread, and they have to be finished before starting the next time step, that means synchronization among threads. When solving (25), a 4 × 4 matrix is inverted for obtaining a double solution, so that, the checkpoint node requires the largest computational time required by any node. Then, it is highly recommendable to optimize the source code for this kind of node. One way for optimizing is to invert the matrix by Cramer's rule, since four elements of 16 are zero, so it is possible to optimize the source code by eliminating many operations. Note that this node has the largest number of floating point operations among all types of nodes defined in this paper. Optimizing reduces computing time significantly.

Courant–Friedericks–Levy stability condition

The Hartree method enables the use of the characteristics method with a fixed grid in the (x,t) plane. Let us assume flow variables are known at time instant k, and that we want to compute the flow at point k+ 1. The MOC is subject to the stability CFL condition – see Abbott (1979) or Ames (1977), for example – so that, the time step size becomes:
formula
(27)
where NC 1 is the so-called Courant number and Nx is the number of cells in which the canal is divided.

Computing Δtk with (27) is not totally parallelizable because it is necessary to compare all wave-speed values with each other. If a thread for each i to compute wave-speed is launched, it is not able to compare values among them until all threads are finished because threads are working independently from others. However, it could divide the domain into groups of cells, launch a thread per group to obtain the minimum value in the group and, when all threads are finished, compare values among groups. However, this is not a parallelized process, and threads need to use shared memory.

The computation of the CFL is the greatest obstacle to getting good parallel program performance, especially when it is calculated always for all time instant k. However, when flow is close-to-steady, it is possible to update the time step size once before getting into a Regulation Period. It has been decided to compute the time step size between two consecutive Regulation Periods, just after Jump Step at the third step of the algorithm. In this way, computation in the GPU is avoided. During Regulation Period time-size remains constant.

Theoretically, it is possible to take NC= 1 in (27) when the time step size is always computed; but if it is only computed at the Jump Step, it can help to take NC= 0.95.

Parallel-in-space updating functions

Lumping together into a set, (10) for all nodes inside of a canal pool, (15) for the upstream node, (21) for the downstream one, and (25) for checkpoints, a model for Regulation Periods can be implemented, which is denoted herein by:
formula
(28)
The locality principle from computer science says that items whose addresses are near one another tend to be referenced close together in time. When computing a thread, addresses of yi+1, vi+1, yi, vi, yi−1, and vi−1 seem to be close enough to follow the locality principle. The updating is carried out by calling a thread per each node i in the GPU.
At the same time, lumping together into another set, (10) for standard nodes, (12) for the upstream node, (17) for the downstream one, and (24) for checkpoints, the Jump Step Model can be implemented, and denoted by:
formula
(29)

By using one of these models, it is possible to update hydraulic unknowns for all sections from the variables at instant k. Thanks to this strategy, programming both CPU and GPU solvers is trivial. The parallel part of the algorithm consists of a single kernel that is processed in the GPU, in which each thread calculates the couple , so it does not present any difficulty. With respect to the CPU version implemented in FORTRAN, it only requires the ‘omp parallel do’ directive before the loop that runs through all the nodes in the inner loop by calling the update function of the instant variables k + 1; this function is similar to the one implemented in the GPU kernel.

ILLUSTRATIVE EXAMPLES

The ASCE Task Committee on Canal Control Algorithms (Clemmens et al. 1998) developed a series of test cases to test the general suitability of canal-control scheme logic. The cases consisted of defining several scenarios happening over two canals, and the objective was to find the way to canal control. The test cases covered a reasonable range of the proprieties and operating characteristics of irrigation canals. Among them, there is a control part where flow-rate changes are scheduled, and then, the control problem has feedforward character. In this part, the objective of the tests is to bring the canal from an initial steady-state to a final one.

GoRoSo is a feedforward control algorithm for irrigation canals based on sequential quadratic programming (Soler et al. 2013a), with which it is possible to calculate gate trajectories that smoothly carry the canal from an initial steady-state to a final steady-state by keeping water depths constant at overall end points of each pool, that is at checkpoints. Constraints on gate movements are required for stabilizing the control algorithm, and to avoid gate stroking movements which are amenable to generate possible changes in flow regime. The algorithm makes it possible to calculate gate trajectories by constraining gate movements into a user-defined amount. The application of the GoRoSo method to the tests can be found in Soler et al. (2013b). The two canals defined in these test cases together with internal and boundary conditions obtained from the feedforward control algorithm define two scenarios for validation of the parallel algorithm.

The first example shows how by using a fine discretization it is possible to overcome the numerical diffusion that is inherited from the Hartree scheme, and the second example is provided in order to show the high performance in terms of computing speed. The second example is also solved by means of the HEC-RAS platform in order to compare simulation results and diffusive errors (https://www.hec.usace.army.mil/software/hec-ras/). The HEC-RAS is a computer program to model the water flows through open canal systems and the calculation of surface water profiles (Barkau 1992). For unsteady flow, HEC-RAS solves the full, dynamic, 1-D SVE using the above-mentioned box four-point implicit scheme, first used by Preissmann (1961). Since flow conditions in the scenario are close-to-steady state, and subcritical, HEC-RAS does not show the numerical instability problems associated with regime changes.

To evaluate the numerical diffusion and quantify the diffusive losses generated by the first-order interpolation, it is proposed to apply the conservation mass principle to the whole domain x/t of integration by calculating the following numerical diffusion error index:
formula
(30)
where QPi(j) is the pumped off-take withdrawal i during the jth- Regulation Period, λ is the number of regulation periods in which the prediction horizon is divided, L is the canal length, and q0 is the gravity off-take and canal tail defined in (17).

Description of the test cases

Scenarios are implemented on two different canals, each with eight pools separated by checkpoints with sluicegates (Figure 7).

Figure 7

Canal profiles for the tests (taken from Liu et al. (1998)).

Figure 7

Canal profiles for the tests (taken from Liu et al. (1998)).

The first canal is based on the 9.5 km-long lateral canal WM within the Maricopa Stanfield Irrigation and Drainage district in Central Arizona. It is a steep canal characterized by high Froude number and little storage. More details are given in Table 1.

Table 1

Maricopa Stanfield Canal characteristics

PoolS0 (m/m)Pool length (m)nb (m)m (m/m)Canal depth (m)d (m)k0, y0 (m)
0.0020 100 0.014 1.0 1.5 1.1 1.0 – 
II 0.0020 1,200 0.014 1.0 1.5 1.1 1.0 – 
III 0.0020 400 0.014 1.0 1.5 1.0 1.0 – 
IV 0.0020 800 0.014 0.8 1.5 1.1 1.0 – 
0.0020 2,000 0.014 0.8 1.5 1.1 1.0 – 
VI 0.0020 1,700 0.014 0.8 1.5 1.0 1.0 – 
VII 0.0020 1,600 0.014 0.6 1.5 1.0 1.0 – 
VIII 0.0020 1,700 0.014 0.6 1.5 1.0 1.0 0.9486833/0.4 
PoolS0 (m/m)Pool length (m)nb (m)m (m/m)Canal depth (m)d (m)k0, y0 (m)
0.0020 100 0.014 1.0 1.5 1.1 1.0 – 
II 0.0020 1,200 0.014 1.0 1.5 1.1 1.0 – 
III 0.0020 400 0.014 1.0 1.5 1.0 1.0 – 
IV 0.0020 800 0.014 0.8 1.5 1.1 1.0 – 
0.0020 2,000 0.014 0.8 1.5 1.1 1.0 – 
VI 0.0020 1,700 0.014 0.8 1.5 1.0 1.0 – 
VII 0.0020 1,600 0.014 0.6 1.5 1.0 1.0 – 
VIII 0.0020 1,700 0.014 0.6 1.5 1.0 1.0 0.9486833/0.4 

The second canal is based on the 28 km-long upstream portion of Corning Canal in California, and is essentially flat and has significant storage. More details are given in Table 2.

Table 2

Corning canal characteristics

PoolS0 (m/m)Pool length (m)nb (m)m (m/m)Canal depth (m)d (m)k0/y0 (m)
0.0001 7,000 0.020 7.0 1.5 2.5 0.2 – 
II 0.0001 3,000 0.020 7.0 1.5 2.5 0.2 – 
III 0.0001 3,000 0.020 7.0 1.5 2.5 0.2 – 
IV 0.0001 4,000 0.020 6.0 1.5 2.3 0.2 – 
0.0001 4,000 0.020 6.0 1.5 2.3 0.2 – 
VI 0.0001 3,000 0.020 5.0 1.5 1.9 0.2 – 
VII 0.0001 2,000 0.020 5.0 1.5 1.9 0.2 – 
VIII 0.0001 2,000 0.020 5.0 1.5 1.9 0.2 1.08465229/0.85 
PoolS0 (m/m)Pool length (m)nb (m)m (m/m)Canal depth (m)d (m)k0/y0 (m)
0.0001 7,000 0.020 7.0 1.5 2.5 0.2 – 
II 0.0001 3,000 0.020 7.0 1.5 2.5 0.2 – 
III 0.0001 3,000 0.020 7.0 1.5 2.5 0.2 – 
IV 0.0001 4,000 0.020 6.0 1.5 2.3 0.2 – 
0.0001 4,000 0.020 6.0 1.5 2.3 0.2 – 
VI 0.0001 3,000 0.020 5.0 1.5 1.9 0.2 – 
VII 0.0001 2,000 0.020 5.0 1.5 1.9 0.2 – 
VIII 0.0001 2,000 0.020 5.0 1.5 1.9 0.2 1.08465229/0.85 

Even though original tests present gravity off-takes, withdrawals are modeled as scheduled pumping stations, and take place even when water depth is lower than the orifice center (y0). This is one of the more difficult situations to control because withdrawal ratios are scheduled and are independent from water depth in the canal. Only the last canal pool is discharged through orifices. Sluicegate discharge coefficients are cd= 0.61. Prediction horizon length is 12 hr in both scenarios. The control objective consists of bringing the canal from an initial steady-state to a final one by following the respective set scheduled gate trajectories.

In the Maricopa Stanfield Canal test, prediction horizon is divided into λ= 144 regulation periods about ΔT= 300 [s] (5 min), and in the Corning Canal test into λ= 48 pieces about ΔT= 900 [s] (15 min). Pumping discharge ratios remain constant during the first 2 hours of the prediction horizon until, at instant 2 h, whole ratios change to the final state ratio until the horizon ends. Then, they remain constant again until the prediction horizon ends. The upstream boundary condition keeps constant along the prediction horizon at a water depth: HUP= 1.1 [m] for the Maricopa Stanfield case and HUP = 3.0 [m] for the Corning Canal case.

Figure 9 shows the gate trajectories that bring both canals from the initial state to the final one described in Figure 8. Feasible gate opening changes are constrained into 3 cm for the case of Maricopa Stanfield and 10 cm for the Corning Canal case.

Figure 8

Initial and final steady-states for the Maricopa Stanfield and Corning Canal test cases.

Figure 8

Initial and final steady-states for the Maricopa Stanfield and Corning Canal test cases.

Figure 9

Piece-wise gate trajectories along the prediction horizon. They are constituted by λ= 144 pieces of ΔT= 300 s for the Maricopa Stanfield example and by λ= 48 pieces of 900 s for the Corning Canal case.

Figure 9

Piece-wise gate trajectories along the prediction horizon. They are constituted by λ= 144 pieces of ΔT= 300 s for the Maricopa Stanfield example and by λ= 48 pieces of 900 s for the Corning Canal case.

Simulation results

When gate trajectories shown in Figure 9 are applied to the models described by the characteristics given in Tables 1 and 2, the evolution of the water depths at checkpoints is shown in Figure 10.

Figure 10

Water depth hydrographs at checkpoints 2 to 9 obtained by the parallel algorithm along the 12 hr-long prediction horizon.

Figure 10

Water depth hydrographs at checkpoints 2 to 9 obtained by the parallel algorithm along the 12 hr-long prediction horizon.

As seen in Figure 10, water depths show small oscillations around values of the target, which are given in Table 3. These oscillations are sufficiently small to consider being good enough for the regulation. This occurs except at CP 9, where the water level rises up about 10 cm from the target for approximately 15 min, but without overflowing.

Table 3

Definition of the initial and final steady-states of the prediction horizons

PoolMaricopa initial and final water depths at downstream (m)Maricopa initial flow (m3/s)Maricopa final flow (m3/s)Corning Canal initial and final water depths at downstream (m)Corning Canal initial steady-state discharge (m3/s)Corning Canal final steady-state discharge (m3/s)
0.9 2.0 2.0 2.1 2.7 13.7 
II 0.9 1.8 1.8 2.1 2.5 12.0 
III 0.8 1.6 1.4 2.1 2.2 10.2 
IV 0.9 1.4 1.4 1.9 2.0 7.5 
0.9 1.2 1.4 1.9 1.7 7.2 
VI 0.8 1.0 1.4 1.7 1.5 7.0 
VII 0.8 0.8 1.1 1.7 1.2 6.2 
VIII 0.8 0.6 0.9 1.7 1.0 5.0 
PoolMaricopa initial and final water depths at downstream (m)Maricopa initial flow (m3/s)Maricopa final flow (m3/s)Corning Canal initial and final water depths at downstream (m)Corning Canal initial steady-state discharge (m3/s)Corning Canal final steady-state discharge (m3/s)
0.9 2.0 2.0 2.1 2.7 13.7 
II 0.9 1.8 1.8 2.1 2.5 12.0 
III 0.8 1.6 1.4 2.1 2.2 10.2 
IV 0.9 1.4 1.4 1.9 2.0 7.5 
0.9 1.2 1.4 1.9 1.7 7.2 
VI 0.8 1.0 1.4 1.7 1.5 7.0 
VII 0.8 0.8 1.1 1.7 1.2 6.2 
VIII 0.8 0.6 0.9 1.7 1.0 5.0 

Maricopa Stanfield diffusive error

The Maricopa Stanfield case has been chosen to analyze the accuracy of the parallel algorithm since the water depth profiles are highly curved, leading to large diffusive errors. The left-hand panel of Figure 8 shows the highly curved water depth profiles for the Maricopa Stanfield scenario. In this situation, using straight lines in interpolation generates significant errors, which can be reduced by using fine discretization. In order to show this, four different levels of discretization have been implemented, that is, from Δx= 10.0 to Δx= 0.5. As Table 4 shows, diffusive error is reduced when discretization is fine enough. Therefore, the accuracy of the Hartree scheme is good enough when a fine discretization is used.

Table 4

Maricopa Stanfield scenario results

Cell size Δx (m)NodesεProcessing time (s)
10 958 0.04213 2.4 
1,908 0.02317 5.7 
9,508 0.00497 63.0 
0.5 19,008 0.00251 180.0 
Cell size Δx (m)NodesεProcessing time (s)
10 958 0.04213 2.4 
1,908 0.02317 5.7 
9,508 0.00497 63.0 
0.5 19,008 0.00251 180.0 

A computer with a CPU (Intel® Core™ 2 Duo CPU E7500 @ 2.93 GHz × 2) and a GPU (NVIDIA GeForce GT 710/PCIe/SSE2 710 CUDA cores running at 954 MHz) has been used to simulate these models. As the last column in Table 4 shows, processing times are low enough to use this numerical scheme in the real-time canal control context.

Corning canal scenario diffusive error

When the HEC-RAS model discretized by a space cell size about Δx= 10 m runs constrained by a time step size lower than 30 s, similar results to the Hartree method are obtained (Figure 11).

Figure 11

Discharge hydrograph by HEC-RAS model (left) and parallel algorithm (right).

Figure 11

Discharge hydrograph by HEC-RAS model (left) and parallel algorithm (right).

From Figure 11, the following comments can be made:

  • Simulation results can be considered as equivalent.

  • The gate trajectories carry the canal to the final steady state.

  • The canal only needs a transient of about 3 hours to achieve the final steady-state.

  • The differences between both hydrographs are due to the fact that the gate formulation used in the HEC-RAS model is different from the one used here (11).

  • Curves are drawn using 721 points, that is a point every minute.

As Table 5 shows, the parallel algorithm has an equivalent diffusive error as the HEC-RAS model.

Table 5

Numerical diffusive error index from parallel algorithm and HEC-RAS model

Parallel algorithm ɛ = 0.0007987 
HEC-RAS model ɛ = 0.0007374 
Parallel algorithm ɛ = 0.0007987 
HEC-RAS model ɛ = 0.0007374 

Corning canal scenario processing time evaluation

To test the parallel algorithm performance, five Corning Canal models with different level of spatial discretization were implemented, and the processing time required by each model was compared.

Two parallel versions were implemented, one in CPU and one in CPU-GPU, the first being a version implemented in FORTRAN and the second one using C-CUDA. The characteristics of the hardware where the tests were run are:

  • Equipment 1: A computer with a CPU (Intel® Core™ 2 Duo CPU E7500 @ 2.93 GHz × 2) and a GPU (NVIDIA GeForce GT 710/PCIe/SSE2 710 CUDA cores running at 954 MHz). It can be considered as a representation of the cheapest equipment that can found in the market.

  • Equipment 2: A computer Dell PowerEdge R720 (2 processor Intel Xeon (R) CPU E5-2620v2 @ 2.1 GHz × 6) and a GPU (NVIDIA K40 GPU 2880 CUDA cores running at 745 MHz, Kepler Architecture, 12GB DDR5). This can be considered as a moderate server of a data center.

  • Equipment 3: A computer Dell PowerEdge R740 (2 Intel Xeon Gold 6138 @ 2 GHz × 20 processor) and a GPU (NVIDIA Tesla V100 GPU 5120 CUDA cores running at 1,370 MHz, 32GB DDR5). This corresponds to a last generation server from a data center.

Performance test 1

In this test, the processing time corresponding to three levels of spatial discretization are compared (Δx= 10 m, Δx= 5 m, and Δx= 1 m). For this test, three different solvers were implemented. The first is a serial version on CPU using the FORTRAN language, the second is a parallel CPU implementation using OpenMP with a shared memory strategy, and the third is a parallel version in C-CUDA which runs on CPU-GPU as described in the ‘Parallel algorithm’ section. It is worth mentioning that the gfortran-7 version is used for the CPU code, so the code is translated to C by the compiler before compilation. On the other hand, the parallel CPU version is compiled with the ‘-fopenmp’ directive, that allows activation of the parallel execution. The three solvers were run on Equipment 1, 2, and 3, using 2, 12, and 40 CPU-cores for the parallel CPU version, respectively, while the CPU-GPU version uses the graphics card of each equipment. Table 6 summarizes the execution times obtained in each case.

Table 6

Processing time of the performance test 1

EquipmentCell size Δx10 m5 m1 m# Cores
# Nodes2,8085,60828,008
Serial version 25.80 s 103.8 s 2,672.4 s 1 (CPU) 
Parallel CPU 15.01 s 55.80 s 1,382.4 s 2 (CPU) 
Parallel GPU 3.44 s 10.57 s 232.7 s 710 (GPU) 
Serial version 42.22 s 169.6 s 4,193.5 s 1 (CPU) 
Parallel CPU 5.07 s 19.38 s 460.9 s 12 (CPU) 
Parallel GPU 0.837 s 1.705 s 12.34 s 2,880 (GPU) 
Serial version 25.17 s 100.2 s 2,501.2 s 1 (CPU) 
Parallel CPU 2.122 s 6.976 s 142.0 s 40 (CPU) 
Parallel GPU 0.288 s 5.578 s 3.011 s 5,120 (GPU) 
EquipmentCell size Δx10 m5 m1 m# Cores
# Nodes2,8085,60828,008
Serial version 25.80 s 103.8 s 2,672.4 s 1 (CPU) 
Parallel CPU 15.01 s 55.80 s 1,382.4 s 2 (CPU) 
Parallel GPU 3.44 s 10.57 s 232.7 s 710 (GPU) 
Serial version 42.22 s 169.6 s 4,193.5 s 1 (CPU) 
Parallel CPU 5.07 s 19.38 s 460.9 s 12 (CPU) 
Parallel GPU 0.837 s 1.705 s 12.34 s 2,880 (GPU) 
Serial version 25.17 s 100.2 s 2,501.2 s 1 (CPU) 
Parallel CPU 2.122 s 6.976 s 142.0 s 40 (CPU) 
Parallel GPU 0.288 s 5.578 s 3.011 s 5,120 (GPU) 

From Table 6, the following comments can be made:

  • When the HEC-RAS model runs on Equipment 1 constrained with a user-defined time step size about 30 s, it takes about 17 s to complete the simulation. When the step size is 10 s, it takes about 28 s. It is important to note that only a HEC-RAS serial version is available.

  • The processing times for the serial version of the code are similar in the three sets of computer equipment. This is due to the fact that the clock frequencies of the CPUs have not evolved in recent years (2006, 2013, 2017, respectively, for each CPU), moreover, Equipment 2 has a lower clock frequency consuming more time. Regarding the parallel CPU version, it is shown that the code scales considerably with the number of parallel processes, obtaining gains of 1.9, 9.1, and 17.6 compared to the serial versions in the case of Δx= 1 m, which is the most expensive computationally.

  • For the GPU versions, a substantial improvement can be seen in computing time regarding the parallel solver in CPU. The speed-up obtained with the different GPU versions is compared with the faster CPU parallel version (Equipment 3). The speed-up for the GT710 card is 0.6 for all discretization levels, due to the saturation of the relatively few GPU cores that the card has. Speed-ups from 2.5 to 11.5 were obtained on the K40 card and from 7.3 to 47.2 on the V100 card, showing that the code scales very well when refining the mesh and/or increasing the number of GPU cores available.

  • As the finest case (Δx= 1 m) shows, it is possible to solve relatively large problems on GPU in a reasonable time. Moreover, when using the parallel GPU version, the time required to solve the problem depends only on the number of available threads. Since this method implies trivial parallelism, it would be easy to use multi-GPU allowing larger problems to be solved over long periods of time. Implementation of the Hartree method in fine meshes reduces interpolation errors and, according to the times obtained in GPU, it is suitable to use on optimization problems.

  • Finally, it is worth mentioning that the cost of a small card like the GT710 (2017) is around 50 USD, while the Xeon Gold 6138 (2017) processors are around 5,224 USD, considering that the processing time for both configurations is of the same order (speed-up of 0.6 in favor of the CPU). If the costs of the K40 and V100 cards (approximately USD 750 and USD 10,000, respectively) are considered, a speed-up of 11.5 and 47.2 in the processing time may not be as convenient from a cost/speed-up point of view. However, we emphasize that a hydraulic moderator can obtain acceptable results in terms of time and precision at a very low cost when using low-profile graphics cards.

Performance test 2

The purpose of the second test is to check the relationship between the processing time required by a time step and the cell size of the model, while keeping the number of time steps constant. As required by the CFL condition, the time step size Δt depends on the cell size Δx, and if the number of time steps is required to be changed, only the processing time can be modified. However, since the Hartree scheme is linear, one can easily estimate the processing time in an approximate way. For example, by doubling the cell size the processing time must be divided by 2 to obtain the same number of time steps.

Five numerical models were implemented, each one with different discretization levels (Δx= 1 m, Δx= 2.5 m, Δx= 5 m, Δx= 7.5 m, and Δx= 10 m). A processing time has been determined for each level. The tests were performed on Equipment 1 using the GPU parallel solver. Results can be found in Table 7.

Table 7

Results of the performance test 2

Δx (m)PH (hr)TS (step)PT (s)η = PT/TS (ms/step)
1.0 1.2 28,498 24.216 0.850 
2.5 27,390 9.828 0.359 
5.0 27,375 5.263 0.192 
7.5 27,366 3.829 0.140 
10.0 12 27,358 3.439 0.126 
1.0 1.75 39,825 33.915 0.852 
2.5 36,514 13.116 0.359 
5.0 36,485 7.188 0.197 
7.5 12 36,470 5.146 0.141 
10.0 16 36,461 4.294 0.118 
1.0 2.5 57,145 48.548 0.850 
2.5 54,740 19.739 0.361 
5.0 12 54,689 10,567 0.193 
7.5 18 54,662 7,634 0.140 
10.0 24 54,637 6,433 0.118 
Δx (m)PH (hr)TS (step)PT (s)η = PT/TS (ms/step)
1.0 1.2 28,498 24.216 0.850 
2.5 27,390 9.828 0.359 
5.0 27,375 5.263 0.192 
7.5 27,366 3.829 0.140 
10.0 12 27,358 3.439 0.126 
1.0 1.75 39,825 33.915 0.852 
2.5 36,514 13.116 0.359 
5.0 36,485 7.188 0.197 
7.5 12 36,470 5.146 0.141 
10.0 16 36,461 4.294 0.118 
1.0 2.5 57,145 48.548 0.850 
2.5 54,740 19.739 0.361 
5.0 12 54,689 10,567 0.193 
7.5 18 54,662 7,634 0.140 
10.0 24 54,637 6,433 0.118 

Δx [m] = cell size; PH [hr] = prediction horizon; TS [st] = total number of time steps; PT [ms] = processing time; and time-per-step ratio η = PT/TS in milliseconds per step.

Viewing the last column, it is possible to conclude that each cell size has its own approximate ratio η, independently from Prediction Horizon, so that, η depends uniquely on cell size Δx. It means that the processing time has to be proportional to Prediction Horizon, as expected. Figure 12 shows the curve η vs Δx which is obtained from Table 7.

Figure 12

Trial 2 results: time-per-step ratio vs level of discretization.

Figure 12

Trial 2 results: time-per-step ratio vs level of discretization.

From Figure 12, it can be concluded that the curve η vs Δx is clearly nonlinear. The ratio η by the models Δx= 5 m, Δx= 7.5 m, and Δx= 10 m shows small variations among each other, but models Δx= 1 m, Δx= 2.5 m, and Δx= 5 m show larger variations. That could indicate a limit point below which, the ratio of time-per-step η increases strongly. That suggests the existence of a trend change limit point. Obviously, this point is associated with a specific hardware, in this case, Equipment 1, and decays considerably in Δx= 2.5 m. In other words, discretization below Δx= 5 m mean that this execution configuration does not have enough ‘parallel capacity’ and, therefore, the model is inappropriate for such configuration.

SUMMARY AND CONCLUSIONS

Due to its lack of precision, using the Hartree method was dropped in open canal flow simulations. However, it should be taken into account that giving up this method occurred in a context with less computing power, where the thick grid used was mandatory and the interpolation error very large. The Maricopa Stanfield benchmark test 1.2 by the ASCE Task Committee on Canal Automation Algorithms has been used for evaluating the accuracy of the parallel algorithm. The example has been chosen since water depth profiles are highly curved, and this fact is amenable to obtaining large diffusive errors. It has been found that diffusive error is reduced when discretization is fine enough. Therefore, the accuracy of the Hartree scheme can be considered good enough when a fine discretization is used. The use of the parallel codes makes it possible to implement models with a high level of discretization that can be solved in relatively small times.

Parallel codes in CPU and GPU were developed for the resolution of free-surface flow for open channels. Numerical parallelization was achieved by using the explicit Hartree method applied to the Saint-Venant equations, and by making linear the equations for external and internal boundary conditions. As the trajectories of the gates, boundary conditions, and off-take withdrawals can be modeled with piece-wise functions due to the flow behavior in irrigation canals, the prediction horizon has been discretized in Regulation Period and Jump Step. Taking these ideas into account, a solver has been implemented that solves the non-linear Jump Step part on the CPU and the linear Regulation Period part on the GPU.

The Corning Canal benchmark test 2.2 by the ASCE Task Committee on Canal Automation Algorithms was used for the performance evaluation. It was found that the parallel version in C-CUDA in GPU allows the problem to be solved in much less time than the FORTRAN version in CPU.

Finally, a study of the sensitivity of the ratio of computation speed to the level of discretization has been carried out. As a result, the existence of a trend change limit point is suggested. This point is associated with specific computer equipment used by the calculations. Discretization below this point gives lower speed ratios and their behavior seems to be linear and less sensitive to changes in discretization. In contrast, dense discretization is more sensitive to changes in the cell size.

DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.

REFERENCES

REFERENCES
Abbott
M. B.
1979
Computational Hydraulics. Elements of the Theory of Free Surface Flows
.
Pitman Publishing
,
London
,
324
p.
Aissa
M.
Verstraete
T.
Vuik
C.
2017
Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured meshes
.
Computers & Mathematics with Applications
74
(
1
),
201
217
.
Ames
W. F.
1977
Numerical Methods for Partial Differential Equations
.
Academic Press Inc.
,
Orlando, FL
,
USA
.
Anders
A.
Kandrot
E.
2010
CUDA by Example. An Introduction to General-Purpose GPU Programming
.
Pearson Education
,
Boston, MA
,
USA
.
Barkau
R. L.
1992
UNET, One-Dimensional Unsteady Flow Through a Full Network of Open Channels
.
Computer Program
,
St. Louis, MO
,
USA
.
Bautista
E.
Clemmens
A. J.
1999
Response of ASCE task committee test cases to open-loop control measures
.
Journal of Irrigation and Drainage Engineering
125
(
4
),
179
188
.
Bessone
L.
Gamazo
P.
Storti
M.
2018
Performance evaluation of different schemes for the resolution of a nonlinear diffusion equation in GPGPU
.
Mecánica Computacional
XXXVI
,
605
625
.
Bondarenco
M.
Gamazo
P.
Ezzatti
P.
2017
A comparison of various schemes for solving the transport equation in many-core platforms
.
The Journal of Supercomputing
73
(
1
),
469
481
.
Clemmens
A. J.
Grawitz
B.
Schuurmans
W.
1998
Test cases for canal control algorithms
.
Journal of Irrigation and Drainage Engineering
124
(
1
),
23
30
.
Cozzolino
L.
Cimorelli
L.
Covelli
C.
Morte
R. D.
Pianese
D.
2015
The analytic solution of the Shallow-Water Equations with partially open sluice-gates: the dam-break problem
.
Advances in Water Resources
80
,
90
102
.
http://doi.org/10.1016/j.advwatres.2015.03.010
.
Crandall
S. H.
1956
Engineering Analysis – A Survey of Numerical Procedures
.
Wiley Interscience
,
New York
,
USA
.
Ducheteau
P.
Zachmann
D. W.
1988
Ecuaciones Diferenciales Parciales. (Partial Differential Equations)
.
McGraw-Hill
,
Mèxico City, Mexico
.
Gómez
M.
1988
Contribución al estudio del movimiento variable en lámina libre, en las redes de alcantarillado. Aplicaciones. (Contribution to Study of the Unsteady Free Surface Flow in Sewer Systems. Applications.)
Doctoral thesis
,
UPC
,
Barcelona
,
Spain
.
Hodges
B. R.
Liu
F.
2019
Timescale interpolation and no-neighbour discretization for a 1D finite-volume Saint-Venant solver
.
Journal of Hydraulic Research
.
https://doi.org/10.1080/00221686.2019.1671510
Katapodes
N. D.
2016
Free-Surface Flow: Computational Methods
.
Butterworth-Heinemann
,
Oxford
,
UK
.
Litrico
X.
Fromion
V.
2009
Modeling and Control of Hydrosystems
.
Springer
,
Dordrecht
,
Heidelberg, London, New York
.
Liu
F.
Feyen
J.
Malaterre
P. O.
Baume
J. P.
Kosuth
P.
1998
Development and evaluation of canal automation algorithm CLIS
.
Journal of Irrigation and Drainage Engineering
124
(
1
),
40
46
.
http://doi.org/10.1061/(ASCE)0733-9437(1998)124:1(40)
.
Martín-Sanchez
J. M.
Rodellar
J.
1996
Adaptive Predictive Control: From the Concepts to Plant Optimization
.
Prentice Hall
,
London
,
UK
.
NVIDIA Corporation 2012
Cuda C Best Practices Guide, Version 4.1, Santa Clara, CA, USA
.
Ostad-Ali-Askari
K.
Shayannejad
M.
Eslamian
S.
2018
Comparison of solutions of Saint-Venant equations by characteristics and finite difference methods for unsteady flow analysis in open channel
.
International Journal of Hydrology Science and Technology
8
(
3
).
doi:10.1504/IJHST.2018.10014535
.
Preissmann
A.
1961
Propagation des intumescences dans les canaux et rivières. (Propagation of the Mobile Jump in Canals and Rivers)
.
First Congress of the French Association for Computation
,
Grenoble
,
France
, pp.
433
442
.
Soler
J.
Gómez
M.
Rodellar
J.
2013a
Goroso: a feedforward control algorithm for irrigation canals based on sequential quadratic programming
.
Journal of Irrigation and Drainage Engineering
139
(
1
),
41
54
.
Soler
J.
Gómez
M.
Rodellar
J.
Gamazo
P.
2013b
Application of the GoRoSo feedforward algorithm to compute the gate trajectories for a quick canal closing in the case of an emergency
.
Journal of Irrigation and Drainage Engineering
139
(
12
),
1028
1036
.
10.1061/(ASCE)IR.1943-4774.0000640
.
Soler
J.
Gómez
M.
Bonet
E.
2015
Canal monitoring algorithm
.
Journal of Irrigation and Drainage Engineering
142
(
3
).
https://doi.org/10.1061/(ASCE)IR.1943-4774.0000982
Whitley
D.
1994
A genetic algorithm tutorial
.
Statistics and Computing
4
(
2
),
65
85
.
https://doi.org/10.1007/BF00175354
.
Wylie
E. B.
1969
Control of transient free-surface flow
.
Journal of the Hydraulics Division
95
,
347
362
.
Yang
J. C.
Chiu
K. P.
1993
Use of characteristics method with cubic interpolation for unsteady-flow computation
.
International Journal for Numerical Methods in Fluids
16
,
329
345
.