The calibration of parameters in complex systems usually requires a large computational effort. Moreover, it becomes harder to perform the calibration when non-linear systems underlie the physical process, and the direction to follow in order to optimize an objective function changes depending on the situation. In the context of shallow water equations (SWE), the calibration of parameters, such as the roughness coefficient or the gauge curve for the outlet boundary condition, is often required. In this work, the SWE are used to simulate an open channel flow with lateral gates. Due to the uncertainty in the mathematical modeling that these lateral discharges may introduce into the simulation, the work is focused on the calibration of discharge coefficients. Thus, the calibration is performed by two different approaches. On the one hand, a classical Monte Carlo method is used. On the other hand, the development and application of an adjoint formulation to evaluate the gradient is presented. This is then used in a gradient-based optimizer and is compared with the stochastic approach. The advantages and disadvantages are illustrated and discussed through different test cases.

## INTRODUCTION

Calibration is related to the optimization process of finding the minimum of a functional usually defined as a relation between the objective quantity and that simulated. There are many fields where calibration is required. This process is more necessary when using black-box models or simplified deterministic models. In some ways, the model may acquire a predictive character when calibrated. The calibration can be understood as an optimization process, where the error between the model estimate and the data set is minimized.

In addition, there are many phenomena that can be modeled using partial differential equations (PDEs) that represent conservation laws. The application of mathematical models to describe physical processes is helpful when performing simulations of possible scenarios using computational facilities (Burguete *et al.* 2014). Despite the quality of these mathematical models for predictive purposes, some parameters underlie the simplifications in the model. Consequently, calibration is usually required to adjust the model to the physical reality (Lacasta *et al.* 2015b).

The application of the shallow water equations (SWE) for the simulation of open-channel flow has been widely used, in particular, for irrigation water delivery (Chanson 2004; Chaudhry 2007). The equations can also be extended to include regulation elements. The flow in those hydraulic structures does not follow the shallow water hypothesis. Their presence is usually modeled by means of steady state continuity and head loss equations that somehow represent what lies behind these phenomena (Morales-Hernández *et al.* 2013).

There are three broad optimizer families: gradient-based methods, derivative-free search algorithms, and evolutionary methods (Chaparro *et al.* 2008). This work deals with the first and second families. While the latter do not require gradient information to perform the calibration, the former must have information about the variation of the objective function with respect to the controlled variable. The advantage of these methods appears when the performance function is sufficiently smooth, and the first derivative, with respect to the controllable parameter, drives the optimization method to the optimal value. On the contrary, and when local minima exist, these methods can be inefficient and may also provide non-optimal solutions. These situations can be avoided using other global optimization methods not based on gradient information, such as stochastic methods. While they can probably more easily detect the optimal interval in the global solution space, they are expensive in terms of evaluations.

In this case, the use of Lagrange multipliers within variational principles leads to the definition of adjoint variables and the corresponding differential equations governing them. Their solutions can help to obtain the sensitivity of a pre-defined objective function. This family of methods has largely been developed over the last 30 years (Giles & Pierce 2000). However, and due to increasing computational capacity, adjoint methods have been applied in different forms in many fields: from airfoil shape optimal design (Castro *et al.* 2007; Bueno-Orovio *et al.* 2012) to estimating *Greeks* in the financial derivatives industry (Giles & Glasserman 2006). In the context of free surface flow hydraulics, the use of the adjoint method to obtain the adjoint system of equations intended for control was described in Sanders & Katopodes (2000), Piasecki & Sanders (2002), and Ding & Wang (2006). Interesting applications have also been recently described for data assimilation (Lai & Monnier 2009; Lacasta & García-Navarro 2016) and field data reconstruction (Lacasta *et al.* 2015a). Such works highlight the capacity of the adjoint method for providing an estimate of the gradient that can be used for control purposes. Regarding hydraulic model calibration, the widest application is the identification of roughness parameters. There are several works on this, such as Fread & Smith (1978) and Wasantha Lal (1995). In addition, in Pappenberger *et al.* (2005), an application of a Monte Carlo (MC)-based optimization procedure for roughness parameter identification can be found, while in Ding & Wang (2005), the adjoint equations of the one-dimensional (1D) SWE are used for the same purpose.

The non-linear SW flow equations must be solved numerically and this can be done by different numerical methods. Several techniques applied to such equations have been reported recently as promising, such as second-order MUSCL schemes (Hou *et al.* 2015) and higher order numerical schemes (Díaz *et al.* 2013; Navas-Montilla & Murillo 2015). Also, the emerged discontinuous Galerkin method has been used to compute them (Xing *et al.* 2010; Kesserwani & Liang 2015; Kesserwani *et al.* 2015). Despite the apparent good properties of these methods, the enormous computational need may be a drawback when dealing with practical applications. In addition, not only the accuracy, but also the quality in terms of conservation and well-balancing of the numerical solution is desirable when dealing with these equations (Noelle *et al.* 2006). The use of a first order upwind (FOU) scheme, such as that of Murillo & García-Navarro (2010), has been demonstrated to be a good option because of the acceptable trade-off between numerical quality and computational requirements. In the context of SWE, the use of first-order schemes is common, since in the majority of the flows concerning realistic applications, the source terms dominate over the convective terms, second-order diffusive terms are not included in the formulation and the difference between the numerical results from first- and second-order models becomes negligible (Petaccia & Savi 2002; Murillo *et al.* 2009). In this work, forward solver and adjoint solver are both based on a well-balanced and exhaustively tested FOU finite volume scheme.

The objective of this work is to compare two widely used approaches for the calibration of a computational model based on the SWE. On the one hand, a gradient descent (GD) method is used as an example of gradient-based optimization. On the other hand, the use of MC following the method of Burguete & Latorre (2015) is included as an example of *brute-force* calibration methodology. In this work, the calibration is aimed at the estimation of a discharge coefficient in order to model the outflow from a lateral hydraulic structure. In order to evaluate the performance of each technique, three cases different in nature are proposed. They are related to the calibration of (1) an approximately reachable situation (the error in the optimal solution is not equal to zero), (2) an affordable target level at some point, and (3) an affordable target level located in a position separated by an internal boundary condition (a sluice gate). The structure of the paper is as follows: below, the mathematical model and the adjoint formulation are described. The next section includes the numerical scheme used to obtain a discrete solution for both problems and then the mathematical optimization problem is formulated, emphasizing the gradient-free optimization method used in this work as well as the gradient formulation based on the adjoint information. This is followed by a section in which the gradient-based optimizer and MC method are compared in three different cases, and finally, conclusions are drawn.

## MATHEMATICAL MODEL

### Shallow water equations

*et al.*1980; Chanson 2004; Chaudhry 2007). A one-dimensional flow in a unitary fixed-width channel can be expressed as follows: where the conserved variables (

*h*,

*q*) are the water depth and unitary discharge, respectively defined in space and time and

*g*the gravitational acceleration.

*n*is the Gauckler–Manning's roughness coefficient, and the bed slope depends on bottom depth

*z*. Additionally, the hydraulic radius is the ratio of the channel's cross-sectional area

*A*to the wetted perimeter

*P*. The lateral discharge is considered as a mass sink term that can be formulated as:

*h*is the upstream water depth at the gate location and is the gate opening. Figure 1 illustrates a channel including a lateral gate as described. It is important to take into account that (3) deals with the formulation of a simplistic Bernoulli-based formulation of the flow assuming atmospheric pressure downstream of the gate. This implies that the discharge rate depends only on the water depth upstream of the gate. can be used to account for some uncertainties related to losses due to different causes such as construction irregularities, and obstruction or bed changes from sediment deposition with slope changes. This quantity is bounded by when realistic situations are simulated (Henderson 1996). The source term in (3) is defined at the location of the gate and it is assumed that it will vanish in the rest of the channel. It is important to note that some authors (Piasecki & Sanders 2002; Baume

*et al.*2005; Fovet

*et al.*2010) include the lateral flow influence in the momentum equation, whereas others, such as Malaterre

*et al.*(1998), Ding & Wang (2006), and Xu

*et al.*(2011), prefer to neglect that effect. In this work, the lateral offtake is considered a point mass contribution without effect on the momentum equation.

### Performance function

*T*) is the simulated time period and are the coordinates of the reach under consideration. The error metric may be defined by considering measured water depths. A very typical function to measure the deviation between an estimation and a measurement is the mean square error (MSE) which may include saddles, but no local minimum; this may be convenient for some minimization methods.

### Adjoint formulation

**P**is obtained by multiplying the mass conservation equation by the adjoint variable (also known as Lagrange multiplier) and the momentum conservation equation by the adjoint variable :

**P**can be integrated by parts leading to: Assuming first order optimality condition for These variations on (7) can be formulated taking increments with respect to

*h*and

*q*. Since in this case we need to adjust the discharge coefficient , variations of (6) with respect to its dependent variable are also included leading to: Increments are also taken in the objective function

**J**: Considering (7), whenever the following is satisfied: can be rewritten as:

Equation (14) is then defined as the sensitivity of the error and will be very useful in the development of the gradient expression to be included in the optimization method. Due to the dependence on (14) with respect to , (10) must be solved numerically.

## NUMERICAL SCHEME

*u*

*=*

*q*/

*h*is the velocity and the wave celerity. The second is the adjoint system (10) comprising : where Equations (15) and (17) can be solved using the same numerical technique. In this case, both are solved using a finite volume formulation using an explicit Euler scheme for the time integration and a FOU scheme for the spatial integration (Burguete & García-Navarro 2004). For a given cell

*i*with edges and

*i*+ 1/2, a locally linearized problem is formulated. Using an upwind Riemann solver, the updating expression for the physical equations follows the scheme: where is the integration time step and the distance between the center of each computational cell. The updating procedure (19) takes into account the sign of the contribution from each edge, where the approximate eigenvalues can be defined as , and the eigenvectors are , . This compact formulation requires one to define with: and where the source term contribution satisfies . It is also worth mentioning that the linearized source terms are discretized using an upwind scheme.

The terms denoted refer to differences between the right and left states of each cell , while are averaged values . It is important to note that this notation is valid only in the numerical scheme framework and it is different from the one used in the adjoint system deduction where denoted the variation operator in time–space domain.

*i*the wave celerity satisfies: One of the most representative drawbacks when solving the adjoint PDEs is the requirement to store information about the physical system at all points in time. The impact in terms of memory requirements can be estimated as taking into account that, in the one-dimensional case, not only the number of cells, but also the required time steps to perform the computation are not too large. Typically when using the 1D approach, problems that model tenths of kilometers using hundreds of cells require minutes to simulate days.

## MATHEMATICAL OPTIMIZATION

Mathematical optimization can be understood as the process of searching for the optimal value that minimizes a given functional. This searching can be done using different techniques. They are usually classified into two main groups: gradient-based and gradient-free optimizers. The former option requires the evaluation of the gradient for use as a guide when looking for the optimum. This is very efficient when dealing with convex problems. Gradient-free methods perform the operation over the whole domain and they are very convenient when multi-modal and noisy functions need to be minimized. These methods offer a suitable technique to cover the domain where the variable to be calibrated can be found.

The most relevant detail when dealing with the MC method is the *brute-force-*based methodology which is based on finding the optimum. In contrast to the gradient-based methods, MC just reduces the window for finding the optimal solution. Another important difference from the gradient-based methodology is the non-deterministic character of MC calibrations.

### MC method

Gradient-free methods usually deal with global optimization problems where the whole process is based on heuristic and stochastic algorithms (Moles *et al.* 2003; Zabinsky 2013; Sedaghatdoost & Ebrahimian 2015). These methods include adaptive stochastic methods such as MC inspired methods. They look for the solution by random sampling. Instead of locating the optimal value by means of the information of the gradient, it is possible to cover the domain to seek the minimum and recursively reduce that domain. This reduction occurs when a set of the best options among all those evaluated is found. There is an interesting discussion of MC methods in Atanassov & Dimov (2008).

**Result:** A new value for closer to the solution that minimizes **J**

**input:****output:**m = 0;

**while****do**

/* **Generate****random trials within the interval** ***/**

;

** /* Best values****defines the new interval */**

;

;

;

*m* + +;

**Algorithm 1:** MC method.

The whole computational cost of this method includes the evaluation of each trial. If the method has not converged before or is reached, the computational cost is in terms of the number of simulations. While the method does not require additional parameters, the complexity comes from the definition of the number of seeds that the method has to include at each time. The trade-off between the maximum number of iterations, , and the number of elements generated per iteration, , is not easy to balance.

### Gradient-descent method

*h*,

*q*), the relation between the error and the parameter to be controlled can be expressed as follows:

*m*denotes the iteration of the method and denotes the step-length used by the method. The optimization procedure using this method is illustrated in Figure 3. Considering the initial value of and obtaining the value of , descent is performed with step-size , leading to the new value of .

The integration of (30) in the full optimization procedure is detailed in Algorithm 2.

**Result:** A new value for closer to the solution that minimizes **input:****output:**

m = 0;

**while****do**

;

**if****then**

**/* converged on critical point** **/***

**return**;

**else**

**if****then**

**/* converged on an****value** **/***

return;

**else**

**if****then**

**/* It has diverged** **/***

;

**else**

;

*m* + +;

**Algorithm 2**: Gradient-descent method.

The algorithm includes a little modification from the classical gradient-descent method regarding the step-length used in the optimization. When using the fixed step-length , convergence is not guaranteed. This is because there are situations in which, when *overshooting* a long way using some value of the gradient, the method may diverge. A typical strategy is the use of an initial that may be increased by a constant factor (1.1 for example) or halved if the functional estimation is amplified. Nonetheless, more sophisticated step-sizes can be applied (Yuan 2008).

The cost of this optimizer considering the computation of the gradient using the adjoint variables is where is the computational cost of the forward shallow water model run and is the cost of the adjoint solver. The amount of computational effort can be approximated by since each iteration needs to solve the problem forwards and backwards in time, and they require the same number of integration time steps as well as the same number of computational operations. Comparing both, the cost of the stochastic method and the cost of the gradient-based method , the performance in terms of evaluations varies depending on: (1) the convergence ratio: whether MC employs the best trials or the gradient method uses faster directions will determine the number of iterations required to find the optimum; and (2) the initial guess/initial interval: the distance of the initial guess to the optimal solution as well as the definition of the adequate interval may have an important impact on the performance of the method.

Additionally, and for each method, the correct choice of the step-size in the gradient-descent method, or the number of trials used in each iteration in the case of stochastic optimization, may also degrade or improve the performance of the method.

The gradient-based method shows an intrinsic weakness compared with the MC method. While each trial of the MC method can be performed in parallel (each simulation can be computed independently), the parallelization of the gradient-based method requires the parallelization of the numerical solver, which is more complicated. In this work, both methods have been analyzed in sequential mode, which means that each trial is run after the previous trial has finished as in the case of the gradient method.

## TEST CASES

In order to benchmark the previously introduced optimization methods, the gradient-based optimization method based on adjoint equations is compared with a MC optimization method. Due to its stochastic nature, MC results can change between one repetition and the next. Nevertheless, this effect has not been analyzed in this work. In addition, the computational cost of both methods is defined as the number of simulations required to converge to the desired state. Otherwise, the measurements may be misleading, because MC processes are intrinsically independent; thus, they may be directly parallelized. In the case of the gradient-based method, the computational optimization of the technique must be applied to the solver of both the direct and the adjoint models. Due to this, performance of each method is evaluated in terms of the number of simulations required to reach their objective functions.

*L*= 1,000

*m*is defined by a constant rectangular unitary width section with . The channel is defined with a constant Gauckler–Manning's roughness coefficient of and a constant slope of 0.011 (see Figure 4).

The optimizers are configured using the same tolerance and, in the case of GD, that tolerance is also used for and . The initial interval for MC covers the entire search domain for the coefficient that is . Furthermore, it is configured to perform the calibration with .

The step-size range for the gradient method is and the initial step-size is defined differently in each case.

### Test case I

In the first test case, one lateral gate is located at and the measured or desired water depth is defined at (see Figure 4). The target function is also plotted in Figure 4.

A constant inlet of is assumed to define the initial steady state condition for the simulated case and is kept constant at all times. Regarding these conditions and using (3), it is possible to find some to adjust the results to the objective water depth . This is an *invented* analytic function which does not correspond to any realistic water depth measurement and is used only for demonstrating that this method is able to adjust the parameter even when the measurement does not correspond to any possible solution. The GD method is configured with . Additionally, the initial guess for the method is .

### Test case II

For the second test case, the objective water depth history is defined by the depths recorded during a previous simulation run with . The setup is essentially the same as the previous one, replacing designed conditions with simulated results. Theoretically, the minimum of the functional in this case should be min **J** = 0. As the gradient method may require some stop condition as a convergence criterion, the tolerance is established to . Moreover, the initial for the gradient method is defined further than in the previous case .

As in the previous case, the GD method shows a higher convergence rate. Not only the speed that the GD method reaches when obtaining the solution, but also the quality of the method providing the gradient estimation is highlighted. While requiring the equivalent of two simulations for each gradient step, the convergence ratio for each iteration is higher than for the MC method. Moreover, GD strictly converges to an optimal value of since is reached. In contrast, the MC method reaches its maximum number of iterations without reaching the tolerance.

**J**is plotted against the different values of obtained by each method. It is possible to observe how the GD method describes the curve better and faster than the MC method.

### Test case III

The third test case is a variation of the second. In this case, two transverse gates have been included. There are several techniques for dealing with such structures in the model. For instance, in Cozzolino *et al.* (2015), a robust and efficient numerical treatment is incorporated into the numerical method, providing a dynamic detection of free and submerged regimes. In the present work, they have been formulated as internal boundary conditions following Morales-Hernández *et al.* (2013), which provides a simpler method good enough for relatively uniform flow regimes. The idea is to analyze the possible influence that these internal boundaries have over the control.

A forward simulation using was performed previously and the temporal evolution at is used as the target water depth.

The result of the calibration using the stochastic methodology provides bounds on the optimal solution with the best option within the interval of with an error of . The GD calibration calculates with . The number of steps for non-gradient optimizations required in the process includes 64 forward simulations, while in the case of the gradient-based simulation only seven are required, and seven more for the adjoint system integration. It represents an acceleration of of the GD method against the MC method.

This case has detected one of the most relevant weaknesses of the MC method: it may not converge to the optimal solution. As in the previous case, the GD method converges to an optimal value of .

*et al.*(2012), a deep analysis of the application of the MC method highlights the dependence of the convergence ratio on the number of trials analyzed and the threshold used. This also implies an increase in the number of simulations. Not only the quality of the calibration in terms of convergence rate, but also the quantity of steps in the process, establish strong differences between these methodologies.

These results highlight a new possibility, where a combination of these methods can obtain the solution more quickly. The MC method can be used to generate the initial guess for the GD method, which then can be used to descend more rapidly to the optimal solution.

## CONCLUSIONS

In this work the comparison of a MC-based optimization and an adjoint method-based optimization method to calibrate discharge coefficients of lateral gates on 1D channels is presented. The techniques have been checked using three test cases.

The adjoint technique has the drawback of the non-generalized way of calculating sensitivities. However, when developing it for a particular PDE system, it offers a powerful procedure for performing optimal control. Being a gradient-based method, it requires the evaluation of the error gradient. The present work has been based on a simple semi-fixed, step-length algorithm which, in some cases, may vary the performance depending on the step-length used. Other gradient-based methods such as BFGS or L-BFGS may be used with the previous information, including an optimal step-length that may improve the convergence rate of the method.

On the contrary, MC methods are capable of obtaining good results with a simpler implementation for this type of problem. Previous information, other than the range where the optimal solution may be found, is not required. This is an advantage when calibration is related to an ill-posed problem or a complex functional, but not for such cases as those presented in this work.

The MC method provides two advantages: independence from the model to be calibrated and the intrinsic capability of working in parallel. On the other hand, the GD method shows a greater convergence rate, but it is sensitive to the initial guess used. Results suggest that a combination of these methods, the MC method for choosing the initial guess and then the GD method for descending in the correct direction, may generate an effective optimization method. The main disadvantage of the GD method based on the adjoint formulation is that it must be carefully reformulated in each different application.

Future work may include the advantage of MC in choosing an appropriate initial guess for use in the gradient-based optimizer. Moreover, the same technique may be applied to the 2D system of equations. However, limitations on the time–space dimension of the problem may be found due to the memory storage requirements of the adjoint method. Nevertheless, more sophisticated objective functions and optimization applications can be addressed in the two-dimensional framework, where calibration is a very common task in practical applications. It is also feasible to develop a very similar technique for related uses, such as the calibration of the friction coefficient used in Manning's formula following the steps presented in the first part of the work and then comparing both methods for this application.

## ACKNOWLEDGEMENTS

This research was partially funded by the Spanish MINECO/FEDER through the Research Project CGL2015-66114-R and by Diputación General de Aragón, DGA, through FEDER funds. The first author was also supported by the Spanish Ministry of Economy and Competitiveness fellowship BES-2012-053691.