The calibration of parameters in complex systems usually requires a large computational effort. Moreover, it becomes harder to perform the calibration when non-linear systems underlie the physical process, and the direction to follow in order to optimize an objective function changes depending on the situation. In the context of shallow water equations (SWE), the calibration of parameters, such as the roughness coefficient or the gauge curve for the outlet boundary condition, is often required. In this work, the SWE are used to simulate an open channel flow with lateral gates. Due to the uncertainty in the mathematical modeling that these lateral discharges may introduce into the simulation, the work is focused on the calibration of discharge coefficients. Thus, the calibration is performed by two different approaches. On the one hand, a classical Monte Carlo method is used. On the other hand, the development and application of an adjoint formulation to evaluate the gradient is presented. This is then used in a gradient-based optimizer and is compared with the stochastic approach. The advantages and disadvantages are illustrated and discussed through different test cases.
INTRODUCTION
Calibration is related to the optimization process of finding the minimum of a functional usually defined as a relation between the objective quantity and that simulated. There are many fields where calibration is required. This process is more necessary when using black-box models or simplified deterministic models. In some ways, the model may acquire a predictive character when calibrated. The calibration can be understood as an optimization process, where the error between the model estimate and the data set is minimized.
In addition, there are many phenomena that can be modeled using partial differential equations (PDEs) that represent conservation laws. The application of mathematical models to describe physical processes is helpful when performing simulations of possible scenarios using computational facilities (Burguete et al. 2014). Despite the quality of these mathematical models for predictive purposes, some parameters underlie the simplifications in the model. Consequently, calibration is usually required to adjust the model to the physical reality (Lacasta et al. 2015b).
The application of the shallow water equations (SWE) for the simulation of open-channel flow has been widely used, in particular, for irrigation water delivery (Chanson 2004; Chaudhry 2007). The equations can also be extended to include regulation elements. The flow in those hydraulic structures does not follow the shallow water hypothesis. Their presence is usually modeled by means of steady state continuity and head loss equations that somehow represent what lies behind these phenomena (Morales-Hernández et al. 2013).
There are three broad optimizer families: gradient-based methods, derivative-free search algorithms, and evolutionary methods (Chaparro et al. 2008). This work deals with the first and second families. While the latter do not require gradient information to perform the calibration, the former must have information about the variation of the objective function with respect to the controlled variable. The advantage of these methods appears when the performance function is sufficiently smooth, and the first derivative, with respect to the controllable parameter, drives the optimization method to the optimal value. On the contrary, and when local minima exist, these methods can be inefficient and may also provide non-optimal solutions. These situations can be avoided using other global optimization methods not based on gradient information, such as stochastic methods. While they can probably more easily detect the optimal interval in the global solution space, they are expensive in terms of evaluations.
In this case, the use of Lagrange multipliers within variational principles leads to the definition of adjoint variables and the corresponding differential equations governing them. Their solutions can help to obtain the sensitivity of a pre-defined objective function. This family of methods has largely been developed over the last 30 years (Giles & Pierce 2000). However, and due to increasing computational capacity, adjoint methods have been applied in different forms in many fields: from airfoil shape optimal design (Castro et al. 2007; Bueno-Orovio et al. 2012) to estimating Greeks in the financial derivatives industry (Giles & Glasserman 2006). In the context of free surface flow hydraulics, the use of the adjoint method to obtain the adjoint system of equations intended for control was described in Sanders & Katopodes (2000), Piasecki & Sanders (2002), and Ding & Wang (2006). Interesting applications have also been recently described for data assimilation (Lai & Monnier 2009; Lacasta & García-Navarro 2016) and field data reconstruction (Lacasta et al. 2015a). Such works highlight the capacity of the adjoint method for providing an estimate of the gradient that can be used for control purposes. Regarding hydraulic model calibration, the widest application is the identification of roughness parameters. There are several works on this, such as Fread & Smith (1978) and Wasantha Lal (1995). In addition, in Pappenberger et al. (2005), an application of a Monte Carlo (MC)-based optimization procedure for roughness parameter identification can be found, while in Ding & Wang (2005), the adjoint equations of the one-dimensional (1D) SWE are used for the same purpose.
The non-linear SW flow equations must be solved numerically and this can be done by different numerical methods. Several techniques applied to such equations have been reported recently as promising, such as second-order MUSCL schemes (Hou et al. 2015) and higher order numerical schemes (Díaz et al. 2013; Navas-Montilla & Murillo 2015). Also, the emerged discontinuous Galerkin method has been used to compute them (Xing et al. 2010; Kesserwani & Liang 2015; Kesserwani et al. 2015). Despite the apparent good properties of these methods, the enormous computational need may be a drawback when dealing with practical applications. In addition, not only the accuracy, but also the quality in terms of conservation and well-balancing of the numerical solution is desirable when dealing with these equations (Noelle et al. 2006). The use of a first order upwind (FOU) scheme, such as that of Murillo & García-Navarro (2010), has been demonstrated to be a good option because of the acceptable trade-off between numerical quality and computational requirements. In the context of SWE, the use of first-order schemes is common, since in the majority of the flows concerning realistic applications, the source terms dominate over the convective terms, second-order diffusive terms are not included in the formulation and the difference between the numerical results from first- and second-order models becomes negligible (Petaccia & Savi 2002; Murillo et al. 2009). In this work, forward solver and adjoint solver are both based on a well-balanced and exhaustively tested FOU finite volume scheme.
The objective of this work is to compare two widely used approaches for the calibration of a computational model based on the SWE. On the one hand, a gradient descent (GD) method is used as an example of gradient-based optimization. On the other hand, the use of MC following the method of Burguete & Latorre (2015) is included as an example of brute-force calibration methodology. In this work, the calibration is aimed at the estimation of a discharge coefficient in order to model the outflow from a lateral hydraulic structure. In order to evaluate the performance of each technique, three cases different in nature are proposed. They are related to the calibration of (1) an approximately reachable situation (the error in the optimal solution is not equal to zero), (2) an affordable target level at some point, and (3) an affordable target level located in a position separated by an internal boundary condition (a sluice gate). The structure of the paper is as follows: below, the mathematical model and the adjoint formulation are described. The next section includes the numerical scheme used to obtain a discrete solution for both problems and then the mathematical optimization problem is formulated, emphasizing the gradient-free optimization method used in this work as well as the gradient formulation based on the adjoint information. This is followed by a section in which the gradient-based optimizer and MC method are compared in three different cases, and finally, conclusions are drawn.
MATHEMATICAL MODEL
Shallow water equations








Performance function




Adjoint formulation
Equation (14) is then defined as the sensitivity of the error and will be very useful in the development of the gradient expression to be included in the optimization method. Due to the dependence on (14) with respect to , (10) must be solved numerically.
NUMERICAL SCHEME
The terms denoted refer to differences between the right and left states of each cell
, while
are averaged values
. It is important to note that this notation is valid only in the numerical scheme framework and it is different from the one used in the adjoint system deduction where
denoted the variation operator in time–space domain.



MATHEMATICAL OPTIMIZATION
Mathematical optimization can be understood as the process of searching for the optimal value that minimizes a given functional. This searching can be done using different techniques. They are usually classified into two main groups: gradient-based and gradient-free optimizers. The former option requires the evaluation of the gradient for use as a guide when looking for the optimum. This is very efficient when dealing with convex problems. Gradient-free methods perform the operation over the whole domain and they are very convenient when multi-modal and noisy functions need to be minimized. These methods offer a suitable technique to cover the domain where the variable to be calibrated can be found.
The most relevant detail when dealing with the MC method is the brute-force-based methodology which is based on finding the optimum. In contrast to the gradient-based methods, MC just reduces the window for finding the optimal solution. Another important difference from the gradient-based methodology is the non-deterministic character of MC calibrations.
MC method
Gradient-free methods usually deal with global optimization problems where the whole process is based on heuristic and stochastic algorithms (Moles et al. 2003; Zabinsky 2013; Sedaghatdoost & Ebrahimian 2015). These methods include adaptive stochastic methods such as MC inspired methods. They look for the solution by random sampling. Instead of locating the optimal value by means of the information of the gradient, it is possible to cover the domain to seek the minimum and recursively reduce that domain. This reduction occurs when a set of the best options among all those evaluated is found. There is an interesting discussion of MC methods in Atanassov & Dimov (2008).




Scheme of the MC-based optimization method. First iteration for a given interval I0 (left) and following iteration with the sub-interval I1 (right).
Scheme of the MC-based optimization method. First iteration for a given interval I0 (left) and following iteration with the sub-interval I1 (right).
Result: A new value for closer to the solution that minimizes J
input:output:
m = 0;
whiledo
/* Generaterandom trials within the interval
*/
;
/* Best valuesdefines the new interval */
;
;
;
m + +;
Algorithm 1: MC method.
The whole computational cost of this method includes the evaluation of each trial. If the method has not converged before or
is reached, the computational cost is
in terms of the number of simulations. While the method does not require additional parameters, the complexity comes from the definition of the number of seeds that the method has to include at each time. The trade-off between the maximum number of iterations,
, and the number of elements generated per iteration,
, is not easy to balance.
Gradient-descent method


Scheme of the gradient-descent method. Evolution of from
(left) to
(right) following the direction
.
Scheme of the gradient-descent method. Evolution of from
(left) to
(right) following the direction
.
The integration of (30) in the full optimization procedure is detailed in Algorithm 2.
Result: A new value for closer to the solution that minimizes
input:
output:
m = 0;
whiledo
;
ifthen
/* converged on critical point /*
return;
else
ifthen
/* converged on anvalue /*
return;
else
ifthen
/* It has diverged /*
;
else
;
m + +;
Algorithm 2: Gradient-descent method.
The algorithm includes a little modification from the classical gradient-descent method regarding the step-length used in the optimization. When using the fixed step-length , convergence is not guaranteed. This is because there are situations in which, when overshooting a long way using some value of the gradient, the method may diverge. A typical strategy is the use of an initial
that may be increased by a constant factor (1.1 for example) or halved if the functional estimation is amplified. Nonetheless, more sophisticated step-sizes can be applied (Yuan 2008).
The cost of this optimizer considering the computation of the gradient using the adjoint variables is where
is the computational cost of the forward shallow water model run and
is the cost of the adjoint solver. The amount of computational effort
can be approximated by
since each iteration needs to solve the problem forwards and backwards in time, and they require the same number of integration time steps as well as the same number of computational operations. Comparing both, the cost of the stochastic method
and the cost of the gradient-based method
, the performance in terms of evaluations varies depending on: (1) the convergence ratio: whether MC employs the best trials or the gradient method uses faster directions will determine the number of iterations required to find the optimum; and (2) the initial guess/initial interval: the distance of the initial guess to the optimal solution as well as the definition of the adequate interval may have an important impact on the performance of the method.
Additionally, and for each method, the correct choice of the step-size in the gradient-descent method, or the number of trials used in each iteration in the case of stochastic optimization, may also degrade or improve the performance of the method.
The gradient-based method shows an intrinsic weakness compared with the MC method. While each trial of the MC method can be performed in parallel (each simulation can be computed independently), the parallelization of the gradient-based method requires the parallelization of the numerical solver, which is more complicated. In this work, both methods have been analyzed in sequential mode, which means that each trial is run after the previous trial has finished as in the case of the gradient method.
TEST CASES
In order to benchmark the previously introduced optimization methods, the gradient-based optimization method based on adjoint equations is compared with a MC optimization method. Due to its stochastic nature, MC results can change between one repetition and the next. Nevertheless, this effect has not been analyzed in this work. In addition, the computational cost of both methods is defined as the number of simulations required to converge to the desired state. Otherwise, the measurements may be misleading, because MC processes are intrinsically independent; thus, they may be directly parallelized. In the case of the gradient-based method, the computational optimization of the technique must be applied to the solver of both the direct and the adjoint models. Due to this, performance of each method is evaluated in terms of the number of simulations required to reach their objective functions.


Sketch of the configuration used in Test case I and II, with the location of the probe and the location
of the lateral gate (upper panel). The lower panel includes objective water depth
at
for Test case I and for Test case II and
axis represents gate opening (m).
Sketch of the configuration used in Test case I and II, with the location of the probe and the location
of the lateral gate (upper panel). The lower panel includes objective water depth
at
for Test case I and for Test case II and
axis represents gate opening (m).
The optimizers are configured using the same tolerance and, in the case of GD, that tolerance is also used for
and
. The initial interval for MC covers the entire search domain for the coefficient
that is
. Furthermore, it is configured to perform the calibration with
.
The step-size range for the gradient method is and the initial step-size is defined differently in each case.
Test case I
In the first test case, one lateral gate is located at and the measured or desired water depth is defined at
(see Figure 4). The target function
is also plotted in Figure 4.
A constant inlet of is assumed to define the initial steady state condition for the simulated case and is kept constant at all times. Regarding these conditions and using (3), it is possible to find some
to adjust the results to the objective water depth
. This
is an invented analytic function which does not correspond to any realistic water depth measurement and is used only for demonstrating that this method is able to adjust the parameter even when the measurement does not correspond to any possible solution. The GD method is configured with
. Additionally, the initial guess for the method is
.
Test case I: Evolution of the water depth at through the calibration process for MC (left) and GD (right) methods.
Test case I: Evolution of the water depth at through the calibration process for MC (left) and GD (right) methods.
Convergence of the error and for Test case I using MC (left) and GD (right).

Test case I: Evolution of with respect to
for each method. Dots represent the different MC evaluations.
Test case I: Evolution of with respect to
for each method. Dots represent the different MC evaluations.
Test case II
For the second test case, the objective water depth history is defined by the depths recorded during a previous simulation run with . The setup is essentially the same as the previous one, replacing designed conditions with simulated results. Theoretically, the minimum of the functional in this case should be min J = 0. As the gradient method may require some stop condition as a convergence criterion, the tolerance is established to
. Moreover, the initial
for the gradient method is defined further than in the previous case
.
Test case II: Evolution of the water depth at through the calibration process for MC (left) and GD (right) methods.
Test case II: Evolution of the water depth at through the calibration process for MC (left) and GD (right) methods.
Convergence of the error and for Test case II using MC (left) and GD (right).
As in the previous case, the GD method shows a higher convergence rate. Not only the speed that the GD method reaches when obtaining the solution, but also the quality of the method providing the gradient estimation is highlighted. While requiring the equivalent of two simulations for each gradient step, the convergence ratio for each iteration is higher than for the MC method. Moreover, GD strictly converges to an optimal value of since
is reached. In contrast, the MC method reaches its maximum number of iterations without reaching the tolerance.

Test case II: Evolution of J with respect to for each method. Dots represent the different MC evaluations.
Test case II: Evolution of J with respect to for each method. Dots represent the different MC evaluations.
Test case III
The third test case is a variation of the second. In this case, two transverse gates have been included. There are several techniques for dealing with such structures in the model. For instance, in Cozzolino et al. (2015), a robust and efficient numerical treatment is incorporated into the numerical method, providing a dynamic detection of free and submerged regimes. In the present work, they have been formulated as internal boundary conditions following Morales-Hernández et al. (2013), which provides a simpler method good enough for relatively uniform flow regimes. The idea is to analyze the possible influence that these internal boundaries have over the control.



Sketch of the channel and the location of the probe , the lateral gate
, and the two cross gates
. The lower panel includes objective water depth
at
for Test case III and
axis represents gate opening (m).
Sketch of the channel and the location of the probe , the lateral gate
, and the two cross gates
. The lower panel includes objective water depth
at
for Test case III and
axis represents gate opening (m).
A forward simulation using was performed previously and the temporal evolution at
is used as the target water depth.
Test case III: Evolution of the water depth at through the calibration process for MC (left) and GD (right) methods.
Test case III: Evolution of the water depth at through the calibration process for MC (left) and GD (right) methods.
The result of the calibration using the stochastic methodology provides bounds on the optimal solution with the best option within the interval of
with an error of
. The GD calibration calculates
with
. The number of steps for non-gradient optimizations required in the process includes 64 forward simulations, while in the case of the gradient-based simulation only seven are required, and seven more for the adjoint system integration. It represents an acceleration of
of the GD method against the MC method.
This case has detected one of the most relevant weaknesses of the MC method: it may not converge to the optimal solution. As in the previous case, the GD method converges to an optimal value of .



Test case III: Evolution of J with respect to for each method. Dots represent the different MC evaluations.
Test case III: Evolution of J with respect to for each method. Dots represent the different MC evaluations.

Convergence of the error and for Test case III using MC (left) and GD (right).
These results highlight a new possibility, where a combination of these methods can obtain the solution more quickly. The MC method can be used to generate the initial guess for the GD method, which then can be used to descend more rapidly to the optimal solution.
CONCLUSIONS
In this work the comparison of a MC-based optimization and an adjoint method-based optimization method to calibrate discharge coefficients of lateral gates on 1D channels is presented. The techniques have been checked using three test cases.
The adjoint technique has the drawback of the non-generalized way of calculating sensitivities. However, when developing it for a particular PDE system, it offers a powerful procedure for performing optimal control. Being a gradient-based method, it requires the evaluation of the error gradient. The present work has been based on a simple semi-fixed, step-length algorithm which, in some cases, may vary the performance depending on the step-length used. Other gradient-based methods such as BFGS or L-BFGS may be used with the previous information, including an optimal step-length that may improve the convergence rate of the method.
On the contrary, MC methods are capable of obtaining good results with a simpler implementation for this type of problem. Previous information, other than the range where the optimal solution may be found, is not required. This is an advantage when calibration is related to an ill-posed problem or a complex functional, but not for such cases as those presented in this work.
The MC method provides two advantages: independence from the model to be calibrated and the intrinsic capability of working in parallel. On the other hand, the GD method shows a greater convergence rate, but it is sensitive to the initial guess used. Results suggest that a combination of these methods, the MC method for choosing the initial guess and then the GD method for descending in the correct direction, may generate an effective optimization method. The main disadvantage of the GD method based on the adjoint formulation is that it must be carefully reformulated in each different application.
Future work may include the advantage of MC in choosing an appropriate initial guess for use in the gradient-based optimizer. Moreover, the same technique may be applied to the 2D system of equations. However, limitations on the time–space dimension of the problem may be found due to the memory storage requirements of the adjoint method. Nevertheless, more sophisticated objective functions and optimization applications can be addressed in the two-dimensional framework, where calibration is a very common task in practical applications. It is also feasible to develop a very similar technique for related uses, such as the calibration of the friction coefficient used in Manning's formula following the steps presented in the first part of the work and then comparing both methods for this application.
ACKNOWLEDGEMENTS
This research was partially funded by the Spanish MINECO/FEDER through the Research Project CGL2015-66114-R and by Diputación General de Aragón, DGA, through FEDER funds. The first author was also supported by the Spanish Ministry of Economy and Competitiveness fellowship BES-2012-053691.