Reverse flood routing in an open channel using a combined model of genetic algorithm and a numerical model


 Flood impact assessment in a river system is done with the help of flood routing and this process helps to determine the status of sensitive points of the route in terms of flood entry and the resulting risks for urban and rural areas. For flood routing, a hydrodynamic numerical model should be implemented and this model needs upstream and downstream boundaries. In some cases, the upstream boundary, which is usually a hydrograph, is not available due to the lack of facilities and it is necessary to be generated for numerical model implementation. The purpose of this study is to present an integrated method comprising an optimization model and a hydrodynamic numerical model for flood modeling in order to determine the upstream hydrograph using the measured downstream hydrograph along a river. The routing procedure consists of three steps: (1) generating a hypothetical upstream hydrograph using the genetic algorithm method; (2) hydrodynamic modeling using a numerical simulation model for flood routing according to the hypothetical hydrograph, which is generated in the first step; (3) comparing the calculated and observed hydrograph in the downstream by using a fitness function. This recommended procedure was named the Reverse Flood Routing Method (RFRM) and was then applied to Karun River, the largest river in Iran. Comparison of the final generated upstream hydrograph by the RFRM model with the corresponding measured hydrograph at the upstream boundary (here Ahvaz hydrometric station was assumed as an ungauged river location) shows the high accuracy of the recommended model in this study.


INTRODUCTION
Since the earliest recorded civilization, humans have settled near rivers because of the proximity to water supplies, natural needs for urban and rural populations, and advantages for agriculture activities, resulting in a significant percentage of people around the world living near rivers (Baldassarre et al. 2013). But living along rivers has dangers like floods, which are a natural phenomenon and whose occurrence is inevitable in many places and times and sometimes no human-made structures can stop them. However, there are some methods for mitigating the risk of their happening and the consequences. Many researchers have studied the causes and presented strategies for reducing the destructive effects of the flood. Flood control and management and also the design of suitable flood-resistance structures and specifying flood catchment areas on both sides of a river are dependent on the flood hydrograph, its peak discharge and base flow time. The continuity equation in steady form (i.e. Q ¼ AV) shows the relationship between discharge and the cross-section area. Therefore, in some places along a river it can be possible to relate the flow depth to discharge (Rating Curve) and then the recorded flow depth-time can be easily converted to discharge time or hydrograph. These equipped cross-sections are generally called hydrometric stations. In these types of stations, water elevations are continuously recorded and are converted to discharge using a pre-prepared accurate rating curve or stage-discharge relationship. Therefore, flood hydrographs are provided at these stations (Perumal et al. 2007;D'Oria & Tanda 2012). Construction, equipment, and maintenance of hydrometric stations require high costs for governments or water agencies, especially for natural channels with complex and challenging conditions. However, a compelling method that is currently under high development is the estimation of the river discharge through image velocimetry (e.g., Dal Sasso et al. 2020; Rozos et al. 2020). This method also needs sophisticated equipment and its uncertainty and accuracy due to some environmental factors such as turbidity and suspended sediments may make many problems for discharge measuring. For this reason, the number of such stations along a river or canal may be limited and is mostly dependent on the importance of the river area and also on the current and future development projects. Therefore, providing hydrograph at any point of a waterway or river needs to set up a hydrodynamic numerical model and run it for the desired reach of channel and time period. These types of models need all information like stage-discharge at upstream and downstream boundaries of the chosen distance during the selected period. To mitigate the flood risk, the knowledge of discharge at gauging stations and upstream or downstream of the section is essential (Zucco et al. 2015). Accurate information about flood characteristics is one of the critical issues for managing its disaster (Soleimani-Alyar et al. 2016). There are several hydraulic and hydrologic mathematical methods used for flood modeling in rivers (Pashazadeh & Javan 2020). A hydraulic model consists of the conservation and energy or momentum equations that use geometric and hydraulic data for flood routing. But a hydrologic model relies on limited parameters coupled with linear or nonlinear equations based on the continuity equation for flood routing, which is simple in comparison to a hydraulic model (Bozorg-Hadad et al. 2018). Flow hydrographs are one of the fundamental elements for flood modeling which is recorded as time series at hydrometric or gauged stations. The information of the peak discharge of a hydrograph is also essential for flood-risk assessment and management of water resource systems (Todaro et al. 2019). However, due to the lack of hydrometric and/or gauged stations, these hydrographs may not be available everywhere (Kaya et al. 2017;Jeslin & Sumam 2021). The Muskingum method is one of the hydrological approaches that has been used for many years due to its simplicity and acceptable accuracy (Norouzi & Bazargan 2020). The usual downstream routing of flood flow; that is, using the existing upstream hydrograph, is a stable process for the gradually varied subcritical flow in rivers, with the results being reliable. In contrast, reverse flood routing determines the upstream hydrograph based on the river geometry, flow hydraulic characteristics, and downstream hydrograph (Bruen & Dooge 2007). A part of performing reverse routing is to solve the 1-D Saint-Venant equations. Solving these equations requires initial and boundary conditions, and the discharge hydrograph at the ends of the river reach (Spada et al. 2017). Zucco et al. (2015) applied a method to generate the upstream hydrograph using the downstream hydrograph based on the Person Type-III distribution and only the continuity equation, and they applied their method for the Tiber River in Italy. The same method was used by Jeslin & Sumam (2021) for Chalakkudy River in India.
Hydrodynamic partial differential equations including continuity and momentum equations are generally solved using the 1-D numerical models (in rivers) in order to specify water elevations and discharges at any point and time during floods along the open channel. Since these models are deterministic, they must be solved within a specified domain with boundaries. In many rivers, the upstream boundary may not be provided due to some difficulties such as lack of facilities or being impassable. In this study, a procedure is deployed to develop an integrated model comprising a hydrodynamic mathematical model, which solves the 1-D Saint-Venant equations, and an optimization model accompanied by a genetic algorithm method. This combined program can generate the upstream missed hydrograph using the existing downstream hydrograph.

Governing 1D hydrodynamic equations
The basic formulation of unsteady one-dimensional flow in open channels defined as St. Venant equations. These equations comprise continuity and momentum equations, which are written in the following form, respectively (Cunge et al. 1980;Mohammadi & Kashefipour 2014: where Q ¼ discharge, A ¼ cross section area, x ¼ distance in flow direction, g ¼ gravitational acceleration, Z ¼ water surface elevation, R ¼ hydraulic radius, n ¼ Manning's roughness coefficient, q L ¼ lateral inflow or out flow (negative for out flow), T W ¼ top width of channel, t ¼ time and β ¼ momentum correction coefficient. These partial differential equations are generally solved using different types of numerical methods. For example, in the FASTER (Flow And Solute Transport in Estuaries and Rivers) model, these equations are numerically solved using an implicit staggered central finite difference scheme with variable grid size, which is accurate and unconditionally stable. More details regarding this type of numerical solution of the Saint-Venant equations can be found in Kashefipour (2002). In the MIKE11 commercial model, which was developed and released by DHI (Danish Hydraulic Institute), these equations are solved based on an implicit finite difference scheme developed by Abbott and lonescu (DHI Reference Manual 2017).

Introduction to optimization
The optimization process can be used in all sciences such as math, computer engineering, electronic, water engineering and so on in order to minimize an error, effort or energy, or to maximize profit or outcomes (Kramer 2017). Optimization is a technical and iterative procedure for making something better. In any process, there is a set of inputs and a set of outputs, as shown in Figure 1.
Optimization refers to finding the values of inputs in such a way as to get the 'best' output values. The definition of 'best' varies from one problem to another, but in mathematical terms, it refers to maximizing or minimizing one or more objective functions by changing the input parameters. The set of all solutions or values, which the inputs can take, make up the search space. In this search space, a set of points may exist that offers the optimal solution. The optimization's aim is to find that point or set of points in the search space.

Genetic Algorithm
Genetic Algorithm (GA), like many other optimization methods, is an iterative model and uses some key parameters whose values control the way to reach the best answer and these values are usually determined using trial and error. The GA model is generally used for a wide range of engineering subjects for optimization process. The GA model guide search strategies inspired by a biological process, derived from Darwin's principal of natural selection and survival of fittest (Sivapragasam et al. 2008). This method is frequently used to solve optimization problems, research and in machine learning. The general framework of most evolutionary algorithms is similar, that means they start with a population of random solutions which are evaluated by a fitness function. The most popular components of this algorithm are selection, crossover and mutation (Mirjalili et al. 2020). Crossover is an operator that allows exchanging genes between parents to produce a new solution. There are three main crossover methods in the literature, single-point crossover, double point crossover and multipoint crossover. For example, in single-point crossover, the chromosome of the two-parent solution changes before and after singlepoint. Crossover is the primary operator of the exploitation of GA. The mutation causes random changes in genes and is the primary mechanism of exploration in GA. There are many mutation techniques such as uniform, and non-uniform that can be used in a genetic algorithm. By using a selection operator, the best offspring solutions are selected to be parents in the new parental population to allow convergence towards optimal solutions. The selection process is based on the fitness value in a population. When we face a minimize situation, the low value is acceptable but in a maximize situation, the high value is preferred. There are many methods for simulating the selection. Roulette wheel selection is one way. More details about GA can be found in Haupt & Haupt (2004).

MATERIALS AND METHODS
Description of the procedure of developing the integrated model in this research is followed by using an application for Karun River, the largest river in Iran.

Case study description
Karun River is the largest and the only navigation river in the southwest of Iran. The basin area of this river is about 45,220 km 2 , with the 50-year average discharge being reported as about 600 m 3 /s. The selected reach length was about 50 km with 49 cross-sections ( Figure 2). This figure shows the study area domain; Ahvaz (cross-section 49 or C.S49) and Farsiat (cross-section 1 or C.S1) are the upstream and downstream boundaries, respectively. The measured hydrographs (Q-t chart) and time-series of water elevations (Z-t chart) were available at both boundaries for several periods; that is, the continuous measurements at both hydrometric stations. In this Corrected Proof study, the numerical FASTER model, which was initially developed by Kashefipour (2002), and genetic algorithm was used for simulation of the flow in Karun River and optimization, respectively. A piecewise equation was added to the FASTER model, enabling it to calculate the Manning's roughness coefficient at any spatial point and any time during simulation using flow discharge and depth. This dynamic equation makes the model very accurate for determining time-dependent discharge and depth at any point of the domain. This part of the FASTER model is fully described by Mohammadi & Kashefipour (2014). The FASTER model was set up for the aforementioned domain. All data needed for implementing the model for the years 2016-2017 and also the bed elevations for all cross-sections were provided by KWPA (Khuzestan Water and Power Authorities) company. Ahvaz and Farsiat hydrometric stations were defined as the upstream and downstream boundaries, respectively (see Figure 2). The Q-t data is usually defined for the upstream boundary and the Z-t data for the downstream boundary.

Calibration of FASTER model
Since the main part of the developed integrated model in this study is the flow simulation model, it was essential to ensure from the FASTER model accuracy in the prediction of discharges and water surface elevations. For this reason, the FASTER model was first calibrated using the existing data. The main part of all hydrodynamic numerical models like FASTER for precise predictions of water elevations and discharges is accurate estimation of roughness coefficients. It was found that the dynamic piecewise equations introduced for the Manning's roughness coefficient by Mohammadi & Kashefipour (2014) in the FASTER model needed to be slightly revised for this domain of Karun River. The original equations were replaced by the revised ones in the FASTER model. By these revised Manning's coefficients, we have validated the accuracy of the model predictions for the discharges and water surface elevations. These two equations are written as Equations (3) and (4). Comparison of these equations with the original ones in the FASTER model shows that for discharges of more than 1,100 m 3 /s, the roughness coefficients increased about 9%, and this value was about 27% for discharges of less than 1,100 m 3 /s. The coefficients 'a' and 'b' in Equations (3) and (4) were therefore specified as 1.09 and 1.27, respectively.  where y ave and u ave are the cross-sectional average of flow depth and velocity, respectively. In the hydrodynamic module of the FASTER model, the Saint-Venant equations including continuity and momentum equations are solved using an implicit staggered method with variable mesh size, which is unconditionally stable and accurate (Kashefipour 2002). This deterministic model is very suitable for modelling of flow and flood routing with minimal and acceptable uncertainty.

Optimization method
The aim of this study is to generate a suitable upstream hydrograph from the known downstream hydrograph. According to the GA procedure, an objective function was first defined (Equation (5)). This equation, which is written for the downstream discharges, must be minimum for the best answer of the upstream hydrograph.
where Q obs and Q cal are the measured and calculated discharge, respectively. The number of existing measured discharges for the downstream hydrograph was 10 and the corresponding discharges for the upstream hydrograph were unknown. The GA model works in several steps. Before starting step1, 10 discharge values (each one is a gen) are randomly selected from the range of (0.8Q-1.3Q), which makes a string (or chromosome). The number of population is selected (N pop will be constant for all steps up to reaching the best answer), which is the number of strings, say m.
Step 1: Each string is individually used as the upstream boundary and the hydrodynamic FASTER model is implemented and the downstream discharges are calculated.
Step 3: Evaluating of the fitness; the fitness for each string is calculated by and the probability of each string is also calculated by p j ¼ F j =T. The cumulative probabilities are calculated from the string or chromosome 1 to m (C 1 ¼ p 1 , C 2 ¼ p 1 þ p 2 , …, C m ¼ p 1 þ p 2 þ …þ p m ), it is obvious that all cumulative probability values are between (0 and 1). Now m (number of population or strings) numbers are randomly selected between (0,1), say R 1 , R 2 , . . . , R m . At this part of step 3, each new string (chromosome) is selected using the comparison of each R with two consecutive C; if R is smaller, the corresponding old string is selected. For example, for R 1 , we want to select the first new string, the value of R 1 is compared with C1-C2, C2-C3, C3-C4, …. Then if R 1 , C 3 the old string number 3 is selected for the new string number 1. This procedure is continued for R 1 , R 2 , . . . , R m . In this step, the poor fitness strings are omitted from the list and the strong fitness strings may be selected for two or more new strings.
Step 4: A probability (p c ) value is selected for crossover process. This value will be constant for all the process up to reaching the best answer. Now again m random numbers between (0,1) are selected (R 1 , R 2 , . . . , R m ), then all these numbers are compared with P c . All the strings (1 to m) corresponding to the R numbers smaller than P c are chosen for crossover. In this stage, according to the number of the selected strings (chromosomes), random integer numbers between (k ¼ 1-10; number of gens or discharges in each string) are selected. Then in the crossover process k gens of one string are replaced with the corresponding gens in another string (one-point crossover method). Finally, in this step m new strings are generated. Some of them remain unchanged because they did not participate in the crossover process.
Step 5: Mutation process; N mut is first chosen (this value remains constant up to reaching the best answer). This number shows the number of gens (discharges) that should be mutated. For all chromosomes (strings) the number of all gens is (L ¼ m Â 10), so each gen has a position between 1 and L. Now N mut integer numbers are randomly selected between1-L. The gens (discharges) corresponding to the selected number should be mutated. To do this, the GA model again selects a corresponding discharge between 0.8Q-1.3Q from the input downstream discharges data. At the end of this step, m new chromosomes or strings are produced (some of them remain unchanged). The steps 1 to 5 will be repeated several times to minimize Equation (5) and to get the best answer for the upstream hydrograph. The number of iterations depends on the desirable accuracy in prediction of the upstream hydrograph and also on the chosen parameters, N pop , P c , and N mut . For example, repeating steps 1 to 5 can be interrupted when the f function is less than a very small value (f ( e). The above described optimization model is written using the well-known MATLAB software.
Given that we face a minimization situation, the goal is to bring this function as close to zero as possible. By changing some of the parameters in the genetic algorithm, at last, seven different optimization models were written. These parameters were the number of members of the population (N pop ), percentage of crossover (P C ), number of mutations (N mut ) and number of iterations. These models were produced based on trial and error, previous experiences, and using two types of numbers, Integer (Models 1 to 3) and Real (Models 4 to 7) numbers, Table 1. The task of this part of the main program is to select the new set of discharges for the upstream boundary based on the defined models and the previous set of data and check for minimizing Equation (5) by comparison of the new set of calculated discharges for the downstream boundary with the corresponding measured values. The models defined in Table 1 were finally evaluated by the statistical parameter of Root Mean Square Error (RMSE), which is defined as:

Model development
The main model consists of three parts, including: 1-a numerical hydrodynamic model (here, the FASTER model), which solves the Saint Venant equations and determines water elevations and discharges at any point and time of simulation; 2-a genetic algorithm (GA) model, which produces the new set of discharges (hydrograph) for upstream boundary using the downstream discharges data and checking the fitness function (Equation (5)) for minimization, and 3-an interface for linking the above models and managing the number of iterations towards the best-optimized results. Ahvaz and Farsiat hydrometric stations were set as the upstream and downstream boundaries, respectively (see Figure 2). The Q-t data (hydrograph) and Z-t data for a ten-day period in 2016-2017 were defined for these boundaries, respectively. It is assumed that the upstream hydrograph does not exist, and the hydrodynamic model cannot be implemented without this hydrograph. Therefore, a hypothetical hydrograph is first produced for the upstream boundary by the GA part of the model based on the downstream hydrograph, as was mentioned above. The flow routing is then started along with this domain and continued up to the end of simulation time (i.e., ten days or 240 hours). At the end of the simulation time of the FASTER model, the downstream hydrograph is provided by the FASTER model. By the defined fitness function, the calculated and observed hydrographs in the downstream boundary are compared. At last, for the reasonable error, the generated hypothetical hydrograph is the final upstream hydrograph. Otherwise, the GA model produces another hypothetical hydrograph as was extensively described in section 3.3 and this cycle continues until the difference between calculated and observed hydrographs at the downstream boundary becomes close to zero. The interface part of the model transfers the information between the FASTER and GA models; that is, the upstream hydrograph from GA to FASTER and the downstream hydrograph from FASTER to GA. In this part of the model, a conditional statement is placed to control the number of iterations based on the calculated error. According to the number of iterations, say 10,000, and running FASTER and GA models, it is necessary to use a supercomputer to reduce the program running time. A flowchart of the model is shown in Figure 3.

RESULTS AND DISCUSSION
Optimization problems are generally solved using a series of repetitive mathematical calculations and, by defining one or more conditional statements, tries to get the best answers. For example, in this study, in each iteration, a GA model, a numerical hydrodynamic model, and an interface model are implemented, and running these models together with a usual computer takes a very long time, especially when the number of iterations is high. Also, it should be mentioned that according to the simulation time and the amount of the time step, the hydrodynamic model may take a considerable time to be completed in each iteration. Therefore, in this type of computer program, the time consumption of running a program is a key parameter. For minimizing the time-consumption of running a program, we used a supercomputer with a 64-core processor. However, selecting a suitable combination of the parameters and their values for the GA model (i.e., models 1-7, see Table 1) may significantly reduce the time for implementing the main model. In addition to program execution time, the performance and accuracy of the GA model, which produces the upstream hydrograph, is another key parameter. In this study, the number of iterations for all seven considered models in the GA model was the same and equal to 10,000. The most critical decision for implementing a genetic algorithm is to choose a good representation, Figure 3 | Flowchart of the model. otherwise leads to poor performance and answer. The results showed that the best combination of the GA parameters is model No.6 (see Table 1). Running the program revealed that after 3,030 iterations, the RMSE% of comparing the produced and real hydrographs for the downstream boundary was calculated to be less than 0.05, which indicated the high accuracy of the GA model. Furthermore, the implementation time of the program to reach the best answer is much shorter. The produced hydrographs for the upstream boundary by the different GA models are illustrated in Table 2, and the real hydrograph is also added to this table. The RMSE% based on

Corrected Proof
Equation (6) is also calculated and added to this table. As is shown in Table 2, model No.6 performed better than the other considered models. Figure 4(a)-4(g) compares the produced or estimated hydrographs by seven GA models with the measured or actual hydrograph at the upper boundary; that is, the Ahvaz hydrometric station in this study.
Knowing the river conditions in terms of different discharges is essential for many river engineering projects, such as numerical flow and water quality modeling in riverine basins, watershed management, designing hydraulic and water supply structures, and flood control and distribution plans. However, continuous measurement of discharges along a river network may require several hydrometric stations to be established, and this may be impossible due to budget shortage or severe conditions, or high instability of river sections. This research teaches us how to produce the missed or non-measured hydrographs for upstream from downstream or vice-versa. It should be said that the parameter values used in the GA model may not be applicable from one river to another. However, the most important and key achievement of this study is the instruction of the applied method and giving the hint for producing the upstream hydrograph using the downstream one. For many engineering processes in this subject such as water quality modelling (chemical, physical and biological) and sediment transport modelling within a domain, having an upstream hydrograph is inevitable, and sometimes providing it requires high cost.
Uncertainty is one of the most important and fundamental issues that greatly affects the accuracy of model predictions. Usually, models based on statistics such as hydrological models differ in accuracy and uncertainty depending on the time interval of forecast and the use of a number of parameters in the model. In this study, a deterministic hydrodynamic model was used and the only parameter that may affect its accuracy is the roughness coefficient. Accurate estimate of this coefficient can highly affect the model accuracy in prediction of water elevations and discharges at any point during flood (Pappenberger et al. 2005). The FASTER model uses a dynamic and variable roughness coefficient, so this makes it very accurate and reliable (Mohammadi & Kashefipour 2014).

CONCLUSION
In this study, an integrated model was developed comprising a GA model, a hydrodynamic numerical 1-D model, and an interface program. This mathematical model was written so it can be used for reverse flow routing to produce an upstream boundary hydrograph from the known downstream hydrograph. It was found that the GA parameters play an important role in producing precise hydrographs and best answers, but also reducing consuming computer running time. Here, we could generate the upstream hydrograph with a minimum RMSE error 0.04% for the GA parameters of N pop ¼ 10 (number of members of population), N mut ¼ 1 (number of mutations), and P c ¼ 0.9 (percentage of crossover) with minimum iteration.

DATA AVAILABILITY STATEMENT
All relevant data are included in the paper or its Supplementary Information.