## Abstract

Water allocation in agricultural lands, optimal design of hydraulic structures and climatic phenomena are the events in water management science that face hydrological uncertainties. The purpose of this study is to estimate the characteristics of surface runoff based on probabilistic and fuzzy analysis. Separation and generation of basic hydrological information, probabilistic modeling, fuzzy analysis, and optimization to achieve the solution were the main steps of the decision-making problem. Long-term hydrological data of the study area were collected, analyzed and used as a basis for the simulation model. In this study, a copula-based stochastic method was developed to deal with uncertainties related to rainfall and runoff characteristics as well as to address the nonlinear dependence between multiple random variables. The relationship between rainfall variables and flood characteristics was formulated through fuzzy set theory. The feasible domain of the fuzzy problem was searched using the non-dominated sorting genetic algorithm to find the optimal extreme points. The obtained solutions were used as a fuzzy response to calculate the flood of the Baghmalek plain in Khuzestan province in southwestern Iran. The results showed that the maximum model error occurred in predicting rainfall depth and flood volume, and the maximum rainfall rate and runoff flow could be calculated more accurately. Moreover, the developed fuzzy-probabilistic model was able to predict more than 90% of flood events within the defined fuzzy range.

## HIGHLIGHTS

A fuzzy relationship has been developed between rainfall and runoff.

The proposed method is applicable to basins without hydrology stations.

NSGAII was incorporated in a new framework to find the extreme points.

Probabilistic-fuzzy model was considered as a suitable structure for estimating surface flows.

## INTRODUCTION

Uncertainty analysis is one of the main components in the decision-making systems of the different engineering sciences (Alam *et al.* 2020a, 2020b, 2020c, Yang *et al.* 2020a; Qayyum *et al.* 2020; Abedini *et al.* 2020; Jalali Sarvestani and Charehjou 2021; Abedini & Zhang 2021; Ouyang *et al.* 2021; Chen *et al.* 2021; Sabernezhad 2021; Rivera-Diaz *et al.* 2021). In hydrological problems, probabilistic planning and fuzzy programming are included as two methods of uncertainty analysis (Wang *et al.* 2018, 2019). Probabilistic flood estimation as an uncertainty analysis method has been used in urban, environmental landscape and dam reservoir design based on the frequency analysis of a variable in a predetermined return period (Chao *et al.* 2018; Zhang *et al.* 2019). Hydrological uncertainties in decision-making systems are evaluated as a sustainability component to increase managerial reliability. Uncertainty in the hydrological phenomena should be incorporated when there are no specific values or complete knowledge of the decision variables of a problem (Haghighi & Zahedi-Asl 2014). In recent years, uncertainty analysis of a hydrological process has been considered in different fields of water resources management in hydraulic structure design (Rahimi *et al.* 2020), developing groundwater strategies (Lalehzari & Kerachian 2020a), agricultural water planning (Lalehzari *et al.* 2020), and flood routing (Rahimi *et al.* 2020).

Frequency distribution functions and *α*-cut decomposition are two types of technique to evaluate the rate of uncertainties in hydrological problems (De Michele & Salvadori 2003; Ahmadisharaf & Kalyanapu 2019). These techniques have been defined in the probabilistic and fuzzy frameworks, respectively. Hydrological uncertainties can be analyzed using univariate, bivariate, or multivariate distribution functions and numerous studies have been carried out based on this concept (Ahmad & Simonovic 2011; Chen & Guo 2019; Rahimi *et al.* 2020). Probabilistic modeling has been evaluated to take into account rainfall uncertainty in the design of urban runoff control systems (Chen *et al.* 2021), groundwater resources and water exploitation (Lalehzari & Kerachian 2020b), estimating the soil moisture balance in root zone under drought condition (Huang *et al.* 2021), and energy systems (Ma *et al.* 2021).

Rainfall variables that reflect climatic characteristics are critical to runoff monitoring and management in agriculture (Yang *et al.* 2015, 2020b; Zhang *et al.* 2020c; Xu *et al.* 2021; Li *et al.* 2021). The main goal of this study was to develop a prediction model to provide a fuzzy-probabilistic framework for estimating the flood amounts based on the long-term rainfall in Baghmalek Plain, Khuzestan province, Iran. Rainfall characteristics including depth and rate, and flood variables of volume and maximum flow rate, have been evaluated to generate the time series of information.

## METHODOLOGY

Figure 1 shows the step-by-step calculation process to solve the problem. According to the figure, after analyzing the hydrological information, a bivariate probability model has been developed to predict the return periods of rainfall and runoff. The structure required to estimate the runoff fuzzy values was then programmed using multi-objective optimization.

### Probabilistic modeling

There are many probability distributions in hydrology that are fitted with different conditions and data. The Probability Density Function (PDF) and Cumulative Distribution Function (CDF) are used for probability distributions to predict random hydrological events. Univariate distribution functions can be used as marginal functions to obtain the probabilistic estimation of a hydrological variable. Some methods of fitting the best marginal distribution are Chi-square, Kolmogorov-Smirnov, and Anderson-Darling.

Overall, the steps required to determine, select, and apply a probability distribution are: considering the characteristics and constraints of the random variable; evaluating the appropriate mathematical patterns (probability density function); performing a process to fit all distributions on sample data for estimating the coefficients; goodness-of-fit test to evaluate the suitability of patterns; selecting the most appropriate distribution by considering the results of the goodness-of-fit test; incorporating the model in frequency analysis and estimating return periods.

#### Chi-square test

^{2}statistic, then the critical value with the desired error level and degree of freedom n-k-1 (

*n*= number of layers and k = number of variables) are obtained. The general Chi-square formula is presented as follows:where,

*f*

_{o}is the observed frequency and

*f*

_{e}is the expected frequency. The chi-square function is obtained from the normal distribution. If the variable

*x*belongs to the normal population with mean

*μ*and standard deviation

*σ*and a random sample is selected from this population, the following distribution with degree of freedom

*n*will be obtained.

#### Kolmogorov-Smirnov test

*D*).where

*N*is the number of observations;

*C*is the hypothesized model that needs to be tested, and

*x*is observation in increasing order.

_{i}#### Anderson-Darling test

*C*

_{i}*=*

*F*(

*x*) and

_{i}, α*a*is the shape parameter. Computing statistics

*D*and

*A*by Kolmogorov-Smirnov and Anderson-Darling methods, and comparing them with corresponding critical values at a certain significance level, if the statistics are less than the corresponding critical values, our hypothesis can be accepted.

^{2}#### Multivariate frequency function

*x*

_{1}, …,

*x*

_{n}follow an arbitrary marginal distribution function

*F*

_{1}(

*x*

_{1}), …,

*F*

_{n}(

*x*

_{n}), respectively, there then exists a copula,

*C*(), that combines these marginal distribution functions to give the joint distribution function,

*F*(

*x*

_{1}, …,

*x*

_{n}) as follows (Sklar 1959):where,

*u*

_{i}*=*

*F*(

_{i}*x*) are the functions of the cumulative distribution of continuous margins. The copula consists of various functions that must be evaluated and selected. Archimedean joint functions are part of these functions that have been used in a wide range of hydrological studies. The application of the copula in various water engineering issues is described by Chen & Guo (2019).

_{i}To identify and select the most appropriate copula function, the correlations of the variables must be determined by *τ* Kendall and *ρ* Spearman methods. In the next step, the parameter *θ* for the copula family c(u,v) was calculated based on the correlation coefficients.

*θ*is a parameter of a data set (

*θ*≥ 1), and is calculated based on a sample of data. The value of could be estimated as follows:

### Fuzzy analysis

#### Fuzzy structure

*a*= 0,

*a*= 0.25,

*a*= 0.5,

*a*= 0.75, and

*a*= 1 and the fuzzy responses were obtained based on triangular functions. In each

*a-cut*for fuzzy analysis becomes an optimization problem to find the extreme points of the solutions as follows:

In fuzzy system optimization, the boundary responses are only required for determining the extreme points of the fuzzy domain. Figure 2 shows the framework of the fuzzy modeling using *α*-cut decomposition. This structure was summarized in the five levels of the fuzzy numbers to generate the flood hydrographs.

#### Multiobjective optimization

Simulation models and intelligent algorithms were incorporated into the hydrological, environmental and agricultural problems to develop the sustainable framework (Li *et al.* 2017, 2019, 2020; Awan *et al.* 2020; Bafkar 2020; Ebadi *et al.* 2020; Maina *et al.* 2020; Medvedeva *et al.* 2020; Nnaemeka 2020; Nwankwo *et al.* 2020; Simos & Tsitouras 2020; Mohammadi 2021; Sepahvand *et al.* 2021). Optimization techniques are the second step after the simulation models in this structure. Originally, the non-dominated sorting concept was defined to describe the economic efficiency of individuals in a society and its application in other sciences has expanded over time (Yang *et al.* 2008; Sreekanth & Datta 2010; Lalehzari *et al.* 2020). Accordingly, the high-income limit in a society is a certain criterion that is the result of the total assets of each individual. Therefore, an increase in an individual's income occurs only if the income of other members of society is reduced. In this concept, the desired goal of each member to achieve more benefit is in conflict with the goals of other members. Therefore, if it is possible to increase an individual's income without reducing the income of others, there is a Pareto improvement. It is necessary to determine a set of points to solve the problem, which is in a predefined optimal direction. The main concept in interpreting an optimal point is Pareto optimality.

*et al.*2016). Various methods are usually used to generate the optimal solution, most commonly via the application of constraints to define a feasible domain. Moreover, the optimal concept in problems with more than one objective function means finding solutions that satisfy each of the objectives to an acceptable level. Therefore, the decision-maker chooses one of the solutions according to the managerial strategy. The mathematical form of these concepts can be written as Equations (12) and (13):Subject to:where

*F*is the rainfall-runoff function and

*C*is the constraint of o = 1, 2 intended for the optimization process. Figure 2 shows an optimal front of the solution sorted by non-dominated sorting. This mechanism requires optimization tools to find new and more suitable solutions (Lalehzari 2017). For this purpose, the genetic algorithm method could be used to run the model.

_{o}*CDP*).where are the maximum and minimum values of the objective function o in the population, respectively, and O is the value of the objective function o in the solution i.

## DATA USED

The developed model was evaluated based on the long-term hydrological information (1973–2020) in Baghmalek plain. The study area is located in the eastern north of Khuzestan province has an area of approximately 62.4 km^{2} with geographical coordinates between 49° 39′ to 50° 11′ north longitudes and 31° 22′ to 31° 42′ east latitudes. The mean elevation of the plain is 743.5 m above sea level. Two rainfall characteristics used in this study for 42 hydrographs are illustrated in Figure 4. Furthermore, the rainfall duration for each event is shown in the outer circle of the figure. Moreover, the related flood events were considered to determine the rainfall and flood relationships. Table 1 summarizes the flood peak and volume.

Event no. . | Peak . | Volume . | Event no. . | Peak . | Volume . | Event no. . | Peak . | Volume . |
---|---|---|---|---|---|---|---|---|

m^{3}/s
. | MCM . | m^{3}/s
. | MCM . | m^{3}/s
. | MCM . | |||

1 | 310.1 | 44.5 | 15 | 580.9 | 86.4 | 29 | 605.3 | 64.6 |

2 | 510.5 | 135.3 | 16 | 721.2 | 137.1 | 30 | 406.0 | 151.2 |

3 | 212.9 | 59.2 | 17 | 641.6 | 117.3 | 31 | 1,441.0 | 259.3 |

4 | 660.8 | 230.2 | 18 | 757.2 | 130.5 | 32 | 750.0 | 113.7 |

5 | 276.0 | 59.7 | 19 | 245.4 | 50.3 | 33 | 469.2 | 78.2 |

6 | 742.0 | 419.0 | 20 | 1,338.5 | 292.8 | 34 | 1,013.0 | 195.9 |

7 | 95.7 | 18.7 | 21 | 969.0 | 157.5 | 35 | 542.0 | 106.7 |

8 | 531.8 | 146.3 | 22 | 1,022.8 | 237.1 | 36 | 651.0 | 124.5 |

9 | 645.6 | 95.9 | 23 | 879.8 | 169.9 | 37 | 658.0 | 263.7 |

10 | 234.0 | 53.3 | 24 | 856.8 | 231.0 | 38 | 784 | 117.6 |

11 | 300.4 | 86.4 | 25 | 920.6 | 135.0 | 39 | 376.2 | 64.1 |

12 | 269.0 | 74.7 | 26 | 361.3 | 63.1 | 40 | 266.31 | 73.99 |

13 | 322.8 | 77.3 | 27 | 323.3 | 69.6 | 41 | 654.19 | 227.85 |

14 | 412.7 | 64.5 | 28 | 802.2 | 143.9 | 42 | 959.31 | 155.89 |

Event no. . | Peak . | Volume . | Event no. . | Peak . | Volume . | Event no. . | Peak . | Volume . |
---|---|---|---|---|---|---|---|---|

m^{3}/s
. | MCM . | m^{3}/s
. | MCM . | m^{3}/s
. | MCM . | |||

1 | 310.1 | 44.5 | 15 | 580.9 | 86.4 | 29 | 605.3 | 64.6 |

2 | 510.5 | 135.3 | 16 | 721.2 | 137.1 | 30 | 406.0 | 151.2 |

3 | 212.9 | 59.2 | 17 | 641.6 | 117.3 | 31 | 1,441.0 | 259.3 |

4 | 660.8 | 230.2 | 18 | 757.2 | 130.5 | 32 | 750.0 | 113.7 |

5 | 276.0 | 59.7 | 19 | 245.4 | 50.3 | 33 | 469.2 | 78.2 |

6 | 742.0 | 419.0 | 20 | 1,338.5 | 292.8 | 34 | 1,013.0 | 195.9 |

7 | 95.7 | 18.7 | 21 | 969.0 | 157.5 | 35 | 542.0 | 106.7 |

8 | 531.8 | 146.3 | 22 | 1,022.8 | 237.1 | 36 | 651.0 | 124.5 |

9 | 645.6 | 95.9 | 23 | 879.8 | 169.9 | 37 | 658.0 | 263.7 |

10 | 234.0 | 53.3 | 24 | 856.8 | 231.0 | 38 | 784 | 117.6 |

11 | 300.4 | 86.4 | 25 | 920.6 | 135.0 | 39 | 376.2 | 64.1 |

12 | 269.0 | 74.7 | 26 | 361.3 | 63.1 | 40 | 266.31 | 73.99 |

13 | 322.8 | 77.3 | 27 | 323.3 | 69.6 | 41 | 654.19 | 227.85 |

14 | 412.7 | 64.5 | 28 | 802.2 | 143.9 | 42 | 959.31 | 155.89 |

## RESULTS AND DISCUSSION

The probabilistic framework used in this study involves the flood and precipitation information provided in daily time steps. The recorded data of 42 rainfall hydrographs (Table 1) were analyzed to formulate the flood volume and peak flow. Hydrological characteristics including peak rainfall (mm/day), cumulative rainfall depth of each hydrograph (mm), flood peak (m^{3}/s) and flood volume (million cubic meters (MCM)) were considered from daily precipitation and flood information. The relationship between rainfall and flood characteristics based on hydrograph information is shown in Figure 5.

The marginal frequency functions implemented in the probabilistic analysis of the rainfall peak and depth and peak flow and volume of the flood are summarized in Table 2. The results obtained by Chi-Squared, Anderson-Darling and Kolmogorov-Smirnov tests indicated that the generalized extreme values, Log Pearson, generalized extreme values, and Log-Normal functions were the best options for estimating the maximum peak (PR) and depth of rainfall (RD) and peak flow (PF) and volume of the flood (FV), respectively. The graphical illustration of the fitted curves for the four marginal distribution functions is shown in Figure 6.

Distribution functions . | Kolmogorov-Smirnov . | Anderson-Darling . | Chi-Squared . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

PR . | RD . | PF . | FV . | PR . | RD . | PF . | FV . | PR . | RD . | PF . | FV . | |

Gamma | 0.094 | 0.078 | 0.07 | 0.09 | 0.27 | 0.30 | 0.18 | 0.188 | 0.73 | 1.68 | 0.575 | 0.73 |

Gen. Extreme Value | 0.059 | 0.076 | 0.06 | 0.09 | 0.23 | 0.27 | 0.16 | 0.199 | 0.27 | 1.89 | 1.926 | 0.75 |

Gen. Gamma | 0.076 | 0.077 | 0.09 | 0.08 | 0.26 | 0.27 | 0.24 | 0.188 | 0.37 | 1.70 | 0.510 | 0.73 |

Inv. Gaussian | 0.095 | 0.074 | 0.07 | 0.09 | 0.29 | 0.25 | 0.16 | 0.193 | 0.67 | 1.71 | 0.596 | 0.72 |

Log-Logistic | 0.107 | 0.089 | 0.07 | 0.10 | 0.37 | 0.38 | 0.17 | 0.259 | 4.25 | 1.07 | 2.108 | 1.09 |

Log-Pearson | 0.081 | 0.069 | 0.07 | 0.09 | 0.24 | 0.21 | 0.17 | 0.186 | 0.36 | 0.61 | 0.598 | 0.72 |

Lognormal | 0.097 | 0.085 | 0.06 | 0.09 | 0.3 | 0.25 | 0.16 | 0.200 | 4.25 | 2.26 | 1.947 | 0.72 |

Normal | 0.090 | 0.076 | 0.11 | 0.10 | 0.44 | 0.26 | 0.56 | 0.698 | 2.38 | 2.03 | 3.907 | 1.75 |

Weibull | 0.082 | 0.097 | 0.08 | 0.08 | 0.24 | 0.52 | 0.2 | 0.218 | 0.38 | 1.90 | 0.530 | 1.84 |

Distribution functions . | Kolmogorov-Smirnov . | Anderson-Darling . | Chi-Squared . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

PR . | RD . | PF . | FV . | PR . | RD . | PF . | FV . | PR . | RD . | PF . | FV . | |

Gamma | 0.094 | 0.078 | 0.07 | 0.09 | 0.27 | 0.30 | 0.18 | 0.188 | 0.73 | 1.68 | 0.575 | 0.73 |

Gen. Extreme Value | 0.059 | 0.076 | 0.06 | 0.09 | 0.23 | 0.27 | 0.16 | 0.199 | 0.27 | 1.89 | 1.926 | 0.75 |

Gen. Gamma | 0.076 | 0.077 | 0.09 | 0.08 | 0.26 | 0.27 | 0.24 | 0.188 | 0.37 | 1.70 | 0.510 | 0.73 |

Inv. Gaussian | 0.095 | 0.074 | 0.07 | 0.09 | 0.29 | 0.25 | 0.16 | 0.193 | 0.67 | 1.71 | 0.596 | 0.72 |

Log-Logistic | 0.107 | 0.089 | 0.07 | 0.10 | 0.37 | 0.38 | 0.17 | 0.259 | 4.25 | 1.07 | 2.108 | 1.09 |

Log-Pearson | 0.081 | 0.069 | 0.07 | 0.09 | 0.24 | 0.21 | 0.17 | 0.186 | 0.36 | 0.61 | 0.598 | 0.72 |

Lognormal | 0.097 | 0.085 | 0.06 | 0.09 | 0.3 | 0.25 | 0.16 | 0.200 | 4.25 | 2.26 | 1.947 | 0.72 |

Normal | 0.090 | 0.076 | 0.11 | 0.10 | 0.44 | 0.26 | 0.56 | 0.698 | 2.38 | 2.03 | 3.907 | 1.75 |

Weibull | 0.082 | 0.097 | 0.08 | 0.08 | 0.24 | 0.52 | 0.2 | 0.218 | 0.38 | 1.90 | 0.530 | 1.84 |

PR, peak rainfall; RD, rainfall depth; PF, peak flood; FV, flood volume.

To determine the correlation between precipitation and flood characteristics, *τ* Kendall, *ρ* Spearman and *ρ* Pearson were used. The obtained correlation coefficients are summarized in Table 3. The correlation results according to the table showed the degree of dependence between random variables, which is commonly used in cases related to detailed functions. In the next step, the maximum likelihood estimator was used to compare the copula functions, as shown in Figure 7. The joint function with the smaller MLE value was considered as the superior joint function. The results showed that the Archimedean Gumbel was better than others according to the figure.

Correlation coefficients . | Index . | Rainfall . | Flood . | Peak values of rainfall and flood . |
---|---|---|---|---|

τ Kendall | 0.53 | 0.44 | 0.68 | |

ρ Spearman | 0.63 | 0.56 | 0.81 | |

ρ Pearson | 0.74 | 0.61 | 0.87 |

Correlation coefficients . | Index . | Rainfall . | Flood . | Peak values of rainfall and flood . |
---|---|---|---|---|

τ Kendall | 0.53 | 0.44 | 0.68 | |

ρ Spearman | 0.63 | 0.56 | 0.81 | |

ρ Pearson | 0.74 | 0.61 | 0.87 |

*d*) and rate (

*r*) and is the joint cumulative distribution function of random variables of

*d*and

*r*. is equal to 1 for the annual data set. The values predicted by the Archimedean Gumbel function for rainfall variables are illustrated in Figure 8. The effect of joint functions showed that the values shown in the figure are greater than their equivalent values in the marginal return period. The maximum depth and rate of the predicted rainfall with a return period of 500 years in the study area were predicted to be 398 and 385 mm/day, respectively. The previous studies showed that estimation of return period using multivariate distribution functions could be effective in evaluating flood-related projects (Maurer

*et al.*2018; Gao

*et al.*2019; Salas & Obeysekera 2019).

NSGAII (Deb *et al.* 2002) is used to identify the extreme points that are the two boundary points of the Pareto front and introduce them as fuzzy responses to the problem. The developed fuzzy triangular system for rainfall variables led to the probabilistic-fuzzy estimation of flood variables. The NSGAII optimization model searches for rainfall variables for each fuzzy space return period and calculates the extreme points of different *a*-cut levels. The fuzzy responses of the flood characteristics based on the different return periods of rainfall such as 5-year (a), 25-year (b), and 100-year (c) are illustrated in Figure 9. According to the developed concept, the predicted flood volume for the study area was estimated to be between 634 and 825 MCM for a 100-year return period (Figure 9(c)). This return period was recommended for reservoir dam design (Zhang *et al.* 2020a, 2020b). Consequently, in the 25-year return period, which is an appropriate time scale for flood diversion systems of water storage reservoirs, a flood volume of 125–164 MCM with a maximum flow of 865 m^{3}/s has been obtained. This flood can be caused by a rainfall of 82 mm/day or a depth of 142 mm.

## CONCLUSION

A framework was developed to estimate the flood hydrographs in different return periods of rainfall. Fuzzy set theory and bivariate frequency distribution function were incorporated to develop the decision-making system. Probabilistic analysis of rainfall characteristics showed that the generalized extreme values and Log Pearson were fitted for the peak point and depth of rainfall, respectively. The correlation coefficient between rainfall characteristics was calculated to be 0.68. Moreover, the Gumbel Archimedean copula function was obtained as a probabilistic model for estimating return periods using MLE criteria. The calculated probabilistic values were converted to flood values using an uncertain relationship based on *a*-cut decomposition. The triangular fuzzy function was evaluated as a suitable option for this prediction. The results showed that the developed model can predict the characteristics of flood hydrographs especially in a return period of fewer than 100 years. Future research could consider other rainfall and flood characteristics to reduce the proposed evaluation criteria.

## ACKNOWLEDGEMENTS

The Innovation Team Planning Project of Hubei (T201314).

## DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.