## ABSTRACT

Copulas are a vital statistical tool, particularly in hydrology, for understanding complex relationships among flood characteristics. This study focuses on three key flood features: peak discharge, flood volume, and flood duration, using trivariate copulas to capture their interdependencies. This is crucial because bivariate and univariate analyses fall short in considering all three factors simultaneously. To handle extreme flood values, L-moment is proposed over maximum likelihood estimation and inference function margin due to its enhanced reliability and susceptibility to outliers and extreme values. Akaike information criterion was employed to identify the best-fit marginal distribution and copula. The Lognormal distribution effectively models peak discharge, while Weibull and generalized extreme value distributions fit flood volume and duration best, respectively. Various copula families, including elliptical and Archimedean, are assessed, where Clayton copula emerge as the most suitable. This analysis demonstrates that when more flood features are considered together, the return period increases, indicating the reduced likelihood of occurrence. The trivariate case of the AND-joint return period surpasses the trivariate case of the OR-joint return period where the T_{P, V, D}^{AND}=5, 405.93 years, while T_{P, V, D}^{OR}=500.46 years. This comprehensive approach enhances hydrological modeling and decision-making for water resource management and flood mitigation projects.

## HIGHLIGHTS

All key flood variables were considered and the statistical modeling can be applied to other hydrological stations.

This paper uses the L-moment parameter estimation method which is very suited to the extreme events data.

Calculation of the return period will aid policy makers in providing early warning.

## INTRODUCTION

Due to their destructive effects, floods are among the worst natural catastrophes all around the world, necessitating great attention and effort from researchers and practitioners. According to the Department of Irrigation and Drainage (2017), the flooding that affected several Malaysian states in late December and early January 2022 has resulted in overall losses of RM6.1 billion (USD 1.46 billion). Floods can lead to substantial harm to infrastructure, loss of life, and enduring adverse impacts on both the economy and the environment. Hence, it is essential to mitigate the flood risk by minimizing the risks associated with flood occurrences.

Flood occurrence is characterized by strongly correlated variables such as peak discharge, volume, and duration (Ganguli & Reddy 2012; Tosunoglu *et al.* 2020; Klaho *et al.* 2022). They stressed that the three key elements of a flood must be understood to conduct flood modeling using historical flood data in the area to accurately predict future floods. Their assertion is founded on the belief that each of these variables plays a crucial role in influencing flood risk, making their inclusion essential for obtaining a more thorough and holistic understanding of the potential impacts of flooding, providing valuable insights into the joint behavior and interdependencies among these critical flood variables. Experts have brought up the unreliability of univariate frequency analysis where only one variable is considered and the fact that flood variables are dependent on one another (Ganguli & Reddy 2012; Daneshkhah *et al.* 2016). They claimed that the approach is unable to adequately represent input hydrographs or lower the level of uncertainty in flood analysis. As flood variables are dependent on each other, the need to model them simultaneously has arised. Theoretically, it is very difficult to build joint distributions between flood variables using multivariate probability distribution functions. This is because conventional multivariate analysis frequently relies on one or more of the following principles where the variables need to follow the same marginal distributions, and the hydrological variables are interdependent (Song & Singh 2010). In other circumstances, the data are assumed to follow a normal distribution, which is not the case in real-world practice. The main challenge of classic multivariate modeling is that it relies on the assumption that every element in a flood can be adequately characterized by a single family of parametric univariate distributions. However, this is usually not the case since each flood component may differ from one another (Tosunoglu *et al.* 2020). Due to the drawbacks of the conventional multivariate functions, copula functions provide an alternative that caters to the previously mentioned shortcomings as well as greatly improves the prediction of flood occurrence.

Copula has drawn great attention in many fields as it does not suffer from previously mentioned drawbacks. It can capture the correlation between several variables in addition to maintaining their joint relationship since they are free to select the joint dependency structure that best suits their needs as well as the best-fitted univariate marginal distributions (Latif & Mustafa 2021). The flexibility they offer has attracted numerous researchers, leading to widespread adoption for various purposes such as to derive bivariate rainfall frequency distributions from the Amite River Basin in Louisiana, United States (Zhang & Singh 2007). It is also applied to estimate drought length, interval time, severity, and lowest Standard Precipitation Index (SPI) value in various drought circumstances (Chen *et al.* 2013), to model the interdependence between rainfall and temperature for agricultural purposes in Scania (Cong & Brady 2012). Furthermore, the copula is used to examine the effects of climate change using the Gumbel–Hougaard copula function for future bivariate flood peak and volume variables (Goodarzi *et al.* 2020), as well as to assess key financial indices and performance returns (DiBmann *et al.* 2013). Copulas are useful not just for assessing flood risk but also for the risk associated with water quality. A Copula-based Bayesian network technique was suggested by Yu & Zhang (2021) to assess the water quality risk in a big drinking water reservoir in Tianjin, China, using diverse environmental risk indicators. Another study related to ecological and flood risk was conducted by Yang *et al.* (2021). They revealed that the entire volume and duration of floods are shown to be an important indicator to assess the hazards associated with flood events in an interconnected network of rivers and lakes, namely the Nansi Lake Basin, China using the copula-coupled hydrodynamic risk assessment model.

Before we delve into copula modeling, it is essential to understand the specific type of copula modeling used in flood risk analysis. The study of semiparametric bivariate copula in the Kelantan River Basin, Malaysia has been investigated by Latif & Mustafa (2021). Their study demonstrated that the Gaussian copula is recognized as the best-fitting copula function for constructing the bivariate joint distribution of flood peak and volume, while the Frank copula appears to be the best-fit copula for the remaining flood attribute pairs. Razmkhah *et al.* (2022) used various copula families in the Karoon III Basin, Iran, to simulate the bivariate joint distribution of flood peak, volume, base flow duration, and time to peak. They concluded that those four variables are dependent in various ways. Hence, it is crucial to consider all four variables when conducting flood risk analysis. A bivariate copula for flood frequency analysis was also conducted by Favre *et al.* (2004) to model peak discharge and volume at the Rimouski River in Quebec, Canada.

Both bivariate and trivariate flood frequency analysis for the Dez Dam Lake located in Iran were investigated by Klaho *et al.* (2022) where they concluded that bivariate return period calculations are more likely to be accurate for short-term compared to the trivariate cases, but trivariate conditional return period calculations for long-term periods are highly likely to be more reliable than those for bivariate cases. Their findings were validated when the actual flood data were compared with estimated values derived from both bivariate and trivariate copulas. Gumbel–Hougaard copula was used to derive trivariate flood frequency distributions which are flood peak, volume, and duration without making any assumptions about the independence, normality, or similarity of the marginal distributions of the flood variables using flood data from the Amite River in Louisiana (Zhang & Singh 2007). Based on the root mean square error and Kolmogorov–Smirnov (KS) test statistics, the proposed copula performs better than the trivariate normal distribution when it comes to fitting the empirical joint distribution.

Various parameter estimation of marginal distributions has been employed by several researchers in flood risk analysis including the method of moments (MOM) (Bezak *et al.* 2014), maximum likelihood estimation (MLE) (Latif & Mustafa 2020b), maximum pseudo-likelihood (MPL) (Latif & Mustafa 2021), and L-moment (Sraj *et al.* 2015). The results of fitting marginal distributions using the MPL approach on the flood characteristics of the Kelantan River Basin in Malaysia indicated that the flood peak discharge, volume, and duration are best approximated by the Lognormal-2P distribution, Johnson SB-4P distribution, and the Gamma-3P distribution, respectively (Latif & Mustafa 2020b). In another study, Lognormal-3P emerges as the best-fitted marginal for peak discharge and volume distribution, while Johnson SB fits the flood duration best for Armand Watershed in Iran when the parameter was estimated using the MLE technique (Amini *et al.* 2023). Sraj *et al.* (2015) employed the L-moment to estimate the marginal distribution of flood characteristics in the Sava River. The analysis revealed that the log-Pearson Type-III distribution was the most suitable for modeling discharge peaks and hydrograph durations, while the Pearson Type-III distribution was chosen to model hydrograph volumes.

During the process of parameter estimation methods of flood attributes for the Sava River in Slovenia, Bezak *et al.* (2014) examined various approaches, including the MOM, MLE, and L-moment. The results demonstrated that employing the L-moment for parameter estimation in the annual maximum (AM) analyses produced superior outcomes compared to the MOM and MLE methods. They also highlighted the benefits of conventional moments for smaller sample sizes and reduced skewness. In situations where data exhibit symmetry, the MOM is effective. However, due to the inherent asymmetry of flood data, the L-moment performs better. Hosking (1990) claimed that L-moment, similar to conventional moment, offers improved reliability and is less susceptible to outliers than the commonly used method, MLE. The L-moment yielded better test results as opposed to both the MOM and MLE approach when compared by Sraj *et al.* (2015) in their bivariate copula for flood frequency analysis in the Sava River.

This study aims to model the dependency between flood attributes which includes flood peak discharge, flood volume, and flood duration using a trivariate copula model. The L-moment will be employed in the parameter estimations of each flood attribute to address the challenges posed by skewness, outliers, and extreme values in flood data. Given the limited number of researchers utilizing the L-moment in flood frequency analysis, particularly in Malaysia, this research seeks to enhance the accuracy of flood analysis results by adopting a different parametric approach. This combined approach is essential to capture the complex dependence structure among the three critical flood attributes, which is beyond the scope of univariate and bivariate models. Such a comprehensive approach will enable a more holistic flood risk assessment. Then, a widely used copula family in the field of hydrology will be fitted, such as elliptical and Archimedean copulas to capture the complex dependence structure among flood features effectively.

## MATERIALS

### Study area and data

^{2}is a crucial river system that contributes as a significant freshwater source for Malaysia. JRB also plays a vital role in supporting the water needs and demands of the region and its neighboring country, Singapore (Tan

*et al.*2021). Johor, adjacent to Pahang and Melaka and located in the southern part of Peninsular Malaysia, is the southernmost state, as depicted in the left portion of Figure 1. The location of the JRB and the Rantau Panjang Gauging station can be observed in the right part of Figure 1 marked with red. Johor was among the most severely impacted states in Malaysia during the most recent flood events in late December 2021 and early January 2022. With a population of 4 million, the nearby villages suffered significant damage by the floods where tens of thousands of individuals were compelled to find refuge in relief facilities set up in schools and community centers (Chen 2023). Understanding the streamflow patterns and characteristics in the JRB is vital for effective flood mitigation and awareness. By conducting in-depth studies and analyzing the hydrological dynamics of the basin, local government agencies and decision-makers can gain valuable insights to develop informed strategies that minimize the impact of flooding events and ensure the safety of the communities residing in the region.

This study utilized a dataset of daily streamflow data acquired from the Rantau Panjang Gauging station, provided by the Department of Irrigation and Drainage, Malaysia. The dataset covers an extensive period of 49 years, ranging from 1972 to 2021. Each flood variables consist of 49 data entailing 49 flood events. By extracting and analyzing the essential flood variables of peak discharge, volume, and duration from this extensive dataset, we aimed to gain valuable insights into their dependency characteristics and patterns.

### Extraction of flood hydrograph

*q*) and end date (

_{s}*q*) of flood occurrences. Determining the duration of a flood involves the following formula:while the computation of flood volume can be found below:

_{e}*P*), flood volume (

*V*), and flood duration (

*D*) for a flood event, all of which can be derived from daily discharge data at a specific site of interest.

## METHODOLOGY

### Sklar's theorem

*C*, which effectively combines these marginal distribution functions. The copula

*C*yields the joint distribution , as depicted below by Sklar (1959).

If *F* is continuous, then the copula *C* is unique. Copula is superior due to its ability to distinguish between the modeling of marginal distributions and their dependence structures.

### Copula families

Several widely utilized copula in the field of hydrology were employed in this study including the elliptical families (Gaussian), and Archimedean families (Clayton, Gumbel, Frank, Joe) (Chen & Guo 2019; Klaho *et al.* 2022). Further description of each copula family can be seen in Table 1.

Copula type . | . | Parameter range . |
---|---|---|

Gaussian | The definition of implicit Gaussian copula is as follows: where is the distribution function of a bivariate normal with unit variances, zero means, and correlation parameter of the copula is the inverse of the univariate standard normal distribution function. Gaussian copula can be expressed as: and when , independence is applied | |

Clayton | ||

Frank | ||

Gumbel | ||

Joe |

Copula type . | . | Parameter range . |
---|---|---|

Gaussian | The definition of implicit Gaussian copula is as follows: where is the distribution function of a bivariate normal with unit variances, zero means, and correlation parameter of the copula is the inverse of the univariate standard normal distribution function. Gaussian copula can be expressed as: and when , independence is applied | |

Clayton | ||

Frank | ||

Gumbel | ||

Joe |

### Fully nested Archimedean copula

The two bivariate copulas and , which represents the dependency between variables and , respectively, make up the three variable asymmetric copulas. On the other hand, the outer copula is a function of the inner copula and , which forms the trivariate copula. According to Ganguli & Reddy (2012), the rank correlation coefficients such as Kendall's tau or Spearman's rho between the inner pair () must be greater than those between the other pairings () and () to be able to apply the FNA three-dimensional copula.

Next, we will explore every copula family along with the specific parameter ranges employed in this research, which include copulas from both the Archimedean and elliptical families (Bolbolian Ghalibaf 2020). The formulations for each copula family are presented in Table 1.

### Parameter estimation method

#### L-moment

*X*with a distribution function

*F*(

*x*) and a corresponding quantile function

*x*(

*F*). Let represent the order statistics of a random sample of size

*n*drawn from the distribution of

*X*. In this framework, the L-moment of the

*r*-th order for the random variable

*X*is computed as below according to Bílková (2014)

*X*, it is preferable to standardize higher L-moment () with

*r*≥ 3. The ratio of L-moment for the

*r*-th order of the random variable

*X*can be formulated as follows:

Table 2 displays the CDF and L-moment parameter estimation approach of the marginal distributions analyzed in this study (Asquith 2023). The generalized extreme value (GEV) and generalized Pareto (GP) distributions, each characterized by their respective estimated parameters: (location), (scale), and (shape). Additionally, the Weibull distribution was examined, with consideration given to the parameters (location), (scale), and (shape). Furthermore, the Lognormal distribution was included in the analysis, with a focus on the parameters (space), (location), and (scale). Lastly, the Pearson Type-III distribution was investigated and the parameter was used to represent the skewness of the distribution.

Distribution . | CDF . | L-moment . |
---|---|---|

Weibull | where | |

Lognormal | where is the CDF of standard normal distribution and | |

GEV | where | |

GP | where | |

Pearson Type-III | where , | , where denotes the ratio of the incomplete beta function. For more comprehensive information, additional details can be found in the work of Khan et al. (2021). |

Distribution . | CDF . | L-moment . |
---|---|---|

Weibull | where | |

Lognormal | where is the CDF of standard normal distribution and | |

GEV | where | |

GP | where | |

Pearson Type-III | where , | , where denotes the ratio of the incomplete beta function. For more comprehensive information, additional details can be found in the work of Khan et al. (2021). |

### Goodness-of-fit test

The KS test was applied to assess the goodness-of-fit of the fitted marginal distributions and copulas. The objective was to ascertain whether these distributions and copulas adequately captured the underlying data and to identify the best-fitting models for the analysis. The KS test is known to possess higher statistical power than the chi-square test, particularly when dealing with moderate sample sizes. However, as the sample size increases, both tests tend to exhibit similar levels of power. The previously mentioned author also highlighted another widely recognized goodness-of-fit test for assessing the appropriateness of the marginal distribution is the Anderson–Darling test. Nevertheless, the Anderson–Darling test is only available for a limited set of specific distributions (Ricci 2005). Therefore, to evaluate the goodness-of-fit for the fitted marginal distribution in this study, the KS test is employed due to its versatility and ability to accommodate a broader range of distributions (Ricci 2005).

*N*is the number of observations;

*C*(.) is the hypothesized model that needs to be tested; and is the observation in increasing order.

*et al.*2009):

In this context, represents the empirical copula distribution, while represents the parametric distribution based on *n* observed data. These two distributions will be evaluated and compared using the CvM test statistics. Additionally, denote the pseudo-observation generated from the copula, *C* derived from the data. The hypotheses of CvM test statistics are as follows:

The tested copula distribution fits the data well.

The tested copula distribution does not fit the data.

The null hypothesis will be rejected if the *p*-value is less than the chosen significance level, which in this case is set at 0.05. Hence, if the *p*-value is less than 0.05, it can be concluded that the fitted copula distribution does not fit the data well.

### Model performance analysis

*et al.*(2018) by identifying a distribution with the lowest AIC value. AIC was also used by Ni

*et al.*(2020) to choose the best-fitted copula. AIC was used in this study to determine both the best-fitted marginal distribution and the best-fitted copula. The formulation of AIC can be seen as follows:where

*c*is the copula with parameter , while

*k*

*=*1 for one parameter copula and

*k*

*=*2 for 2 parameter copula.

### Tail dependency analysis

*et al.*2022). Similar to the work of Requena

*et al.*(2013), more attention is given to the upper tail dependency in this study as it relates to the frequency analysis of extreme flood events for dam safety analysis. Poulin

*et al.*(2007) highlighted the need to consider upper tail dependence when choosing copulas to avoid underestimating the risk of flooding. This is because upper tail dependence is associated with the level of dependence between the extreme values of the variables under investigation. The upper tail dependence coefficient of copula can be estimated by:To determine which copula models best capture the dependency in the extremes, it is important to compare each copula's upper tail coefficient with the coefficient derived from the observed data. Hence, the tail dependence of the fitted copula model is estimated by Latif & Mustafa (2020b) using the nonparametric tail dependence coefficient estimation method, which is based on the Caperaa–Fougeres–Genest estimator as follows:where

*n*is the number of observations, while are the simulated data from the chosen copula model.

### Risk analysis

#### Calculation of flood return period

*et al.*2022). Within the context of a one-dimensional probability framework, the determination of return periods for hydrological or flood events that surpassed a prescribed threshold value represented as (

*X*

*≥*

*x*) involves the utilization of univariate CDFs (Latif & Mustafa 2020a). The estimation of these CDFs is accomplished by fitting the data through Equation (6) as presented below:where is the best-fitted marginal distribution and in the context of annual maxima-based extreme modeling, the symbol

*μ*denotes the mean inter-arrival duration between two consecutive episodes. Specifically, in this scenario, the value of

*μ*is explicitly set to 1, reflecting the interval between successive occurrences of extreme events within the annual data (Yue & Rasmussen 2002). The calculation for the flood return period of three-dimensional copulas for the OR case:where is the joint CDF that can be expressed through the best-fitted trivariate copula as well as the best-fitted univariate marginal distribution functions of each flood variable Alternatively, in the case of the AND condition where , the trivariate joint distribution event can be formulated as below:

*AND*case as demonstrated below. Let

*X*be peak discharge and

*Y*be flood volume. The copula Gumbel is fitted since they were found as the best bivariate copula for peak discharge–flood volume pair. For simplicity, for bivariate case, the formula for calculating the

*AND*return period can be found as follows:

The sample calculation of the 5-year joint return period for *AND* case between peak discharge and flood volume is 7.54 years.

### Research framework

## RESULTS AND DISCUSSION

### Dependence measure

For this study, the non-parametric approaches (Kendall's tau and Spearman's rho) were considered to measure the dependency structure of flood variables. The results can be found in Table 3.

Variables/dependence measure . | Kendall (p-value)
. | Spearman (p-value)
. |
---|---|---|

Peak discharge, volume (P–V) | 0.70 (0.00) | 0.87 (0.00) |

Volume, duration (V–D) | 0.54 (0.00) | 0.73 (0.00) |

Peak discharge, duration (P–D) | 0.28 (0.00) | 0.39 (0.00) |

Variables/dependence measure . | Kendall (p-value)
. | Spearman (p-value)
. |
---|---|---|

Peak discharge, volume (P–V) | 0.70 (0.00) | 0.87 (0.00) |

Volume, duration (V–D) | 0.54 (0.00) | 0.73 (0.00) |

Peak discharge, duration (P–D) | 0.28 (0.00) | 0.39 (0.00) |

*p*-value of all flood pair variables is significant at

*α*= 0.05 according to both Kendall's tau and Spearman's rho. Hence, we will choose between peak discharge–volume and volume–duration to couple them as a bivariate copula. However, looking at the highest pair correlation between the previously mentioned pair, peak discharge–volume revealed the highest dependency between them. Thus, they will be fitted as the bivariate copula which satisfies the condition of FNA copula, before including the third variable which is flood duration to be analyzed as the trivariate copula. Graphical representations such as Kendall's plot can be found in Figure 4 to illustrate the dependency and correlation between various flood parameters.

Kendall's plot in Figure 4 illustrates that there is a strong positive correlation between peak discharge–volume and volume–duration. The data point deviates far away and is positioned above the diagonal line, which means that these two pairs of flood attributes display strong positive dependence. Looking at the relationship between peak discharge–duration, the data points are located much closer to the diagonal line indicating a weak positive association as compared to the other flood pairs.

The dependency measure was examined visually with the use of the scatter pair plot illustrated in Figure 5 to support our findings. The relationship between peak discharge–volume as can be seen in the box between these two variables in Figure 5 indicates a strong dependency of 0.70 as per Kendall's tau. This pair exhibits the highest level of dependence. However, the scatter plot for the peak discharge–duration pair displays a weak relationship as the data point significantly deviates quite far from the diagonal line. Looking at the histogram distribution of all three flood attributes, the data are positively skewed which suggests a right-tailed distribution.

### Fitting marginal distribution

In this research, an investigation was conducted on several three parameters marginal distributions to assess the dependency of flood variables. The study aims to fit the best distribution to the flood variables which will be tested using several distributions such as GEV, GP, Weibull, Lognormal, and Pearson Type-III distribution.

Referring to the AIC values of the peak discharge tabulated above, it was observed that the Lognormal outperformed the other marginal distributions as it depicts the lowest AIC values. In contrast, the Weibull provided the best fit for the flood volume and GEV satisfactorily modeled the flood duration since both distributions portray the lowest AIC value. Based on the obtained *p*-values of the KS test being significantly higher than the predetermined significance level of 0.05, we fail to reject the null hypothesis and conclude that all distributions passed the KS test and follow Lognormal, Weibull, and GEV distribution for peak discharge, volume, and duration, respectively.

Flood variables . | Distribution . | Estimated parameters . | AIC . | KS (p-value)
. | ||
---|---|---|---|---|---|---|

Peak discharge (P) | Weibull | 626.13 | 0.08 (0.99) | |||

Lognormal | 597.97 | 0.06 (1.00) | ||||

GEV | 599.46 | 0.06 (1.00) | ||||

GP | 626.30 | 0.08 (0.99) | ||||

Pearson Type-III | 625.98 | 0.08 (0.99) | ||||

Volume (V) | Weibull | 801.59 | 0.08 (0.99) | |||

Lognormal | 806.10 | 0.06 (1.00) | ||||

GEV | 807.32 | 0.06 (1.00) | ||||

GP | 862.21 | 0.08 (0.99) | ||||

Pearson Type-III | 802.87 | 0.08 (0.99) | ||||

Duration (D) | Weibull | 421.97 | 0.10 (0.96) | |||

Lognormal | 349.35 | 0.08 (0.99) | ||||

GEV | 348.50 | 0.08 (0.99) | ||||

GP | 428.72 | 0.10 (0.96) | ||||

Pearson Type-III | 384.02 | 0.08 (0.99) |

Flood variables . | Distribution . | Estimated parameters . | AIC . | KS (p-value)
. | ||
---|---|---|---|---|---|---|

Peak discharge (P) | Weibull | 626.13 | 0.08 (0.99) | |||

Lognormal | 597.97 | 0.06 (1.00) | ||||

GEV | 599.46 | 0.06 (1.00) | ||||

GP | 626.30 | 0.08 (0.99) | ||||

Pearson Type-III | 625.98 | 0.08 (0.99) | ||||

Volume (V) | Weibull | 801.59 | 0.08 (0.99) | |||

Lognormal | 806.10 | 0.06 (1.00) | ||||

GEV | 807.32 | 0.06 (1.00) | ||||

GP | 862.21 | 0.08 (0.99) | ||||

Pearson Type-III | 802.87 | 0.08 (0.99) | ||||

Duration (D) | Weibull | 421.97 | 0.10 (0.96) | |||

Lognormal | 349.35 | 0.08 (0.99) | ||||

GEV | 348.50 | 0.08 (0.99) | ||||

GP | 428.72 | 0.10 (0.96) | ||||

Pearson Type-III | 384.02 | 0.08 (0.99) |

*Note*: Bold lettering indicates the best marginal distribution for each flood attribute with the lowest AIC values.

### Trivariate copula analysis

In this section, we explored the dependencies among three flood variables using five copula families. Before delving into the trivariate copula modeling for flood variables in the JRB, we will now start by fitting the copula to the bivariate flood pair with the highest correlation which is the peak discharge–volume pair. The bivariate modeling of peak discharge–volume is presented in Table 5.

Copula . | Parameter . | AIC . | CvM test (p-value)
. |
---|---|---|---|

Peak discharge–volume (P–V) | |||

Gaussian | 0.87 | (0.34) | |

Clayton | 3.20 | (0.30) | |

Gumbel (rotated by 180°) | 3.05 | (0.74) | |

Frank | 10.59 | (0.39) | |

Joe | 3.98 | (0.94) | |

Volume–duration (V–D) | |||

Gaussian | 0.73 | (0.95) | |

Clayton | 1.30 | (0.85) | |

Gumbel | 1.94 | (0.61) | |

Frank | 6.48 | (0.94) | |

Joe | 2.16 | (0.54) | |

Peak discharge–duration (P–D) | |||

Gaussian | 0.37 | (0.86) | |

Clayton | 0.30 | (0.24) | |

Gumbel | 1.23 | (0.10) | |

Frank | 2.41 | (0.44) | |

Joe | 1.19 | (1.00) |

Copula . | Parameter . | AIC . | CvM test (p-value)
. |
---|---|---|---|

Peak discharge–volume (P–V) | |||

Gaussian | 0.87 | (0.34) | |

Clayton | 3.20 | (0.30) | |

Gumbel (rotated by 180°) | 3.05 | (0.74) | |

Frank | 10.59 | (0.39) | |

Joe | 3.98 | (0.94) | |

Volume–duration (V–D) | |||

Gaussian | 0.73 | (0.95) | |

Clayton | 1.30 | (0.85) | |

Gumbel | 1.94 | (0.61) | |

Frank | 6.48 | (0.94) | |

Joe | 2.16 | (0.54) | |

Peak discharge–duration (P–D) | |||

Gaussian | 0.37 | (0.86) | |

Clayton | 0.30 | (0.24) | |

Gumbel | 1.23 | (0.10) | |

Frank | 2.41 | (0.44) | |

Joe | 1.19 | (1.00) |

*Note*: Bold lettering indicates the most satisfactory copula model with the lowest AIC.

Table 5 presents a comparison of AIC values for various bivariate copula models, aiding us in selecting the optimal copula. Five copula functions from the elliptical (Gaussian) and Archimedean (Clayton, Gumbel, Frank, Joe) families will be fitted to the peak discharge–volume pair. Based on the results presented, the rotated Gumbel copula emerged as the best fit for the peak discharge–volume pair, exhibiting the lowest AIC values. On the other hand, both Gaussian and Frank were selected as the best-fit copula for volume–duration and peak discharge–duration pair, respectively. Evidently, each of the fitted copulas has successfully passed the CvM test, as indicated by their *p*-values exceeding the 0.05 threshold level.

Referring to Table 6, the results of the fitting process for the trivariate copula model with five well-known copula families were presented. It was discovered that the Clayton copula had the lowest values for AIC, suggesting a greater model fit and further demonstrating its suitability as the best copula for effectively capturing the dependency structure of the trivariate flood elements within the JRB. These findings offer strong evidence and support for the claim that, in this particular situation, the Clayton copula is the best option for illustrating the interdependencies between the flood factors which are peak discharge–volume–duration.

Copula . | Parameter . | AIC . | CvM test (p-value)
. |
---|---|---|---|

Peak discharge–volume–duration (P–V–D) | |||

Gaussian | 0.54 | (0.65) | |

Clayton | 0.83 | (0.50) | |

Gumbel | 1.50 | (0.60) | |

Frank | 3.74 | (0.63) | |

Joe | 1.68 | (0.84) |

Copula . | Parameter . | AIC . | CvM test (p-value)
. |
---|---|---|---|

Peak discharge–volume–duration (P–V–D) | |||

Gaussian | 0.54 | (0.65) | |

Clayton | 0.83 | (0.50) | |

Gumbel | 1.50 | (0.60) | |

Frank | 3.74 | (0.63) | |

Joe | 1.68 | (0.84) |

*Note*: Bold lettering indicates the most satisfactory copula model.

*λ*-function. The comparison of various copula families for trivariate flood variables is shown in Figure 7. When comparing the Clayton copula to other copula families, Figure 7 illustrates that the theoretical function (grey continuous line) and the empirical function of the observations (red line) coincide with each other best which further indicates that the fitted copula aligns with the empirical values. Our graphical findings are further supported by Table 6 when Clayton copula depicts the lowest AIC values. After closer consideration of the

*p*-values derived from the CvM test in Table 6, it was found that all fitted copula offers a satisfactory fit to the data as their

*p*-values exceed the 0.05 threshold. After examining all fitted copula analytically and graphically, Clayton copula seems to be the best fit for trivariate copula of flood variables in the JRB.

### Tail dependence analysis

As highlighted by previous researchers, tail dependency coefficients were employed to choose an appropriate copula.

The nonparametric estimator of the upper tail dependence coefficient, for peak discharge–volume (0.3041), volume–duration (0.2159), and peak discharge–duration (0.0231), which were computed using Equation 20 was compared with the upper tail dependence of copula, , obtained in Table 7. The tail dependence , of Gaussian, Clayton, Rotated Gumbel, Frank, and Rotated Joe copulas are all 0, and the values are close to the tail dependence coefficient that was obtained using Equation 20 which are 0.3041 and 0.2159 for both peak discharge–volume and volume–duration pairs. Hence, we can fairly say that all copula fits the data well. Looking at the pair variables of peak discharge–duration, Gaussian and Frank copula managed to capture the tail dependency of the variables as the copula tail dependence, obtained is 0, which were close to the nonparametric tail dependence coefficient, , which is 0.0231. Therefore, when also considering the tail dependence requirement for the copulas that have been computed, the family of copulas that was chosen based on AIC and goodness-of-fit tests is further validated by considering the tail dependency analysis. Different copula functions for generating the bivariate distribution of flood peak discharge–volume pairs, particularly in Malaysia, have been described by Latif & Mustafa (2020b), where Gaussian emerges as the best-fitted copula. However, for peak discharge–duration pair, Frank copula appears as the best-fit for both study in the Kelantan River Basin (Latif & Mustafa 2020b) and the JRB.

Copula . | Peak discharge–volume (P–V). | Volume–duration (V–D). | Copula . | Peak discharge–duration (P–D). | |||
---|---|---|---|---|---|---|---|

. | . | . | . | . | . | ||

Gaussian | 0.87 | 0 | 0.73 | 0 | Gaussian | 0.37 | 0 |

Clayton | 3.20 | 0 | 1.30 | 0 | Rotated Clayton (180°) | 0.30 | 0.09 |

Rotated Gumbel (180°) | 3.05 | 0 | 1.94 | 0 | Gumbel | 1.23 | 0.24 |

Frank | 10.59 | 0 | 6.48 | 0 | Frank | 2.41 | 0 |

Rotated Joe (180°) | 3.98 | 0 | 2.16 | 0 | Joe | 1.19 | 0.21 |

Copula . | Peak discharge–volume (P–V). | Volume–duration (V–D). | Copula . | Peak discharge–duration (P–D). | |||
---|---|---|---|---|---|---|---|

. | . | . | . | . | . | ||

Gaussian | 0.87 | 0 | 0.73 | 0 | Gaussian | 0.37 | 0 |

Clayton | 3.20 | 0 | 1.30 | 0 | Rotated Clayton (180°) | 0.30 | 0.09 |

Rotated Gumbel (180°) | 3.05 | 0 | 1.94 | 0 | Gumbel | 1.23 | 0.24 |

Frank | 10.59 | 0 | 6.48 | 0 | Frank | 2.41 | 0 |

Rotated Joe (180°) | 3.98 | 0 | 2.16 | 0 | Joe | 1.19 | 0.21 |

In Figures 8 and 9, the simulated data from the chosen copula were compared with the actual observation. Based on the scatter plot in the upper row of both figures, it is evident that the chosen copula is considerably more consistent and able to preserve the overall pattern of both the survival or rotated Gumbel copula fitted for (*P*, *V*) pair and the Gaussian copula suited for (*V*, *D*) pair.

In addition, the rotated Gumbel copula for (*P*, *V*) flood characteristics and the Gaussian copula for (*V*, *D*) flood characteristics exhibit very similar tail dependency behavior with the actual data, as shown by the Chi-plot and the K-plot in the second and third rows in both Figures 8 and 9. After a thorough analysis that was done both analytically and graphically, it was confirmed that the chosen copula managed to capture the tail dependency of flood variables in JRB.

### Flood risk analysis

Mathematically speaking, the univariate return period of the specified flood features, is obtained from the univariate CDF of the flood variable that fits the data well. The flood return period serves as a probability measure. For instance, if a flood event has a return period of 50 years, it indicates that there is a 2% chance or 1/50 likelihood of that particular flood occurring in any given year. As an example, a flood event characterized by a 1,000-year return period is anticipated to surpass its magnitude, on average, once every 10 centuries. Referring to Table 8, it is revealed that the 1,000-year flood return period corresponds to a flood peak discharge or *P* of 1,318.339 m^{3}/s, a flood volume *V* = 7,276.214 m^{3}, and a flood duration *D* = 82 days.

T . | P . | V . | D . | . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|

5 | 290.09 | 2,144.46 | 26 | 3.74 | 7.54 | 3.54 | 8.52 | 3.06 | 13.76 | 2.97 | 9.24 |

10 | 386.19 | 2,876.91 | 32 | 6.93 | 17.99 | 6.64 | 20.24 | 5.60 | 46.61 | 5.46 | 21.34 |

20 | 491.49 | 3,582.01 | 38 | 13.03 | 42.95 | 12.66 | 47.60 | 10.63 | 169.00 | 10.46 | 49.36 |

50 | 647.32 | 4,483.85 | 47 | 30.65 | 135.67 | 30.18 | 145.76 | 25.65 | 989.40 | 25.46 | 149.20 |

100 | 779.26 | 5,148.35 | 54 | 59.13 | 323.83 | 58.68 | 337.89 | 50.65 | 3,867.43 | 50.46 | 343.40 |

200 | 924.46 | 5,800.47 | 61 | 114.86 | 772.96 | 114.70 | 780.20 | 100.66 | 15,288.55 | 100.46 | 788.69 |

500 | 1,138.46 | 6,646.56 | 73 | 278.52 | 2,441.37 | 279.80 | 2,347.45 | 250.66 | 94,872.30 | 250.46 | 2,361.69 |

1,000 | 1,318.34 | 7,276.21 | 82 | 546.93 | 5,827.38 | 551.17 | 5,385.53 | 500.66 | 378,579.9 | 500.46 | 5,405.93 |

T . | P . | V . | D . | . | . | . | . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|

5 | 290.09 | 2,144.46 | 26 | 3.74 | 7.54 | 3.54 | 8.52 | 3.06 | 13.76 | 2.97 | 9.24 |

10 | 386.19 | 2,876.91 | 32 | 6.93 | 17.99 | 6.64 | 20.24 | 5.60 | 46.61 | 5.46 | 21.34 |

20 | 491.49 | 3,582.01 | 38 | 13.03 | 42.95 | 12.66 | 47.60 | 10.63 | 169.00 | 10.46 | 49.36 |

50 | 647.32 | 4,483.85 | 47 | 30.65 | 135.67 | 30.18 | 145.76 | 25.65 | 989.40 | 25.46 | 149.20 |

100 | 779.26 | 5,148.35 | 54 | 59.13 | 323.83 | 58.68 | 337.89 | 50.65 | 3,867.43 | 50.46 | 343.40 |

200 | 924.46 | 5,800.47 | 61 | 114.86 | 772.96 | 114.70 | 780.20 | 100.66 | 15,288.55 | 100.46 | 788.69 |

500 | 1,138.46 | 6,646.56 | 73 | 278.52 | 2,441.37 | 279.80 | 2,347.45 | 250.66 | 94,872.30 | 250.46 | 2,361.69 |

1,000 | 1,318.34 | 7,276.21 | 82 | 546.93 | 5,827.38 | 551.17 | 5,385.53 | 500.66 | 378,579.9 | 500.46 | 5,405.93 |

It can be inferred that the simultaneous occurrence of trivariate flood characteristics is less common in the *AND* scenario and more common in the *OR* joint scenario. For instance, the return period for 1,000 years is determined to be 546.93 years, which is highly likely to occur than the *AND-*joint return period years. The *OR-*joint return period for 1,000 years of years and years, where both cases have a higher return period and are less likely to occur for 1,000 years than the other peak discharge–duration bivariate case, where . Looking at the trivariate case, the years, which is very much lower and less likely to happen than the *OR-*joint return period case of years. To conclude, The *OR-*joint return period for the trivariate case is more frequent than all of the bivariate cases. However, the *AND-*joint return period shows different results where the years, which is higher and less frequent than all other bivariate cases and trivariate case for the *AND-*joint return period. This outcome is consistent with the research conducted by Latif & Simonovic (2022), which investigated compound flooding events. Their research findings indicated that the *AND*-joint return period value for any trivariate (and bivariate) flood events surpasses the *OR*-joint return period.

The discussion above concludes that when assessing flood risk, it is extremely crucial to consider both the cases of joint return periods (*OR* & *AND*-joint cases) to make a justifiable risk-based decision. Considering based on only either the *OR* or *AND*-joint return periods or either based on univariate return periods would be problematic and could lead to underestimations or overestimations of flood risk. Hence, proper flood planning and risk mitigation cannot be achieved.

## CONCLUSION

This study employed trivariate copula functions to examine and analyze how three flood elements which are peak discharge, flood volume, and flood duration relate to one another. Unlike many existing studies that used MLE and inference function margin for flood frequency analysis, the L-moment approach was utilized to estimate the parameters of the flood marginal distributions because flood variables are often influenced by extreme values and outliers. The results show that the Lognormal distribution emerged as the best for the peak discharge variable. While the Weibull and GEV distributions offered the greatest fit as marginal distributions for the flood volume and flood duration, respectively.

After successfully fitting the marginal distribution, the copula parameter was then fitted using the maximum likelihood approach and it was discovered that the most effective choice for representing the dependent structure between the variables peak discharge–volume–duration was found to be the Clayton copula model as it has the lowest AIC value. After that, the most suitable copulas are utilized to calculate the joint return periods for various combinations of flood characteristics. Based on the aforementioned analyses, it can be inferred that the *AND*-joint return period surpasses the *OR*-joint return periods across different flood combinations. The study concludes that using the information gathered from this analysis, one may better comprehend the connections between peak discharge, flood volume, and flood duration specifically the hydrologist and decision-makers. It is worth emphasizing that conventional parameter estimation methods may not provide a complete understanding, especially when dealing with extreme values. This is why we highlight the importance of adopting the L-moment approach, which is capable of addressing these limitations. This information may be useful for enhancing flood risk assessment, hydrological modeling, and decision-making in water resource management and flood mitigation efforts.

The current study is focused on three-dimensional flood variables which include flood peak discharge, flood volume, and flood duration. However, it could be enhanced by incorporating another crucial flood characteristic, peak time, as previous research has highlighted its correlation with other flood attributes. By including peak time in the analysis, a more comprehensive understanding of flood dynamics can be achieved. However, the commonly used multivariate copulas, such as the Gaussian, student-*t*, and Archimedean, cannot accurately express the dependence structure in high dimensions, making it challenging to extend beyond three-dimensional variables. For future investigations, it is recommended to explore alternative copula models, such as the vine copula, which offers increased flexibility in copula modeling. The vine copula was used because it adds flexibility to the model by accounting for tail dependencies and non-linear relationships. The vine copula approach allows for a better representation of complex dependence structures among higher dimensional flood variables where it can model all four crucial flood factors simultaneously. This recommendation arises from the limitation of the proposed model, which focuses solely on three variables and is unable to cater to the needs of four or more variables. By adopting these recommendations, researchers can strengthen flood risk assessment methodologies, contributing to more effective flood management and decision-making processes.

## ACKNOWLEDGEMENTS

The authors would like to acknowledge the generous support and funding provided for this study by the University Fundamental Research Grant from Universiti Teknologi Malaysia, under vote number Q.J130000.2554.21H65. Furthermore, heartfelt gratitude is expressed to the Department of Irrigation and Drainage (DID), Malaysia, for their invaluable contribution in supplying the authors with the daily discharge data for the JRB.

## FUNDING

This research was funded by the University Fundamental Research Grant from Universiti Teknologi Malaysia, under vote number Q.J130000.2554.21H65.

## COMPETING INTERESTS

The authors have no relevant financial or non-financial interests to disclose.

## AUTHOR CONTRIBUTIONS

N. A. Jafry and J. Suhaila contributed to conceptualization; F. Yusof contributed to methodology; N. A. Jafry contributed to software development and wrote the original draft; S. R. M. Nor and N. E. Alias validated the work; J. Suhaila and F. Yusof wrote the original draft and reviewed and edited the manuscript; S. R. M. Nor visualized the published work; and N. E. Alias supervised the work. All authors have read and agreed to the final version of the manuscript.

## DATA AVAILABILITY STATEMENT

Data cannot be made publicly available; readers should contact the corresponding author for details.

## CONFLICT OF INTEREST

The authors declare there is no conflict.

## REFERENCES

*4 Dead, 40,000 Flee Homes as Floods Hit Malaysia*. CNN. (accessed 15 March 2023)